matchindex {truecluster} | R Documentation |
Permutes rows and columns of a cross-table to match the marginal classes.
matchindex(observed, method = c("heuristic", "truematch", "tracemax")[1]) matchtable(observed, method = c("heuristic", "truematch", "tracemax")[1]) matchcalcs(k)
observed |
an integer matrix representing a cross-table, non-square allowed |
method |
one of c("heuristic","truematch","tracemax"), see details |
k |
number of categories to match |
Method heuristic – the default – matches using the truematch heuristic, i.e. recursively calculating resisuals and removing the row/col with the biggest residuals. The number of residuals calculated during matching is given by function matchcalcs
= 1/6*(n*(n+1)*(2*n+1)) - 1 (Otto, Forster, Analysis 1, Diffential und Integralrechnung einer Veränderlichen, Viehweg, Braunschweig (1992).). Thus the truematch heuristic has time-complexity O(n^3), and memory-complexity O(n^2), if replacing recursion by while loops as done here.
Method truematch – the truematch algorithm – calculates residuals just once and then applies trace maximization (Munkres Hungarian Method).
Method tracemax applies trace maximization directly without calculating residuals.
All methods can cope with (non-square) rectangular matrices. All methods break ties by random. All methods use C implementations. All methods have polynomial time complexity.
Function matchindex
returns a list with components
row |
permutation index for the rows |
col |
permutation index for the columns |
row |
permutation index for the rows |
col |
permutation index for the columns |
Function matchcalcs
returns the number of residuals to calculate during matching
Jens Oehlschlägel
xx
munkres
, for another implementation of the Hungarian method see solve_LSAP
, for greedy heuristics see matchClasses
m <- rbind(c(1, 98), c(0, 1)) i <- matchindex(m) i m[i$row, i$col] i <- matchindex(m, method="truematch") i m[i$row, i$col] i <- matchindex(m, method="tracemax") i m[i$row, i$col] ## Not run: library(e1071) library(clue) #library(lpSolve) cat("Check quality with respect to trace maximization\n") x <- integer(6) names(x) <- c("e","s","l","t","r","m") for (i in 1:400){ k <- 5 m <- table(sample(1:k, 10000, TRUE), sample(1:k, 10000, TRUE)) i.e <- matchClasses(m, method="exact") m.e <- m[,i.e] i.s <- solve_LSAP(m, maximum=TRUE) m.s <- m[,i.s] i.l <- apply(lp.assign(-m)$solution, 1, which.max) m.l <- m[,i.l] i.t <- matchindex(m, method="tracemax") m.t <- m[i.t$row, i.t$col] i.r <- matchindex(m, method="truematch") m.r <- m[i.r$row, i.r$col] i.m <- matchindex(m, method="truecluster") m.m <- m[i.m$row, i.m$col] x["e"] <- x["e"] + sum(diag(m.e)) x["s"] <- x["s"] + sum(diag(m.s)) x["l"] <- x["l"] + sum(diag(m.l)) x["t"] <- x["t"] + sum(diag(m.t)) x["r"] <- x["r"] + sum(diag(m.r)) x["m"] <- x["m"] + sum(diag(m.m)) } x / max(x) cat("Check speed\n") K <- 2^(2:9) n <- length(K) tim <- matrix(NA, nrow=n, ncol=3, dimnames=list(NULL, c("s","m","t"))) for (i in 1:n){ k <- K[i] m <- table(sample(1:k, 1000000, TRUE), sample(1:k, 1000000, TRUE)) tim[i,"s"] <- system.time({solve_LSAP(m)})[3] tim[i,"m"] <- system.time({matchindex(m, method="truecluster")})[3] tim[i,"t"] <- system.time({matchindex(m, method="tracemax")})[3] matplot(K, tim, pch=c("s","m","t")) } ## End(Not run)