extract_plot {fabia}R Documentation

Extraction of Biclusters and Plotting of the Results

Description

extract_plot: R implementation of extract_plot.

Usage


extract_plot(X,L,Z,thresZ=0.5,ti,thresL=NULL,Y=NULL,x11b=TRUE,norm=1)

Arguments

X original data matrix.
L loading, left matrix.
Z factor, right matrix.
thresZ threshold for sample belonging to bicluster (default 0.5).
thresL threshold for loading belonging to bicluster (estimated if not given).
ti plot title.
Y noise free data matrix.
x11b plot on screen.
norm should the data be standardized, default = 1 (yes, using mean), 2 (yes, using median).

Details

Essentially the model is the sum of outer products of vectors. The number of summands p is the number of biclusters.

X = L Z + U

X = sum_{i=1}^{p} L_i (Z_i )^T + U

The hidden dimension p is used for kmeans clustering of L_i and Z_i .

The L_i and Z_i are used to extract the bicluster i, where a threshold determines which observations and which samples belong the the bicluster.

The method produces a couple of plots given below.

Plots: “Y”: noise free data (if available), “X”: data, “LZ”: reconstructed data, “LZ-X”: error, “abs(Z)”: absolute factors, “abs(L)”: absolute loadings, “abs(nL)”: absolute loadings normalized, “abs(nZ)”: absolute factors normalized, “nZ*pmZ”: factors sorted, “pmL*nL”: loadings sorted, “pmL*L*z*pmZ”: reconstructed matrix sorted, “pmL*X*pmZ”: original matrix sorted.

In above plots the matrix L and the matrix Z are sorted. For sorting first kmeans is on the p dimensional space is performed and then the vectors which belong to the same cluster are put together in the sorting. This sorting is made for visualization but in general it is not possible to visualize all biclusters as blocks if they overlap.

In bic the biclusters are extracted according to the largest absolute values of the component i, i.e. the largest values of L_i and the largest values of Z_i . The factors Z_i are normalized to variance 1.

The components of bic are bin, bixv, bixn, biypv, biypn, biynv, and biynn. bin gives the size of the bicluster: number observations, number positive samples, number negative samples. bixv gives the values of the observations that have absolute values above a threshold. They are sorted and bixn gives their names (e.g. gene names). biypv gives the values of the samples that have values above a threshold. They are sorted and biypn gives their names (e.g. sample names). biynv gives the values of the samples that have values below this threshold. They are sorted and biynn gives their names (e.g. sample names).

That means the samples are divided into two groups where one group shows large positive values and the other group has negative values with large absolute values. That means a observation pattern can be switched on or switched off relative to the average value.

numn gives the indexes of bic with components: numn1 = bix ,numn2 = biyp, and numn3 = biyn.

The kmeans clusters are given by biclust with components biclustx (the clustered observations) and biclusty (the clustered samples).

Implementation in R.

Value

bic extracted biclusters.
numn indexes for the extracted biclusters.
biclust clusters of kmeans clustering.
pmZ permutation matrix of z from kmeans clustering.
pmL permutation matrix of Lambda from kmeans clustering.
nL normalized loadings (left matrix).
nZ normalized factors (right matrix).
Xord sorted original matrix according to kmeans on Z and kmeans on Lambda.

Author(s)

Sepp Hochreiter

See Also

fabi, fabia, fabiap, fabias, fabiasp, mfsc, nmfdiv, nmfeu, nmfsc, nprojfunc, projfunc, make_fabi_data, make_fabi_data_blocks, make_fabi_data_pos, make_fabi_data_blocks_pos, extract_bic, myImagePlot, PlotBicluster, Breast_A, DLBCL_B, Multi_A, fabiaDemo, fabiaVersion

Examples


#---------------
# TEST
#---------------

dat <- make_fabi_data_blocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]
X <- X- rowMeans(X)
XX <- (1/ncol(X))*tcrossprod(X)
dXX <- 1/sqrt(diag(XX)+0.001*as.vector(rep(1,nrow(X))))
X <- dXX*X


resEx <- fabia(X,20,0.3,1.0,1.0,3)

rEx <- extract_plot(X,resEx$L,resEx$Z,ti="FABIA",Y=Y,x11b=FALSE)

rEx$bic[1,]
rEx$bic[2,]
rEx$bic[3,]
rEx$biclust[1,]
rEx$biclust[2,]
rEx$biclust[3,]

## Not run: 
#---------------
# DEMO1
#---------------

dat <- make_fabi_data_blocks(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]

resToy <- fabia(X,200,0.4,1.0,1.0,13)

rToy <- extract_plot(X,resToy$L,resToy$Z,ti="FABIA",Y=Y)

#---------------
# DEMO2
#---------------

data(Breast_A)

X <- as.matrix(XBreast)

resBreast <- fabia(X,200,0.1,1.0,1.0,5)

rBreast <- extract_plot(X,resBreast$L,resBreast$Z,ti="FABIA Breast cancer(Veer)")

#sorting of predefined labels
CBreast
## End(Not run)

[Package fabia version 0.1.1 Index]