% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/SubtypingOmicsData.R
\name{SubtypingOmicsData}
\alias{SubtypingOmicsData}
\title{SubtypingOmicsData: Subtyping multi-omics data}
\usage{
SubtypingOmicsData(dataList, Kmax = 10, noisePercent = "med", iter = 200,
  kmIter = 50, agreementCutoff = 0.5)
}
\arguments{
\item{dataList}{list of data matrices or data frames. Each matrix represents a data type where the rows are samples and the columns are features. The matrices must have the same set of samples.}

\item{Kmax}{the maximum number of clusters. Default value is 10.}

\item{noisePercent}{the parameter to determine the noise standard deviation. Default is "med", i.e. the noise standard deviation is the medium standard deviation of the features. If noisePercent is numeric, then the noise standard deviation is noisePercent * sd(data).}

\item{iter}{the number of perturbed datasets. Default value is 200.}

\item{kmIter}{the number of initial centers used in k-means clustering.}

\item{agreementCutoff}{agreement threshold to be considered consistent.}
}
\value{
\emph{SubtypingOmicsData} returns a list with at least the following components:
\item{groups}{A vector of labels indicating the cluster to which each sample is allocated in Stage I}
\item{groups2}{A vector of labels indicating the cluster to which each sample is allocated in Stage II}
\item{dataTypeResult}{A list of results for individual data type. Each element of the list is the result of the \emph{PerturbationClustering} for the corresponding data matrix provided in dataList.}
}
\description{
Perform subtyping using multiple types of data
}
\details{
The input is  a list of data matrices where each matrix represents the molecular measurements of a data type. The matrices have the same number of rows (samples). The algorithm first partitions each data type using the function \emph{PerturbationClustering}. It then merges the connectivities across data types into similarity matrices. Similarity-based clustering algorithms, such as partitioning around medoids (pam), hierarchical clustering (hclust), and dynamicTreeCut, are used to partition the built similarity. The algorithm returns the partitioning (from different similarity-based algorithms) that agrees the most with individual data types. That completes Stage I.

In Stage II, the algorithm attempts to split each discovered group if there is a strong agreement between data types, or the if the subtyping in Stage I is very unbalanced.
}
\examples{

#load the kidney cancer carcinoma data
data(KIRC)
#perform subtyping on the multi-omics data
dataList <- list (mydatGE, mydatME, mydatMI) 
names(dataList) = c("GE", "ME", "MI")
result=SubtypingOmicsData(dataList = dataList, Kmax = 10, noisePercent = "med", iter = 50)
# Plot the Kaplan-Meier curves and calculate Cox p-value
library(survival)
groups=result$groups;groups2=result$groups2
a <-intersect(unique(groups2), unique(groups));names(a) <- intersect(unique(groups2), unique(groups)); a[setdiff(unique(groups2), unique(groups))] <- seq(setdiff(unique(groups2), unique(groups)))+max(groups)
colors <- a[levels(factor(groups2))]
coxFit <- coxph(Surv(time = Survival, event = Death) ~ as.factor(groups2), data = survival, ties="exact")
mfit <- survfit(Surv(Survival, Death == 1) ~ as.factor(groups2), data = survival)
plot(mfit, col=colors, main = "Survival curves for KIRC, level 2", xlab = "Days", ylab="Survival", lwd=2)
legend("bottomright", legend = paste("Cox p-value:", round(summary(coxFit)$sctest[3],digits = 5), sep=""))
legend("bottomleft", fill=colors, legend=paste("Group ",levels(factor(groups2)), ": ", table(groups2)[levels(factor(groups2))], sep=""))

}
\references{
Tin Nguyen, Rebecca Tagett, Diana Diaz, and Sorin Draghici (2015) A novel approach for data integration and disease subtyping. Submitted.
}
\seealso{
\code{\link{PerturbationClustering}}, \code{\link{hclust}}, \code{\link{pam}}, \code{\link{dynamicTreeCut}}, \code{\link{clusGap}}
}
\author{
Tin Nguyen and Sorin Draghici
}
