% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MFUZZclustersNumber.R
\name{MFUZZclustersNumber}
\alias{MFUZZclustersNumber}
\title{Automatic choice of the number of clusters to use for
the Mfuzz analysis}
\usage{
MFUZZclustersNumber(
  SEresNorm,
  DATAnorm = TRUE,
  Method = "hcpc",
  Max.clust = 3,
  Min.std = 0.1,
  Plot.Cluster = TRUE,
  path.result = NULL
)
}
\arguments{
\item{SEresNorm}{Results of the function
\code{\link[=DATAnormalization]{DATAnormalization()}}.}

\item{DATAnorm}{\code{TRUE} or \code{FALSE}. \code{TRUE} as default.
\code{TRUE} means the function uses the normalized data.
\code{FALSE} means the function uses the raw counts data.}

\item{Method}{"kmeans" or "hcpc". The method used for selecting the number
of cluster to be used for the temporal cluster analysis (see \code{Details}).
\code{Method="kmeans"} is advised for large number of genes.}

\item{Max.clust}{Integer strictly superior to 1 indicating
the maximum number of clusters. The default is \code{Max.clust=10}.}

\item{Min.std}{Numeric positive value. All genes where their
standard deviations are smaller than the threshold Min.std will be excluded.}

\item{Plot.Cluster}{\code{TRUE} or \code{FALSE}. \code{TRUE} as default.
If \code{TRUE}, the output graph will be plotted.
Otherwise the graph will be plotted.}

\item{path.result}{Character or \code{NULL}.
Path to save the plot described in the section \code{Value}.
If \code{NULL}, the graph will not be saved in a folder.
\code{NULL} as default.}
}
\value{
The function returns the same SummarizedExperiment class object
\code{SEresNorm} with the different elements below,
saved in the metadata \code{Results[[1]][[4]]} of \code{SEresNorm},
\itemize{
\item the optimal number of clusters for each biological condition
(between 2 and \code{Max.clust}).
\item a data.frame with (\eqn{N_{bc}+1}) columns and \code{Max.clust} rows
with \eqn{N_{bc}} the number of biological conditions.
\itemize{
\item If \code{Method="kmeans"}, the ith rows and the jth column correspond
to the within-cluster intertia (see \code{tot.withinss} from
\code{\link[stats:kmeans]{stats::kmeans()}})
dividing by the sum of the variance of each row of \code{ExprData}
of the (j-1)th biological condition computed by
\code{\link[stats:kmeans]{stats::kmeans()}}
with i clusters.
When there is only one cluster, the within-cluster intertia
corresponds to the sum of the variance of each row of
\code{ExprData} (see \code{Details}).
The first column contains integers between 1 and \code{Max.clust}
which corresponds to the number of clusters selected for the
\code{\link[stats:kmeans]{stats::kmeans()}}
analysis.
\item If \code{Method="hcpc"}, the jth column correspond to the clustering
heights (see the output \code{height} from
\code{\link[FactoMineR:HCPC]{FactoMineR::HCPC()}})
dividing by the maximum value of \code{height}.
The first column contains integers between 1 and \code{Max.clust}
which corresponds to the number of clusters selected for the
\code{\link[stats:kmeans]{stats::kmeans()}}
analysis.
}
\item a plot which gives
\itemize{
\item If \code{Method="kmeans"}, the evolution of the weighted
within-cluster intertia per number of clusters
(from 1 to \code{Max.clust}) for each biological condition.
The optimal number of cluster for each biological condition
will be colored in blue.
\item If \code{Method="hcpc"}, the evolution of the scaled height per
number of clusters (from 1 to \code{Max.clust})
for each biological condition.
The optimal number of cluster for each biological condition will be
colored in blue.
}
}
}
\description{
The function uses
\code{\link[stats:kmeans]{stats::kmeans()}} or
\code{\link[FactoMineR:HCPC]{FactoMineR::HCPC()}}
in order to compute the number of cluster for the
\code{\link[Mfuzz:mfuzz]{Mfuzz::mfuzz()}} analysis.
}
\details{
All results are built from the results of our function
\code{\link[=DATAnormalization]{DATAnormalization()}}.

The \code{Mfuzz} package works with datasets where rows correspond to genes
and columns correspond to times.
If \code{RawCounts} (input of our function
\code{\link[=DATAprepSE]{DATAprepSE()}})
contains several replicates per time,
the algorithm computes the mean of replicates for each gene before using
\code{\link[Mfuzz:mfuzz]{Mfuzz::mfuzz()}}.
When there are several biological conditions, the algorithm realizes
the \code{\link[Mfuzz:mfuzz]{Mfuzz::mfuzz()}}
analysis for each biological condition.

The kmeans method or the hierarchical clustering method,
respectively included in
\code{\link[stats:kmeans]{stats::kmeans()}} and
\code{\link[FactoMineR:HCPC]{FactoMineR::HCPC()}},
is used in order to compute the optimal number of clusters.
If there are several biological conditions, the algorithm computes
one optimal number of clusters per biological condition.
}
\examples{
## Data simulation
set.seed(33)
DATAclustSIM <- matrix(rnorm(12*10*3, sd=0.2,
                             mean=rep(c(rep(c(1, 6, 9, 4, 3, 1,
                                              6.5, 0.7, 10), times=2),
                                        rep(c(2, 3.6, 3.7, 5, 7.9, 8,
                                              7.5, 3.5, 3.4), times=2)),
                                      each=10)),
                       nrow=30, ncol=12)
DATAclustSIM <- floor(DATAclustSIM*100)
##
colnames(DATAclustSIM) <- c("G1_t0_r1", "G1_t1_r1", "G1_t2_r1",
                            "G1_t0_r2", "G1_t1_r2", "G1_t2_r2",
                            "G2_t0_r3", "G2_t1_r3", "G2_t2_r3",
                            "G2_t0_r4", "G2_t1_r4", "G2_t2_r4")
##------------------------------------------------------------------------##
## Plot the temporal expression of each individual
graphics::matplot(t(rbind(DATAclustSIM[, 1:3], DATAclustSIM[, 4:6],
                          DATAclustSIM[, 7:9], DATAclustSIM[, 10:12])),
                  col=rep(c("black", "red"), each=6*10),
                  xlab="Time", ylab="Gene expression", type=c("b"), pch=19)

##------------------------------------------------------------------------##
## Preprocessing step
DATAclustSIM <- data.frame(DATAclustSIM)

resDATAprepSE <- DATAprepSE(RawCounts=DATAclustSIM,
                            Column.gene=NULL,
                            Group.position=1,
                            Time.position=2,
                            Individual.position=3)
## Normalization
resNorm <- DATAnormalization(SEres=resDATAprepSE,
                             Normalization="rle",
                             Plot.Boxplot=FALSE,
                             Colored.By.Factors=FALSE)

##------------------------------------------------------------------------##
resMFUZZcluster <- MFUZZclustersNumber(SEresNorm=resNorm,
                                       DATAnorm=FALSE,
                                       Method="hcpc",
                                       Max.clust=5,
                                       Plot.Cluster=TRUE,
                                       path.result=NULL)
}
\seealso{
The function is called by
\code{\link[=MFUZZanalysis]{MFUZZanalysis()}}.
}
