## Main Functions

### KBoost(X, TFs, prior_weights, g, v, ite)

Function to infer gene regulatory network from gene expression data.

Input:

`X`

: an NxG matrix where N is the number of observations and G the number of genes.`TFs`

: a vector of numerical indexes of the K genes in X that are TFs (default 1:G).`prior_weights`

: a GxK matrix with the prior probabilities of each interaction (default is 0.5 for all values).`g`

: a positive scalar that corresponds to the width parameter in the RBF Kernel (default 40).`v`

: a positive scalar lower than 1 that is the shrinkage parameter for each boosting iteration (default 0.1).`ite`

: an integer that represents the maximum number of iterations (default 3).

Output:

List with the following fields:

`GRN`

: A matrix with the gene regulatory network.`GRN_UP`

: A matrix with the gene regulatory network before the heuristic step of multiplying each column by its variance.`prior`

: The prior for the best model at each iteration.`model`

: the transcription factors with the highest posteriors at each iteration per gene.`prior_weights`

: a GxK matrix with the prior probabilities of each interaction.`g`

: a positive scalar that corresponds to the width parameter in the RBF Kernel.`v`

: a positive scalar lower than 1 that is the shrinkage parameter for each boosting iteration.`ite`

: an integer that represents the maximum number of iterations.

### KBoost_human_symbol(X, gen_names, g, v, ite, pos_weight, neg_weight)

Function to infer gene regulatory network from human cell lines or patient samples. This function automatically builds a prior from *Gerstein et al. (2012)* and uses the list of TFs from *Lambert et al. (2018)*. The gene expression data needs to be a numerical matrix.

Input:

`X`

: an NxG numeric matrix with the expression values of G genes and N obersvations. The gene names can be specified as column names.`gen_names`

: a set of SYMBOL gene names that correspond to the names of the columns of X. Not required if column names of X are already gene names.`g`

: a positive scalar with the width parameter for the RBF kernel. (default = 40).`v`

: a number between 0 and 1 with the shrinkage parameter. (default = 0.1).`ite`

: an integer with the number of iterations (default = 3).`pos_weight`

: the prior weight for edges that were previously found in the*Gerstein et al.*network (default = 0.6).`neg_weight`

: the prior weight for edges that were not found in the*Gerstein et al.*network (default = 0.5).

Output:

List with the following fields:

`GRN`

: A matrix with the gene regulatory network.`GRN_UP`

: A matrix with the gene regulatory network before the heuristic step of multiplying each column by its variance.`prior`

: The prior for the best model at each iteration.`model`

: the transcription factors with the highest posteriors at each iteration per gene.`prior_weights`

: a GxK matrix with the prior probabilities of each interaction.`g`

: a positive scalar that corresponds to the width parameter in the RBF Kernel.`v`

: a positive scalar smaller than 1 that is the shrinkage parameter for each boosting iteration.`ite`

: an integer that represents the maximum number of iterations.

### AUPR_AUROC_matrix(Net, G_mat, auto_remove, TFs, upper_limit)

Function to calculate the AUROC and AUPR of a known network.

Input:

`Net`

: An inferred network with the predictive probabilities that each transcription factor regulates each gene.`G_mat`

: A matrix with the gold standard network.`auto_remove`

: TRUE if the auto-regulation is to be discarded.`TFs`

: the indexes of the rows of Net that are TFs.`upper_limit`

: Max number of edges to use (default = all possible edges).

Output:

List with the following fields:

`AUPR`

: the area under the precision-recall (PR) curve.`AUROC`

: the area under the receiver operator characteristic (ROC) curve.`th`

: All the unique values of Net.`Prec`

: The precision at each value of th.`Rec`

: The recall at each value of th.`FPR`

: The false positive rate at each value of th.`TP`

: The true positives at each value of th.`FP`

: The false positives at each value of th.`TN`

: The true negatives at each value of th.`FN`

: The false negatives at each value of th.

### d4_mfac(v, g, ite)

Function to produce the KBoost AUPR and AUROC results on the DREAM4 Multifactorial Challenge.

Input:

`g`

: a number larger than 0 that is the width parameter for the RBF Kernel`v`

: a number between 0 and 1 that is the shrinkage parameter`ite`

: an integer with number of iterations.

Output:

`auprs`

: a matrix with the AUPR per D4 multifactorial dataset.`aurocs`

: a matrix with the AUROC per D4 multifactorial dataset.

`get_prior_Gerstein(gen_names, TFs, pos_weight, neg_weight)`

Function to build a prior from a previously built Network on ChIP-Seq from *Gerstein et al. (2012)*.

Input:

`gen_names`

: the gene names of the G genes in the user’s subset in Symbol nomenclature.`TFs`

: the indexes of the K genes in the user’s subset which are TFs.`pos_weight`

: the prior weight for edges that were previously found in the*Gerstein et al.*network`neg_weight`

: the prior weight for edges that were not found in the*Gerstein et al.*network

Output:

`prior_weights`

: a GxK matrix with prior weights that a TF regulates a gene given the network published by*Gerstein et al.*

### grid_search_kboost(dataset, vs, gs, ite)

Function to perform a grid search and find the best hyperparameters.

Input:

`dataset`

: One of the three datasets in the package, 1 for IRMA, 2 for DREAM4 multifactorial and 3 for DREAM5.`vs`

: The range of values of v. All values need to be between 0 and 1.`gs`

: The range of values of g. All values need to be larger than 0.`ite`

: An integer that is the number of iterations.

Output:

List with the following fields:

`aurocs`

: a 3 dimensional marray with the AUROCs. Columns are the gs, the rows the datasets, vs, and the last dimension is the different datasets within a dataset.`auprs`

: a 3 dimensional matrix with the AUPRs. Columns are the gs, the rows the datasets, vs, and the last dimension is the different datasets within a dataset.

### irma_check(g, v, ite)

Function to produce the AUPR and AUROC Results on the DREAM4 Multifactorial Challenge.

Input:

`g`

: a number larger than 0 that is the width parameter for the RBF Kernel`v`

: a number between 0 and 1 that is the shrinkage parameter`ite`

: an integer with number of iterations.

Output:

`auprs`

: a matrix with the AUPR per IRMA dataset.`aurocs`

: a matrix with the AUROC per IRMA dataset.

### net_dist_bin(GRN,TFs,thr)

Function to calculate the shortest distance between nodes.

Input:

`GRN`

: An inferred networks with the predictive probabilities that a transcription factor regulates a gene.`TFs`

: A vector with indexes of the rows of GRN which correspond to TFs.`thr`

: A scalar between 0 and 1 that is used select the edges with large posterior probabilities.

Output:

`dist_mat`

: A matrix with the shortest distances between TFs (columns) and all genes (rows).

Example:

### net_summary_bin(GRN,TFs,thr,a,b)

Function to summarize the GRN filtered with a threshold.

Input:

`GRN`

: An inferred networks with the predictive probabilities that a transcription facor regulates a gene.`TFs`

: A vector with indexes of the rows of GRN which correspond to TFs.`thr`

: a scalar between 0 and 1, edges with posterior probabilities lower than thr will be discarded.`a`

: a scalar for the Katz and PageRank centrality measures. Default the inverse of the largest eigenvalue of GRN.`b`

: a scalar for the Katz and PageRank centrality measures. Default is 1.

Output: List with the following fields:

`GRN_table`

: a sorted table version of the GRN.`Outdegree`

: the outdegree of each TF.`Indegree`

: the indegree of each gene.

`Close_centr`

: A matrix with the closeness centrality measure per TF.

Example:

### net_refine(Net)

Function to do a heuristic post-processing suggested by Slawek and Arodz that improves accuracy. Each column is multiplied by its variance.

Input:

`Net`

: a GRN with TFs in the columns.

Output:

`Net`

: a refined GRN.

### write_GRN_D4(GRN,TFs, filename)

Function to write output in DREAM4 Challenge Format.

Input:

`GRN`

: a GxK gene regulatory network.`TFs`

: a K set of indixes of G that are TFs.`filename`

: a string with the name of the file to store the GRN.