GRNdata

Pau Bellot, Catharina Olsen, Patrick Meyer

2023-10-26

This package contains a large set of gene expressions generated by various simulators collected in what we cal ``Datasource".

The data generated by the simulators is free of noise. The noise could be added later so that it is possible to control its properties independently of the simulators and also to provide fully reproducible tests. This study involves data generated by three different GRN simulators:

GNW

The GNW simulator (Schaffter, Marbach, and Floreano 2011) generates network structures by extracting parts of known real GRN structures capturing several of their important structural properties. To produce gene expression data, the simulator relies on a system of non-linear ordinary differential equations (ODE).

SynTReN

The SynTReN simulator (Van den Bulcke et al. 2006) generates the underlying networks by selecting sub-networks from and organisms. Then the experiments are obtained by simulating equations based on Michaelis-Menten and Hill kinetics under different conditions.

Rogers

The data generator described in (Rogers and Girolami 2005) that will be referred as relies on a power-law distribution on the number of connections of the genes to generate the underling network. The steady state of the system is obtained by integrating a system of differential equations simulating only knockout data.

Datasources

Using these simulators, five large datasources involving many noise-free experiments have been generated. The characteristics of these datasources are detailed in the following Table:

Datasource Topology Experiments Genes Edges
\(Rogers_{1000}\) Power-law tail topology 1000 1000 1350
\(SynTReN_{300}\) E. coli 800 300 468
\(SynTReN_{1000}\) E. coli 1000 1000 4695
\(GNW_{1565}\) E. coli 1565 1565 7264
\(GNW_{2000}\) Yeast 2000 2000 10392

In order to generate these datasources we have simulated multifactorial data with SynTReN and GNW, which is a less informative data (Marbach et al. 2010).


References:

Marbach, Daniel, Robert J Prill, Thomas Schaffter, Claudio Mattiussi, Dario Floreano, and Gustavo Stolovitzky. 2010. “Revealing Strengths and Weaknesses of Methods for Gene Network Inference.” Proceedings of the National Academy of Sciences 107 (14): 6286–91.

Rogers, Simon, and Mark Girolami. 2005. “A Bayesian Regression Approach to the Inference of Regulatory Networks from Gene Expression Data.” Bioinformatics 21 (14): 3131–7.

Schaffter, Thomas, Daniel Marbach, and Dario Floreano. 2011. “GeneNetWeaver: In Silico Benchmark Generation and Performance Profiling of Network Inference Methods.” Bioinformatics 27 (16): 2263–70.

Van den Bulcke, Tim, Koenraad Van Leemput, Bart Naudts, Piet van Remortel, Hongwu Ma, Alain Verschoren, Bart De Moor, and Kathleen Marchal. 2006. “SynTReN: A Generator of Synthetic Gene Expression Data for Design and Analysis of Structure Learning Algorithms.” BMC Bioinformatics 7 (1): 43.