OncoSimulR is an individual- or clone-based forward-time genetic simulator for biallelic markers (wildtype vs. mutated) in asexually reproducing populations without spatial structure (perfect mixing). Its design emphasizes flexible specification of fitness and mutator effects.
OncoSimulR was originally developed to simulate tumor progression with emphasis on allowing users to set restrictions in the accumulation of mutations as specified, for example, by Oncogenetic Trees (OT: Desper et al. 1999; Szabo and Boucher 2008) or Conjunctive Bayesian Networks (CBN: Beerenwinkel, Eriksson, and Sturmfels 2007; Gerstung et al. 2009, 2011), with the possibility of adding passenger mutations to the simulations and allowing for several types of sampling.
Since then, OncoSimulR has been vastly extended to allow you to specify other types of restrictions in the accumulation of genes, such as the XOR models of Korsunsky et al. (2014) or the “semimonotone” model of Farahani and Lagergren (2013). Moreover, different fitness effects related to the order in which mutations appear can also be incorporated, involving arbitrary numbers of genes. This is very different from “restrictions in the order of accumulation of mutations”. With order effects, described in a recent cancer paper by Ortmann and collaborators (Ortmann et al. 2015), the effect of having both mutations “A” and “B” differs depending on whether “A” appeared before or after “B” (the actual case involves genes JAK2 and TET2).
More generally, OncoSimulR now also allows you to specify arbitrary epistatic interactions between arbitrary collections of genes and to model, for example, synthetic mortality or synthetic viability (again, involving an arbitrary number of genes, some of which might also depend on other genes, or show order effects with other genes). Moreover, it is possible to specify the above interactions in terms of modules, not genes. This idea is discussed in, for example, Raphael and Vandin (2015) and Gerstung et al. (2011): the restrictions encoded in, say, CBNs or OT can be considered to apply not to genes, but to modules, where each module is a set of genes (and the intersection between modules is the empty set) that performs a specific biological function. Modules, then, play the role of a “union operation” over the set of genes in a module. In addition, arbitrary numbers of genes without interactions (and with fitness effects coming from any distribution you might want) are also possible.
You can also directly specify the mapping between genotypes and fitness and, thus, you can simulate on fitness landscapes of arbitrary complexity.
It is now (released initially in this repo as the freq-dep-fitness branch on February 2019) also possible to simulate scenarios with frequency-dependent fitness, where the fitness of one or more genotypes depends on the relative or absolute frequencies of other genotypes, as in game theory and adaptive dynamics. This makes it possible to model predation and parasitism, cooperation and mutualism, and commensalism. It also allows to model therapeutic interventions (where fitness changes at specified time points or as a function of the total populations size).
Simulations can start from arbitrary initial population compositions and it is also possible to simulate multiple species. Thus, simulations that involve both ecological and evolutionary processes are possible.
Mutator/antimutator genes, genes that alter the mutation rate of other genes (Gerrish et al. 2007; Tomlinson, Novelli, and Bodmer 1996), can also be simulated with OncoSimulR and specified with most of the mechanisms above (you can have, for instance, interactions between mutator genes). And, regardless of the presence or not of other mutator/antimutator genes, different genes can have different mutation rates.
Simulations can be stopped as a function of total population size, number of mutated driver genes, or number of time periods. Simulations can also be stopped with a stochastic detection mechanism where the probability of detecting a tumor increases with total population size. Simulations return the number of cells of every genotype/clone at each of the sampling periods and we can take samples from the former with single-cell or whole- tumor resolution, adding noise if we want. If we ask for them, simulations also store and return the genealogical relationships of all clones generated during the simulation.
The models so far implemented are all continuous time models, which are simulated using the BNB algorithm of Mather, Hasty, and Tsimring (2012). The core of the code is implemented in C++, providing for fast execution. To help with simulation studies, code to simulate random graphs of the kind often seen in CBNs, OTs, etc, is also available. Finally, OncoSimulR also allows for the generation of random fitness landscapes and the representation of fitness landscapes and provides statistics of evolutionary predictability.
As mentioned above, OncoSimulR is now a very general package for forward genetic simulation, with applicability well beyond tumor progression. This is a summary of some of its key features:
Restrictions in the accumulations of mutations, as specified by Oncogenetic Trees (OTs), Conjunctive Bayesian Networks (CBNs), semimonotone progression networks, and XOR relationships.
You can add arbitrary numbers of non-interacting genes with arbitrary fitness effects.
you can allow for deviations from the OT, CBN, semimonotone, and XOR models, specifying a penalty for such deviations (the \(s_h\) parameter).
You can conduct multiple simulations, and sample from them with different temporal schemes and using both whole tumor or single cell sampling.
You can stop the simulations using a flexible combination of conditions: final time, number of drivers, population size, fixation of certain genotypes, and a stochastic stopping mechanism that depends on population size.
Right now, three different models are available, two that lead to exponential growth, one of them loosely based on Bozic et al. (2010), and another that leads to logistic-like growth, based on McFarland et al. (2013).
You can use large numbers of genes (e.g., see an example of 50000 in section 6.5.3).
You can obtain the true sequence of events and the phylogenetic relationships between clones (see section 15.1 for the details of what we mean by “clone”).
You can generate random fitness landscapes (under the House of Cards, Rough Mount Fuji, or additive models, or combinations of the former and under the NK model) and use those landscapes as input to the simulation functions.
You can plot fitness landscapes.
You can obtain statistics of evolutionary predictability from the simulations.
You can now also use simulations with frequency-dependent fitness: fitness (birth rate) is not fixed for a genotype, but can be a function of the frequecies of the clones (see section 10). We can therefore use OncoSimulR to examine, via simulations, results from game theory and adaptive dynamics and study complex scenarios that are not amenable to analytical solutions. More generally, we can model predation and parasitism, cooperation and mutualism, and commensalism.
It is possible to start the simulation with arbitrary initial composition (section 6.7) and to simulate multiple species (section 6.8). You can thus run simulations that involve both ecological and evolutionary processes involving inter-species relationships plus genetic restrictions in evolution.
It is possible to simulate many different therapeutic
interventions. Section 12 shows examples of
interventions where certain genotypes change fitness (because of
chemotherapy) at specified times. More generally, since fitness
(birth rates) can be made a function of total populations sizes
and/or frequencies (see section 10), many different
arbitrary intervention schemes can be simulated. Possible models
are, of course, not limited to cancer chemotherapy, but could
treatment of bacteria, antiviral therapy, etc.
The table below, modified from the table at the Genetics Simulation Resources (GSR) page, provides a summary of the key features of OncoSimulR. (An explanation of the meaning of terms specific to the GSR table is available from https://popmodels.cancercontrol.cancer.gov/gsr/search/ or from the Genetics Simulation Resources table itself, by moving the mouse over each term).
|Type of Simulated Data||Haploid DNA Sequence|
|Variations||Biallelic Marker, Genotype or Sequencing Error|
|Type of Dynamical Model||Continuous time|
|Entities Tracked||Clones (see 15.2)|
|Input||Program specific (R data frames and matrices specifying genotypes’ fitness, gene effects, and starting genotype)|
|Data Type||Genotype or Sequence, Individual Relationship (complete parent-child relationships between clones), Demographic (populations sizes of all clones at sampling times), Diversity Measures (LOD, POM, diversity of genotypes), Fitness|
|Sample Type||Random or Independent, Longitudinal, Other (proportional to population size)|
|Mating Scheme||Asexual Reproduction|
|Population Size Changes||Exponential (two models), Logistic (McFarland et al., 2013)|
|Birth Rate||Individually Determined from Genotype (models “Exp”, “McFL”, “McFLD”). Frequency-Dependently Determined from Genotype (models “Exp”, “McFL”, “McFLD”)|
|Death Rate||Individually Determined from Genotype (model “Bozic”), Influenced by Environment —population size (models “McFL” and “McFLD”)|
|Determinant||Single and Multi-locus, Fitness of Offspring, Environmental Factors (population size, genotype frequencies)|
|Models||Directional Selection, Multi-locus models, Epistasis, Random Fitness Effects, Frequency-Dependent|
|Mutation Models||Two-allele Mutation Model (wildtype, mutant), without back mutation|
|Events Allowed||Varying Genetic Features: change of individual mutation rates (mutator/antimutator genes)|
|Spatial Structure||No Spatial Structure (perfectly mixed and no migration)|
Further details about the original motivation for wanting to simulate data this way in the context of tumor progression can be found in Diaz-Uriarte (2015), where additional comments about model parameters and caveats are discussed.
Are there similar programs? The Java program by Reiter et al. (2013), TTP, offers somewhat similar functionality to the previous version of OncoSimulR, but it is restricted to at most four drivers (whereas v.1 of OncoSimulR allowed for up to 64), you cannot use arbitrary CBNs or OTs (or XORs or semimonotone graphs) to specify restrictions, there is no allowance for passengers, and a single type of model (a discrete time Galton-Watson process) is implemented. The current functionality of OncoSimulR goes well beyond the the previous version (and, thus, also the TPT of Reiter et al. (2013)). We now allow you to specify all types of fitness effects in other general forward genetic simulators such as FFPopSim (Zanini and Neher 2012), and some that, to our knowledge (e.g., order effects) are not available from any genetics simulator. In addition, the “Lego system” to flexibly combine different fitness specifications is also unique; by “Lego system” I mean that we can combine different pieces and blocks, similarly to what we do with Lego bricks. (I find this an intuitive and very graphical analogy, which I have copied from Hothorn et al. (2006) and Hothorn et al. (2008)). In a nutshell, salient features of OncoSimulR compared to other simulators are the unparalleled flexibility to specify fitness and mutator effects, with modules and order effects as particularly unique, and the options for sampling and stopping the simulations, particularly convenient in cancer evolution models. Also unique in this type of software is the addition of functions for simulating fitness landscapes and assessing evolutionary predictability.
OncoSimulR can be used to address questions that span from the effect of mutator genes in cancer to the interplay between fitness landscapes and mutation rates. The main types of questions that OncoSimulR can help address involve combinations of:
Examining times to evolutionarily or biomedically relevant events
(fixation of genotypes, reaching a minimal size, acquiring a
minimal number of driver genes, etc —specified with the stopping
conditions to the
samplePop) that are related to:
sampledGenotypes) and any other inferences that depend on the observational process.
Tracking the genealogical relationships of clones
plotClonePhylog) and assessing evolutionary predictability
Some specific questions that you can address with the help of OncoSimulR are discussed in section 1.3.
A quick overview of the main functions and their relationships is shown in Figure 1.1, where we use italics for the type/class of R object and courier font for the name of the functions.