Author: Steffi Grote
Date: April 04, 2017
ABAEnrichment is designed to test user-defined genes for expression enrichment in different human brain regions.
The package integrates the expression of the input gene set and the structural information of the brain using an ontology, both provided by the Allen Brain Atlas project [1-4].
The statistical analysis is performed by the core function
aba_enrich which interfaces with the ontology enrichment software FUNC .
Additional functions provided in this package are
get_annotated_genes supporting the exploration and visualization of the expression data.
The package incorporates three different brain expression datasets:
All three datasets are filtered for protein-coding genes and gene expression is averaged across donors. Although the third dataset does not contain expression data, but a derived score, for simplicity we only refer to 'expression' in this documentation. For details on the datasets see the ABAData vignette.
Using the ontology that describes the hierarchical organization of the brain, brain regions get annotated all genes that are expressed in the brain region itself or in any of its substructures.
The boundary between 'expressed' and 'not expressed' is defined by different expression quantiles (e.g. using a quantile of 0.4, the lowest 40% of gene expression in the brain are considered 'not expressed' and the upper 60% are considered 'expressed').
These cutoffs are set with the parameter
cutoff_quantiles and an analysis is run for every cutoff separately. The default cutoffs are 10% to 90% in steps of 10%.
The enrichment analysis is performed by using either the hypergeometric or the Wilcoxon rank-sum test implemented in the ontology enrichment software FUNC . The hypergeometric test evaluates the enrichment of annotated (expressed) candidate genes compared to annotated background genes for each brain region. The background genes can be defined explicitly like the candidate genes or, by default, consist of all protein-coding genes from the dataset that are not contained in the set of candidate genes. In contrast to this binary distinction between candidate and background genes, the Wilcoxon rank-sum test uses user-defined scores that are assigned to the input genes. It then tests every brain region for an enrichment of genes with high scores in the set of expressed input genes.
To account for multiple testing, FUNC computes the family-wise error rate (FWER) using randomsets.
The randomsets are generated by permuting candidate and background genes or the scores assigned to genes for the hypergeometric and Wilcoxon rank-sum test, respectively (see Schematic 1 below).
This is also the default behavior in ABAEnrichment.
For the hypergeometric test, ABAEnrichment additionally provides the option to correlate the chance of a background gene to be selected as a random candidate gene with the length of the background gene (option
Furthermore, instead of defining genes explicitly, whole genomic regions can be provided as input.
ABAEnrichment then tests brain regions for enrichment of expressed genes located in the candidate regions, compared to expressed genes located in the background regions.
The randomsets then also consist of randomly chosen candidate regions inside the background regions, either as a whole block in one background region (default), or on the same chromosome allowing to overlap multiple background regions on that chromosome (option
circ_chrom, see Schematic 2 below).
|aba_enrich||core function for performing enrichment analyses given a candidate gene set|
|get_expression||returns expression data for a given set of genes and brain regions|
|plot_expression||plots a heatmap with expression data for a given set of genes and brain regions|
|get_name||returns the full name of a brain region given a structure ID|
|get_sampled_substructures||returns the substructures of a given brain region that have expression data available|
|get_superstructures||returns the superstructures of a given brain region|
|get_id||NEW: returns the structure ID given the name of a brain region|
|get_annotated_genes||NEW: returns genes annotated to enriched or user-defined brain regions|
For a random set of 13 candidate genes, two analyses to identify human brain regions with enriched expression of the candidate genes are performed: one using data from adult donors (from Allen Human Brain Atlas ) and one using data from five developmental stages (from BrainSpan Atlas of the Developing Human Brain ).
To run a hypergeometric test, a binary vector with
1 for a candidate gene and
0 for a background gene needs to be defined. The names of the vector elements are the corresponding gene identifiers (Entrez-ID, Ensembl-ID or gene-symbol).
In this example no background genes are defined, so all remaining protein-coding genes of the dataset are used as default background.
## load ABAEnrichment package library(ABAEnrichment) ## create input vector with candidate genes gene_ids = c('NCAPG', 'APOL4', 'NGFR', 'NXPH4', 'C21orf59', 'CACNG2', 'AGTR1', 'ANO1', 'BTBD3', 'MTUS1', 'CALB1', 'GYG1', 'PAX2') genes = rep(1, length(gene_ids)) names(genes) = gene_ids genes
## NCAPG APOL4 NGFR NXPH4 C21orf59 CACNG2 AGTR1 ANO1 BTBD3 MTUS1 CALB1 GYG1 ## 1 1 1 1 1 1 1 1 1 1 1 1 ## PAX2 ## 1
The core function
aba_enrich performs the enrichment analysis.
It takes the
genes vector as input, together with a
dataset argument which is set to
adult (default) or
5_stages for the analyses of the adult and the developing human brain, respectively. An example with the developmental effect score (
dev_effect) can be found below.
## run enrichment analyses with default parameters for the adult and developing human brain res_adult = aba_enrich(genes, dataset='adult') res_devel = aba_enrich(genes, dataset='5_stages')
In the following examples two additional parameters are set to lower computation time:
cutoff_quantiles=c(0.5,0.7,0.9) to use the 50%, 70% and 90% expression quantiles across all genes as the boundary between 'expressed' and 'not expressed' genes, and
n_randsets=100 to use 100 random permutations to calculate the FWER.
n_randsets have default values
## run enrichment analysis with less cutoffs and randomsets to save computation time res_devel = aba_enrich(genes, dataset='5_stages', cutoff_quantiles=c(0.5,0.7,0.9), n_randsets=100)
aba_enrich returns a list, the first element of which contains the results of the statistical analysis for each brain region and age category (analyses are performed independently for each developmental stage):
## extract first element from the output list, which contains the statistics fwers_devel = res_devel[] ## see results for the brain regions with highest enrichment for children (3-11 yrs, age_category 3) fwers_3 = fwers_devel[fwers_devel[,1]==3, ] head(fwers_3)
## age_category structure_id structure times_FWER_under_0.05 mean_FWER ## 55 3 Allen:10657 CBC_cerebellar cortex 0 0.5000000 ## 56 3 Allen:10361 AMY_amygdaloid complex 0 0.9500000 ## 57 3 Allen:10163 M1C_primary motor cortex (area M1, area 4) 0 0.9566667 ## 58 3 Allen:10225 IPC_posteroventral (inferior) parietal cortex 0 0.9733333 ## 59 3 Allen:10173 DFC_dorsolateral prefrontal cortex 0 0.9833333 ## 60 3 Allen:10161 FCx_frontal neocortex 0 0.9866667 ## min_FWER equivalent_structures FWERs ## 55 0.36 Allen:10657;Allen:10656;Allen:10655;Allen:10654;Allen:10653 0.66;0.48;0.36 ## 56 0.85 Allen:10361 0.85;1;1 ## 57 0.87 Allen:10163;Allen:10162 0.87;1;1 ## 58 0.92 Allen:10225;Allen:10214 0.92;1;1 ## 59 0.96 Allen:10173 0.96;1;0.99 ## 60 0.96 Allen:10161 0.96;1;1
The rows in the output data frame are ordered by
mean_FWER; with e.g.
min_FWER denoting the minimum FWER for enrichment of expressed candidate genes in that brain region across all expression cutoffs.
FWERs lists the individual FWERs for each cutoff.
equivalent_structures lists brain regions with identical expression data due to lack of independent expression measurements in all regions.
Nodes (brain regions) in the ontology inherit data from their children (substructures), and in the case of only one child node with expression data, the parent node inherits the child's data leading to identical enrichment statistics.
In addition to the statistics, the list that is returned from
aba_enrich also contains the input genes for which expression data are available, and for each age category the gene expression values that correspond to the requested
## $genes ## AGTR1 ANO1 BTBD3 C21orf59 CACNG2 CALB1 GYG1 MTUS1 NCAPG NGFR NXPH4 PAX2 ## 1 1 1 1 1 1 1 1 1 1 1 1 ## ## $cutoffs ## age_category_1 age_category_2 age_category_3 age_category_4 age_category_5 ## 50% 3.144481 2.854802 2.716617 2.776235 2.862117 ## 70% 7.823920 7.017616 6.897414 6.842193 7.118609 ## 90% 23.768641 22.478328 23.124388 21.625395 22.680811
For example, in the enrichment analysis of age category 2 (infant) with an expression cutoff of 0.7 (70%), genes are considered 'expressed' in a particular brain region when their expression value in that region is at least 7.017616.
The default behavior of
aba_enrich is to permute candidate and background genes randomly to compute the FWER.
With the option
gene_len=TRUE, random selection of background genes as candidate genes is dependent on the gene length, i.e. a gene twice as long as another gene also is twice as likely selected as a candidate gene in a randomset.
This is useful when the procedure that led to the identification of the candidate gene set is also more likely to discover longer genes. Gene coordinates were obtained from http://grch37.ensembl.org/biomart/martview/ (GRCh37.p13). NEW: the option
ref_genome=grch38 uses gene coordinates from the GRCh38 genome (GRCh38.p10) obtained from http://ensembl.org/biomart/martview/.
## run enrichment analysis, with randomsets dependent on gene length res_len = aba_enrich(genes, gene_len=TRUE) ## run the same analysis using gene coordinates from GRCh38 instead of the default GRCh37 res_len_hg20 = aba_enrich(genes, gene_len=TRUE, ref_genome='grch38')
When the genes are not divided into candidate and background genes, but are ranked by scores, a Wilcoxon rank-sum test can be performed to find brain regions with a high proportion of genes with high scores in the set of expressed genes.
genes input vector then contains the scores assigned to the genes, and is named with the respective gene identifiers.
The output is identical to the one produced with the hypergeometric test.
## assign random scores to the genes used above genes = sample(1:50, length(gene_ids)) names(genes) = gene_ids genes
## NCAPG APOL4 NGFR NXPH4 C21orf59 CACNG2 AGTR1 ANO1 BTBD3 MTUS1 CALB1 GYG1 ## 28 21 46 37 24 20 35 44 8 48 49 15 ## PAX2 ## 1
## test for enrichment of expressed genes with high scores in the adult brain using the Wilcoxon rank-sum test res_wilcox = aba_enrich(genes, test='wilcoxon', cutoff_quantiles=c(0.5,0.7,0.9), n_randsets=100)
## age_category structure_id structure times_FWER_under_0.05 mean_FWER ## 1 5 Allen:9017 ICjl_interstitial nucleus of Cajal, Right 0 0.8666667 ## 2 5 Allen:4314 SI_Substantia Innominata, Right 0 0.9000000 ## 3 5 Allen:4292 Acb_Nucleus Accumbens, Right 0 0.9033333 ## 4 5 Allen:4280 HCd_Head of Caudate Nucleus, Left 0 0.9033333 ## 5 5 Allen:9016 ICjl_interstitial nucleus of Cajal, Left 0 0.9033333 ## 6 5 Allen:4284 HCd_Head of Caudate Nucleus, Right 0 0.9033333 ## min_FWER equivalent_structures FWERs ## 1 0.63 Allen:9017 0.99;0.98;0.63 ## 2 0.74 Allen:4314 0.98;0.98;0.74 ## 3 0.74 Allen:4292 0.99;0.98;0.74 ## 4 0.74 Allen:4280 0.99;0.98;0.74 ## 5 0.74 Allen:9016 0.99;0.98;0.74 ## 6 0.74 Allen:4284 0.99;0.98;0.74
Instead of defining candidate and background genes explicitly in the
genes input vector, it is also possible to define entire chromosomal regions as candidate and background regions.
The expression enrichment is then tested for all protein-coding genes located in, or overlapping the candidate regions on the plus or the minus strand. The gene coordinates used to identify those genes were obtained from http://grch37.ensembl.org/biomart/martview/ (GRCh37.p13). NEW: the option
ref_genome=grch38 uses gene coordinates from the hg20 genome (GRCh38.p10) obtained from http://ensembl.org/biomart/martview/.
In comparison to defining candidate and background genes explicitly, this option has the advantage that the FWER accounts for spatial clustering of genes. For the random permutations used to compute the FWER, blocks as long as candidate regions are chosen from the merged candidate and background regions and genes contained in these blocks are considered candidate genes (Schematic 2).
To define chromosomal regions in the input vector, the names of the 1/0 vector have to be of the form
start always has to be smaller than
Note that this option requires the input of background regions.
If multiple candidate regions are provided, in the randomsets they are placed randomly (but without overlap) into the merged candidate and background regions.
The output of
aba_enrich is identical to the one that is produced for single genes.
The second element of the output list contains the candidate and background genes located in the user-defined regions:
## create input vector with a candidate region on chromosome 8 ## and background regions on chromosome 7, 8 and 9 genes = c(1, rep(0,6)) names(genes) = c('8:82000000-83000000', '7:1300000-56800000', '7:74900000-148700000', '8:7400000-44300000', '8:47600000-146300000', '9:0-39200000', '9:69700000-140200000') genes
## 8:82000000-83000000 7:1300000-56800000 7:74900000-148700000 8:7400000-44300000 8:47600000-146300000 ## 1 0 0 0 0 ## 9:0-39200000 9:69700000-140200000 ## 0 0
## run enrichment analysis for the adult human brain res_region = aba_enrich(genes, dataset='adult', cutoff_quantiles=c(0.5,0.7,0.9), n_randsets=100)
## look at the results from the enrichment analysis fwers_region = res_region[] head(fwers_region)
## age_category structure_id structure times_FWER_under_0.05 mean_FWER min_FWER ## 1 5 Allen:4671 MB_Mammillary Body, Left 1 0.2866667 0.02 ## 2 5 Allen:12926 MG_Medial Geniculate Complex 1 0.4500000 0.02 ## 3 5 Allen:4734 He-III_III, Left Lateral Hemisphere 1 0.2500000 0.03 ## 4 5 Allen:4738 He-VI_VI, Left Lateral Hemisphere 1 0.2600000 0.03 ## 5 5 Allen:12909 MB_Mammillary Body 1 0.3266667 0.03 ## 6 5 Allen:4665 MamR_Mammillary Region 1 0.3400000 0.03 ## equivalent_structures FWERs ## 1 Allen:4671 0.57;0.27;0.02 ## 2 Allen:12926 0.6;0.73;0.02 ## 3 Allen:4734 0.35;0.37;0.03 ## 4 Allen:4738 0.37;0.38;0.03 ## 5 Allen:12909 0.59;0.36;0.03 ## 6 Allen:4665 0.61;0.38;0.03
## see which genes are located in the candidate region input_genes = res_region[] candidate_genes = input_genes[input_genes==1] candidate_genes
## CHMP4C FABP4 FABP5 FABP9 FABP12 IMPA1 PAG1 PMP2 SLC10A5 SNX16 ZFAND1 ## 1 1 1 1 1 1 1 1 1 1 1
An alternative method to choose random blocks from the background regions can be used with the option
circ_chrom=TRUE. Every candidate region is then compared to background regions on the same chromosome (Schematic 2). And in contrast to the default
circ_chrom=FALSE, randomly chosen blocks do not have to be located inside a single background region, but are allowed to overlap multiple background regions. This means that a randomly chosen block can start at the end of the last background region and continue at the beginning of the first background region on a given chromosome.
get_expression enables the output of gene and brain region-specific expression data averaged across donors.
By only setting the parameter
structure_ids that defines the brain regions, the
dataset are automatically set to the genes and dataset used in the last enrichment analysis.
In comparison to defining genes and brain regions explicitly this saves some time since some pre-computations on the original dataset, e.g. aggregation of expression per gene, do not have to be redone.
Using the default options (
get_expression returns expression data for the candidate genes. If
background=TRUE, the gene expression data for both, candidate genes and background genes, are returned.
## get expression data for the first 5 brain regions from the last aba_enrich-analysis top_regions = fwers_region[1:5, 'structure_id'] top_regions
##  "Allen:4671" "Allen:12926" "Allen:4734" "Allen:4738" "Allen:12909"
expr = get_expression(top_regions, background=FALSE) head(expr)
## CHMP4C FABP12 FABP4 FABP5 FABP9 IMPA1 PAG1 PMP2 SLC10A5 SNX16 ## Allen:4671 3.315379 1.322007 2.948199 8.331502 1.289105 9.367726 8.846593 10.507419 2.517208 6.726171 ## Allen:4675 2.784213 1.969504 3.043131 9.223920 1.588357 8.954029 7.125901 10.477008 2.137033 7.080715 ## Allen:4672 2.645451 1.744787 2.177720 7.968980 1.448837 9.034155 8.390096 10.483289 2.437719 6.958238 ## Allen:4444 2.586348 1.648789 2.129794 7.979775 1.358975 9.586679 8.861224 11.058100 1.990756 6.375037 ## Allen:4499 3.086266 1.302643 3.250694 9.242289 1.293894 8.830867 8.682235 11.506694 1.704897 6.400059 ## Allen:4734 2.518846 1.477751 1.987022 8.811209 1.377784 7.703972 10.205105 9.345247 3.046310 6.080289 ## ZFAND1 ## Allen:4671 8.877031 ## Allen:4675 8.722243 ## Allen:4672 8.863811 ## Allen:4444 8.401070 ## Allen:4499 8.977492 ## Allen:4734 8.870895
The same output would be created independently of an
aba_enrich analysis by, in addition to
dataset manually. Like in all functions of the ABAEnrichment package
gene_ids can be Entrez-ID, Ensembl-ID or gene-symbol.
## get expression data independent from previous aba_enrich analysis regions = c('Allen:12926', 'Allen:4738', 'Allen:4671', 'Allen:12909', 'Allen:4718') gene_ids = c('CHMP4C', 'FABP12', 'FABP4', 'FABP5', 'FABP9', 'IMPA1', 'PAG1', 'PMP2', 'SLC10A5', 'SNX16', 'ZFAND1') expr2 = get_expression(regions, gene_ids=gene_ids, dataset='adult', background=FALSE)
5_stages dataset the output of
get_expression is a list with a data frame for each developmental stage, where the first element corresponds to the first developmental stage, the second element to the second developmental stage, and so on.
Note that the brain regions passed to
get_expression do not have to match the brain regions returned in the output. This is due to the fact that not all brain regions were measured independently. In case a brain region was not measured directly, all available expression data from its substructures are returned. The function get_sampled_substructures can be used to identify substructures with expression data.
plot_expression enables the visualization of expression data. The usage of
plot_expression is similar to that of
get_expression. Providing only brain regions as input, it plots the expression data for the genes and dataset used in the last
## get expression data for the first 5 brain regions from the last aba_enrich-analysis top_regions = fwers_region[1:5, 'structure_id'] plot_expression(top_regions, background=FALSE)
The optional argument
dendro determines whether or not a dendrogram should be added to the heatplot.
The colored side bar in the plot without dendrogram indicates candidate genes (red) and background genes (black). In this case only candidate gene expression was plotted (with the default option
## plot the same expression data without dendrogram plot_expression(top_regions, dendro=FALSE, background=FALSE)
When plotting expression data following an enrichment analysis with the Wilcoxon rank-sum test, the option
dendro=FALSE results in a side bar that indicates the scores that were used for the enrichment analysis.
plot_expression can also be used independently of an enrichment analysis. In that case the arguments
dataset have to be defined. If the
5_stages dataset is used, the additional argument
age_category selects the developmental stage for which the expression data should be plotted:
## plot expression of some genes for the frontal neocortex (Allen:10161) in age category 3 (children, 3-11 yrs) gene_ids = c('ENSG00000157764', 'ENSG00000163041', 'ENSG00000182158', 'ENSG00000147889', 'ENSG00000103126', 'ENSG00000184634') plot_expression('Allen:10161', gene_ids=gene_ids, dataset='5_stages', age_category=3)
In this example the frontal neocortex (Allen:10161) was not sampled directly for expression measurements, but some of its substructures have expression data annotated. Instead of pooling the data of the substructures, they are plotted separately.
As illustrated in the previous example, some brain regions like frontal neocortex (Allen:10161) do not have gene expression data available in the data set, but some of their substructures do have. Plotting or requesting expression data for such brain regions automatically obtains the data for all its sampled substructures.
ABAEnrichment offers some functions to explore the ontology graph which describes the hierarchical organization of the brain regions used in the enrichment analyses. The function
get_sampled_substructures returns the IDs of all the substructures for which expression data are available, and
get_superstructures returns all superstructures in the order 'general to special'. The function
get_name is useful to see the name of a brain region given a structure ID:
## get IDs of the substructures of the frontal neocortex (Allen:10161) for which expression data are available subs = get_sampled_substructures('Allen:10161') subs
##  "Allen:10173" "Allen:10185" "Allen:10194" "Allen:10163"
## get the full name of those substructures get_name(subs)
## Allen:10173 Allen:10185 ## "DFC_dorsolateral prefrontal cortex" "VFC_ventrolateral prefrontal cortex" ## Allen:10194 Allen:10163 ## "OFC_orbital frontal cortex" "M1C_primary motor cortex (area M1, area 4)"
## get the superstructures of the frontal neocortex (from general to special) supers = get_superstructures('Allen:10161') supers
##  "Allen:10153" "Allen:10154" "Allen:10155" "Allen:10156" "Allen:10157" "Allen:10158" "Allen:10159" ##  "Allen:10160" "Allen:10161"
## get the full names of those superstructures get_name(supers)
## Allen:10153 Allen:10154 Allen:10155 ## "NP_neural plate" "NT_neural tube" "Br_brain" ## Allen:10156 Allen:10157 Allen:10158 ## "F_forebrain (prosencephalon)" "FGM_gray matter of forebrain" "Tel_telencephalon" ## Allen:10159 Allen:10160 Allen:10161 ## "Cx_cerebral cortex" "NCx_neocortex (isocortex)" "FCx_frontal neocortex"
Note that the ontologies and the IDs for brain regions differ between the adult and the developing brain. However, the ontology functions
get_superstructures work with valid brain regions IDs from both ontologies.
The new function
get_id searches the ontologies of the adult and developing brain for the structure ID that belongs to a given brain region name:
## get structure IDs of brain regions that contain 'accumbens' in their names get_id('accumbens')
## structure ontology structure_id ## 1 Acb_Nucleus Accumbens adult Allen:4290 ## 2 Acb_Nucleus Accumbens, Left adult Allen:4291 ## 3 Acb_Nucleus Accumbens, Right adult Allen:4292
## get structure IDs of brain regions that contain 'telencephalon' in their names get_id('telencephalon')
## structure ontology structure_id ## 1 Tel_telencephalon developmental Allen:10158 ## 2 Tel_Telencephalon adult Allen:4007
Note that the output of
get_id is restricted to brain regions that are used in ABAEnrichment.
The function can also be used to get a full list of covered brain regions together with their IDs:
## get all brain regions included in ABAEnrichment together with thier IDs all_regions = get_id('') head(all_regions)
## structure ontology structure_id ## 1 NP_neural plate developmental Allen:10153 ## 2 NT_neural tube developmental Allen:10154 ## 3 Br_brain developmental Allen:10155 ## 4 F_forebrain (prosencephalon) developmental Allen:10156 ## 5 FGM_gray matter of forebrain developmental Allen:10157 ## 6 Tel_telencephalon developmental Allen:10158
Often it is useful to see which genes are annotated to a brain region, i.e. count as 'expressed', given an expression cutoff.
While this can be accomplished using the expression cutoffs from
aba_enrich(...)[] and the expression values from
get_expression, ABAEnrichment now also offers the convenient function
This function takes the output from
aba_enrich and a FWER-threshold (
fwer_threshold, default=0.05) as input and returns all brain-region/expression-cutoff combinations with a FWER below the FWER-threshold together with the corresponding annotated genes, the FWER and the score (1 for candidate and 0 for background genes or the scores from the Wilcoxon rank-sum test). Note that a brain region gets all genes annotated that are expressed in the brain region itself or in any of the sampled substructures (see Schematic 1 below).
## run an enrichment analysis with 7 candidate and 7 background genes for the developing brain gene_ids = c('FOXJ1', 'NTS', 'LTBP1', 'STON2', 'KCNJ6', 'AGT', 'ANO3', 'TTR', 'ELAVL4', 'BEAN1', 'TOX', 'EPN3', 'PAX2', 'KLHL1') genes = rep(c(1,0), each=7) names(genes) = gene_ids res = aba_enrich(genes, dataset='5_stages', cutoff_quantiles=c(0.5,0.7,0.9), n_randsets=100)
## age_category structure_id structure times_FWER_under_0.05 mean_FWER min_FWER ## 1 1 Allen:10294 HIP_hippocampus (hippocampal formation) 0 0.4566667 0.32 ## 2 1 Allen:10398 MD_mediodorsal nucleus of thalamus 0 0.8266667 0.61 ## 3 1 Allen:10361 AMY_amygdaloid complex 0 0.8900000 0.67 ## 4 1 Allen:10331 CN_cerebral nuclei 0 0.8900000 0.67 ## 5 1 Allen:10159 Cx_cerebral cortex 0 0.9500000 0.85 ## 6 1 Allen:10158 Tel_telencephalon 0 0.9500000 0.85 ## equivalent_structures FWERs ## 1 Allen:10294;Allen:10293;Allen:10292 0.67;0.38;0.32 ## 2 Allen:10398;Allen:10397;Allen:10391;Allen:10390;Allen:10389 0.87;1;0.61 ## 3 Allen:10361 0.67;1;1 ## 4 Allen:10331 0.67;1;1 ## 5 Allen:10159 0.85;1;1 ## 6 Allen:10158 0.85;1;1
## see which candidate genes are annotated to brain regions with a FWER < 0.01 anno = get_annotated_genes(res, fwer_threshold=0.1) anno
## age_category structure_id cutoff anno_gene FWER score ## 1 5 Allen:10333 0.5 AGT 0.07 1 ## 2 5 Allen:10333 0.5 ANO3 0.07 1 ## 3 5 Allen:10333 0.5 LTBP1 0.07 1 ## 4 5 Allen:10333 0.5 STON2 0.07 1
In addition to passing the result of an enrichment analysis together with a FWER-threshold to
get_annotated_genes, it is also possible to define brain regions, expression cutoffs and (optionally) genes directly. The function then returns all genes that have expression above the cutoffs in the respective brain regions. If
genes are defined, the output is reduced to this set of genes; if not, all protein-coding genes with available expression data are analyzed.
## find out which of the above genes have expression above the 70% and 90% expression-cutoff, respectively, ## in the basal ganglia of the adult human brain (Allen:4276) anno2 = get_annotated_genes(structure_ids='Allen:4276', dataset='adult', cutoff_quantiles=c(0.7,0.9), genes=gene_ids) anno2
## age_category structure_id cutoff anno_gene ## 1 5 Allen:4276 0.7 AGT ## 2 5 Allen:4276 0.7 ANO3 ## 3 5 Allen:4276 0.7 TTR ## 4 5 Allen:4276 0.9 AGT ## 5 5 Allen:4276 0.9 ANO3 ## 6 5 Allen:4276 0.9 TTR
In the previous examples genes got annotated to brain regions based on their expression. Besides the two gene expression datasets
5_stages, the dataset
dev_effect can be used, which provides scores for an age effect for genes based on their expression change during development. Using this dataset, the same analyses as above are performed, except that a gene is annotated to a brain region when its developmental effect score in that region is a above the
To test brain regions for enrichment of candidate genes in the set of all genes with a developmental effect score above cutoff, the
dataset parameter has to be set to
dev_effect. The output of the developmental effect enrichment analysis is equal to that of the expression enrichment analysis:
## use previously defined input genes vector genes
## FOXJ1 NTS LTBP1 STON2 KCNJ6 AGT ANO3 TTR ELAVL4 BEAN1 TOX EPN3 PAX2 KLHL1 ## 1 1 1 1 1 1 1 0 0 0 0 0 0 0
## run enrichment analysis with developmental effect score res_dev_effect = aba_enrich(genes, dataset='dev_effect', cutoff_quantiles=c(0.5,0.7,0.9), n_randsets=100)
## see the 5 brain regions with the lowest FWERs top_regions = res_dev_effect[][1:5, ] top_regions
## age_category structure_id structure times_FWER_under_0.05 ## 1 0 Allen:10294 HIP_hippocampus (hippocampal formation) 0 ## 2 0 Allen:10269 V1C_primary visual cortex (striate cortex, area V1/17) 0 ## 3 0 Allen:10163 M1C_primary motor cortex (area M1, area 4) 0 ## 4 0 Allen:13322 DLTC_dorsolateral temporal neocortex 0 ## 5 0 Allen:10243 STC_posterior (caudal) superior temporal cortex (area 22c) 0 ## mean_FWER min_FWER equivalent_structures FWERs ## 1 0.7700000 0.68 Allen:10294;Allen:10293;Allen:10292 0.68;0.79;0.84 ## 2 0.8233333 0.68 Allen:10269;Allen:10268 0.68;0.79;1 ## 3 0.9166667 0.75 Allen:10163;Allen:10162 0.75;1;1 ## 4 0.8666667 0.79 Allen:13322 0.97;0.79;0.84 ## 5 0.8666667 0.79 Allen:10243;Allen:10240 0.97;0.79;0.84
As for the expression datasets, the developmental effect scores can be retrieved with the functions
get_expression and plotted with
## plot developmental effect score of the 5 brain structures with the lowest FWERs plot_expression(top_regions[ ,'structure_id'])