Contents

1 Overview

This package provides a unified approach to programming with Bioconductor components to address problems in cancer genomics. Central concerns are:

2 Ontology

2.1 Oncotree

The NCI Thesaurus project distributes an OBO representation of oncotree. We can use this through the ontoProc (devel branch only) and ontologyPlot packages. Code for visualizing the location of ‘Glioblastoma’ in the context of its ‘siblings’ in the ontology follows.

library(ontoProc)
library(ontologyPlot)
oto = getOncotreeOnto()
glioTag = names(grep("Glioblastoma$", oto$name, value=TRUE))
st = siblings_TAG(glioTag, oto, justSibs=FALSE)
onto_plot(oto, slot(st, "ontoTags"), fontsize=50)

3 Resource interfaces

3.1 PanCancer Atlas

In conjunction with restfulSE which handles aspects of the interface to BigQuery, this package provides tools for working with the PanCancer atlas project data.

3.1.1 Sample types

A key feature distinguishing the pancancer-atlas project from TCGA is the availability of data from normal tissue or metastatic or recurrent tumor samples. Codes are used to distinguish the different sources:

BiocOncoTK::pancan_sampTypeMap
##   SampleTypeLetterCode                                      SampleType
## 1                  TAM                           Additional Metastatic
## 2                  TAP                        Additional - New Primary
## 3                   TR                           Recurrent Solid Tumor
## 4                   TB Primary Blood Derived Cancer - Peripheral Blood
## 5                   TM                                      Metastatic
## 6                   NT                             Solid Tissue Normal
## 7                   TP                             Primary solid Tumor

3.1.2 SummarizedExperiments per assay per tumor or other sample type

The following code will run if you have a valid setting for environment variable CGC_BILLING, to allow BiocOncoTK::pancan_BQ() to generate a proper BigQueryConnection.

library(BiocOncoTK)
if (nchar(Sys.getenv("CGC_BILLING"))>0) {
 pcbq = pancan_BQ() # basic connection
 BRCA_mir = restfulSE::pancan_SE(pcbq)
}

The result is

> BRCA_mir
class: SummarizedExperiment 
dim: 743 1068 
metadata(0):
assays(1): assay
rownames(743): hsa-miR-30d-3p hsa-miR-486-3p ... hsa-miR-525-3p
  hsa-miR-892b
rowData names(0):
colnames(1068): TCGA-LD-A7W6 TCGA-BH-A18I ... TCGA-E9-A1N9 TCGA-B6-A0X0
colData names(746): bcr_patient_uuid bcr_patient_barcode ...
  bilirubin_upper_limit days_to_last_known_alive

3.1.3 Subsetting to normal

To shift attention to the normal tissue samples provided, use

 BRCA_mir_nor = restfulSE::pancan_SE(pcbq, assaySampleTypeCode="NT")

to find

class: SummarizedExperiment 
dim: 743 90 
metadata(0):
assays(1): assay
rownames(743): hsa-miR-7641 hsa-miR-135a-5p ... hsa-miR-1323
  hsa-miR-520d-5p
rowData names(0):
colnames(90): TCGA-BH-A18P TCGA-BH-A18S ... TCGA-E9-A1N6 TCGA-E9-A1N9
colData names(746): bcr_patient_uuid bcr_patient_barcode ...
  bilirubin_upper_limit days_to_last_known_alive

The intersection of the colnames from the two SummarizedExperiments thus formed (patients contributing both solid tumor and matched normal) has length 89.

3.1.4 Shifting focus to another tissue/assay

You need to know what type of sample has been assayed for the tumor type of interest.

Here is how you find the candidates.

bqcon %>% tbl(pancan_longname("rnaseq")) %>% filter(Study=="GBM") %>% 
   group_by(SampleTypeLetterCode) %>% summarise(n=n())

To get RNA-seq on recurrent GBM samples:

pancan_SE(bqcon, colDFilterValue="GBM", tumorFieldValue="GBM", 
  assayDataTableName=pancan_longname("rnaseq"), 
  assaySampleTypeCode="TR", assayFeatureName="Symbol", 
  assayValueFieldName="normalized_count")

3.1.5 Multiassay experiments per tumor

Suppose we want to work with the mRNA, RPPA, 27k/450k merged methylation and miRNA data together. We can invoke pancan_SE again, specifying the appropriate tables and fields.

BRCA_mrna = pancan_SE(pcbq,
   assayDataTableName = pancan_longname("rnaseq"),
   assayFeatureName = "Entrez",
   assayValueFieldName = "normalized_count")
BRCA_rppa = pancan_SE(pcbq,
   assayDataTableName = pancan_longname("RPPA"),
   assayFeatureName = "Protein",
   assayValueFieldName = "Value")
BRCA_meth = pancan_SE(pcbq,
   assayDataTableName = pancan_longname("27k")[2],
   assayFeatureName = "ID",
   assayValueFieldName = "Beta")

After obtaining the clinical data for BRCA with

library(dplyr)
library(magrittr)
clinBRCA = pcbq %>% tbl(pancan_longname("clinical")) %>% 
  filter(acronym=="BRCA") %>% as.data.frame() 
rownames(clinBRCA) = clinBRCA[,2]
clinDF = DataFrame(clinBRCA)

we use

library(MultiAssayExperiment)
brcaMAE = MultiAssayExperiment(
  ExperimentList(rnaseq=BRCA_mrna, meth=BRCA_meth, rppa=BRCA_rppa,
    mirna=BRCA_mir),colData=clinDF)

to generate brcaMAE. No assay data are present in this object, but data are retrieved on request.

> brcaMAE
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] rnaseq: SummarizedExperiment with 20531 rows and 1097 columns 
 [2] meth: SummarizedExperiment with 22601 rows and 1067 columns 
 [3] rppa: SummarizedExperiment with 259 rows and 873 columns 
 [4] mirna: SummarizedExperiment with 743 rows and 1068 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

It is convenient to check for sample availability for the different assays using upsetSamples in MultiAssayExperiment.