Contents

1 Introduction

The TCGA tumor types cover a collection of anatomical compartments. Organizing tumor types into groups of related compartments may be fruitful. We will use the oncotree OBO representation from an NCI thesaurus OBO distribution in the Bioc 3.9 version of ontoProc.

2 A table

This table was constructed by hand on Oct 10 2019 using materials in ontoProc package.

suppressPackageStartupMessages({
library(DT)
library(ontoProc)
library(magrittr)
library(dplyr)
library(BiocOncoTK)
library(AnnotationHub)
otree = getOncotreeOnto()
})
## loading from cache
data("map_tcga_ncit")
datatable(map_tcga_ncit)

3 Formal annotation of anatomic site

3.1 Expeditious mapping

We will drop the CNTL class, and use only the first NCIT mapping when two seem to match.

controlindex = which(map_tcga_ncit[,1]=="CNTL")
tcgacodes = map_tcga_ncit[-controlindex,1]
ncitsites = map_tcga_ncit[-controlindex,3]
ssi = strsplit(ncitsites, "\\|")
sites = sapply(ssi, "[", 1)
simpmap = data.frame(code=tcgacodes, oncotr_site=otree$name[sites], ncit=sites,
  stringsAsFactors=FALSE)
simpmap[sample(seq_len(nrow(simpmap)),5),]
##             code                           oncotr_site        ncit
## NCIT:C3483  LCML                      Chronic Leukemia  NCIT:C3483
## NCIT:C4436  CHOL                    Cholangiocarcinoma  NCIT:C4436
## NCIT:C34447 HNSC Head and Neck Squamous Cell Carcinoma NCIT:C34447
## NCIT:C3326  PCPG        Adrenal Gland Pheochromocytoma  NCIT:C3326
## NCIT:C7550    OV         Ovarian Serous Adenocarcinoma  NCIT:C7550

We now have a 1-1 mapping from TCGA code to NCIT site. These sites can be grouped according to organ system, using the knowledge that NCIT:C3263 is the ‘neoplasm by site’ (which really should be ‘system’) category.

poss_sys = otree$children["NCIT:C3263"][[1]] # all possible systems
allanc = otree$ancestors[simpmap$ncit]
specific = sapply(allanc, function(x) intersect(x, poss_sys)[1]) # ignore multiplicities
sys = unlist(otree$name[specific])
datatable(systab <- cbind(simpmap, sys=sys))

Neither thymoma nor mesothelioma have NCIT organ system mappings per se.

3.2 Aggregation

We now have 12 categories for 33 tumor types. A code pattern for finding the TCGA codes for a given system is:

systab %>% filter(grepl("Repro", sys))
##             code                   oncotr_site        ncit
## NCIT:C40195 CESC    Cervical Squamous Neoplasm NCIT:C40195
## NCIT:C7550    OV Ovarian Serous Adenocarcinoma  NCIT:C7550
## NCIT:C2919  PRAD       Prostate Adenocarcinoma  NCIT:C2919
## NCIT:C8591  TGCT    Testicular Germ Cell Tumor  NCIT:C8591
## NCIT:C42700  UCS        Uterine Carcinosarcoma NCIT:C42700
##                                      sys
## NCIT:C40195 Reproductive System Neoplasm
## NCIT:C7550  Reproductive System Neoplasm
## NCIT:C2919  Reproductive System Neoplasm
## NCIT:C8591  Reproductive System Neoplasm
## NCIT:C42700 Reproductive System Neoplasm