With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected. The Cancer Genome Atlas Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. Resulting data consisting of somatic variants are stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.
For VCF files or simple tabular files, easy option is to use vcf2maf utility which will annotate VCFs, prioritize transcripts, and generates an MAF. Recent updates to gatk has also enabled funcotator to genrate MAF files.
If you’re using ANNOVAR for variant annotations, maftools has a handy function
annovarToMaf for converting tabular annovar outputs to MAF.
MAF files contain many fields ranging from chromosome names to cosmic annotations. However most of the analysis in maftools uses following fields.
Mandatory fields: Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele2, Variant_Classification, Variant_Type and Tumor_Sample_Barcode.
Recommended optional fields: non MAF specific fields containing VAF (Variant Allele Frequecy) and amino acid change information.
Complete specification of MAF files can be found on NCI GDC documentation page.
This vignette demonstrates the usage and application of maftools on an example MAF file from TCGA LAML cohort 1.
maftools functions can be categorized into mainly Visualization and Analysis modules. Each of these functions and a short description is summarized as shown below. Usage is simple, just read your MAF file with
read.maf (along with copy-number data if available) and pass the resulting MAF object to the desired function for plotting or analysis.