1 Retrieval of cadd.v1.6.hg38 pathogenicity scores through AnnotationHub resources

The cadd.v1.6.hg38 package provides metadata for the AnnotationHub resources associated with human CADD pathogenicity scores [@kircher2014general]. The original data can be found at the University of Washington download site. Details about how those original data were processed into AnnotationHub resources can be found in the source file:

cadd.v1.6.hg38/scripts/make-metadata_cadd.v1.6.hg38.R

The pathogenicity scores for cadd.v1.6.hg38 can be retrieved using the AnnotationHub, which is a web resource that provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard (e.g., UCSC, Ensembl) and distributed sites, can be found. A Bioconductor AnnotationHub web resource creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

While the AnnotationHub API can be used to query those resources, we encourage to use the GenomicScores API [@puigdevall2018genomicscores], as follows. The first step to retrieve genomic scores is to check the ones available to download.

availableGScores()
##  [1] "AlphaMissense.v2023.hg19"          "AlphaMissense.v2023.hg38"         
##  [3] "cadd.v1.6.hg19"                    "cadd.v1.6.hg38"                   
##  [5] "fitCons.UCSC.hg19"                 "linsight.UCSC.hg19"               
##  [7] "mcap.v1.0.hg19"                    "phastCons7way.UCSC.hg38"          
##  [9] "phastCons27way.UCSC.dm6"           "phastCons30way.UCSC.hg38"         
## [11] "phastCons35way.UCSC.mm39"          "phastCons46wayPlacental.UCSC.hg19"
## [13] "phastCons46wayPrimates.UCSC.hg19"  "phastCons60way.UCSC.mm10"         
## [15] "phastCons100way.UCSC.hg19"         "phastCons100way.UCSC.hg38"        
## [17] "phyloP35way.UCSC.mm39"             "phyloP60way.UCSC.mm10"            
## [19] "phyloP100way.UCSC.hg19"            "phyloP100way.UCSC.hg38"

The selected resource can be downloaded with the function getGScores(). After the resource is downloaded the first time, the cached copy will enable a quicker retrieval later.

cadd <- getGScores("cadd.v1.6.hg38")
cadd
## GScores object 
## # organism: Homo sapiens (UCSC, hg38)
## # provider: UWashington
## # provider version: v1.6
## # download date: Oct 11, 2023
## # loaded sequences: chr1
## # maximum abs. error: 5
## # use 'citation()' to cite these data in publications
citation(cadd)
## Martin Kircher, Daniela M. Witten, Preti Jain, Brian J. O'Roak, Gregory
## M. Cooper, Jay Shendure (2014). "A general framework for estimating the
## relative pathogenicity of human genetic variants." _Nature Genetics_,
## *46*, 310-315. doi:10.1038/ng.2892 <https://doi.org/10.1038/ng.2892>.

Finally, the CADD pathogenicity score of a particular genomic position is retrieved using the function ‘gscores()’. Please consult the documentation of the GenomicScores package for details on how to use it. For instance, @cheng2023accurate report likely pathogenic scores for variants in the human glucose sensor GCK. If we would like to retrieve the CADD score of the variant NM_000162.5(GCK):c.1174C>T (p.Arg392Cys), classified as pathogenic in the ClinVar database, we should call gscores() as follows.

gscores(cadd, GRanges("chr7:44145576"), ref="C", alt="T")
## GRanges object with 1 range and 1 metadata column:
##       seqnames    ranges strand |   default
##          <Rle> <IRanges>  <Rle> | <numeric>
##   [1]     chr7  44145576      * |        30
##   -------
##   seqinfo: 25 sequences (1 circular) from hg38 genome

1.1 Building an annotation package from a GScores object

Retrieving genomic scores through AnnotationHub resources requires an internet connection and we may want to work with such resources offline. For that purpose, we can create ourselves an annotation package, such as phastCons100way.UCSC.hg38, from a GScores object corresponding to a downloaded AnnotationHub resource. To do that we use the function makeGScoresPackage() as follows:

makeGScoresPackage(cadd, maintainer="Me <me@example.com>", author="Me", version="1.0.0")
## Creating package in ./cadd.v1.6.hg38

An argument, destDir, which by default points to the current working directory, can be used to change where in the filesystem the package is created. Afterwards, we should still build and install the package via, e.g., R CMD build and R CMD INSTALL, to be able to use it offline.

2 Session information

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] GenomicScores_2.14.1 GenomicRanges_1.54.1 GenomeInfoDb_1.38.0 
##  [4] IRanges_2.36.0       S4Vectors_0.40.1     AnnotationHub_3.10.0
##  [7] BiocFileCache_2.10.1 dbplyr_2.4.0         BiocGenerics_0.48.0 
## [10] BiocStyle_2.30.0    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.0              dplyr_1.1.3                  
##  [3] blob_1.2.4                    filelock_1.0.2               
##  [5] Biostrings_2.70.1             bitops_1.0-7                 
##  [7] fastmap_1.1.1                 RCurl_1.98-1.12              
##  [9] promises_1.2.1                XML_3.99-0.14                
## [11] digest_0.6.33                 mime_0.12                    
## [13] lifecycle_1.0.3               ellipsis_0.3.2               
## [15] KEGGREST_1.42.0               interactiveDisplayBase_1.40.0
## [17] RSQLite_2.3.2                 magrittr_2.0.3               
## [19] compiler_4.3.1                rlang_1.1.1                  
## [21] sass_0.4.7                    tools_4.3.1                  
## [23] utf8_1.2.4                    yaml_2.3.7                   
## [25] knitr_1.45                    S4Arrays_1.2.0               
## [27] bit_4.0.5                     curl_5.1.0                   
## [29] DelayedArray_0.28.0           abind_1.4-5                  
## [31] HDF5Array_1.30.0              withr_2.5.2                  
## [33] purrr_1.0.2                   grid_4.3.1                   
## [35] fansi_1.0.5                   xtable_1.8-4                 
## [37] Rhdf5lib_1.24.0               cli_3.6.1                    
## [39] rmarkdown_2.25                crayon_1.5.2                 
## [41] generics_0.1.3                httr_1.4.7                   
## [43] DBI_1.1.3                     cachem_1.0.8                 
## [45] rhdf5_2.46.0                  zlibbioc_1.48.0              
## [47] AnnotationDbi_1.64.0          BiocManager_1.30.22          
## [49] XVector_0.42.0                matrixStats_1.0.0            
## [51] vctrs_0.6.4                   Matrix_1.6-1.1               
## [53] jsonlite_1.8.7                bookdown_0.36                
## [55] bit64_4.0.5                   jquerylib_0.1.4              
## [57] glue_1.6.2                    BiocVersion_3.18.0           
## [59] later_1.3.1                   tibble_3.2.1                 
## [61] pillar_1.9.0                  rappdirs_0.3.3               
## [63] htmltools_0.5.6.1             rhdf5filters_1.14.0          
## [65] GenomeInfoDbData_1.2.11       R6_2.5.1                     
## [67] evaluate_0.22                 shiny_1.7.5.1                
## [69] Biobase_2.62.0                lattice_0.22-5               
## [71] png_0.1-8                     memoise_2.0.1                
## [73] httpuv_1.6.12                 bslib_0.5.1                  
## [75] Rcpp_1.0.11                   SparseArray_1.2.0            
## [77] xfun_0.40                     MatrixGenerics_1.14.0        
## [79] pkgconfig_2.0.3

3 References