Contents

0.1 Introduction

The raerdata package contains datasets and databases used to illustrate functionality to characterize RNA editing using the raer package. Included in the package are databases of known human and mouse RNA editing sites. Datasets have been preprocessed to generate smaller examples suitable for quick exploration of the data and demonstration of the raer package.

0.2 Installation

if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

# The following initializes usage of Bioc devel
BiocManager::install(version = "devel")

BiocManager::install("raerdata")
library(raerdata)

0.3 RNA editing Atlases

Atlases of known human and mouse A-to-I RNA editing sites formatted into GRanges objects are provided.

0.3.1 REDIportal

The REDIportal is a collection of RNA editing sites identified from multiple studies in multiple species (Picardi et al. (2017)). The human (hg38) and mouse (mm10) collections are provided in GRanges objects, in either coordinate only format, or with additional metadata.

rediportal_coords_hg38()
## GRanges object with 15638648 ranges and 0 metadata columns:
##              seqnames    ranges strand
##                 <Rle> <IRanges>  <Rle>
##          [1]     chr1     87158      -
##          [2]     chr1     87168      -
##          [3]     chr1     87171      -
##          [4]     chr1     87189      -
##          [5]     chr1     87218      -
##          ...      ...       ...    ...
##   [15638644]     chrY  56885715      +
##   [15638645]     chrY  56885716      +
##   [15638646]     chrY  56885728      +
##   [15638647]     chrY  56885841      +
##   [15638648]     chrY  56885850      +
##   -------
##   seqinfo: 44 sequences from hg38 genome; no seqlengths

0.3.2 CDS recoding sites

Human CDS recoding RNA editing sites identified by Gabay et al. (2022) were formatted into GRanges objects. These sites were also lifted over to the mouse genome (mm10).

cds_sites <- gabay_sites_hg38()
cds_sites[1:4, 1:4]
## GRanges object with 4 ranges and 4 metadata columns:
##       seqnames    ranges strand |    GeneName
##          <Rle> <IRanges>  <Rle> | <character>
##   [1]     chr1    999279      - |        HES4
##   [2]     chr1   1014084      + |       ISG15
##   [3]     chr1   1281229      + |      SCNN1D
##   [4]     chr1   1281248      + |      SCNN1D
##       RefseqAccession_1,ExonNum_1,NucleotideSubstitution_1,AminoAcidSubstitution_1;…;RefseqAccession_N,ExonNum_N,NucleotideSubstitution_N,AminoAcidSubstitution_N
##                                                                                                                                                       <character>
##   [1]                                                                                                                                      NM_001142467.1,exon3..
##   [2]                                                                                                                                      NM_005101.3,exon2,c...
##   [3]                                                                                                                                      NM_001130413.3,exon2..
##   [4]                                                                                                                                      NM_001130413.3,exon2..
##          Syn/NonSyn Diversifying/Restorative/Syn
##         <character>                  <character>
##   [1] nonsynonymous                           NA
##   [2] nonsynonymous                           NA
##   [3]    synonymous                           NA
##   [4] nonsynonymous                           NA
##   -------
##   seqinfo: 23 sequences from hg38 genome; no seqlengths

0.4 Datasets

0.4.1 Whole genome and RNA sequencing data from NA12878 cell line

WGS and RNA-seq BAM and associated files generated from a subset of chromosome 4. Paths to files and related data objects are returned in a list.

NA12878()
## $bams
## BamFileList of length 2
## names(2): NA12878_RNASEQ NA12878_WGS
## 
## $fasta
## [1] "/home/biocbuild/.cache/R/ExperimentHub/2728a97176a5a_8469"
## 
## $snps
## GRanges object with 380175 ranges and 2 metadata columns:
##            seqnames    ranges strand |         name     score
##               <Rle> <IRanges>  <Rle> |  <character> <numeric>
##        [1]     chr4     10001      * | rs1581341342         0
##        [2]     chr4     10002      * | rs1581341346         0
##        [3]     chr4     10004      * | rs1581341351         0
##        [4]     chr4     10005      * | rs1581341354         0
##        [5]     chr4     10006      * | rs1209159313         0
##        ...      ...       ...    ... .          ...       ...
##   [380171]     chr4    999987      * | rs1577536513         0
##   [380172]     chr4    999989      * |  rs948695434         0
##   [380173]     chr4    999991      * | rs1044698628         0
##   [380174]     chr4    999996      * | rs1361920394         0
##   [380175]     chr4    999997      * |   rs59206677         0
##   -------
##   seqinfo: 711 sequences (1 circular) from hg38 genome

0.4.2 GSE99249: RNA-Seq of Interferon beta treatment of ADAR1KO cell line

RNA-seq BAM files from ADAR1KO and Wild-Type HEK293 cells and associated reference files from chromosome 18 (Chung et al. (2018)).

GSE99249()
## $bams
## BamFileList of length 6
## names(6): SRR5564260 SRR5564261 SRR5564269 SRR5564270 SRR5564271 SRR5564277
## 
## $fasta
## [1] "/home/biocbuild/.cache/R/ExperimentHub/2728a93a243bf8_8310"
## 
## $sites
## GRanges object with 15638648 ranges and 0 metadata columns:
##              seqnames    ranges strand
##                 <Rle> <IRanges>  <Rle>
##          [1]     chr1     87158      -
##          [2]     chr1     87168      -
##          [3]     chr1     87171      -
##          [4]     chr1     87189      -
##          [5]     chr1     87218      -
##          ...      ...       ...    ...
##   [15638644]     chrY  56885715      +
##   [15638645]     chrY  56885716      +
##   [15638646]     chrY  56885728      +
##   [15638647]     chrY  56885841      +
##   [15638648]     chrY  56885850      +
##   -------
##   seqinfo: 44 sequences from hg38 genome; no seqlengths

0.4.3 10x Genomics 10k PBMC scRNA-seq

10x Genomics BAM file and RNA editing sites from chromosome 16 of human PBMC scRNA-seq library. Also included is a SingleCellExperiment object containing gene expression values, cluster annotations, cell-type annotations, and a UMAP projection.

pbmc_10x()
## $bam
## class: BamFile 
## path: /home/biocbuild/.cache/R/ExperimentHub/2728a95c6f34ee_8311
## index: /home/biocbuild/.cache/R/ExperimentHub/2728a93bd8f51f_8312
## isOpen: FALSE 
## yieldSize: NA 
## obeyQname: FALSE 
## asMates: FALSE 
## qnamePrefixEnd: NA 
## qnameSuffixStart: NA 
## 
## $sites
## GRanges object with 15638648 ranges and 0 metadata columns:
##              seqnames    ranges strand
##                 <Rle> <IRanges>  <Rle>
##          [1]     chr1     87158      -
##          [2]     chr1     87168      -
##          [3]     chr1     87171      -
##          [4]     chr1     87189      -
##          [5]     chr1     87218      -
##          ...      ...       ...    ...
##   [15638644]     chrY  56885715      +
##   [15638645]     chrY  56885716      +
##   [15638646]     chrY  56885728      +
##   [15638647]     chrY  56885841      +
##   [15638648]     chrY  56885850      +
##   -------
##   seqinfo: 44 sequences from hg38 genome; no seqlengths
## 
## $sce
## class: SingleCellExperiment 
## dim: 36601 500 
## metadata(2): Samples mkrs
## assays(2): counts logcounts
## rownames(36601): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
## rowData names(3): ID Symbol Type
## colnames(500): TGTTTGTCAGTTAGGG-1 ATCTCTACAAGCTACT-1 ...
##   GGGCGTTTCAGGACGA-1 CTATAGGAGATTGTGA-1
## colData names(8): Sample Barcode ... r celltype
## reducedDimNames(2): PCA UMAP
## mainExpName: NULL
## altExpNames(0):

0.5 ExperimentHub access

Alternatively individual files can be accessed from the ExperimentHub directly

library(ExperimentHub)
eh <- ExperimentHub()
raerdata_files <- query(eh, "raerdata")
data.frame(
    id = raerdata_files$ah_id,
    title = raerdata_files$title,
    description = raerdata_files$description
)

Session info

sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ExperimentHub_2.12.0              AnnotationHub_3.12.0             
##  [3] BiocFileCache_2.12.0              dbplyr_2.5.0                     
##  [5] SingleCellExperiment_1.26.0       SummarizedExperiment_1.34.0      
##  [7] Biobase_2.64.0                    MatrixGenerics_1.16.0            
##  [9] matrixStats_1.3.0                 Rsamtools_2.20.0                 
## [11] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.72.0                  
## [13] BiocIO_1.14.0                     Biostrings_2.72.0                
## [15] XVector_0.44.0                    rtracklayer_1.64.0               
## [17] GenomicRanges_1.56.0              GenomeInfoDb_1.40.0              
## [19] IRanges_2.38.0                    S4Vectors_0.42.0                 
## [21] BiocGenerics_0.50.0               raerdata_1.2.0                   
## [23] BiocStyle_2.32.0                 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1         dplyr_1.1.4              blob_1.2.4              
##  [4] filelock_1.0.3           bitops_1.0-7             fastmap_1.1.1           
##  [7] RCurl_1.98-1.14          GenomicAlignments_1.40.0 XML_3.99-0.16.1         
## [10] digest_0.6.35            mime_0.12                lifecycle_1.0.4         
## [13] KEGGREST_1.44.0          RSQLite_2.3.6            magrittr_2.0.3          
## [16] compiler_4.4.0           rlang_1.1.3              sass_0.4.9              
## [19] tools_4.4.0              utf8_1.2.4               yaml_2.3.8              
## [22] knitr_1.46               S4Arrays_1.4.0           bit_4.0.5               
## [25] curl_5.2.1               DelayedArray_0.30.0      abind_1.4-5             
## [28] BiocParallel_1.38.0      withr_3.0.0              purrr_1.0.2             
## [31] grid_4.4.0               fansi_1.0.6              cli_3.6.2               
## [34] rmarkdown_2.26           crayon_1.5.2             generics_0.1.3          
## [37] httr_1.4.7               rjson_0.2.21             DBI_1.2.2               
## [40] cachem_1.0.8             zlibbioc_1.50.0          parallel_4.4.0          
## [43] AnnotationDbi_1.66.0     BiocManager_1.30.22      restfulr_0.0.15         
## [46] vctrs_0.6.5              Matrix_1.7-0             jsonlite_1.8.8          
## [49] bookdown_0.39            bit64_4.0.5              jquerylib_0.1.4         
## [52] glue_1.7.0               codetools_0.2-20         BiocVersion_3.19.1      
## [55] UCSC.utils_1.0.0         tibble_3.2.1             pillar_1.9.0            
## [58] rappdirs_0.3.3           htmltools_0.5.8.1        GenomeInfoDbData_1.2.12 
## [61] R6_2.5.1                 evaluate_0.23            lattice_0.22-6          
## [64] png_0.1-8                memoise_2.0.1            bslib_0.7.0             
## [67] SparseArray_1.4.0        xfun_0.43                pkgconfig_2.0.3

Chung, Hachung, Jorg J A Calis, Xianfang Wu, Tony Sun, Yingpu Yu, Stephanie L Sarbanes, Viet Loan Dao Thi, et al. 2018. “Human ADAR1 Prevents Endogenous RNA from Triggering Translational Shutdown.” Cell 172 (4): 811–824.e14. https://doi.org/10.1016/j.cell.2017.12.038.

Gabay, Orshay, Yoav Shoshan, Eli Kopel, Udi Ben-Zvi, Tomer D Mann, Noam Bressler, Roni Cohen-Fultheim, et al. 2022. “Landscape of Adenosine-to-Inosine RNA Recoding Across Human Tissues.” Nat. Commun. 13 (1): 1184. https://doi.org/10.1038/s41467-022-28841-4.

Picardi, Ernesto, Anna Maria D’Erchia, Claudio Lo Giudice, and Graziano Pesole. 2017. “REDIportal: A Comprehensive Database of A-to-I RNA Editing Events in Humans.” Nucleic Acids Res. 45 (D1): D750–D757. https://doi.org/10.1093/nar/gkw767.