About This Document »

Package name liftOver
Built with Bioconductor (R) 3.5 (3.4.0)
Last Built Mon, 10 Apr 2017 15:55:35 -0700
Last Modified Wed, 14 Sep 2016 01:39:53 -0700 (r120929)
Source Package liftOver_0.99.120929.tar.gz
Windows Binary liftOver_0.99.120929.zip
Mac OS X 10.10 (Yosemite) liftOver_0.99.120929.tgz
R Script liftOver.R

To install this workflow under Bioconductor 3.4, start R and enter:
source("http://bioconductor.org/workflows.R")
workflowInstall("liftOver")

Changing genomic coordinate systems with rtracklayer::liftOver

Contents

The liftOver facilities developed in conjunction with the UCSC browser track infrastructure are available for transforming data in GRanges formats. This is illustrated here with an image of the NHGRI GWAS catalog that is, as of Oct. 31 2014, distributed with coordinates defined by NCBI build hg38.

Setup: The NHGRI GWAS catalog as an hg38-based GRanges

## Loading required package: Homo.sapiens

## Loading required package: AnnotationDbi

## Loading required package: stats4

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, cbind, colMeans,
##     colnames, colSums, do.call, duplicated, eval, evalq, Filter,
##     Find, get, grep, grepl, intersect, is.unsorted, lapply,
##     lengths, Map, mapply, match, mget, order, paste, pmax,
##     pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
##     rowMeans, rownames, rowSums, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Loading required package: IRanges

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:base':
## 
##     expand.grid

## Loading required package: OrganismDbi

## Loading required package: GenomicFeatures

## Loading required package: GenomeInfoDb

## Loading required package: GenomicRanges

## Loading required package: GO.db

## 

## Loading required package: org.Hs.eg.db

## 

## Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene

## gwascat loaded.  Use data(ebicat38) for hg38 coordinates;

##  data(ebicat37) for hg19 coordinates.

library(gwascat)
cur = makeCurrentGwascat()  # result varies by day

cur

## gwasloc instance with 17865 records and 35 attributes per record.
## Extracted:  2014-10-31 
## Genome:  GRCh38 
## Excerpt:
## GRanges object with 5 ranges and 35 metadata columns:
##       seqnames               ranges strand | Date.Added.to.Catalog
##          <Rle>            <IRanges>  <Rle> |           <character>
##   [1]       17 [79831041, 79831041]      * |            10/22/2014
##   [2]        5 [31766326, 31766326]      * |            10/22/2014
##   [3]       11 [13107616, 13107616]      * |            10/22/2014
##   [4]       10 [94922089, 94922089]      * |            10/22/2014
##   [5]       10 [94922089, 94922089]      * |            10/22/2014
##        PUBMEDID First.Author        Date             Journal
##       <integer>  <character> <character>         <character>
##   [1]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [2]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [3]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [4]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [5]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##                                              Link
##                                       <character>
##   [1] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [2] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [3] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [4] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [5] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##                                                                                                  Study
##                                                                                            <character>
##   [1] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [2] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [3] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [4] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [5] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##                                                                                                    Disease.Trait
##                                                                                                      <character>
##   [1] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [2] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [3] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [4] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [5] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##                                                                                        Initial.Sample.Description
##                                                                                                       <character>
##   [1] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [2] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [3] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [4] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [5] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##       Replication.Sample.Description      Region      Chr_id   Chr_pos
##                          <character> <character> <character> <numeric>
##   [1]                           <NA>     17q25.3          17  79831041
##   [2]                           <NA>      5p13.3           5  31766326
##   [3]                           <NA>     11p15.2          11  13107616
##   [4]                           <NA>    10q23.33          10  94922089
##   [5]                           <NA>    10q23.33          10  94922089
##       Reported.Gene.s.        Mapped_gene Upstream_gene_id
##            <character>        <character>      <character>
##   [1]             CBX4        CBX8 - CBX4            57332
##   [2]            PDZD2              PDZD2             <NA>
##   [3]     CTC-497E21.5    RASSF10 - ARNTL           644943
##   [4]   CYP2C19,CYP2C9 CYP2C115P - CYP2C9        100874513
##   [5]   CYP2C19,CYP2C9 CYP2C115P - CYP2C9        100874513
##       Downstream_gene_id Snp_gene_ids Upstream_gene_distance
##              <character>  <character>            <character>
##   [1]               8535                               33.93
##   [2]               <NA>        23037                   <NA>
##   [3]                406                               95.51
##   [4]               1559                               15.96
##   [5]               1559                               15.96
##       Downstream_gene_distance Strongest.SNP.Risk.Allele        SNPs
##                    <character>               <character> <character>
##   [1]                     2.12               rs9747992-?   rs9747992
##   [2]                     <NA>               rs2059865-?   rs2059865
##   [3]                   170.12             rs117020818-? rs117020818
##   [4]                     16.5               rs1074145-?   rs1074145
##   [5]                     16.5               rs1074145-?   rs1074145
##            Merged Snp_id_current     Context  Intergenic
##       <character>    <character> <character> <character>
##   [1]           0        9747992  Intergenic           1
##   [2]           0        2059865      intron           0
##   [3]           0      117020818  Intergenic           1
##   [4]           0        1074145  Intergenic           1
##   [5]           0        1074145  Intergenic           1
##       Risk.Allele.Frequency   p.Value Pvalue_mlog        p.Value..text.
##                 <character> <numeric>   <numeric>           <character>
##   [1]                 0.086     2e-07    6.698970 (S-DCT concentration)
##   [2]                 0.235     3e-07    6.522879 (S-DCT concentration)
##   [3]                 0.046     7e-07    6.154902 (S-DCT concentration)
##   [4]                 0.162     4e-09    8.397940  (S-CT concentration)
##   [5]                 0.162     2e-16   15.698970    (S-DCT/S-CT ratio)
##       OR.or.beta X95..CI..text.     Platform..SNPs.passing.QC.         CNV
##        <numeric>    <character>                    <character> <character>
##   [1]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [2]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [3]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [4]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [5]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##       num.Risk.Allele.Frequency
##                       <numeric>
##   [1]                     0.086
##   [2]                     0.235
##   [3]                     0.046
##   [4]                     0.162
##   [5]                     0.162
##   -------
##   seqinfo: 23 sequences from GRCh38 genome

Resource: The chain file for hg38 to hg19 transformation

The transformation to hg19 coordinates is defined by a chain file provided by UCSC. rtracklayer::import.chain will bring the data into R.

library(rtracklayer)
ch = import.chain("hg38ToHg19.over.chain")
ch

## Chain of length 25
## names(25): chr22 chr21 chr19 chr20 chrY chr18 ... chr5 chr4 chr3 chr2 chr1

str(ch[[1]])

## Formal class 'ChainBlock' [package "rtracklayer"] with 6 slots
##   ..@ ranges  :Formal class 'IRanges' [package "IRanges"] with 6 slots
##   .. .. ..@ start          : int [1:6842] 16367189 16386933 16386970 16387001 16387128 16395491 16395528 16395841 16395860 16395956 ...
##   .. .. ..@ width          : int [1:6842] 19744 36 31 112 8362 36 312 18 95 33 ...
##   .. .. ..@ NAMES          : NULL
##   .. .. ..@ elementType    : chr "integer"
##   .. .. ..@ elementMetadata: NULL
##   .. .. ..@ metadata       : list()
##   ..@ offset  : int [1:6842] -480662 -480702 -480702 -480726 -480726 -480726 -480726 -480726 -480726 -480726 ...
##   ..@ score   : int [1:1168] -1063867308 68830488 21156147 20814926 7358950 3927744 2928210 991419 880681 802146 ...
##   ..@ space   : chr [1:1168] "chr22" "chr14" "chr22" "chr21" ...
##   ..@ reversed: logi [1:1168] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..@ length  : int [1:1168] 1124 1280 173 465 398 110 43 173 342 84 ...

Some more details about the chain data structure are available in the import.chain man page

   A chain file essentially details many local alignments, so it is
   possible for the "from" ranges to map to overlapping regions in
   the other sequence. The "from" ranges are guaranteed to be
   disjoint (but do not necessarily cover the entire "from"
   sequence).

Action: liftOver

The liftOver function will create a GRangesList.

seqlevelsStyle(cur) = "UCSC"  # necessary
cur19 = liftOver(cur, ch)

## Warning in defined[[i]]: closing unused connection 5
## (hg38ToHg19.over.chain)

class(cur19)

## [1] "GRangesList"
## attr(,"package")
## [1] "GenomicRanges"

We unlist and coerce to the gwaswloc class, a convenient form for the GWAS catalog with its many mcols fields.

cur19 = unlist(cur19)
genome(cur19) = "hg19"
cur19 = new("gwaswloc", cur19)
cur19

## gwasloc instance with 17845 records and 35 attributes per record.
## Extracted:   
## Genome:  hg19 
## Excerpt:
## GRanges object with 5 ranges and 35 metadata columns:
##       seqnames               ranges strand | Date.Added.to.Catalog
##          <Rle>            <IRanges>  <Rle> |           <character>
##   [1]    chr17 [77804840, 77804840]      * |            10/22/2014
##   [2]     chr5 [31766433, 31766433]      * |            10/22/2014
##   [3]    chr11 [13129163, 13129163]      * |            10/22/2014
##   [4]    chr10 [96681846, 96681846]      * |            10/22/2014
##   [5]    chr10 [96681846, 96681846]      * |            10/22/2014
##        PUBMEDID First.Author        Date             Journal
##       <integer>  <character> <character>         <character>
##   [1]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [2]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [3]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [4]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##   [5]  24528284         Ji Y  08/01/2014 Br J Clin Pharmacol
##                                              Link
##                                       <character>
##   [1] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [2] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [3] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [4] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##   [5] http://www.ncbi.nlm.nih.gov/pubmed/24528284
##                                                                                                  Study
##                                                                                            <character>
##   [1] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [2] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [3] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [4] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##   [5] Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
##                                                                                                    Disease.Trait
##                                                                                                      <character>
##   [1] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [2] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [3] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [4] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##   [5] Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)
##                                                                                        Initial.Sample.Description
##                                                                                                       <character>
##   [1] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [2] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [3] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [4] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##   [5] 300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals
##       Replication.Sample.Description      Region      Chr_id   Chr_pos
##                          <character> <character> <character> <numeric>
##   [1]                           <NA>     17q25.3          17  79831041
##   [2]                           <NA>      5p13.3           5  31766326
##   [3]                           <NA>     11p15.2          11  13107616
##   [4]                           <NA>    10q23.33          10  94922089
##   [5]                           <NA>    10q23.33          10  94922089
##       Reported.Gene.s.        Mapped_gene Upstream_gene_id
##            <character>        <character>      <character>
##   [1]             CBX4        CBX8 - CBX4            57332
##   [2]            PDZD2              PDZD2             <NA>
##   [3]     CTC-497E21.5    RASSF10 - ARNTL           644943
##   [4]   CYP2C19,CYP2C9 CYP2C115P - CYP2C9        100874513
##   [5]   CYP2C19,CYP2C9 CYP2C115P - CYP2C9        100874513
##       Downstream_gene_id Snp_gene_ids Upstream_gene_distance
##              <character>  <character>            <character>
##   [1]               8535                               33.93
##   [2]               <NA>        23037                   <NA>
##   [3]                406                               95.51
##   [4]               1559                               15.96
##   [5]               1559                               15.96
##       Downstream_gene_distance Strongest.SNP.Risk.Allele        SNPs
##                    <character>               <character> <character>
##   [1]                     2.12               rs9747992-?   rs9747992
##   [2]                     <NA>               rs2059865-?   rs2059865
##   [3]                   170.12             rs117020818-? rs117020818
##   [4]                     16.5               rs1074145-?   rs1074145
##   [5]                     16.5               rs1074145-?   rs1074145
##            Merged Snp_id_current     Context  Intergenic
##       <character>    <character> <character> <character>
##   [1]           0        9747992  Intergenic           1
##   [2]           0        2059865      intron           0
##   [3]           0      117020818  Intergenic           1
##   [4]           0        1074145  Intergenic           1
##   [5]           0        1074145  Intergenic           1
##       Risk.Allele.Frequency   p.Value Pvalue_mlog        p.Value..text.
##                 <character> <numeric>   <numeric>           <character>
##   [1]                 0.086     2e-07    6.698970 (S-DCT concentration)
##   [2]                 0.235     3e-07    6.522879 (S-DCT concentration)
##   [3]                 0.046     7e-07    6.154902 (S-DCT concentration)
##   [4]                 0.162     4e-09    8.397940  (S-CT concentration)
##   [5]                 0.162     2e-16   15.698970    (S-DCT/S-CT ratio)
##       OR.or.beta X95..CI..text.     Platform..SNPs.passing.QC.         CNV
##        <numeric>    <character>                    <character> <character>
##   [1]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [2]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [3]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [4]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##   [5]       <NA>             NR Illumina [7,537,437] (Imputed)           N
##       num.Risk.Allele.Frequency
##                       <numeric>
##   [1]                     0.086
##   [2]                     0.235
##   [3]                     0.046
##   [4]                     0.162
##   [5]                     0.162
##   -------
##   seqinfo: 23 sequences from hg19 genome; no seqlengths

We see that the translation leads to a loss of some loci.

length(cur)-length(cur19)

## [1] 20

setdiff(cur$SNPs, cur19$SNPs)

##  [1] "rs687289"   "rs386000"   "rs718433"   "rs4911642"  "rs687621"  
##  [6] "rs757210"   "rs11672691" "rs644234"   "rs514659"   "rs9876781" 
## [11] "rs649129"   "rs1167796"  "rs644148"

It may be interesting to follow up some of the losses.