hg19 to hg38 conversion

4,955 views
Skip to first unread message

Jiang, Zhijie

unread,
May 20, 2014, 12:13:03 PM5/20/14
to gen...@soe.ucsc.edu

Hi,

I saw you have the alignment file for liftOver to convert hg38 to hg19. I am wondering if you have the alignment file for liftOver to do the reverse, from hg19 to hg38.

 

Thanks,

Zhijie

 

Brian Lee

unread,
May 20, 2014, 1:46:06 PM5/20/14
to Jiang, Zhijie, gen...@soe.ucsc.edu
Dear Zhijie,

Thank you for using the UCSC Genome Browser and your question about liftOver files.

From the homepage you can click the "Downloads" link on the left column and then navigate to each assembly and find the appropriate liftOver directory:


Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Jiang, Zhijie

unread,
May 20, 2014, 2:08:49 PM5/20/14
to Brian Lee, Jiang, Zhijie, gen...@soe.ucsc.edu

Hi Brian,

 

Thank you so much. This is exactly what I need.

 

Best,

Zhijie

Jiang, Zhijie

unread,
May 23, 2014, 1:07:08 PM5/23/14
to Brian Lee, gen...@soe.ucsc.edu
Hi Brian,

I tried to convert chr1:100-100000 from GRCh37 to GRCh38 using liftOver, and got this following message without telling me the corresponding genomic coordinate on GRCh38,

#Partially deleted in new
chr1 99 100000

while the NCBI remap I tried (http://www.ncbi.nlm.nih.gov/genome/tools/remap), provided me with the genomic coordinate on GRCh38, chr1:10001-100000.

So I am wondering if liftOver can provide the corresponding genome coordinate on the target genome assembly even though there are insertions/deletions on the alignment between two genome assemblies, just like NCBI remap.

Best,
Zhijie

Jonathan Casper

unread,
May 23, 2014, 3:37:09 PM5/23/14
to Jiang, Zhijie, gen...@soe.ucsc.edu

Hello Zhijie,

Thank you for your question about lifting coordinates from the start of GRCh37 to GRCh38. One of our engineers offers the following explanation for the "partially deleted in new" message that you received.

The assembly sequence on chr1 doesn't start until 10,000. The first 10,000 bases is a telomere gap. Our chain file does not have mapping for those first 10,000 bases, the map starts at 10,000. The complaint about converting chr1:100-100000 is therefore somewhat technically correct since chr1:100-9999 has no mapping. But it is also a type of error on our behalf. The genome browser convert function gets this correct, the liftOver does not unless you reduce -minMatch.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Jiang, Zhijie

unread,
May 27, 2014, 11:08:57 AM5/27/14
to Jonathan Casper, gen...@soe.ucsc.edu

Hi Jonathan,

 

Thank you for your reply. The minMatch option really helps, when I lowered it to 0.8 some previously unmapped regions can be mapped, but for some regions I had to lower the minMatch to 0.5 to get them mapped. So I am wondering if such a low minMatch value could potentially introduce any errors in the conversion. For example, if I have several regions of GRCh37, some can be mapped to GRCh38 by minMatch at 0.95, some can be mapped by minMatch at 0.8 and some can be mapped by minMatch at 0.5, if I use 0.5 of minMatch to convert all regions from GRCh37 to GRCh38, does the lower minMacth introduce any errors to those conversions that don’t need such a lower minMatch?

 

 

Thanks,

Zhijie

 

 

From: Jonathan Casper [mailto:jca...@soe.ucsc.edu]
Sent: Friday, May 23, 2014 3:37 PM
To: Jiang, Zhijie
Cc: gen...@soe.ucsc.edu
Subject: Re: [genome] hg19 to hg38 conversion

 

Hello Zhijie,

Brian Lee

unread,
May 27, 2014, 12:12:47 PM5/27/14
to Jiang, Zhijie, Jonathan Casper, gen...@soe.ucsc.edu

Dear Zhijie,

Thank you for your question regarding adjusting the minMatch parameter when using liftOver. You may be interested in reviewing related previously answered mailing list questions in our archives: https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/minMatch$20liftOver

By definition lowering the minMatch number will reduce the minimum ratio of bases that must remap, but a low minMatch number may not be of consequence in a highly conserved area, such as around genes, where a high ratio of the bases will remap regardless. Our liftOver tool was originally designed for converting coordinates between assemblies of the same species, so the -minMatch default value is set quite high (.95) as the overall assembly versions should be quite similar. When mapping between assemblies that greatly differ, or with problematic regions, we adjust this value downward. For example, for cross-species usage, tuning this parameter is not particularly helpful and we recommend using a low value such as .01. Please note the previous explanation that shares the reason for your mapping message is because the new map starts at 10,000.

I hope this is helpful. Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

--


Brian Lee

unread,
May 27, 2014, 1:11:25 PM5/27/14
to Jiang, Zhijie, Jonathan Casper, gen...@soe.ucsc.edu
Dear Zhijie,

Rather than changing the liftOver parameters, it is best to examine the hg19 and hg38 'Diff' tracks on each genome browser to review why parts of the assemblies are different:


Similar to the explanation about the telomere gap, (visible with the gap track on hg19 at chr1:1-10,000 http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=gap), if something will not convert, there is likely a good reason for that.

If you have any further questions, please review the mailing list archives or reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages