liftOver issue: Conversion from Hg19 to Hg38 for chr22

384 views
Skip to first unread message

Elizabeth Chun

unread,
Aug 6, 2021, 3:03:03 PM8/6/21
to gen...@soe.ucsc.edu

Hello,


I am noticing some inconsistent results when I use UCSC liftOver tool to convert genomic coordinates from Hg19 to Hg38, and vice versa.


For example, using a BED file containing genomic coordinates in hg19, which looks like this:

chr22    16050173    16050174    ACGT


I get the following genomic coordinates converted in hg38:

chr22    15927842    15927843    ACGT


When I check the actual sequences via UCSC genome browser, the two sets of coordinates point to different sequences in hg19 and hg38. 


For example, (*[A|T|G|C]* indicating the base at the genomic position)

the genome sequence around chr22:16050174 in hg19 is: TGCA*C*GTGG.

the genome sequence around chr22:15927843 in hg38 is: CCAC*G*TGCA.


These are completely different sequences. I have checked multiple coordinates on chr22, and this kinds of inconsistency in genome sequences at the converted positions continuously comes up for those in chr22, and not in other chromosomes.


When I took the converted genomic coordinates in chr22 in hg38, and converted back to hg19 again, the result then points to a region in a different chromosome in hg19 with the identical sequence.


So to summarize:

Hg19    chr22    16050173    16050174     gets lifted over to

Hg38    chr22    15927842    15927843    (different genome sequence between the two genome versions at the converted coordinate)


Hg38    chr22    15927842    15927843    gets lifted over to

Hg19    chr14    19792826    19792827    (identical genome sequence between the two genome versions at the converted coordinate)


Hg19    chr14    19792826    19792827    gets lifted over to

Hg38    chr14    19194878    19194879    (different genome sequence between the two genome versions at the converted coordinate)



I am aware of chr22 in hg38 being more similar to hg19 chr14 based on sequence alignments, and understand that some segments on chr22 get lifted over to chr14 in hg19. 


However, the inconsistency in this conversion from hg19 to hg38 is concerning. Chr22:15927843 in hg38 should be converted to chr14:19792827 in hg19 no matter whether the conversion is from hg19 to hg38, or from hg38 to hg19. Is there something that I need to understand for this inconsistency to exist, or is this a bug?

Thank you for your help,

Elizabeth



——————————————

Hye-Jung Elizabeth Chun, PhD.

Post-doctoral research fellow, Marra Laboratory
University of British Columbia
Canada's Michael Smith Genome Sciences Centre
BC Cancer

I respectfully acknowledge that my place of work is within the unceded land of the Coast Salish peoples, including the Sḵwx̱wú7mesh (Squamish), xʷməθkʷəy̓əm (Musqueam), Sel̓íl̓witulh (Tsleil-Waututh), Stó:lō and Stz’uminus Nations.

Matthew Speir

unread,
Aug 12, 2021, 8:12:17 PM8/12/21
to Elizabeth Chun, gen...@soe.ucsc.edu
Hello, Elizabeth.

Thank you for your question about UCSC LiftOver.

You have found one of the cases where our LiftOver chains get the answer wrong about how a region maps between two different assemblies. This usually happens in cases where the contig used to build the assembly has either been partially or even fully replaced in the new assembly. That appears to be the case here, as you can see by the brassy/gold line in the "Contigs Dropped or Changed from GRCh37(hg19) to GRCh38(hg38)" track in this session: http://genome.ucsc.edu/s/mspeir/hg19_chr22_16050174. In addition to that, you can see that this position also falls within a segmental duplication that has a greater than 99% similarity to another region in the genome, which I'm guessing, in this case, is the region our LiftOver tool matches this region to when lifting from hg38 back to hg19.

A little bit more detail from one of our engineers about how we create our LiftOver chain files:

Our liftOver chains are taken from the nets, and the nets are single-coverage on the target genome (currently the "from" genome, although there are cases in which it might make sense to go the other way).
 
When a region of hg19 maps well to two different regions of hg38, only one of those regions is kept in the net & liftOver chains, and conversely, when a region of hg38 maps well to two different regions of hg19, only one can be kept. So we expect there to be some regions that don't map symmetrically because the single-coverage restriction means they can't.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Training videos & resources: http://genome.ucsc.edu/training/index.html

Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining

---

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/1628273130322.67631%40bcgsc.ca.
Reply all
Reply to author
Forward
0 new messages