Hello,
I am noticing some inconsistent results when I use UCSC liftOver tool to convert genomic coordinates from Hg19 to Hg38, and vice versa.
For example, using a BED file containing genomic coordinates in hg19, which looks like this:
chr22 16050173 16050174 ACGT
I get the following genomic coordinates converted in hg38:
chr22 15927842 15927843 ACGT
When I check the actual sequences via UCSC genome browser, the two sets of coordinates point to different sequences in hg19 and hg38.
For example, (*[A|T|G|C]* indicating the base at the genomic position)
the genome sequence around chr22:16050174 in hg19 is: TGCA*C*GTGG.
the genome sequence around chr22:15927843 in hg38 is: CCAC*G*TGCA.
These are completely different sequences. I have checked multiple coordinates on chr22, and this kinds of inconsistency in genome sequences at the converted positions continuously comes up for those in chr22, and not in other chromosomes.
When I took the converted genomic coordinates in chr22 in hg38, and converted back to hg19 again, the result then points to a region in a different chromosome in hg19 with the identical sequence.
So to summarize:
Hg19 chr22 16050173 16050174 gets lifted over to
Hg38 chr22 15927842 15927843 (different genome sequence between the two genome versions at the converted coordinate)
Hg38 chr22 15927842 15927843 gets lifted over to
Hg19 chr14 19792826 19792827 (identical genome sequence between the two genome versions at the converted coordinate)
Hg19 chr14 19792826 19792827 gets lifted over to
Hg38 chr14 19194878 19194879 (different genome sequence between the two genome versions at the converted coordinate)
I am aware of chr22 in hg38 being more similar to hg19 chr14 based on sequence alignments, and understand that some segments on chr22 get lifted over to chr14 in hg19.
However, the inconsistency in this conversion from hg19 to hg38 is concerning. Chr22:15927843 in hg38 should be converted to chr14:19792827 in hg19 no matter whether the conversion is from hg19 to hg38, or from hg38 to hg19. Is there something that I need to understand for this inconsistency to exist, or is this a bug?
Thank you for your help,
Elizabeth
——————————————
Our liftOver chains are taken from the nets, and the nets are single-coverage on the target genome (currently the "from" genome, although there are cases in which it might make sense to go the other way).
When a region of hg19 maps well to two different regions of hg38, only one of those regions is kept in the net & liftOver chains, and conversely, when a region of hg38 maps well to two different regions of hg19, only one can be kept. So we expect there to be some regions that don't map symmetrically because the single-coverage restriction means they can't.
Matthew Speir
UCSC Cell Browser, Quality Assurance and Data Wrangler
Human Cell Atlas, User Experience Researcher
UCSC Genome Browser, User Support
UC Santa Cruz Genomics Institute
Revealing life’s code.
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/1628273130322.67631%40bcgsc.ca.