Issue with liftOver [hg19ToHg38.over.chain.gz]

444 views
Skip to first unread message

Zhu, Helen

unread,
Dec 2, 2021, 7:57:48 PM12/2/21
to gen...@soe.ucsc.edu
Hello!

I noticed an issue when lifting over SNPs from hg19 to hg38 (using picard liftOverVcf with the hg19toHg38.over.chain.gz but this error is reproducible on the UCSC liftOver browser).
    1. https://www.ncbi.nlm.nih.gov/snp/?term=rs61776849
      1. rs61776849 (this is the correct label for that risk snp)
        1:1714797 (GRCh38)
        1:1646236 (GRCh37)
    2. https://www.ncbi.nlm.nih.gov/snp/?term=rs72468214
      1. rs72468214 is supposed to be:
      2. 1:1647642 (GRCh38)
        1:1583003 (GRCh37)
and got the same position for both (the ref/alt match is coincidental, the mapping stats are very different)

#CHROM POS ID REF ALT 
chr1 1714797 rs72468214 C T
chr1 1714797 rs61776849 C T  

I tested this result on the LiftOver webserver by lifting over from hg19 to hg38
chr1:1583003-1583003
chr1:1646236-1646236

and both mapped to chr1-1714797:1714797.

Then I tested the reverse on the LiftOver webserver by lifting over from hg38 to hg19
chr1-1714797:1714797

and it only returned the correct coordinate according to dbSNP
chr1:1646236-1646236

Best wishes,
Helen

This e-mail may contain confidential and/or privileged information for the sole use of the intended recipient.
Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited.
If you have received this e-mail in error, please contact the sender and delete all copies.
Opinions, conclusions or other information contained in this e-mail may not be that of the organization.

If you feel you have received an email from UHN of a commercial nature and would like to be removed from the sender's mailing list please do one of the following:
(1) Follow any unsubscribe process the sender has included in their email
(2) Where no unsubscribe process has been included, reply to the sender and type "unsubscribe" in the subject line. If you require additional information please go to our UHN Newsletters and Mailing Lists page.
Please note that we are unable to automatically unsubscribe individuals from all UHN mailing lists.


Patient Consent for Email:

UHN patients may provide their consent to communicate with UHN about their care using email. All electronic communication carries some risk. Please visit our website here to learn about the risks of electronic communication and how to protect your privacy. You may withdraw your consent to receive emails from UHN at any time. Please contact your care provider, if you do not wish to receive emails from UHN.

Luis Nassar

unread,
Dec 8, 2021, 4:21:15 PM12/8/21
to Zhu, Helen, gen...@soe.ucsc.edu
Hello, Helen.

Thank you for your interest in the Genome Browser.

When lifting SNPs with rsIDs between human, it is highly recommended that you look up the updated coordinates in the target assembly's dbSNP table. In essence, you can use the list of rsIDs to query the updated coordinates from the source data. Looking both of these up this way in hg19 or hg38 result in the correct coordinates.

Take a look at the following archive question for more details on how to accomplish this using the Table Browser: https://groups.google.com/a/soe.ucsc.edu/g/genome/c/SM0Ae7NMf4k/m/oh4_M8DeAgAJ

Now for the longer answer, one of these SNPs is located in a GRC incident region (https://www.ncbi.nlm.nih.gov/grc/human/issues/HG-172). It appears that clone AL691432.54 had some missing sequence, and in hg38 they added some novel sequence with FO704657. rs72468214 falls in the middle of this new sequence.

This can be seen by comparing the following two sessions:

hg19: http://genome.ucsc.edu/s/Lou/hg19RM28582
vs
hg38: http://genome.ucsc.edu/s/Lou/hg38RM28582

liftOver actually matches to both locations, although it seems ultimately the non-novel sequence was higher scoring (http://genome.ucsc.edu/s/Lou/hg38selfChain). Which happens to land where the other rsID is.

It is for this reason that looking up the new coordinates directly from dbSNP is best. LiftOver is generally accurate, however, when you try to lift very small regions (especially SNPs) in areas that have new sequence, that accuracy can drop.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/YT2PR01MB47184E3DA4C68691B6E66FDB9F699%40YT2PR01MB4718.CANPRD01.PROD.OUTLOOK.COM.
Reply all
Reply to author
Forward
0 new messages