Liftover hg19 to hg38

294 views
Skip to first unread message

Anna Lioznova

unread,
May 30, 2017, 10:41:05 AM5/30/17
to gen...@soe.ucsc.edu, Yulia Medvedeva
Dear all,

I faced a problem using liftover to move from hg19 to hg38.

I have a short genomic interval from hg19 build namely
chr1    145176339    145176456

The sequence here is
tgtggcaggcactgtgttgacatatccagtataGGAGTGCCCTGGGAGCCCATCTCTCATTTCTGAAAGA
GATAGCATTGTAGATCTGGACGTTTCATCACATATTCCCAGGAAAGC

If one will try to blast this sequence on hg38, one will see one 100% exact match (of course of the same length)
chr1 148712445 148712561

Unfortunately when I l try to run liftover hg19 to hg 38, I get an interval of length 2450870 which is
chr1    146261691    148712561

As you can see, the left coordinate of the loci is exactly the same as coordinate outputted by blast, but right coordinate is far away.

What's the problem with this locus?

Best regards,
Anna

Bert Gold

unread,
May 30, 2017, 1:19:35 PM5/30/17
to Anna Lioznova, gen...@soe.ucsc.edu, Yulia Medvedeva
Anna,

You can resolve this kind of thing by BLAST ing against htgs and Trace Archive.  There are Whole Genome Sequencing contigs of 1000 base pairs or so that perfectly match this sequence.  You build out and then blat.  This is the work of genomics.  You have to decide you are pretty interested in a particular sequence for some good reason before trying to fit it in.  The human genome is not fully sequenced and understood.  Mostly, but not fully.

Best,

Bert Gold


Bert Gold, MB(ASCP)CM, Ph.D., FACMGG, CGMBS  
Senior Clinical Molecular Geneticist
Natera, Inc.
201 Industrial Rd
San Carlos, California

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAPEeTZ%2B8NdXqAOwqT4NLup1KeSB7W8mq8uWiT9Gyg3J_8ZieGA%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Anna Lioznova

unread,
May 31, 2017, 11:01:40 AM5/31/17
to Bert Gold, gen...@soe.ucsc.edu, Yulia Medvedeva
Dear Bert,

thanks for your letter and thank you for the idea.

We managed to find the correct locus in this particular case, but we are worried about liftover output because it is used by many researchers and such kind of things can affect the results.

Thanks a lot,
Anna Lioznova

Matthew Speir

unread,
Jun 7, 2017, 12:27:09 PM6/7/17
to Anna Lioznova, gen...@soe.ucsc.edu, Yulia Medvedeva
Hi Anna,

Thank you for your question about lifting over regions between the hg19 and hg38 genomes.

LiftOver isn't perfect and if the results don't seem correct to you, you can and should always use BLAT to realign the sequence for that region and see if it matches up with the LiftOver results. Additionally, you can use the "Hg38 Diff" track for the hg19 assembly or the "Hg19 Diff" track for the hg38 assembly to see if the contigs used to assemble a particular region have changed between assemblies. In the following session, you can see the "Hg38 Diff" track (in red) alongside a custom track of BLAT results for your sequence: https://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=mspeir&hgS_otherUserSessionName=hg19_blatLiftRegion. For the "Hg38 Diff" track, red indicates that this is an "hg19 contig dropped in the construction of the hg38 assembly" and that converting the coordinates from hg19 --> hg38 may present some difficulties.

However, the excellent example that you provided may indicate some underlying issues with the programs we used to create these "LiftOver" files. While we have limited resources to investigate this issue, but we hope to get to it sometime in the future.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages