Discrepancies in data

12 views
Skip to first unread message

Nitish Tayal

unread,
Dec 6, 2016, 11:11:32 AM12/6/16
to gen...@soe.ucsc.edu, Kuljeet Sandhu
Respected Sir,

I was converting the coordinates from hg19 to hg38 assemblies for human using 0.99 as minimum match but when I took the success cases and matched them back to hg19 then I got many failures. One of the coordinates is chr1:120760888-120761029 in hg19 which mapped to chr1:121262308-121262449 in hg38 under 0.99 mismatch but on checking the sequence of these two coordinates I got to know that they are entirely different. Kindly explain the reason for such discrepancies.

Thanking you.
Yours truly,
Nitish Tayal
JRF, IISER Mohali

Chris Villarreal

unread,
Dec 12, 2016, 2:48:46 PM12/12/16
to Nitish Tayal, gen...@soe.ucsc.edu, Kuljeet Sandhu
Dear Nitish Tayal,

Thank you for your question about the UCSC Genome Browser. I mapped the DNA sequence in chr1:120760888-120761029 from hg19, then used BLAT to find said sequence to hg38. I used the chr1:121262309-121262450 location and looked at the the side-by-side alignment shown here:


000000001 aagatttctgcctaattgcctctatctccactcttccttccccttccctc 000000050
<<<<<<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<<<<<<
121262450 aagatttctgcctaattgcctctatctccactcttccttccccttccctc 121262401

000000051 tccacctccagaggggagttcccgctggaaattgcacaattctttgtgca 000000100
<<<<<<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<<<<<<
121262400 tccacctccagaggggagttcccgctggaaattgcacaattctttgtgca 121262351

000000101 gagagaaacaacaagcttagttcctgttgacctgaagagcat 000000142
<<<<<<<<< |||||||||||||||||||||||||||||||||||||||||| <<<<<<<<<
121262350 gagagaaacaacaagcttagttcctgttgacctgaagagcat 121262309


You can see a 100% match between the DNA at chr1:120760888-120761029 from hg19 and the new location in chr1:121262309-121262450 on hg38. I am unable to replicate any discrepancies between these locations. If I misunderstood your question please provide us with more details. I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

-Chris V
UCSC Genome Browser

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Nitish Tayal

unread,
Dec 13, 2016, 10:51:59 AM12/13/16
to Chris Villarreal, gen...@soe.ucsc.edu
The discrepancy is when you try to map the obtained sequence in hg38 back to hg19 using liftover tool, it does not map back. Try using chr9:45507945-45508127 in hg19. It will map to hg38 at the position chr9:41424922-41425104 using 0.99 cutoff. Now change your parameters to map from hg38 to hg19. Logically, the position chr9:41424922-41425104 in hg38 should give the position chr9:45507945-45508127 in hg19 but it gives a failed match. Kindly see both ways using liftover from UCSC browser only and you will understand the problem.

Chris Villarreal

unread,
Dec 14, 2016, 11:43:31 AM12/14/16
to Nitish Tayal, UCSC
Dear Nitish Tayal,

Thank you for your question about the UCSC Genome Browser. The liftOver utility uses chain files to try to find regions of long-range homology. In regions that are part of segmental duplications in hg19, like the example you provided, it's not uncommon for reciprocal alignments to get confused about which is the most homologous when going back and forth between assemblies. 

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

-Chris V
UCSC Genome Browser
Reply all
Reply to author
Forward
0 new messages