Hello, Giulio.
Thank you for your interest in the UCSC Genome Browser and for your question about how the hg19ToHg38.over.chain.gz file was generated.
The hg19ToHg38.over.chain.gz file was generated by the DoSameSpeciesLiftOver.pl script. The following page of our wiki details how to use this script: http://genomewiki.ucsc.edu/index.php/DoSameSpeciesLiftOver.pl
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
I was wondering how the liftover chain file hg19ToHg38.over.chain.gz from here was generated. Is there documentation that explains what tools were used to create these old chain files? -Giulio
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CABDkqW5Vc3xWjMwYE68xdwRZMjtDNiFx85RdhLk_11oWX2zbqg%40mail.gmail.com.
Would UCSC consider generating chain files for T2T-CHM13v2.0 using the more thoroughly tested DoSameSpeciesLiftOver.pl script?
which approach is better than the other?
Matthew Speir
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CABDkqW62Mu%2B4CTJdeP1aJ%2BpZB0d_cA13CX%3D-Vy8BAF-G9qH03w%40mail.gmail.com.
Hello,
Thank you for using the UCSC Genome Browser and sending your inquiry.
Yes, the hg19ToHg38.over.chain.gz was generated using DoSameSpeciesLiftOver.pl and BLAT. The hg38ToHs1.over.chain.gz (CHM13 alignments) were created using minimap2 and by the CHM13 group. We simply imported them, and "hs1" is our internal name for T2T CHM13 as it's easier for our internal databases, but we'll be attempting to remove the internal name and show "CHM13" whenever possible.
Lastz was used to generate hg38ToPanTro6.over.chain.gz (chain file between human and chimp). Same species alignments use blat, and different species use lastz. By "same species," we mean "almost identical sequence". hg38 and hg19 share the same contigs, so the sequence is almost 100% identical. The same can be said for mm10 and mm39 genome assemblies. However, this does not extend to CHM13, as the assembly is a new sequence with a lower identity.
Hg38 to hs1 is a very unusual case: it's divergent enough that BLAT doesn't work well anymore but so repetitive that lastz falls into rabbit holes, chases repeats for days, and shows way too many overlapping alignments in the end. We, therefore, provide the minimap2 alignments. They were not made by our pipeline, but they look better than any of our own alignments. We may integrate minimap2 into our pipeline if this problem comes up again with other new assemblies.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/27d06af5-a064-4fd1-bde7-632d7fd314aan%40soe.ucsc.edu.
Hello,
Thank you for sending your follow-up inquiry.
You are correct, and I was confusing the two files. The hg38 and hs1 alignment was created using the lastz/chain/net procedure. You can review the steps from the following makedoc:
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser