hg19ToHg38.over.chain.gz

637 views
Skip to first unread message

Giulio Genovese

unread,
Aug 14, 2023, 5:03:00 PM8/14/23
to gen...@soe.ucsc.edu
I was wondering how the liftover chain file hg19ToHg38.over.chain.gz from here was generated. Is there documentation that explains what tools were used to create these old chain files? -Giulio

Gerardo Perez

unread,
Aug 21, 2023, 8:34:56 PM8/21/23
to Giulio Genovese, gen...@soe.ucsc.edu

Hello, Giulio.

Thank you for your interest in the UCSC Genome Browser and for your question about how the hg19ToHg38.over.chain.gz file was generated.

The hg19ToHg38.over.chain.gz file was generated by the DoSameSpeciesLiftOver.pl script. The following page of our wiki details how to use this script: http://genomewiki.ucsc.edu/index.php/DoSameSpeciesLiftOver.pl

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


On Mon, Aug 14, 2023 at 2:02 PM Giulio Genovese <giulio....@gmail.com> wrote:
I was wondering how the liftover chain file hg19ToHg38.over.chain.gz from here was generated. Is there documentation that explains what tools were used to create these old chain files? -Giulio

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CABDkqW5Vc3xWjMwYE68xdwRZMjtDNiFx85RdhLk_11oWX2zbqg%40mail.gmail.com.

Giulio Genovese

unread,
Sep 26, 2023, 3:43:36 PM9/26/23
to Gerardo Perez, gen...@soe.ucsc.edu
I have noticed that the UCSC chain file has the property of including simultaneous chain gaps in both the target assembly and the query assembly while other chain files do not have the same property. Here is an example:

$ zcat hg19ToHg38.over.chain.gz | awk -F"\t" 'NF==3 && $2>0 && $3>0' | wc -l
35658

$ zcat GRCh37_to_GRCh38.chain.gz | awk -F"\t" 'NF==3 && $2>0 && $3>0' | wc -l
0

$ zcat hg38-chm13v2.over.chain.gz | awk -F"\t" 'NF==3 && $2>0 && $3>0' | wc -l
0

Do you know what specific difference in the pipelines that generate the chains causes this qualitative discrepancy? And which approach is better than the other?

Giulio Genovese

unread,
Sep 28, 2023, 2:08:39 AM9/28/23
to UCSC Genome Browser Public Support, Gerardo Perez, gen...@soe.ucsc.edu
So I have noticed something puzzling. I think there was a bug in the way the GRCh38 <-> T2T-CHM13v2.0 chain files were generated. I have reported the problem here but basically as a result the chain files contain bits of wrong alignments here and there and maybe they should be regenerated. Would UCSC consider generating chain files for T2T-CHM13v2.0 using the more thoroughly tested DoSameSpeciesLiftOver.pl script?

Matthew Speir

unread,
Oct 6, 2023, 11:22:55 AM10/6/23
to Giulio Genovese, Gerardo Perez, gen...@soe.ucsc.edu
Hello, Giulio.

Here are some pages where you can read more about how these files were created:
For your question:

Would UCSC consider generating chain files for T2T-CHM13v2.0 using the more thoroughly tested DoSameSpeciesLiftOver.pl script?
We actually have generated these files using our pipeline, which are available on our downloads server:

For your other question

which approach is better than the other?
There doesn't seem to be a clear consensus. Different LiftOver files are better depending on your use case, so I would encourage you to try them and see what works for you.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---

Matthew Speir

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



Giulio Genovese

unread,
Nov 20, 2023, 5:00:54 PM11/20/23
to UCSC Genome Browser Public Support, Matthew Speir, Gerardo Perez
Hi Matthew,

One last pair of questions (for publication purposes).

Is it correct to say that:
(i) hg19ToHg38.over.chain.gz was generated using DoSameSpeciesLiftOver.pl and BLAT
(ii) hg38ToHs1.over.chain.gz was generated using DoBlastzChainNet.pl and LASTZ

And what script/aligner was used to generate hg38ToPanTro6.over.chain.gz (chain file between human and chimp)?

I am trying to give guidance for what chain files to use so this information would eventually be useful to a larger audience.

Giulio

Jairo Navarro Gonzalez

unread,
Dec 8, 2023, 6:21:16 PM12/8/23
to Giulio Genovese, UCSC Genome Browser Public Support, Matthew Speir, Gerardo Perez

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

Yes, the hg19ToHg38.over.chain.gz was generated using DoSameSpeciesLiftOver.pl and BLAT. The hg38ToHs1.over.chain.gz (CHM13 alignments) were created using minimap2 and by the CHM13 group. We simply imported them, and "hs1" is our internal name for T2T CHM13 as it's easier for our internal databases, but we'll be attempting to remove the internal name and show "CHM13" whenever possible.

Lastz was used to generate hg38ToPanTro6.over.chain.gz (chain file between human and chimp). Same species alignments use blat, and different species use lastz. By "same species," we mean "almost identical sequence". hg38 and hg19 share the same contigs, so the sequence is almost 100% identical. The same can be said for mm10 and mm39 genome assemblies. However, this does not extend to CHM13, as the assembly is a new sequence with a lower identity.

Hg38 to hs1 is a very unusual case: it's divergent enough that BLAT doesn't work well anymore but so repetitive that lastz falls into rabbit holes, chases repeats for days, and shows way too many overlapping alignments in the end. We, therefore, provide the minimap2 alignments. They were not made by our pipeline, but they look better than any of our own alignments. We may integrate minimap2 into our pipeline if this problem comes up again with other new assemblies.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.

All messages sent to that address are archived on a publicly accessible Google Groups forum.


If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser


Giulio Genovese

unread,
Dec 11, 2023, 12:47:45 PM12/11/23
to UCSC Genome Browser Public Support, Jairo Navarro Gonzalez, UCSC Genome Browser Public Support, Matthew Speir, Gerardo Perez, Giulio Genovese, Maximilian Haeussler
I am getting conflicting information. Through a private conversation with Maximilian Haeussler I was under the impression that:
1) hg19ToHg38.over.chain.gz was generated by UCSC using DoSameSpeciesLiftOver.pl from BLAT alignments
2) hg38ToHs1.over.chain.gz was generated by UCSC using DoBlastzChainNet.pl from LASTZ alignments
3) hg38-chm13v2.over.chain.gz was generated by the CHM13 group using nf-LO and chaintools from minimap2 alignments
4) hg38ToPanTro6.over.chain.gz was generated by UCSC using DoBlastzChainNet.pl from LASTZ alignments
Are you sure that you are not confusing hg38ToHs1.over.chain.gz with hg38-chm13v2.over.chain.gz?

Jairo Navarro Gonzalez

unread,
Dec 29, 2023, 1:37:32 PM12/29/23
to Giulio Genovese, UCSC Genome Browser Public Support, Matthew Speir, Gerardo Perez, Maximilian Haeussler

Hello,

Thank you for sending your follow-up inquiry.

You are correct, and I was confusing the two files. The hg38 and hs1 alignment was created using the lastz/chain/net procedure. You can review the steps from the following makedoc:

https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/hg38/lastzRuns.txt#L13823

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser

Reply all
Reply to author
Forward
0 new messages