liftover files clarification

303 views
Skip to first unread message

Johan Henriksson

unread,
Jul 27, 2017, 11:33:06 AM7/27/17
to gen...@soe.ucsc.edu
hi!
I am trying to lift regions from human<->mouse and I have found

and

could someone please clarify the difference as these files are fairly different? would be worth also adding a note to the FTP site

thanks!
/Johan



--
--
-----------------------------------------------------------
Johan Henriksson, PhD
Karolinska Institutet / European Bioinformatics Institute (EMBL-EBI)
Labstory - Integrated laboratory documentation and databases (www.labstory.se)
http://mahogny.areta.org  http://www.endrov.net

Cath Tyner

unread,
Aug 2, 2017, 1:45:43 PM8/2/17
to Johan Henriksson, UCSC Genome Browser Public Help Forum
Hi Johan,

Thanks for contacting the UCSC Genome Browser support team, and thank you for your patience while waiting for our response regarding the definitions and differences of the <db1>To<Db2>.over.chain.gz and the "vs<Db$>" <db1>.<db2>.all.chain.gz files. Below is a summary followed by additional details which explain the major differences of these files

Summary:

  • hg38.mm10.all.chain.gz contains all chained lastz alignments of human hg38 and mouse mm10 (Mouse Chains track).

  • hg38ToMm10.over.chain.gz is a filtered subset of hg38.mm10.all.chain.gz, containing only the parts of chains that are included in the single-coverage "net" (Mouse Nets track).

  • The vsMm10 download directory contains chains, nets and a couple more stringently filtered versions of nets briefly described in the README.

  • The liftOver download directory contains the complete set of *.over.chain.gz (i.e. net-filtered chains) from hg38 to other genomes, for use with our liftOver utility or web tool.

  • Viewing the Mouse Chains and Mouse Nets tracks (as seen in this session) in the Genome Browser helps to visualize the differences. Both are subtracks of the Placental Chain/Net track in the Comparative Genomics group. The chain track represents the all.chain file, and the net track represents the over.chain file. 

Details:

all.chain files
See the index-page README for the all.chain.gz files here:
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/vsMm10/

The lastz alignment data is described here: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/vsMm10/
In addition to chains, the all.chain file stores lastz alignment data, like this:

##matrix=axtChain 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91
##gapPenalties=axtChain O=400 E=30
# lastz.v1.03.66 H=2000 --format=axt+
#
# hsp_threshold      = 3000
# gapped_threshold   = 3000
# x_drop             = 910
# y_drop             = 9400
# gap_open_penalty   = 400
# gap_extend_penalty = 30
#        A    C    G    T
#   A   91 -114  -31 -123
#   C -114  100 -125  -31
#   G  -31 -125  100 -114
#   T -123  -31 -114   91
# seed=1110100110010101111 w/transition
# step=1

over.chain files
See the index-page README for over.chain files here: 
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver

Here are a few lines from the liftOver/hg38ToMm10.over.chain.gz file.
The over.chain file contains one line for each alignment chain, followed by multiple lines describing the alignment gaps within that chain.
Chain format: http://genome.ucsc.edu/goldenPath/help/chain.html

chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id
chain 619757641 chr14 107043718 + 24687985 105865965 chr12 120129022 + 44636774 113430848 1
11    0    1
74    0    4
100    0    1
24    0    2
7    3    0
13    551    1060
13    2    0
13    1    0
11    0    1

Resources:

Please respond to this list if you have further questions!

Thank you for contacting the UCSC Genome Browser support team. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe...@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscri...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CANu_RTpaYjRJ54JVsoowmaK%3DBZi6e1PBrd9vtWOWKfc3C2W48g%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages