Question regarding the liftOver pipeline

162 views
Skip to first unread message

Zeta Mui

unread,
Aug 28, 2017, 11:37:51 AM8/28/17
to UCSC Genome Browser Public Help Forum
Dear UCSC personnel,

I was trying to liftover the common bean genome to the soybean genome, and I followed this tutorial: http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto. At the "Sort and filter the chains" step, the tutorial uses only the chainMergeSort utility to handle the .chain file. However, as I was examining which chain file product from the pipeline was used as the chain file for the liftOver utility, I came across the tutorial http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver and in this tutorial, it uses chainSplit to process the chain file, which is absent in the "Whole genome alignment howto". What is the function of this step and can I just continue the "Whole genome alignment howto" tutorial and use the product after the chainNet and use netChainSubset?

--
Best,
Zeta

Jairo Navarro Gonzalez

unread,
Sep 5, 2017, 3:14:37 PM9/5/17
to Zeta Mui, UCSC Genome Browser Public Help Forum

Hello Zeta,

Thank you for using the UCSC Genome Browser and your inquiry.

The purpose of the chainSplit tool is to split the chain file for each target chromosome.

================================================================
========   chainSplit   ====================================
================================================================
chainSplit - Split chains up by target or query sequence
usage:
   chainSplit outDir inChain(s)
options:
   -q  - Split on query (default is on target)
   -lump=N  Lump together so have only N split files.

The procedure you follow depends on the type of alignment you want as a result. I would recommend following the LiftOver tutorial since you want to create a liftOver file to convert annotations between the two assemblies. The Whole genome alignment tutorial uses the tool chainPreNet which removes all chains that do not have a chance of being netted. This tool is not on the LiftOver tutorial and instead uses the tool netChainSubset, which creates a single chain file using only the chains that also appear in the net.

================================================================
========   chainPreNet   ====================================
================================================================
chainPreNet - Remove chains that don't have a chance of being netted
usage:
   chainPreNet in.chain target.sizes query.sizes out.chain
options:
   -dots=N - output a dot every so often
   -pad=N - extra to pad around blocks to decrease trash
            (default 1)
   -inclHap - include query sequences name in the form *_hap*|*_alt*.
              Normally these are excluded from nets as being haplotype
              pseudochromosomes

================================================================
========   netChainSubset   ====================================
================================================================
netChainSubset - Create chain file with subset of chains that appear in the net
usage:
   netChainSubset in.net in.chain out.chain
options:
   -gapOut=gap.tab - Output gap sizes to file
   -type=XXX - Restrict output to particular type in net file
   -splitOnInsert - Split chain when get an insertion of another chain
   -wholeChains - Write entire chain references by net, don't split
    when a high-level net is encoundered.  This is useful when nets
    have been filtered.
   -skipMissing - skip chains that are not found instead of generating
    an error.  Useful if chains have been filtered.

Here is a previously answered question that may also be useful:

https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/zcHuWtmw-LE/M4kdIInQ7TQJ

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAAJ6TgoVT-%2ByuOqCaTeCT1rz%3DBNUjTMXz7Di2G1kG%2BL5uZV7vQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages