looking for best chains

145 views
Skip to first unread message

Anaïs Gouin

unread,
Mar 31, 2015, 12:43:46 PM3/31/15
to gen...@soe.ucsc.edu
Good morning,

I would like to get the reciprocal best chains for my alignments. And I realized that your pipeline (http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best) starts from best chains in one way (genomeA-referenced/genomeB as query).
I looked at this page http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto where it is said taht we have to use axtSort and axtBest to "keep only the longest chains". Is taht the way to get the best chains?

But thiese two tools are usable on axt file. So should I use it directly on the alignments on axt format or should I generate the chains with axtChain then convert the chain format in axt (if it is possible) and then use this filetring step directly on the chains?

Thanks very much in advance for your help.

Best,

Anaïs

Jonathan Casper

unread,
Apr 10, 2015, 7:31:06 PM4/10/15
to Anaïs Gouin, gen...@soe.ucsc.edu

Hello Anaïs,

Thank you for your question about creating reciprocal-bets alignment files. Comments elsewhere in the whole genome alignment page (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto) explain that axtBest is no longer used by UCSC (e.g., at the bottom of the page). Instead, the pipeline on the page that you referenced, http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best, starts from a liftOver chain file (in that case, hg38.oviAri3.over.chain.gz). More information on generating a liftOver chain file can be found at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto. Note that while BLAT is the alignment tool used on that page, other alignment tools can also be used. For the data that were fed into the creation of hg38.oviAri3.over.chain.gz, we performed the alignments with lastz.

The outline of the process on that page is:
1. Generate PSL alignments (e.g., with BLAT or lastz).
2. Turn those alignments into chains with axtChain.
3. Merge the short chains using chainMergeSort, chainSplit, and chainSort.
4. You may wish to filter your chains at this point with chainPreNet, to remove chains that don't have a chance of being part of the final file.
5. Create a net from the chains using the chainNet program, pass that to netSyntenic to add synteny information, use netChainSubset to create a liftOver file, and finally (optionally) join chain fragments with chainStitchId (this is skipped on the wiki page).

The result is a liftOver chain file of the sort used to begin the reciprocal best pipeline.

Here is an example segment from the scripts that created hg38.oviAri3.over.chain.gz. This corresponds to steps 4 and 5 above.

# Make nets ("noClass", i.e. without rmsk/class stats which are added later):
chainPreNet  hg38.oviAri3.all.chain.gz /hive/data/genomes/hg38/chrom.sizes /hive/data/genomes/oviAri3/chrom.sizes stdout \
| chainNet  stdin -minSpace=1 /hive/data/genomes/hg38/chrom.sizes /hive/data/genomes/oviAri3/chrom.sizes stdout /dev/null \
| netSyntenic stdin noClass.net

# Make liftOver chains:
netChainSubset -verbose=0 noClass.net hg38.oviAri3.all.chain.gz stdout \
| chainStitchId stdin stdout | gzip -c > hg38.oviAri3.over.chain.gz

You may also be interested in reading our wiki entry on Chains and Nets, which provides a bit more background on the relationship between the files: http://genomewiki.ucsc.edu/index.php/Chains_Nets.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages