filter chain file

90 views
Skip to first unread message

Lorenzo Barchi

unread,
Aug 19, 2015, 12:39:07 PM8/19/15
to gen...@soe.ucsc.edu
Hi
I am constructing a chain file between two genome assemblies of a plant species for transferring annotation of a gff file (I will use Crossmap for doing this).
Anyway, I used LASTZ, followed with axtChain, chainMergeSort and chainSplit, chainNet, netChainSubset and finally chainStitchId.
When I check my obtained chain file, I can see that for some query sequences  I have two or more target regions (corresponding to different chromosome) like here:

chain  202398 6 75408369 + 69796579 69798736 572 2157  - 0   2157 1978023
chain  10070 12 84011688 + 61122526 61122741 572 2157  - 1932   2153 10650860
chain  8358  1 112608612 + 7563194 7563350 572 2157  + 0          156 36943095

So I was wonder if there is a way to filter my final .chain to keep only the chain with the highest score (which is also the one in which the query sequence full align with the target).
Thank you in advance for the help
Lorenzo


--
Lorenzo Barchi - Ph.D.
University of Torino - Plant Genetics and Breeding
Largo Braccini 2 - 10095 Grugliasco (TO), ITALY
Tel: +39-(0)11-6708809
Fax: +39-(0)11-2368809

Jonathan Casper

unread,
Aug 20, 2015, 7:34:16 PM8/20/15
to Lorenzo Barchi, gen...@soe.ucsc.edu

Hello Lorenzo,

Thank you for your question about obtaining a single-coverage chain file. The type of file you are looking for is called a reciprocal best net. The procedure for generating this kind of file is discussed on our wiki in several related pages about building whole-genome liftOvers. The best page to start with is probably http://genomewiki.ucsc.edu/index.php/LiftOver_Howto, which contains links to the other pages. Please note that the "Same species liftover construction" page actually contains a couple of scripts to automate the entire process. You may also be interested in our Syntenic Net/Reciprocal Best page at http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best, which provides examples of the actual commands we used to create net files between the hg38 and oviAri3 assemblies.

More background information on chains and nets can be found on our wiki at http://genomewiki.ucsc.edu/index.php/Chains_Nets.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Lorenzo Barchi

unread,
Aug 21, 2015, 1:23:08 PM8/21/15
to gen...@soe.ucsc.edu
Hi
Thank you for your help.
Unfortunately, if I check my rbest,chain obtained during '# Swap to get hg38-ref'd reciprocal best chain'  
I still have some  queries still matching in two different target regions (on the same chromosome).
Is it possible to remove somehow the remaining lower score chains?
Best
Lorenzo 





--
Lorenzo Barchi - Ph.D.
University of Torino - Plant Genetics and Breeding
Largo Braccini 2 - 10095 Grugliasco (TO), ITALY
Tel: +39-(0)11-6708809
Fax: +39-(0)11-2368809

Matthew Speir

unread,
Aug 21, 2015, 4:07:11 PM8/21/15
to Lorenzo Barchi, gen...@soe.ucsc.edu
Hi Lorenzo,

You are looking for a file containing "reciprocal best nets", not chains. If you continue following the directions on the wiki page http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best, starting at the section "# Net those on hg38 to get hg38-ref'd reciprocal best net:" then should get a file that contains these reciprocal best nets.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages