reciprocal best hit

107 views
Skip to first unread message

Anaïs Gouin

unread,
Feb 5, 2015, 1:27:47 PM2/5/15
to gen...@soe.ucsc.edu
Hello,

I runned lastz + axtchain to do a whole genome comparison between two variants of a same insect, that are almost considered as two different species. I would like tu run the netting step, following the comments here :
However, I also heard about the reciprocal best analysis, but I do not find example to follow. I found this post on the UCSC genome support forum
so I was tempted to use the chainStitchId/chainSwap program but I am not sure of what to do first.

In the example I found they applied chainStitchId/chainSwap/chainSort over the set of best chains (mm9.hg19.tBest.chain) before netting. Is there a specific program I do not hear about selecting the best chains, or is it chainPreNet that does this thing (+ the netting that allows the target positions to be single coverage)?

Here is what I would from what I read knowing that

Lastz+axtchain with A genome as reference and B genome as query
chainSort/chainPreNet/chainNet/netSyntenic do get the netting of the chains

extraction of chains from nets (ref-referenced)
chainStitchId/chainSwap to get the chains query-reference
chainSort/chainPreNet/chainNet/netSyntenic to get reciprocal best nets

netChainSubset/chainStitchId to extract again of reciprocal best chains

chainSwap to swa agin to get the reciprocal best chains in target reference (A genome)

and netting again to finish.

Do I do it in the good way? Can you give some explanations about chainStitchId? I am not sure to well understand what it does.

Thank you very much in advance for your help,

Anaïs

Matthew Speir

unread,
Feb 6, 2015, 5:24:55 PM2/6/15
to Anaïs Gouin, gen...@soe.ucsc.edu
Hi Anaïs,

Thank you for your question about . You can look at the following GenomeWiki page for some sample scripts that you can use to create your own Reciprocal Best or Syntenic Net file: http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best. You should be able to find any of the utilities referenced in those scripts on our download server at http://hgdownload.soe.ucsc.edu/downloads.html under the appropriate folder for your machine.

If you are ever curious about what a particular UCSC Genome Browser utility does, you can always run it on the command line without any arguments to see that usage message. For example, if you run chainStichId without any arguments, you should see the following usage message:

    chainStitchId - Join chain fragments with the same chain ID into a single
       chain per ID.  Chain fragments must be from same original chain but
       must not overlap.  Chain fragment scores are summed.
    usage:
       chainStitchId in.chain out.chain


I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Anaïs Gouin

unread,
Feb 9, 2015, 12:15:01 PM2/9/15
to gen...@soe.ucsc.edu
Thank you very much for your answer and for the link, it was indeed very useful.

I have however a  little question. I got no error messages during all steps of reciprocal best. But I get the warnings at the end :
Warning: mais rbest net coverage 238772284 != riz 238772322
Warning: mais rbest chain coverage 238772322 != net cov 238772284

Is it a problem? Or Does not it come from the fact that the chains can cover a longer region in the ref than in the query or reciprocally (because of gaps)?

Thanks again,

Anaïs


Anaïs Gouin

unread,
Feb 9, 2015, 12:15:05 PM2/9/15
to gen...@soe.ucsc.edu
A last question to finish. I thought that getting the reciprocal best chains would give me only one2one regions.
By one2one I mean that a region in the first genome match only with one other region of the second genome.

But actually here is an example of my reciprocal best pipeline :

chain 237344 scaffold_1 873266 + 56 5232 SFRU_RICE_003848 4548 - 1112 4548 24274
chain 5577 scaffold_1 873266 + 233 5622 SFRU_RICE_003530 4082 - 731 3943 31497

Then I have two chains from the same region in scaffold_1 that matches with two defferent scaffolds of my second genome.

Can you give me some explanations about the reciprocal best chains please?

Thanks a lot in advance,


Anaïs



Steve Heitner

unread,
Feb 10, 2015, 4:46:23 PM2/10/15
to Anaïs Gouin, gen...@soe.ucsc.edu

Hello, Anaïs.

You should indeed get single coverage.  Please ensure that you are following the exact procedure.  The warning message is probably an indication of where the problem is occurring.

Please contact us again at gen...@soe.ucsc.edu if you have any further questions. 
Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users.  If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group

--

Anaïs Gouin

unread,
Feb 18, 2015, 12:11:24 PM2/18/15
to gen...@soe.ucsc.edu
Hello Steve,

I followed the exact procedure explained at http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best. However, maybe it is the initial file that is not correct.
The first step is : Swap hg38-best chains to be oviAri3-referenced. What do you call hg38-best? Actually I was inspired by the previous cited page and did a first netting and extract what I think was the best chains by using netChainSubset.
Then I followed your pipeline using this file of chains as initial point. Is it the good way to do it?

The warning message actually only occurs when I test if all coverage figures are equal. Here is the results I get :
Warning: Maïs rbest net coverage 238772284 != Riz 238772322
Warning: Maïs rbest chain coverage 238772322 != net cov 238772284
 (Maïs and Riz being the names of my two genomes).

Can you give more explanations about what does exactly netChainSubset please? If it actually retrieves all chaine of the net, isn't normal that I do not get a single coverage at the end? Indeed, if I do not use the option splitOnInsert, I will keep a big chain and a smaller one that can be included in the longest one.

Thanks for you help,

Anaïs


Hiram Clawson

unread,
Feb 18, 2015, 5:41:08 PM2/18/15
to Anaïs Gouin, gen...@soe.ucsc.edu
Good Afternoon Anaïs:

The number of bases difference mentioned in the warning message is
a total of 38 bases. This type of warning message turns out to
be typical. If you can find the 38 bases not covered by each chain/net
combination, you may discover why they do not map exactly.

--Hiram

On 2/18/15 7:13 AM, Anaïs Gouin wrote:
> Hello Steve,
>
> I followed the exact procedure explained at http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best . However, maybe it is the initial file that is not correct.
> The first step is : Swap hg38-best chains to be oviAri3-referenced . What do you call hg38-best ? Actually I was inspired by the previous cited page and did a first netting and extract what I think was the best chains by using netChainSubset.
> Then I followed your pipeline using this file of chains as initial point. Is it the good way to do it?
>
> The warning message actually only occurs when I test if all coverage figures are equal. Here is the results I get :
> Warning: Maïs rbest net coverage 238772284 != Riz 238772322
> Warning: Maïs rbest chain coverage 238772322 != net cov 238772284
> (Maïs and Riz being the names of my two genomes).
>
> Can you give more explanations about what does exactly netChainSubset please? If it actually retrieves all chaine of the net, isn't normal that I do not get a single coverage at the end? Indeed, if I do not use the option splitOnInsert, I will keep a big chain and a smaller one that can be included in the longest one.
>
> Thanks for you help,
>
> Anaïs
>
> ----- Mail original -----
Reply all
Reply to author
Forward
0 new messages