Coverage difference in hg19/mm10 example nets

56 views
Skip to first unread message

Peter Ebert

unread,
Aug 16, 2016, 11:36:19 AM8/16/16
to gen...@soe.ucsc.edu
Hi,
I am trying to wrap my head around the information contained in chain and net files. I have read (Jonathan Caspar, groups.google.com/a/soe.ucsc.edu/d/msg/genome/wzRkJp4XDKw/_8dHfhLXFAAJ ) that, in order to get the single coverage alignment information between two species, one should use reciprocal best net (and not chain) files. I also read the explanations in the genome Wiki (genomewiki.ucsc.edu/index.php/Chains_Nets) and in this post (Brooke Rhead, groups.google.com/a/soe.ucsc.edu/d/msg/genome/bYaydHLNUb0/DovSn8gs6j8J ) and then took the part of the provided script that tests for equal net coverage for the rbest net files downloaded here hgdownload-test.cse.ucsc.edu/goldenPath/hg19/vsMm10/reciprocalBest/ (mm10.hg19.rbest.net.gz and hg19.mm10.rbest.net.gz) and executed it. Doing so results in the following warning:

Warning: hg19 rbest net coverage 967305078 != mm10 967305089

The difference is small, granted, but it does not satisfy my understanding of single coverage in both target and query species. Now I found another post by Hiram ( groups.google.com/a/soe.ucsc.edu/d/msg/genome/H2O-cTdC1-o/1CCay9hVnp0J ) where he states that "this type of warning message turns out to be typical" (as I understand: minor differences in coverage); unfortunately, it is not explained why this is typical (i.e., to be expected?) and how this is to be interpreted in the context of looking for single coverage tracks. Can you shed light on this?
I presume the only solution to this problem is to filter out those genomic positions where the coverage is not symmetrical? Concerning that, is there a tool in the Kent source tree that outputs the one-to-one mapping between target and query; the netToBed tools seems to limit the information about the query to the chromosome only, but does not print the exact genomic coordinates (this would simplify this filtering step)?
Thanks a lot for your help and advice.
Best,
Peter




Cath Tyner

unread,
Aug 17, 2016, 2:43:57 PM8/17/16
to Peter Ebert, UCSC Genome Browser Public Help Forum
Hello Peter,

Thank you for using the UCSC Genome Browser and for asking about chain and net files. One of our engineers has carefully examined the situation that you described by doing the steps below:

1. The hg19.mm10 lift over chain file is swapped to make an equivalent mm10.hg19 lift over chain.
2. That swapped chain file is then run through chainNet and netSyntenic to obtain a reciprocal best net file
3. That reciprocal best net file is used via netChainSubset to extract a chain file from the swapped lift over chain
4. That extracted chain file is then swapped to get a reciprocal best chain file for the hg19.mm10 direction
5. This second reciprocal chain is run through chainNet and netSyntenic to obtain this second reciprocal best net file.
 
To make the measurements to see if everything is OK:
 
1. Both reciprocal chain files are converted to PSL files to be measured
2. Both reciprocal net files are converted to BED files to be measured
 
The measurements indicate an 11 base difference in coverage between one reciprocal best result with the other.
Chaining and netting are not simple operations. They may not be symmetrical operations,
there may be some slight difference in each direction.
 
I tried taking these results and running them around in another cycle to get comparable bed files
in the same coordinate system to see what might be missing, but this led to even more missing bases.
Evidently the cycle itself does something to cause bases to go missing.

Ultimately, we do not have the resources to uncover exactly why the 11 bases out of a billion are missing, as there can be an expected amount of "noise" in the bioinformatics process. 

Here is a related Biostars question which may be helpful to review.

My apologies for not being able to provide a better answer for you. Please do continue to ask questions in this forum, and we will provide support in any way that we are able to do so. 

Thank you again for your inquiry and for using the UCSC Genome Browser. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscribe@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Peter Ebert

unread,
Aug 29, 2016, 10:36:31 AM8/29/16
to UCSC Genome Browser Public Help Forum
Hello Cath,
thanks a lot for investigating the situation. Now I understand that limited resources may not allow to solve the problem once and for all, but may I suggest that you update the information in the Genome Wiki to be more explicit that the script posted there does not (necessarily) produce reciprocal best single coverage files, as shown by using the hg19 mm10 example data? Given the recurring user inquiries about this type of information that I see on the mailing list, it is probably a very useful piece of information for many people out there.
Thanks again for your help.
Best,
Peter

  * Subscribe: Email genome-announce+subs...@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsub...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

Christopher Lee

unread,
Sep 1, 2016, 12:12:37 PM9/1/16
to Peter Ebert, UCSC Genome Browser Public Help Forum

Hi Peter,

Thank you for your suggestion about adding a notice to the GenomeWiki about creating reciprocal best chains and nets. We will consider updating this page in the future.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute



Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


On Tue, Aug 16, 2016 at 1:42 AM, Peter Ebert <ptr....@gmail.com> wrote:
Hi,
I am trying to wrap my head around the information contained in chain and net files. I have read (Jonathan Caspar, groups.google.com/a/soe.ucsc.edu/d/msg/genome/wzRkJp4XDKw/_8dHfhLXFAAJ ) that, in order to get the single coverage alignment information between two species, one should use reciprocal best net (and not chain) files. I also read the explanations in the genome Wiki (genomewiki.ucsc.edu/index.php/Chains_Nets) and in this post (Brooke Rhead, groups.google.com/a/soe.ucsc.edu/d/msg/genome/bYaydHLNUb0/DovSn8gs6j8J ) and then took the part of the provided script that tests for equal net coverage for the rbest net files downloaded here hgdownload-test.cse.ucsc.edu/goldenPath/hg19/vsMm10/reciprocalBest/ (mm10.hg19.rbest.net.gz and hg19.mm10.rbest.net.gz) and executed it. Doing so results in the following warning:

Warning: hg19 rbest net coverage 967305078 != mm10 967305089

The difference is small, granted, but it does not satisfy my understanding of single coverage in both target and query species. Now I found another post by Hiram ( groups.google.com/a/soe.ucsc.edu/d/msg/genome/H2O-cTdC1-o/1CCay9hVnp0J ) where he states that "this type of warning message turns out to be typical" (as I understand: minor differences in coverage); unfortunately, it is not explained why this is typical (i.e., to be expected?) and how this is to be interpreted in the context of looking for single coverage tracks. Can you shed light on this?
I presume the only solution to this problem is to filter out those genomic positions where the coverage is not symmetrical? Concerning that, is there a tool in the Kent source tree that outputs the one-to-one mapping between target and query; the netToBed tools seems to limit the information about the query to the chromosome only, but does not print the exact genomic coordinates (this would simplify this filtering step)?
Thanks a lot for your help and advice.
Best,
Peter




--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages