Coverage vs CoverageQ2Q3

134 views
Skip to first unread message

dethlefs

unread,
Aug 28, 2017, 5:28:55 PM8/28/17
to Anvi'o
Greeting Anvians!

As I understand it, the mean coverage presented in the interactive or refine displays is based only on the particular split/contig and sample for which it's calculated: it's the arithmetic mean of the coverage in that sample across all nucleotides in the contig.  

And then from Mike Lee's post (http://merenlab.org/2017/05/08/anvio-views/), the Q2Q3 coverage of a contig in a sample sounds like it is *also* based only on that contig and sample; it's just a trimmed mean where the 25% of nucleotide positions in the contig that have the highest coverage, and the 25% of nucleotides with the lowest coverage, are excluded when calculating the mean.

So I'm confused by these two screenshots: I'm using anvi-refine on a particular bin obtained via CONCOCT based on clustering across about a hundred samples (which are in temporal order from the top of the screen toward the bottom).  Obviously, the display of mean coverage shows that these contigs were binned together by CONCOCT because they all attain very high coverage in 6 sequential samples towards the end of the series (and one other sample towards the middle)...otherwise, the contigs in this bin are mostly low coverage.

But if the Q2Q3 coverage is only a trimmed mean of coverage within the contig and sample, why does the pattern look so dramatically different in the second screenshot? I suppose it's mathematically possible that for essentially all the 1,220 contigs in this bin, in the 6 or 7 samples where the untrimmed mean coverage is high, it's only a subset of the 25% highest-coverage nucleotides that drive the high coverage...and the trimmed mean coverage shows instead that these 6 or 7 samples aren't so distinct from all other samples.

That doesn't seem likely, though...instead, it seems to me that the Q2Q3 picture results from some cross-sample comparisons, e.g. for a given contig, the trimmed mean is obtained from coverage by nucleotide position across ALL samples, and most nucleotide positions in most contigs in the half dozen high-abundance samples are among the 25% highest coverage values, and hence are dropped. 

Am I confused here, either about what Q2Q3 coverage is supposed to do or what might give rise to this kind of pattern?  Or is the formula for Q2Q3 not doing what you think it's doing?

Thanks!

Les

MeanCoverage.tiff
MeanCoverageQ2Q3.tiff

A. Murat Eren

unread,
Aug 28, 2017, 6:27:06 PM8/28/17
to Anvi'o
Hey Les,

You are correct about the mean coverage and mean coverage Q2Q3, and I agree that the two screenshots you sent look very different. 

This could be differences in normalization (from the settings panel), but I will not bet on it since I'm certain you checked for that.

It is possible that this is due to very very abundant 'non-specific mapping' from those 6 metagenomes to these contigs (imagine a population that is only abundant in those days that did not get assembled, but very closely related to this assembled set of contigs ... Q2Q3 removes most skyscraper coverages due to non-specific mapping, and you see its true low abundance). But I see at least four genome bins in this cluster, which makes this scenario quite unlikely.

Did you inspect some of those contigs to get a feeling of actual nucleotide-level coverage? How do they look like? Because this will be the most accurate way to say something. I am sorry in advance that it will suck to way to collect coverage data from that many samples for large contigs to generate the inspection page :( But it will be worth the wait.

Apart from these questions I have a suggestion: if you use the 'push' button to send your view to anvi-server for this bin (for both views separately) and share private URLs to access to that data from the home -> share scree on anvi-server, it would be much more easier to comment on these. The screenshots do not contain a lot of key information to offer accurate feedback.


Best wishes,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/5a94bfc5-6b98-45da-b3e2-0480d0dce33d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages