Greeting Anvians!
As I understand it, the mean coverage presented in the interactive or refine displays is based only on the particular split/contig and sample for which it's calculated: it's the arithmetic mean of the coverage in that sample across all nucleotides in the contig.
And then from Mike Lee's post (
http://merenlab.org/2017/05/08/anvio-views/), the Q2Q3 coverage of a contig in a sample sounds like it is *also* based only on that contig and sample; it's just a trimmed mean where the 25% of nucleotide positions in the contig that have the highest coverage, and the 25% of nucleotides with the lowest coverage, are excluded when calculating the mean.
So I'm confused by these two screenshots: I'm using anvi-refine on a particular bin obtained via CONCOCT based on clustering across about a hundred samples (which are in temporal order from the top of the screen toward the bottom). Obviously, the display of mean coverage shows that these contigs were binned together by CONCOCT because they all attain very high coverage in 6 sequential samples towards the end of the series (and one other sample towards the middle)...otherwise, the contigs in this bin are mostly low coverage.
But if the Q2Q3 coverage is only a trimmed mean of coverage within the contig and sample, why does the pattern look so dramatically different in the second screenshot? I suppose it's mathematically possible that for essentially all the 1,220 contigs in this bin, in the 6 or 7 samples where the untrimmed mean coverage is high, it's only a subset of the 25% highest-coverage nucleotides that drive the high coverage...and the trimmed mean coverage shows instead that these 6 or 7 samples aren't so distinct from all other samples.
That doesn't seem likely, though...instead, it seems to me that the Q2Q3 picture results from some cross-sample comparisons, e.g. for a given contig, the trimmed mean is obtained from coverage by nucleotide position across ALL samples, and most nucleotide positions in most contigs in the half dozen high-abundance samples are among the 25% highest coverage values, and hence are dropped.
Am I confused here, either about what Q2Q3 coverage is supposed to do or what might give rise to this kind of pattern? Or is the formula for Q2Q3 not doing what you think it's doing?
Thanks!
Les