Samples with no otu hits are darkest in otu heat map

119 views
Skip to first unread message

Luke McKay

unread,
Apr 5, 2016, 12:02:11 PM4/5/16
to Qiime 1 Forum
Hi!
I'm having a problem with my otu heat map coloring pattern.  When using make_otu_heatmap.py, higher relative abundance corresponds to more intense colors and low rel abundance uses lighter colors.  However, apparently when relative abundance is zero, Qiime by default gives the DARKEST possible color from the color spectrum chosen. In other words, when using the default spectrum (YlGn) it's like this:

No hits (darkest green), lots of hits (dark green), few hits (light yellow).  It would make a lot more sense if the otus not present in a sample were colored white or very light yellow.  Is there a way to make this happen?

I've attached a heat map I made to illustrate this problem.  In the last sample listed, "WS0a", the darkest color purple occurs for the otu represented by k__Archaea;p__Crenarchaeota;c__Thaumarchaeota;o__Nitrosocaldales;f__Nitrosocaldaceae;g__Candidatus Nitrosocaldus.  However, as can be seen by the pie chart for this sample (also attached), this taxon does not occur in this sample.  From the pie chart it is clear that the taxon that dominates this sample is k__Archaea;p__Crenarchaeota;c__MCG;o__pGrfC26;f__;g__, and this taxon is one of the darkest in this sample in the heat map, but not THE darkest.  The lightest colored otu in this sample corresponds to k__Archaea;p__Crenarchaeota;c__MCG;o__B10;f__;g__, which is correct because there are hardly any hits for that otu in sample WS0a. 

I should mention that the otu table I'm using has been filtered to a minimum count fraction of 1% for 11 different samples--maybe that has something to do with it?

While I'm at I'll bring up an only slightly related issue...the pdf file names generated for taxonomy charts are quite difficult to decipher.  Is there a way to name them according to the sample name?  Because of this it takes forever to find the right legend for each chart, I have to open the html file at the same time and then open each pdf legend file sequentially until I figure out which one is correct.

Thanks for the help and sorry if these have already been answered.
-Luke
HeatArc_NoEnrich_Greater1.pdf
jCZ8X0ksNP3Kt8Dj5zGoZJiql0pX4r.pdf
OkKZh18fi8Sr0hUTbmg9hJSjAY1OAW_legend.pdf

jonsan

unread,
Apr 5, 2016, 12:33:32 PM4/5/16
to Qiime 1 Forum
Hi Luke, 

Great questions. 

I'm going to check with some of the devs on this, but I suspect that the discontinuous behavior is due to the log transformation of the data, where 0 counts are ending up as -inf. I agree that this is nonintuitive behavior, and it looks like in the documentation that zeroes should be getting a pseudocount to prevent this from happening, but I can't find it actually happening in the code. For now, you could try passing the --no_log_transform option and see if it helps. 

For the taxonomy charts, my suggestion is to open the .html file and follow the links above each image to save the desired file directly.


Cheers,
-jon

Luke McKay

unread,
Apr 5, 2016, 12:43:14 PM4/5/16
to Qiime 1 Forum
hi Jon,
Thanks for the quick response.  I tried passing --no_log_transform previously and you are right that it does fix the coloring issue.  However, I'm not really sure whether I want my data log transformed for this analysis or not.  Do you have any info on the methodological justification for doing the log transform?  ie, Why is this the default?  Is it simply used to spread the numbers further apart to give a larger relative color spread in the heatmap?  

jonsan

unread,
Apr 5, 2016, 12:55:45 PM4/5/16
to Qiime 1 Forum
Hi Luke,

Off the top of my head I don't have a specific reference for using log transformations for perceptual color mapping. However, most microbial abundances in these type of datasets tend to be log or log-normally distributed, so linear-mapped coloring can look very sparse, obscuring trends particularly towards the lower end of the scale. 

-j

Luke McKay

unread,
Apr 5, 2016, 1:07:10 PM4/5/16
to Qiime 1 Forum
Cool, that's what I suspected.  Attached is the new heatmap after passing --no_log_transform.  As you can see it doesn't have the previous color problem but it is difficult to distinguish low abundance colors from no abundance colors.  My two cents: it would be great and arguably a more accurate depiction of the data if otus with 0 hits in a sample were colored completely white by default and the color spectrum was only used to color otus with hits in a sample, whether high or low in abundance.  There are no coloring options in matplotlib where no abundance would equal white, but rather only options for no abundance to be a very light shade of a color.

Thanks again for looking into this.
HeatArc_NoEnrich_Greater1_NoLogTran.pdf

Colin Brislawn

unread,
Apr 5, 2016, 2:03:05 PM4/5/16
to Qiime 1 Forum
Hello Luke,

I'm glad you got --no_log_transform working well. 

it is difficult to distinguish low abundance colors from no abundance colors
That's because low abundance OTUs are really similar to no abundance OTUs. :-)

I guess I like the color scheme because, with a few thousand more or less reads, those bottom OTUs could be added or subtracted from the head map. Because of this, I like that this theme sweeps uncommon OTUs into the 'hardly there' category using visual similar colors. 

If you want this feature, you could try using a manual color scheme. I'm not totally sure what matplotlib supports... 


Keep in touch,
Colin

jonsan

unread,
Apr 5, 2016, 2:25:24 PM4/5/16
to Qiime 1 Forum
As an interim solution, I wrote a very very simple script to add pseudocounts to a biom file. You can find it here. This should enable the expected behavior of the log-transformed heatmap. 

-j

Luke McKay

unread,
Apr 5, 2016, 2:35:40 PM4/5/16
to Qiime 1 Forum
  "That's because low abundance OTUs are really similar to no abundance OTUs. :-)"

I disagree.  Presence is very different from absence, whether it's high or low presence.  A low abundance OTU can very possibly be a significant contributor to an ecosystem whereas a no abundance OTU is very likely not in that ecosystem.  This is especially true once you consider extraction/amplification/sequencing biases and varying 16S gene copy numbers for distinct microbes, and how each of these can skew interpretation of actual microbial numbers in a sample.  I also filtered my otu table at a minimum count fraction of 1%, so in my case the heatmap is forcing 1% abundance to be side by side (in color) with 0% abundance.  

Thanks for the input, though, I will try the manual color scheme.

Luke McKay

unread,
Apr 5, 2016, 2:36:09 PM4/5/16
to Qiime 1 Forum
Thanks, Jon, I will try this.

Luke McKay

unread,
Apr 5, 2016, 4:35:32 PM4/5/16
to Qiime 1 Forum
Jon, I tried your pseudocount script and it took care of the original problem (heatmap attached).  Thanks very much for that
heatmappseudo.pdf

jonsan

unread,
Apr 5, 2016, 4:57:18 PM4/5/16
to Qiime 1 Forum
Awesome, glad it's useful. You may want to adjust the pseudocount multiplier if you want to see a bigger distinction between things with a small count and things with no count (like 0.1 to get a whole log scale instead of half a log scale). And do keep in mind that I threw this together in ten minutes so it's not tested. :)

Cheers, 
-jon

Luke McKay

unread,
Apr 15, 2016, 1:41:43 PM4/15/16
to Qiime 1 Forum
Just wanted to add an update on the pseudocount...

It took me a while to realize this but when using a pseudocount the relative colors for each OTU in the heatmap are misleading.  The low pseudocount number is colored differently for different OTUs (see attached) and makes it look like certain "no hit OTUs" are more abundant in some samples than other "no hit OTUs" are, when in reality there are no hits for either OTU in the sample (see attached).  In other words, the pseudocount of 0.5 (which represents no hits) is not colored consistently across all samples as it would be if it were a 0.  Unfortunately if you don't have the pseudocount then we revert back to the original problem of no hits being colored darkest.  

However, if it works with your data, you can pass --absolute_abundance when making an otu heatmap and the problem is solved.  I think in my case I'll normalize my otu table (normalize_table.py) with CSS and then add a pseudo count and then pass --absolute_abundance (also attached).

Thanks again for the help, just adding this reply bc sharing is caring 
heatARC_NS_Each1pct_NoPhyloSort_Unweight.pdf
heatARC_NS_Each1pct_CSS_NoPhyloSort_Unweight_Abso.pdf

jonsan

unread,
Apr 15, 2016, 2:24:08 PM4/15/16
to Qiime 1 Forum
Thanks for the catch -- it looks like the pseudocount is materially affecting the total counts for each sample so that the renormalization messes it up. 
As you note, it should work to add the psdeudocount to an already-proportional OTU table and then pass it to the heatmap visualization as an absolute-abundance table.

Cheers,
-jon

Reply all
Reply to author
Forward
0 new messages