Why is multivariate variance significant across group of samples that overlap completely in PCoA?

212 views
Skip to first unread message

Andrea Campisano

unread,
May 4, 2015, 5:40:18 PM5/4/15
to qiime...@googlegroups.com
Hello all.

I do not know if this is even the right forum, i get this issue using Qiime but it's more of a general issue related to the analyses.

I have a set of (about 70) samples i am analysing using 16S amplicons ad Qiime 1.8. Everything works fine, it seems that most variables i tried to assess do not samples to group. Instead samples for largely overlapping groups in the 3D space the PCoA is projected into. 

Also, when looking at the group_significance_parameter.txt file during the workflow script core_diversity_analyses.py most of my studied sampling variables fail to produce statistically significant differences in taxa frequencies.

But when i run any multivariate analysis tool (compare_categories.py) i get always very high levels of statistical significance (p=0.001) or in the case of treatments that appear to have no effect at all by all previous described analyses, the highest p value i see is 0.01.

In terms of discussing the data this is puzzling. I am used to analyse multivariate datasets from non sequencing sources and i have a hard time getting multivariate statistical significance from my variable groupings, even when a vague variable-related grouping can be seen in PCA ordination. Here I see nothing in PCoA graphs (even when changing visible axes) but all the analyses (permanova, mrpp, adonis, all of them) tell me that the groups according to all variables are different.

Should i believe to this analyses and discuss my data accordingly or should i believe what i see in PCoA and the other visualisation methods (such as averaged histograms) and group_significance_files.txt?

Thanks a lot for reading this and i would greatly appreciate all possible comments.

Andrea

Andrea Campisano

unread,
May 5, 2015, 10:09:29 AM5/5/15
to qiime...@googlegroups.com
Hello
I do not know if what i wrote has enough information for someone to answer or if maybe i posed a stupid question. In case please let me know, i am not touchy.

I also wonder about another issue that i seem not to easily understand (one of the many, but one has to start somewhere) ^__^

When i use the core_diversity_analyses.py workflow multiple times on the same biom file, changing only the -c parameter (that is, the variable to use in the mapping file), this results in different distance matrices (the unweighted_unifrac_dm.txt file) every time. Is this to be expected? I would guess that distance between samples would not change according to which variable was taken into account, or do they? The changes are not huge but they are there indeed.

Thank you again in advance
Andrea

Luke Thompson

unread,
May 5, 2015, 7:46:06 PM5/5/15
to qiime...@googlegroups.com
Hi Andrea,

I'm in charge of the QIIME forum this week and wanted to get back to you. I think I understand your question, but unfortunately I don't have a good answer for you. Let me consult with some other current/former lab members who are more versed in the statistics behind groups_significance.py and core_diversity_analyses.py. Then I'll get back to you.

Best,
Luke

Will Van Treuren

unread,
May 6, 2015, 1:25:34 PM5/6/15
to qiime...@googlegroups.com
Hi Andrea, 

Your question is a good one but hard to answer because of the number of factors involved. Let me rephrase your question to make sure I understand and am answering it correctly.

You are comparing your samples with a variety of techniques including PCoA, PCA, differential abundance testing (group_significance.py), and distance matrix comparisons like permanova (compare_categories.py). You are concerned because some of the techniques show that there is significant grouping but others show no significant grouping. 

As my primary answer I would say that the results you are getting are not unexpected; different techniques have different abilities to find patterns in the data. For instance, group_siqnificance.py is telling you that there aren't any single feature differences between sample groups. However, it's only testing a single feature at a time. You can imagine that there exist combinations of features that are different between the two samples which you are testing with the distance matrix based methods (PCoA, PCA, PERMANOVA, etc.). In these cases, the combined differences between several features might serve to differentiate groups even if a single feature does not. 

With that said, its important to not rely on just a single method and report those results. The fact that you are getting significant comparisons with only a single type of analysis suggests that whatever patterns are in your data may not be robust. compare_categories.py is known to produce pretty significant p-values even for very small effect sizes. You might take a look at this poster for some information on that question. 

One other thought: I would suggest running each analysis independently instead of using core_diversity_analysis.py. You want to be able to change the parameters of each so you can get some more information. In the case of group_significance.py, for instance, you might want to filter out more features before you run the test so your multiple hypothesis correction isn't so bad. 

Hope this helps,
Will  

Andrea Campisano

unread,
May 7, 2015, 4:15:50 AM5/7/15
to qiime...@googlegroups.com
Dear Will,
thank you very much for your answer. I realise it is a tricky question and you do not have many details. In fact i had also decided to brak down the workflow scripts to try and see if there is some parameter tweak that is causing an issue.

In practice, one of the "metadata" variables which was there mostly as an internal control turned out to be quite significant (around P=0.005) when multivariate tests such as PERMANOVA, ANOSIM, ADONIS etc are perfomed.

I now realise that i have not checked for multivariate normality of data (which i think cannot be done in qiime, right?) and the fact that data may not ha multivariate normal distribution may cause the unexpectedly low p-values. Would you think this is possible? Is there an implementation of multivariate normality tests in qiiime?

If this is true, then it would also explain why the group_significance.py tests (which i understand are nonparametric) all give negative results

Best
Andrea

Andrea Campisano

unread,
May 7, 2015, 5:03:54 AM5/7/15
to qiime...@googlegroups.com
It occurs to me now that PERMANOVA does not require normality but homoscedasticity ... but the previous post also applies for testing that... is it possible without knowledge of R?

:D
Reply all
Reply to author
Forward
0 new messages