adonis vs PERMANOVA

9,523 views
Skip to first unread message

Sasha M

unread,
Nov 27, 2012, 5:09:35 AM11/27/12
to qiime...@googlegroups.com
Hi all,
Thanks for providing such a comprehensive resource.
In reading about various methods of evaluating community clustering in the documentation for vegan, I got the impression that Adonis and PERMANOVA were different names for the same analysis. But in compare_categories.py, Adonis and PERMANOVA are given as two different options. Could someone explain what the difference is?
Thank you!
Sasha

Jai Ram Rideout

unread,
Nov 27, 2012, 10:16:20 AM11/27/12
to qiime...@googlegroups.com
Hi Sasha,

You're right: they are essentially the same analysis, though Adonis is a more robust version of PERMANOVA because it can handle numeric variables (i.e. mapping file categories/columns) in addition to categorical variables. I've also found that Adonis results are easier to interpret because an R^2 value is given as part of the output, whereas with PERMANOVA you only get a pseudo-F statistic. Adonis can also handle multiple variables, though we currently do not support this in QIIME.

We added both methods to QIIME because we are in the process of evaluating them (and several others) to see which ones perform the best, and we wanted to have easy access to these methods using QIIME-formatted files.

Hope this helps,
Jai


Sasha

--
 
 
 

Martin Kostovcik

unread,
Sep 24, 2013, 10:13:28 AM9/24/13
to qiime...@googlegroups.com
Hi Jai,
I'm concerned about usefulness of statistical methods available in compare_categories. First of all ANOSIM is said to be prone to low specificity in intro to compare categories tutoriel but I could'n find any relevant citation for it so if somenone could point me to any of these it'll be much apprecieted. That's in connection with what exactly I'm getting - significant p-values for all categories which doesn't seem to effect clustering at all based on ordination plot. Second thing is that results form ADONIS and PERMANOVA seems to be difficult to interpret indeed cause for example proportion of variance explained by category which is apparently responsible for clustering and got highest R value in anosim is lower than with other categories which are hardly influencing clustering but still get higher values for R2 in adonis. And PERMANOVA is as you saying only reporting pseudo hard to interpret F-value but at least it's highest for the right category. Any hints on these or is your paper out already.
Thnaks
Martin

Kyle Bittinger

unread,
Sep 24, 2013, 12:15:36 PM9/24/13
to qiime...@googlegroups.com
Martin,

If you have the chance to read MJ Anderson's paper on PERMANOVA ["A new method..." Austral Ecology 26, 32 (2001)], I cannot recommend it highly enough.  It is very well written and contains a couple of nice examples.  She does not use the name PERMANOVA, but instead calls her method "non-parametric MANOVA."

Anderson discusses the ANOSIM approach in her paper, because that was the established method prior to PERMANOVA.  On page 37, she points out that ANOSIM is sensitive to differences in group dispersion.  PERMANOVA is designed to be less sensitive to differences in dispersion and more sensitive to differences in "location."  A difference in location is to be understood using the analogy introduced in the paper, but it roughly corresponds to a difference in group centroid, looking at the PCoA plot.

Another difference between ANOSIM and PERMANOVA is that the ANOSIM statistic is based on ranks.  Depending on your perspective, this may or may not be a benefit to the method.  In any case it could account for some of the apparent discrepancy between the test results and the PCoA plot.  The R statistic of ANOSIM and the R-squared value returned by the adonis function are completely unrelated.

Jai and QIIME devs:

To my understanding, the permanova and adonis methods should return identical results for a one-way test among categories (i.e. the pseudo-F statistic should be the same).  Can someone confirm this?

Best,
Kyle



--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jai Ram Rideout

unread,
Sep 24, 2013, 1:26:13 PM9/24/13
to qiime...@googlegroups.com
Hi Kyle and Martin,

As far as I know, the PERMANOVA and adonis pseudo-F statistics should be identical.

Martin, if you're getting significant p-values for categories that do not seem to affect the clustering, it could be due to differences in dispersion, as Kyle mentioned. You can check this by running PERMDISP, which is also available in compare_categories.py, to see whether the dispersion of your groups of samples is significantly different.

More commonly, what we've seen is that having a large number of samples tends to yield highly significant p-values in these tests (including ANOSIM, PERMANOVA, adonis, MRPP, etc.), even if the effect size is very small. For example, you might run adonis and get an R-squared value of 0.01 (very low), but a p-value that is significant. Thus, it is crucial to interpret both the effect size (R-squared, in this example) as well as the p-value, because the p-value tends to drop to zero as you increase the number of samples in your study; extremely small/weak effects may become significant as the sample size increases. There are a number of discussions about these issues with p-values in papers and online- if you're interested in reading more about this, take a look at http://galitshmueli.com/system/files/Largesample-12-6-2012.pdf, which also has some references to other papers on this subject.

Hope this helps,
Jai

Kyle Bittinger

unread,
Sep 24, 2013, 1:31:35 PM9/24/13
to qiime...@googlegroups.com
Thanks for the informative answer, Jai!

Martin Kostovcik

unread,
Sep 24, 2013, 5:55:27 PM9/24/13
to qiime...@googlegroups.com
Thanks you both much. Especially for useful literature - I'll take a look on it. Yes I was aware that dispersion effect might overwhelm any real underlying category effect on clustering...but still the biggest mystery is that I would expect the R squared form adonis to be highest for category that is apparently responsible for clustering as it should explain the proportion of variance explained by given category but it's not so. And also my F-statistics from the adonis are not the same as form PERMANOVA. I'll try to assess the dispersion but it's clear there would be significant differences within tested category so that could definitely inflate p-values of ANOSIM but still those p-values are the least  extravagant compare to those from ADONIS and PERMANOVA (in ANOSIM are at least approaching nonsignificance for categories irrelevant to clustering).
Thanks you guys for some insight into it.

Jai Ram Rideout

unread,
Sep 26, 2013, 10:26:25 AM9/26/13
to qiime...@googlegroups.com
Hi Martin,

Without having more details about your particular dataset, I'm not really able to provide much more help here. Note that adonis is also affected by groups with different dispersion (though apparently less so than with ANOSIM), so that may be something to keep in mind. Also note what Kyle mentioned before: ANOSIM uses ranked distances, while adonis/PERMANOVA do not, so that may be affecting the differences you're seeing when comparing the methods.

That's strange that the F statistics don't match between adonis and PERMANOVA. If you want, send me your data files and I'll take a quick look. Send them to jai.r...@gmail.com

I also recommend posting your questions on the R vegan forum/mailing list, as there may be people there who can help you.

-Jai

Martin Kostovcik

unread,
Sep 26, 2013, 10:49:49 AM9/26/13
to qiime...@googlegroups.com
Thanks Jai,
I will bring it to R forum. I sent you a files- Bray_Curtis matrix and mapping file to your email. These differences are just of curiosity for me as you guys have claimed they should be the same but mostly I'm interested in outcomes and try to find out why the results from ADONIS are in contradiction with ANOSIM.
Thanks for comments.
Martin

Martin Kostovcik

unread,
Sep 26, 2013, 10:50:12 AM9/26/13
to qiime...@googlegroups.com
Hi Jai,
here are promised files.
Martin
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
bray_curtis_dm.txt
Fungal1_Map_bispinatus=ferrugineus.txt

Jai Ram Rideout

unread,
Sep 30, 2013, 10:09:01 AM9/30/13
to qiime...@googlegroups.com
Hi Martin,

Sorry for the delay in getting to this- what category in the mapping file are you testing?

-Jai

Martin Kostovcik

unread,
Sep 30, 2013, 10:19:22 AM9/30/13
to qiime...@googlegroups.com
Species and Location - location shouldn't have an effect but it gives higher values of R2 but that's caused most probably by dispersion effects that we have discussed before - they are present definitely. Weird thing is that statistics numbers form ADONIS and PERMANOVA are not the same as I mentioned. I'm gonna discuss it with statisticians. Thanks for support.
Martin

Jai Ram Rideout

unread,
Sep 30, 2013, 12:13:51 PM9/30/13
to qiime...@googlegroups.com
Thanks Martin- gave your data a quick look but didn't find anything obvious that explains the differences between the adonis and PERMANOVA F statistics. I'll find some time this week to do more digging and will get back to you as soon as possible.

-Jai

Martin Kostovcik

unread,
Sep 30, 2013, 1:10:01 PM9/30/13
to qiime...@googlegroups.com
Thanks you a bunch for that.
Martin

Jai Ram Rideout

unread,
Oct 1, 2013, 2:18:09 PM10/1/13
to qiime...@googlegroups.com
Hi Martin,

Okay, no way to write a short answer for this one, so here goes:

The reason you're seeing different pseudo-F statistics between PERMANOVA and adonis is due to differences in how your metadata mapping file is being parsed by QIIME. Several of your samples (9, I think) have IDs with spaces at the end of their names. Some of the category states in the Locality category also have trailing spaces (for example, 'Colt_Creek    ', while other are 'Colt_Creek').

QIIME has two different routines to parse metadata mapping files. The Python parsing routine (the one most commonly used throughout QIIME) strips away any leading or trailing whitespace from a cell's value. This is the routine being used by compare_categories.py's PERMANOVA and ANOSIM.

compare_categories.py's adonis, in contrast, runs adonis within R, using R's vegan package. The R code to parse mapping files was not stripping away the leading/trailing whitespace from your sample IDs and metadata, so adonis was seeing 5 distinct groups of samples, when really there should have only been two (Colt_Creek and Lake_Wales). PERMANOVA was correctly grouping the samples, but adonis was not. After removing the trailing whitespace from your sample IDs and the Locality category, adonis and PERMANOVA both report the exact same pseudo-F statistic (4.0961).

I've submitted a bug fix to resolve the differences in the two parsing functions (it is currently under review). I apologize for the inconvenience this has caused you, and thank you for bringing this to our attention!

For more details about the issue and bug fix:


Once this bug fix is merged into QIIME, you can update to the latest developer's version (1.7.0-dev) to obtain the fix. You'll then be able to use your mapping file as-is and obtain correct results.

If this isn't an option for you, you should correct your mapping file by removing any leading or trailing whitespace from your sample ID column, Locality, and any others (I didn't check the others).

In case it helps, here's the output that I received from running adonis with the bug fix in place:

Call:
adonis(formula = as.dist(qiime.data$distmat) ~ qiime.data$map[[opts$category]],      permutations = opts$num_permutations) 

Terms added sequentially (first to last)

                                Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
qiime.data$map[[opts$category]]  1    1.0092 1.00925  4.0961 0.06292  0.004 **
Residuals                       61   15.0299 0.24639         0.93708          
Total                           62   16.0392                 1.00000          
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 


The degrees of freedom column (Df) shows the number of sample groups that adonis found, minus 1. Thus, since you only have two groups (Colt_Creek and Lake_Wales), 2 - 1 = 1 Df for the category of interest. You can also check that the correct total number of samples are being recognized by the model by looking at the Total Df cell. This should be (n - 1), where n is the number of samples in your dataset. The distance matrix you provided has 63 samples, so this output looks correct (63 - 1 = 62). Before the fix, adonis was report Df = 4 and Total Df = 53, which is wrong. You can use this information to validate your results in the future.

Note that the R2 value is now pretty low (6.2% variability), which also agrees with ANOSIM (R = 0.0588), so I think this also answers your question about the large discrepancy between the two methods.

I hope this helps- please let me know if you have any additional questions or run into any other issues along the way!

-Jai

Martin Kostovcik

unread,
Oct 1, 2013, 3:26:37 PM10/1/13
to qiime...@googlegroups.com
Thank you Jai for resolving this issue and hope it will draw attention of the other users.
Martin

On 10/1/2013 2:18 PM, Jai Ram Rideout wrote:
Hi Martin,

Okay, no way to write a short answer for this one, so here goes:

The reason you're seeing different pseudo-F statistics between PERMANOVA and adonis is due to differences in how your metadata mapping file is being parsed by QIIME. Several of your samples (9, I think) have IDs with spaces at the end of their names. Some of the category states in the Locality category also have trailing spaces (for example, 'Colt_Creek � �', while other are 'Colt_Creek').

QIIME has two different routines to parse metadata mapping files. The Python parsing routine (the one most commonly used throughout QIIME) strips away any leading or trailing whitespace from a cell's value. This is the routine being used by compare_categories.py's PERMANOVA and ANOSIM.

compare_categories.py's adonis, in contrast, runs adonis within R, using R's vegan package. The R code to parse mapping files was not stripping away the leading/trailing whitespace from your sample IDs and metadata, so adonis was seeing 5 distinct groups of samples, when really there should have only been two (Colt_Creek and Lake_Wales). PERMANOVA was correctly grouping the samples, but adonis was not. After removing the trailing whitespace from your sample IDs and the Locality category, adonis and PERMANOVA both report the exact same pseudo-F statistic (4.0961).

I've submitted a bug fix to resolve the differences in the two parsing functions (it is currently under review). I apologize for the inconvenience this has caused you, and thank you for bringing this to our attention!

For more details about the issue and bug fix:


Once this bug fix is merged into QIIME, you can update to the latest developer's version (1.7.0-dev) to obtain the fix. You'll then be able to use your mapping file as-is and obtain correct results.

If this isn't an option for you, you should correct your mapping file by removing any leading or trailing whitespace from your sample ID column, Locality, and any others (I didn't check the others).

In case it helps, here's the output that I received from running adonis with the bug fix in place:

Call:
adonis(formula = as.dist(qiime.data$distmat) ~ qiime.data$map[[opts$category]], � � �permutations = opts$num_permutations)�

Terms added sequentially (first to last)

� � � � � � � � � � � � � � � � Df SumsOfSqs MeanSqs F.Model � � �R2 Pr(>F) ��
qiime.data$map[[opts$category]] �1 � �1.0092 1.00925 �4.0961 0.06292 �0.004 **
Residuals � � � � � � � � � � � 61 � 15.0299 0.24639 � � � � 0.93708 � � � � �
Total � � � � � � � � � � � � � 62 � 16.0392 � � � � � � � � 1.00000 � � � � �
---
Signif. codes: �0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1�


The degrees of freedom column (Df) shows the number of sample groups that adonis found, minus 1. Thus, since you only have two groups (Colt_Creek and Lake_Wales), 2 - 1 = 1 Df for the category of interest. You can also check that the correct total number of samples are being recognized by the model by looking at the Total Df cell. This should be (n - 1), where n is the number of samples in your dataset. The distance matrix you provided has 63 samples, so this output looks correct (63 - 1 = 62). Before the fix, adonis was report Df = 4 and Total Df = 53, which is wrong. You can use this information to validate your results in the future.

Note that the R2 value is now pretty low (6.2% variability), which also agrees with ANOSIM (R =�0.0588), so I think this also answers your question about the large discrepancy between the two methods.

-Jai


Thanks you both much. Especially for useful literature - I'll take a look on it. Yes I was aware that dispersion effect might overwhelm any real underlying category effect on clustering...but still the biggest mystery is that I would expect the R squared form adonis to be highest for category that is apparently responsible for clustering as it should explain the proportion of variance explained by given category but it's not so. And also my F-statistics from the adonis are not the same as form PERMANOVA. I'll try to assess the dispersion but it's clear there would be significant differences within tested category so that could definitely inflate p-values of ANOSIM but still those p-values are the least� extravagant compare to those from ADONIS and PERMANOVA (in ANOSIM are at least approaching nonsignificance for categories irrelevant to clustering).
Thanks you guys for some insight into it.


On Tuesday, September 24, 2013 1:31:35 PM UTC-4, Kyle Bittinger wrote:
Thanks for the informative answer, Jai!
On Tue, Sep 24, 2013 at 1:26 PM, Jai Ram Rideout <jai.r...@gmail.com> wrote:
Hi Kyle and Martin,

As far as I know, the PERMANOVA and adonis pseudo-F statistics should be identical.

Martin, if you're getting significant p-values for categories that do not seem to affect the clustering, it could be due to differences in dispersion, as Kyle mentioned. You can check this by running PERMDISP, which is also available in compare_categories.py, to see whether the dispersion of your groups of samples is significantly different.

More commonly, what we've seen is that having a large number of samples tends to yield highly significant p-values in these tests (including ANOSIM, PERMANOVA, adonis, MRPP, etc.), even if the effect size is very small. For example, you might run adonis and get an R-squared value of 0.01 (very low), but a p-value that is significant. Thus, it is crucial to interpret both the effect size (R-squared, in this example) as well as the p-value, because the p-value tends to drop to zero as you increase the number of samples in your study; extremely small/weak effects may become significant as the sample size increases. There are a number of discussions about these issues with p-values in papers and online- if you're interested in reading more about this, take a look at�http://galitshmueli.com/system/files/Largesample-12-6-2012.pdf, which also has some references to other papers on this subject.

Hope this helps,
Jai


On Tue, Sep 24, 2013 at 12:15 PM, Kyle Bittinger <kylebi...@gmail.com> wrote:
Martin,

If you have the chance to read MJ Anderson's paper on PERMANOVA ["A new method..." Austral Ecology 26, 32 (2001)], I cannot recommend it highly enough. �It is very well written and contains a couple of nice examples. �She does not use the name PERMANOVA, but instead calls her method "non-parametric MANOVA."

Anderson discusses the ANOSIM approach in her paper, because that was the established method prior to PERMANOVA. �On page 37, she points out that ANOSIM is sensitive to differences in group dispersion. �PERMANOVA is designed to be less sensitive to differences in dispersion and more sensitive to differences in "location." �A difference in location is to be understood using the analogy introduced in the paper, but it roughly corresponds to a difference in group centroid, looking at the PCoA plot.

Another difference between ANOSIM and PERMANOVA is that the ANOSIM statistic is based on ranks. �Depending on your perspective, this may or may not be a benefit to the method. �In any case it could account for some of the apparent discrepancy between the test results and the PCoA plot. �The R statistic of ANOSIM and the R-squared value returned by the adonis function are completely unrelated.

Jai and QIIME devs:

To my understanding, the permanova and adonis methods should return identical results for a one-way test among categories (i.e. the pseudo-F statistic should be the same). �Can someone confirm this?

Best,
Kyle



On Tue, Sep 24, 2013 at 10:13 AM, Martin Kostovcik <kost...@gmail.com> wrote:
Hi Jai,
I'm concerned about usefulness of statistical methods available in compare_categories. First of all ANOSIM is said to be prone to low specificity in intro to compare categories tutoriel but I could'n find any relevant citation for it so if somenone could point me to any of these it'll be much apprecieted. That's in connection with what exactly I'm getting - significant p-values for all categories which doesn't seem to effect clustering at all based on ordination plot. Second thing is that results form ADONIS and PERMANOVA seems to be difficult to interpret indeed cause for example proportion of variance explained by category which is apparently responsible for clustering and got highest R value in anosim is lower than with other categories which are hardly influencing clustering but still get higher values for R2 in adonis. And PERMANOVA is as you saying only reporting pseudo hard to interpret F-value but at least it's highest for the right category. Any hints on these or is your paper out already.
Thnaks
Martin


On Tuesday, November 27, 2012 10:16:22 AM UTC-5, Jai Ram Rideout wrote:
Hi Sasha,

You're right: they are essentially the same analysis, though Adonis is a more robust version of PERMANOVA because it can handle numeric variables (i.e. mapping file categories/columns) in addition to categorical variables. I've also found that Adonis results are easier to interpret because an R^2 value is given as part of the output, whereas with PERMANOVA you only get a pseudo-F statistic. Adonis can also handle multiple variables, though we currently do not support this in QIIME.

We added both methods to QIIME because we are in the process of evaluating them (and several others) to see which ones perform the best, and we wanted to have easy access to these methods using QIIME-formatted files.

Hope this helps,
Jai
On Tue, Nov 27, 2012 at 3:09 AM, Sasha M <amush...@gmail.com> wrote:
Hi all,
Thanks for providing such a comprehensive resource.
In reading about various methods of evaluating community clustering in the documentation for vegan, I got the impression that Adonis and PERMANOVA were different names for the same analysis. But in compare_categories.py, Adonis and PERMANOVA are given as two different options. Could someone explain what the difference is?
Thank you!
Sasha
--
�
�
�

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

Jai Ram Rideout

unread,
Oct 30, 2013, 6:05:58 PM10/30/13
to qiime...@googlegroups.com
Martin, and anyone else who's following this thread:

Sorry for the delay in announcing this bug. Here's a blog post with more details and a workaround:


-Jai


On Tue, Oct 1, 2013 at 3:26 PM, Martin Kostovcik <kost...@gmail.com> wrote:
Thank you Jai for resolving this issue and hope it will draw attention of the other users.
Martin

On 10/1/2013 2:18 PM, Jai Ram Rideout wrote:
Hi Martin,

Okay, no way to write a short answer for this one, so here goes:

The reason you're seeing different pseudo-F statistics between PERMANOVA and adonis is due to differences in how your metadata mapping file is being parsed by QIIME. Several of your samples (9, I think) have IDs with spaces at the end of their names. Some of the category states in the Locality category also have trailing spaces (for example, 'Colt_Creek    ', while other are 'Colt_Creek').

QIIME has two different routines to parse metadata mapping files. The Python parsing routine (the one most commonly used throughout QIIME) strips away any leading or trailing whitespace from a cell's value. This is the routine being used by compare_categories.py's PERMANOVA and ANOSIM.

compare_categories.py's adonis, in contrast, runs adonis within R, using R's vegan package. The R code to parse mapping files was not stripping away the leading/trailing whitespace from your sample IDs and metadata, so adonis was seeing 5 distinct groups of samples, when really there should have only been two (Colt_Creek and Lake_Wales). PERMANOVA was correctly grouping the samples, but adonis was not. After removing the trailing whitespace from your sample IDs and the Locality category, adonis and PERMANOVA both report the exact same pseudo-F statistic (4.0961).

I've submitted a bug fix to resolve the differences in the two parsing functions (it is currently under review). I apologize for the inconvenience this has caused you, and thank you for bringing this to our attention!

For more details about the issue and bug fix:


Once this bug fix is merged into QIIME, you can update to the latest developer's version (1.7.0-dev) to obtain the fix. You'll then be able to use your mapping file as-is and obtain correct results.

If this isn't an option for you, you should correct your mapping file by removing any leading or trailing whitespace from your sample ID column, Locality, and any others (I didn't check the others).

In case it helps, here's the output that I received from running adonis with the bug fix in place:

Call:
adonis(formula = as.dist(qiime.data$distmat) ~ qiime.data$map[[opts$category]],      permutations = opts$num_permutations) 

Terms added sequentially (first to last)

                                Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
qiime.data$map[[opts$category]]  1    1.0092 1.00925  4.0961 0.06292  0.004 **
Residuals                       61   15.0299 0.24639         0.93708          
Total                           62   16.0392                 1.00000          
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
The degrees of freedom column (Df) shows the number of sample groups that adonis found, minus 1. Thus, since you only have two groups (Colt_Creek and Lake_Wales), 2 - 1 = 1 Df for the category of interest. You can also check that the correct total number of samples are being recognized by the model by looking at the Total Df cell. This should be (n - 1), where n is the number of samples in your dataset. The distance matrix you provided has 63 samples, so this output looks correct (63 - 1 = 62). Before the fix, adonis was report Df = 4 and Total Df = 53, which is wrong. You can use this information to validate your results in the future.

Note that the R2 value is now pretty low (6.2% variability), which also agrees with ANOSIM (R = 0.0588), so I think this also answers your question about the large discrepancy between the two methods.

-Jai


Thanks you both much. Especially for useful literature - I'll take a look on it. Yes I was aware that dispersion effect might overwhelm any real underlying category effect on clustering...but still the biggest mystery is that I would expect the R squared form adonis to be highest for category that is apparently responsible for clustering as it should explain the proportion of variance explained by given category but it's not so. And also my F-statistics from the adonis are not the same as form PERMANOVA. I'll try to assess the dispersion but it's clear there would be significant differences within tested category so that could definitely inflate p-values of ANOSIM but still those p-values are the least  extravagant compare to those from ADONIS and PERMANOVA (in ANOSIM are at least approaching nonsignificance for categories irrelevant to clustering).
Thanks you guys for some insight into it.


On Tuesday, September 24, 2013 1:31:35 PM UTC-4, Kyle Bittinger wrote:
Thanks for the informative answer, Jai!
On Tue, Sep 24, 2013 at 1:26 PM, Jai Ram Rideout <jai.r...@gmail.com> wrote:
Hi Kyle and Martin,

As far as I know, the PERMANOVA and adonis pseudo-F statistics should be identical.

Martin, if you're getting significant p-values for categories that do not seem to affect the clustering, it could be due to differences in dispersion, as Kyle mentioned. You can check this by running PERMDISP, which is also available in compare_categories.py, to see whether the dispersion of your groups of samples is significantly different.

More commonly, what we've seen is that having a large number of samples tends to yield highly significant p-values in these tests (including ANOSIM, PERMANOVA, adonis, MRPP, etc.), even if the effect size is very small. For example, you might run adonis and get an R-squared value of 0.01 (very low), but a p-value that is significant. Thus, it is crucial to interpret both the effect size (R-squared, in this example) as well as the p-value, because the p-value tends to drop to zero as you increase the number of samples in your study; extremely small/weak effects may become significant as the sample size increases. There are a number of discussions about these issues with p-values in papers and online- if you're interested in reading more about this, take a look at http://galitshmueli.com/system/files/Largesample-12-6-2012.pdf, which also has some references to other papers on this subject.

Hope this helps,
Jai


On Tue, Sep 24, 2013 at 12:15 PM, Kyle Bittinger <kylebi...@gmail.com> wrote:
Martin,

If you have the chance to read MJ Anderson's paper on PERMANOVA ["A new method..." Austral Ecology 26, 32 (2001)], I cannot recommend it highly enough.  It is very well written and contains a couple of nice examples.  She does not use the name PERMANOVA, but instead calls her method "non-parametric MANOVA."

Anderson discusses the ANOSIM approach in her paper, because that was the established method prior to PERMANOVA.  On page 37, she points out that ANOSIM is sensitive to differences in group dispersion.  PERMANOVA is designed to be less sensitive to differences in dispersion and more sensitive to differences in "location."  A difference in location is to be understood using the analogy introduced in the paper, but it roughly corresponds to a difference in group centroid, looking at the PCoA plot.

Another difference between ANOSIM and PERMANOVA is that the ANOSIM statistic is based on ranks.  Depending on your perspective, this may or may not be a benefit to the method.  In any case it could account for some of the apparent discrepancy between the test results and the PCoA plot.  The R statistic of ANOSIM and the R-squared value returned by the adonis function are completely unrelated.

Jai and QIIME devs:

To my understanding, the permanova and adonis methods should return identical results for a one-way test among categories (i.e. the pseudo-F statistic should be the same).  Can someone confirm this?

Best,
Kyle



On Tue, Sep 24, 2013 at 10:13 AM, Martin Kostovcik <kost...@gmail.com> wrote:
Hi Jai,
I'm concerned about usefulness of statistical methods available in compare_categories. First of all ANOSIM is said to be prone to low specificity in intro to compare categories tutoriel but I could'n find any relevant citation for it so if somenone could point me to any of these it'll be much apprecieted. That's in connection with what exactly I'm getting - significant p-values for all categories which doesn't seem to effect clustering at all based on ordination plot. Second thing is that results form ADONIS and PERMANOVA seems to be difficult to interpret indeed cause for example proportion of variance explained by category which is apparently responsible for clustering and got highest R value in anosim is lower than with other categories which are hardly influencing clustering but still get higher values for R2 in adonis. And PERMANOVA is as you saying only reporting pseudo hard to interpret F-value but at least it's highest for the right category. Any hints on these or is your paper out already.
Thnaks
Martin


On Tuesday, November 27, 2012 10:16:22 AM UTC-5, Jai Ram Rideout wrote:
Hi Sasha,

You're right: they are essentially the same analysis, though Adonis is a more robust version of PERMANOVA because it can handle numeric variables (i.e. mapping file categories/columns) in addition to categorical variables. I've also found that Adonis results are easier to interpret because an R^2 value is given as part of the output, whereas with PERMANOVA you only get a pseudo-F statistic. Adonis can also handle multiple variables, though we currently do not support this in QIIME.

We added both methods to QIIME because we are in the process of evaluating them (and several others) to see which ones perform the best, and we wanted to have easy access to these methods using QIIME-formatted files.

Hope this helps,
Jai


On Tue, Nov 27, 2012 at 3:09 AM, Sasha M <amush...@gmail.com> wrote:
Hi all,
Thanks for providing such a comprehensive resource.
In reading about various methods of evaluating community clustering in the documentation for vegan, I got the impression that Adonis and PERMANOVA were different names for the same analysis. But in compare_categories.py, Adonis and PERMANOVA are given as two different options. Could someone explain what the difference is?
Thank you!
Sasha
--
 
 
 

--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
 
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
 

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
 

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
 

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
 

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
 

Martin Kostovcik

unread,
Oct 30, 2013, 6:07:51 PM10/30/13
to qiime...@googlegroups.com
Thanks Jai for helping with that.
M

On 10/30/2013 6:05 PM, Jai Ram Rideout wrote:
Martin, and anyone else who's following this thread:

Sorry for the delay in announcing this bug. Here's a blog post with more details and a workaround:


-Jai
On Tue, Oct 1, 2013 at 3:26 PM, Martin Kostovcik <kost...@gmail.com> wrote:
Thank you Jai for resolving this issue and hope it will draw attention of the other users.
Martin

On 10/1/2013 2:18 PM, Jai Ram Rideout wrote:
Hi Martin,

Okay, no way to write a short answer for this one, so here goes:

The reason you're seeing different pseudo-F statistics between PERMANOVA and adonis is due to differences in how your metadata mapping file is being parsed by QIIME. Several of your samples (9, I think) have IDs with spaces at the end of their names. Some of the category states in the Locality category also have trailing spaces (for example, 'Colt_Creek � �', while other are 'Colt_Creek').

QIIME has two different routines to parse metadata mapping files. The Python parsing routine (the one most commonly used throughout QIIME) strips away any leading or trailing whitespace from a cell's value. This is the routine being used by compare_categories.py's PERMANOVA and ANOSIM.

compare_categories.py's adonis, in contrast, runs adonis within R, using R's vegan package. The R code to parse mapping files was not stripping away the leading/trailing whitespace from your sample IDs and metadata, so adonis was seeing 5 distinct groups of samples, when really there should have only been two (Colt_Creek and Lake_Wales). PERMANOVA was correctly grouping the samples, but adonis was not. After removing the trailing whitespace from your sample IDs and the Locality category, adonis and PERMANOVA both report the exact same pseudo-F statistic (4.0961).

I've submitted a bug fix to resolve the differences in the two parsing functions (it is currently under review). I apologize for the inconvenience this has caused you, and thank you for bringing this to our attention!

For more details about the issue and bug fix:


Once this bug fix is merged into QIIME, you can update to the latest developer's version (1.7.0-dev) to obtain the fix. You'll then be able to use your mapping file as-is and obtain correct results.

If this isn't an option for you, you should correct your mapping file by removing any leading or trailing whitespace from your sample ID column, Locality, and any others (I didn't check the others).

In case it helps, here's the output that I received from running adonis with the bug fix in place:

Call:
adonis(formula = as.dist(qiime.data$distmat) ~ qiime.data$map[[opts$category]], � � �permutations = opts$num_permutations)�

Terms added sequentially (first to last)

� � � � � � � � � � � � � � � � Df SumsOfSqs MeanSqs F.Model � � �R2 Pr(>F) ��
qiime.data$map[[opts$category]] �1 � �1.0092 1.00925 �4.0961 0.06292 �0.004 **
Residuals � � � � � � � � � � � 61 � 15.0299 0.24639 � � � � 0.93708 � � � � �
Total � � � � � � � � � � � � � 62 � 16.0392 � � � � � � � � 1.00000 � � � � �
---
Signif. codes: �0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1�
The degrees of freedom column (Df) shows the number of sample groups that adonis found, minus 1. Thus, since you only have two groups (Colt_Creek and Lake_Wales), 2 - 1 = 1 Df for the category of interest. You can also check that the correct total number of samples are being recognized by the model by looking at the Total Df cell. This should be (n - 1), where n is the number of samples in your dataset. The distance matrix you provided has 63 samples, so this output looks correct (63 - 1 = 62). Before the fix, adonis was report Df = 4 and Total Df = 53, which is wrong. You can use this information to validate your results in the future.

Note that the R2 value is now pretty low (6.2% variability), which also agrees with ANOSIM (R =�0.0588), so I think this also answers your question about the large discrepancy between the two methods.

-Jai


Thanks you both much. Especially for useful literature - I'll take a look on it. Yes I was aware that dispersion effect might overwhelm any real underlying category effect on clustering...but still the biggest mystery is that I would expect the R squared form adonis to be highest for category that is apparently responsible for clustering as it should explain the proportion of variance explained by given category but it's not so. And also my F-statistics from the adonis are not the same as form PERMANOVA. I'll try to assess the dispersion but it's clear there would be significant differences within tested category so that could definitely inflate p-values of ANOSIM but still those p-values are the least� extravagant compare to those from ADONIS and PERMANOVA (in ANOSIM are at least approaching nonsignificance for categories irrelevant to clustering).
Thanks you guys for some insight into it.


On Tuesday, September 24, 2013 1:31:35 PM UTC-4, Kyle Bittinger wrote:
Thanks for the informative answer, Jai!
On Tue, Sep 24, 2013 at 1:26 PM, Jai Ram Rideout <jai.r...@gmail.com> wrote:
Hi Kyle and Martin,

As far as I know, the PERMANOVA and adonis pseudo-F statistics should be identical.

Martin, if you're getting significant p-values for categories that do not seem to affect the clustering, it could be due to differences in dispersion, as Kyle mentioned. You can check this by running PERMDISP, which is also available in compare_categories.py, to see whether the dispersion of your groups of samples is significantly different.

More commonly, what we've seen is that having a large number of samples tends to yield highly significant p-values in these tests (including ANOSIM, PERMANOVA, adonis, MRPP, etc.), even if the effect size is very small. For example, you might run adonis and get an R-squared value of 0.01 (very low), but a p-value that is significant. Thus, it is crucial to interpret both the effect size (R-squared, in this example) as well as the p-value, because the p-value tends to drop to zero as you increase the number of samples in your study; extremely small/weak effects may become significant as the sample size increases. There are a number of discussions about these issues with p-values in papers and online- if you're interested in reading more about this, take a look at�http://galitshmueli.com/system/files/Largesample-12-6-2012.pdf, which also has some references to other papers on this subject.

Hope this helps,
Jai


On Tue, Sep 24, 2013 at 12:15 PM, Kyle Bittinger <kylebi...@gmail.com> wrote:
Martin,

If you have the chance to read MJ Anderson's paper on PERMANOVA ["A new method..." Austral Ecology 26, 32 (2001)], I cannot recommend it highly enough. �It is very well written and contains a couple of nice examples. �She does not use the name PERMANOVA, but instead calls her method "non-parametric MANOVA."

Anderson discusses the ANOSIM approach in her paper, because that was the established method prior to PERMANOVA. �On page 37, she points out that ANOSIM is sensitive to differences in group dispersion. �PERMANOVA is designed to be less sensitive to differences in dispersion and more sensitive to differences in "location." �A difference in location is to be understood using the analogy introduced in the paper, but it roughly corresponds to a difference in group centroid, looking at the PCoA plot.

Another difference between ANOSIM and PERMANOVA is that the ANOSIM statistic is based on ranks. �Depending on your perspective, this may or may not be a benefit to the method. �In any case it could account for some of the apparent discrepancy between the test results and the PCoA plot. �The R statistic of ANOSIM and the R-squared value returned by the adonis function are completely unrelated.

Jai and QIIME devs:

To my understanding, the permanova and adonis methods should return identical results for a one-way test among categories (i.e. the pseudo-F statistic should be the same). �Can someone confirm this?

Best,
Kyle



On Tue, Sep 24, 2013 at 10:13 AM, Martin Kostovcik <kost...@gmail.com> wrote:
Hi Jai,
I'm concerned about usefulness of statistical methods available in compare_categories. First of all ANOSIM is said to be prone to low specificity in intro to compare categories tutoriel but I could'n find any relevant citation for it so if somenone could point me to any of these it'll be much apprecieted. That's in connection with what exactly I'm getting - significant p-values for all categories which doesn't seem to effect clustering at all based on ordination plot. Second thing is that results form ADONIS and PERMANOVA seems to be difficult to interpret indeed cause for example proportion of variance explained by category which is apparently responsible for clustering and got highest R value in anosim is lower than with other categories which are hardly influencing clustering but still get higher values for R2 in adonis. And PERMANOVA is as you saying only reporting pseudo hard to interpret F-value but at least it's highest for the right category. Any hints on these or is your paper out already.
Thnaks
Martin


On Tuesday, November 27, 2012 10:16:22 AM UTC-5, Jai Ram Rideout wrote:
Hi Sasha,

You're right: they are essentially the same analysis, though Adonis is a more robust version of PERMANOVA because it can handle numeric variables (i.e. mapping file categories/columns) in addition to categorical variables. I've also found that Adonis results are easier to interpret because an R^2 value is given as part of the output, whereas with PERMANOVA you only get a pseudo-F statistic. Adonis can also handle multiple variables, though we currently do not support this in QIIME.

We added both methods to QIIME because we are in the process of evaluating them (and several others) to see which ones perform the best, and we wanted to have easy access to these methods using QIIME-formatted files.

Hope this helps,
Jai
On Tue, Nov 27, 2012 at 3:09 AM, Sasha M <amush...@gmail.com> wrote:
Hi all,
Thanks for providing such a comprehensive resource.
In reading about various methods of evaluating community clustering in the documentation for vegan, I got the impression that Adonis and PERMANOVA were different names for the same analysis. But in compare_categories.py, Adonis and PERMANOVA are given as two different options. Could someone explain what the difference is?
Thank you!
Sasha
--
�
�
�

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/qyDJAdECBnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
�
Reply all
Reply to author
Forward
0 new messages