kruskal wallis results

328 views
Skip to first unread message

Stef

unread,
Apr 25, 2014, 9:45:38 AM4/25/14
to qiime...@googlegroups.com
Hi,
First time I am using Kruskal Wallis so I have a few very basic questions:
The numbers in the output under xxx_mean, is that the mean rank? How exactly is it defined and calculated? 
If I have two groups with 100 samples and one with only 13 does that have an impact?
I´d like to plot the results, but how can I get the variance or CI or whatever I should use?
If I have three groups and get e.g.:
control_mean = 18
treatment1_mean = 14
and
treatment2_mean = 22
How do I know which treatment is significantly different than the control? 
I apologize if these are weird questions. I am not used to working with non-parametric tests.
Thank you for you kind help.
Cheers,
Stef

Will Van Treuren

unread,
Apr 28, 2014, 3:50:42 AM4/28/14
to qiime...@googlegroups.com
Hi Stef, 

The numbers in the output under xxx_mean, is that the mean rank? How exactly is it defined and calculated? 

It is not the mean rank, but rather the mean abundance of the taxa in the particular sample grouping. As an example, assume we have Treatment_A = [S1, S2, S3], Treatment_B = [S4, S5], and our OTU table looks like:
        S1   S2   S3   S4   S5
O1  10    7     0      50    100

Treatment_A_mean = 17/3
Treatment_B_mean = 150/2

If I have two groups with 100 samples and one with only 13 does that have an impact?

It will impact the variance estimates, but there is no reason that you can't do it with 100 and 13 samples. 

How do I know which treatment is significantly different than the control? 

Kruskal Wallis is non-parametric ANOVA. Like ANOVA, a significant Kruskal Wallis p-value just tells you that at least one of the groups has a mean rank that is significantly different (accounting for ties and deviances) than another. You have to conduct post-hoc tests between the various groups to find which groups are significantly different. Those could take the form of t-tests or their non-parametric equivalent (Mann-Whitney-U etc.). The post-hoc tests are implemented in QIIME, but its not easy to do the pairwise comparisons. 

Hope this helps,
Will 


--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stefanie Prast-Nielsen

unread,
Apr 28, 2014, 5:54:05 AM4/28/14
to qiime...@googlegroups.com
Hi Will,
Thanks for your reply! Just to be really sure: the command calculates the mean which I get in my output file although it compares the mean rank when it calculates the significance. How much does the mean tell us in such skewed data? What would you recommend for plotting? Means with std error, std deviation of confidence intervals? How can I calculate these values?
So, if I want to compare only two groups, I use Mann-Whitney-U and if I have 3 or more, I use K-W and they basically work in the same way and use the same assumptions?
Thanks for your help!
Kind regards,
Stef



--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/1qXa2AHMytU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

Will Van Treuren

unread,
Apr 28, 2014, 10:56:15 AM4/28/14
to qiime...@googlegroups.com
Hi Stef,

> Just to be really sure: the command calculates the mean which I get in my output file although it compares the mean rank when it calculates the significance.

Correct, it compares the ranks but outputs the abundance means since those have traditionally been more useful than ranked means. 

 How much does the mean tell us in such skewed data? What would you recommend for plotting? Means with std error, std deviation of confidence intervals? How can I calculate these values?

Depends on how skewed your data are. You can plot a histogram of the values of a given OTU in a given sample  by converting your biom table to a traditional OTU table with the command `biom convert` (QIIME 1.8+) or convert_biom.py (QIIME 1.7 and before). Then isolate the row of the OTU table that represents the OTU of interest and plot in your favorite program. I would suggest histograms as they show you the most information about distribution shape in an easily digestible way. 

So, if I want to compare only two groups, I use Mann-Whitney-U and if I have 3 or more, I use K-W and they basically work in the same way and use the same assumptions?

Correct. MWU returns the same probability calculation as KW (you can use either for two sample comparisons, but MWU is standard). 

Best,
Will 
Reply all
Reply to author
Forward
0 new messages