Small Sample Size

Ryan Trexler

unread,

Aug 21, 2013, 1:02:58 PM8/21/13

to lefse...@googlegroups.com

Hi Nicola and others,

I am using LEfSe to identify taxonomic biomarkers for 3 different Classes, with no subclass information. Unfortunately, I have some issues with sample size, and I have read to take results from Kruskal-Wallis tests with small sample sizes (<5) with caution. It would be great if you can help me with some questions that I have.

Analysis Details: Class I, n=2, Class II, n=10, Class III, n=12. The result of computing LEfSe (K-W alpha = 0.05, LDA threshold = 3.5) on my dataset identifies: 12 biomarkers for Class I, 10 biomarkers for Class II, and 7 biomarkers for Class III.

How sensitive is LEfSe to small sample sizes and dissimilar sample sizes? Would computing the K-W test at an lower alpha level (say, 0.001) produce more reliable results? Additionally, are there any bootstrapping methods I can apply that would help the situation?

Thanks so much for you help,

Ryan Trexler

Nicola Segata

unread,

Aug 22, 2013, 8:21:20 AM8/22/13

to lefse...@googlegroups.com

Hi Ryan,

thanks for getting in touch.

As long as at least two classes have a reasonable sample size (as in your case), the Kruskal-Wallis step is robust. For classes with only few samples, the pairwise comparison between subclasses (or classes if subclasses are not specified) usually performed with the Wilcoxon test are instead tested for the directionality of the difference between the medians only. Practically speaking, a little care should be devoted to those biomarkers of class I: although it is safe to say that those features are breaking the null hypothesis of being equally distributed among all classes (i.e. they pass the KW step), the fact that Class I is higher than both Class II and Class III cannot be said to be statistically significantly.

Lowering the alpha values is definitely a sound way to make the biomarker search more stringent. However, in LEfSe, the idea is that the significant biomarkers are ranked based on the effect size ("the magnitude of the variation") rather than on the statistical significance.

Let me know if you have comments or questions

thanks

Nicola

Ryan Trexler

unread,

Aug 22, 2013, 10:33:47 AM8/22/13

to lefse...@googlegroups.com

Great, thanks for the clarification. I will keep that in mind when I am analyzing the results!