Hi All,
So I believe this was discussed before, but even after reading the explanations I was not able to make much sense of what to do in my scenario.
I'm using LEfSe to analyze 4 different classes (which equate to 4 different time points, 2, 21, 42, and 56 days old).
I am not using subclasses. And I set the strategy for a one against all method. I'll often set the LDA threshold very high (e.g., >4 to get the most significant differences). At any rate when I actually get my output I'll get often times LDA scores which are significant for bacterial groups at 2 days, and at 21 days, 42 days, and 56 days etc...
So my question is two fold, is there a way to to understand why it has generated an LDA score for, for instance day 2... For instance, is this particular group supposed to be more significantly abundant than days 21, 42, and 56? Or is it just more abundant statistically than just one other timepoint? For instance, when I look at the median $ of sequences for 2, 21, 42, and 56 days (Which LEfSe says that it says that Bacteroidia has a highly significant LDA score at day 56)... their respective sequences are 0.57,
37.14, 37.04, 39.39....
It would seem more likely that by some calculation maybe just day 56 had the highest percent of sequences? But doesn't necessarily appear as though it would be significantly higher than days 42 and days 21... I imagine that day 2 is probably what drove the difference there, but why was day 56 the only one to show a high LDA score? How would you write this out as a result? Bacteroidea were differentially abundant at day 56 compared to the rest of the timepoints? (and does that simply mean that it was simply the most abundant of them all)?
The latter part of my question, is perhaps by not using subclasses I guess I'm essentially only performing a kruskal wallis test, so does the LDA score really add any value to this?
Sorry if this is confusing.
Let me know if I can help clarify anything.
Kind regards,
Metasaur