Is it okay to apply statistical methods (e.g., kruskal wallis and others) with humann2 output (RPK value)?

68 views
Skip to first unread message

Minjae Kim

unread,
Dec 20, 2019, 9:18:33 AM12/20/19
to HUMAnN Users
Hello,

Since the metagenomic data is compositional data, I just wonder if it is okay to apply LefSE and other statistical comparisons with RPK value from humann2.
or is it better if we can use raw counts and normalize with metagenomeseq, DESeq2, or EdgeR? (is it possible to have raw counts?)
What is the best recommendation with humann2 output for the comparisons?

Thanks

Eric Franzosa

unread,
Dec 20, 2019, 11:49:41 AM12/20/19
to humann...@googlegroups.com
It's not possible to get raw counts, so I'd avoid methods that strictly require those.

RPKs adjust for gene length but not sequencing depth, so you'll want to manage the latter in SOME way. One option is to normalize the RPKs to relative abundance or CPM units (e.g. via the renorm_table script).

Thanks,
Eric



--
You received this message because you are subscribed to the Google Groups "HUMAnN Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to humann-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/humann-users/8f75868a-4bd7-477b-a4e7-71d9bdb28522%40googlegroups.com.

Minjae Kim

unread,
Dec 20, 2019, 11:54:56 AM12/20/19
to HUMAnN Users
Thanks for the quick reply.
But my main question was that is it okay to do nonparametric method such as kruskal wallis test with the humann2 output values without any normalization?

On Friday, December 20, 2019 at 10:49:41 AM UTC-6, Eric Franzosa wrote:
It's not possible to get raw counts, so I'd avoid methods that strictly require those.

RPKs adjust for gene length but not sequencing depth, so you'll want to manage the latter in SOME way. One option is to normalize the RPKs to relative abundance or CPM units (e.g. via the renorm_table script).

Thanks,
Eric



On Fri, Dec 20, 2019 at 9:18 AM Minjae Kim <minja...@gmail.com> wrote:
Hello,

Since the metagenomic data is compositional data, I just wonder if it is okay to apply LefSE and other statistical comparisons with RPK value from humann2.
or is it better if we can use raw counts and normalize with metagenomeseq, DESeq2, or EdgeR? (is it possible to have raw counts?)
What is the best recommendation with humann2 output for the comparisons?

Thanks

--
You received this message because you are subscribed to the Google Groups "HUMAnN Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to humann...@googlegroups.com.

Eric Franzosa

unread,
Dec 20, 2019, 12:01:07 PM12/20/19
to humann...@googlegroups.com
Sorry for missing that detail. Yes, it's certainly OK to use non-parametric tests on HUMAnN2 outputs, but with the general caveat that dropping distributional assumptions tends to reduce statistical power. Also be aware that some implementations of non-parametric tests behave poorly in the presence of lots of tied values (e.g. zeroes).

Thanks,
Eric



To unsubscribe from this group and stop receiving emails from it, send an email to humann-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/humann-users/2d894695-aeb1-4ad9-bc7e-f201aca66d9a%40googlegroups.com.

Minjae Kim

unread,
Dec 20, 2019, 12:03:42 PM12/20/19
to HUMAnN Users
Thanks for the reply
I will try to do it
Reply all
Reply to author
Forward
0 new messages