Normalization method recommendation for LEfSe analysis

1,322 views
Skip to first unread message

gu71l...@gmail.com

unread,
Dec 8, 2021, 6:12:08 PM12/8/21
to picrust-users
Hi all,

I've run PICRUSt on my 16S metabarcoding data and I'd like to apply a Linear discriminant analysis Effect Size (LEfSe) to identify the EC functions that most likely drive the differences observed among my sample groups (microbiomeMarker R package). What normalization procedure would you recommend I apply upstream on my EC abundances table?

Thank you,

Guillaume



Gavin Douglas

unread,
Dec 9, 2021, 9:22:49 AM12/9/21
to 'cervant...@licifug.ugto.mx' via picrust-users
Hi there,

Prior to running LEfSe I would recommend that you convert your data to relative abundances.


Cheers,

Gavin


--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/picrust-users/85116a91-4910-456c-9f47-8fe2f2f65cd7n%40googlegroups.com.

yokmok

unread,
Dec 15, 2021, 8:01:58 AM12/15/21
to picrust-users
Hi all

Sorry to join your question in the middle.
I would like to check something in the downstream analysis.
I also got "path_abun_unstrat_descrip.tsv" with metacyc pathway added according to tutorial.
My question is, do I need to convert it to relative abundance for any analysis (STAMP, ANCOM, ALDEX2, LEfSe, etc.)?
For example, if the "N10-formyl-tetrahydrofolate biosynthesis" of sample A is 2000, is it correct to divide this number by the sum of all pathways of A?
Also, regarding the above downstream analysis, as mentioned in the previous questions, is it correct to say that there is no one method that is better or worse than the other?
I'm personally familiar with LEfSe, so I'm thinking of using that.

yokmok
2021年12月9日木曜日 23:22:49 UTC+9 Gavin Douglas:

Gavin Douglas

unread,
Dec 15, 2021, 11:27:58 AM12/15/21
to picrus...@googlegroups.com
Hey Yokmok,

ANCOM and ALDEX2 expect count data as input, so to use them you could round the pathway abundances to the nearest integer and not convert them to relative abundances. You would need to do this transformation for STAMP and LEfSe though. Yes, to convert to relative abundances you would perform the transformation you describe.

It’s controverisal regarding which methods are better for differential abundance in general. I wrote this pre-print highlighting that these methods can produce extremely different results, which is worrying: https://www.biorxiv.org/content/10.1101/2021.05.10.443486v1

Also, in the PICRUSt2 manuscript we point out that this is true for a few tools specifically on predicted metagenome data as well. In the pre-print I linked to we concluded that the best idea is to use a consensus approach and to report significant hits that are at least consistently found as differentially abundant by a few different tools.


All the best,

Gavin


犬飼庸介

unread,
Dec 15, 2021, 4:45:08 PM12/15/21
to picrus...@googlegroups.com
Dear Gavin

Thanks for your reply
"ANCOM and ALDEX2 expect count data as input, so to use them you could round the pathway abundances to the nearest integer " 
for example,
7386.12973→7386
8691.92432→8692
You mean to do like this?
I'm not familiar with statistics or programming, and I've managed to analyze it with qiime2 and picrust2 through trial and error, 
so I apologize for asking such a stupid question

yokmok

2021年12月16日(木) 1:28 Gavin Douglas <gavinm...@gmail.com>:
You received this message because you are subscribed to a topic in the Google Groups "picrust-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/picrust-users/otuOAFpu1s8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to picrust-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/picrust-users/8608E7D2-6D0A-4DF0-83D8-25B590877B7A%40gmail.com.

Gavin Douglas

unread,
Dec 15, 2021, 4:48:05 PM12/15/21
to 'cervant...@licifug.ugto.mx' via picrust-users

Alan Hudson

unread,
Jan 14, 2022, 4:31:55 PM1/14/22
to picrust-users
Hi,

I may be being a bit dim but I have read a number of threads that mention how normalisation of Picrust2 output is necessary in order to correctly visualise functional trends among samples/experimental groups, however it is still unclear to me what aspect of the Picrust2 output needs to be normalised and the best method for doing this.

I read in another thread that the "pred_metagenome_unstrat.tsv" file is the most appropriate for visualising differences among groups. I assume that some aspect of this file would need to be normalised by sample read count to make it comparable. I am just confused as to which data variable to normalise and the most appropriate method/software for doing so.

Any advice on how best to do this would be very gratefully received!

Alan

Gavin Douglas

unread,
Jan 25, 2022, 3:53:38 PM1/25/22
to picrus...@googlegroups.com
Sorry for not responding to this!

In terms of what statistical tests to use for differential abundance testing, there’s no easy answers unfortunately. See here for a discussion on this: https://github.com/picrust/picrust2/wiki/Frequently-Asked-Questions#how-should-i-analyze-the-picrust2-output . Currently I would run several approaches, including ANCOM, and make sure that significant features were robust to the choice of different tools.

The output needs to be normalized in the same way that varying read depth across samples needs to be normalized in some way. So for STAMP for instance I believe most plots show relative abundance. If you were to run a compositional-aware tool like ANCOM then you could round the abundance data to the nearest integer instead. I believe that STAMP automatically converts data to relative abundance for this purpose so in this case you wouldn’t need to do anything, but make sure to check that. The input file format is simply a tab-delimited file, so it’s very similar to the PICRUSt2 output - you may just need to change the headers to match the input file description in the STAMP manual.
 
Just to be clear - if you wanted to transform to relative abundance this would be for each sample separately, so it would be each column of the output table.

I would personally use custom R or Python code for visualization so I don’t have any other recommendations for specific software.


All the best,

Gavin



Alan Hudson

unread,
Jan 25, 2022, 8:21:00 PM1/25/22
to picrust-users
Thanks so much for replying and thanks for all the effort you put in to making this forum so useful!

Just to clarify, for normalisation I should use the "pred_metagenome_unstrat.tsv" (EC, KO) and the "path_abun_unstrat.tsv" (MetaCyc) files? To normalise the results, I should sum all the values in a particular column (corresponds to a single sample) and then divide each entry in the column by this total and then repeat this process for all the columns/samples in the data set, is that correct?

Gavin Douglas

unread,
Jan 26, 2022, 7:53:50 AM1/26/22
to 'cervant...@licifug.ugto.mx' via picrust-users
Hi Alan,

Yes those are the correct files and that would be what you do to convert to relative abundance (you could multiply by 100 to convert to percentage).


Cheers,

Gavin

Alan Hudson

unread,
Feb 1, 2022, 9:32:00 PM2/1/22
to picrust-users
Thanks again!

Marwa Tawfik

unread,
Feb 22, 2022, 12:32:20 PM2/22/22
to picrust-users
Hi Gavin,
Do I need to transform outputs from picrust2 into relative abundance before being used in STAMP? I did for LEfSe
Cheers
Reply all
Reply to author
Forward
0 new messages