Questions about pre-sample normalization and low aboundant taxa for LEfSe

geebzbz

unread,

Nov 21, 2014, 2:31:28 PM11/21/14

to lefse-users

Dear LEfSe Staff,

I have a question about LEfSe. It may be a very basic one.

For a taxa aboundance file A, eg 1000 taxa X 100 samples. If I delete some low aboundant taxa, eg keeping top 500 taxa, then the LDA results will be different from the full one. I do not know whether this is a normal phenomenon.

But, each time I set "NO" to the field of "Per-sample normalization of the sum of the values to 1M (recommended when very low values are present)" in LEfSe Galaxy, because I did not undertand what this normalization is, neither know how to normalize it on my data. Can you give any suggestions on this?

Thanks for your repsonse. I tried to post this to googlegroup, but I failed with reasoning that I am not permitted to post.

Bangzhou Zhang

Nicola Segata

unread,

Nov 24, 2014, 9:38:43 AM11/24/14

to geebzbz, lefse-users

Hi Bangzhou,

the normalization is just dividing each single abundance in each sample by the sum of abundances in the sample and multiplying the obtained fraction by 1M. This is done consistently with the taxonomic structure of the problem (i.e. the normalization is performed w.r.t. the sum of all leaf nodes). If the samples are already normalized (e.g. they already are percentages or fractions rather than absolute OTU counts) than the LEfSe normalization will not change the values (normalizing something already normalized does not introduce any changes) but just multiply them by 1M.

I hope this helps

thanks

Nicola

gee...@gmail.com

unread,

Nov 24, 2014, 1:19:03 PM11/24/14

to lefse...@googlegroups.com, gee...@gmail.com, nicola...@unitn.it

Hi Nicola,

Thanks for your explaining. It makes this quite clear. So I just choose "Yes" for it, then the normalization will be conducted correctively nomatter the input is absoulte OTU # or fractions. I will try it now. Thanks again.

Bangzhou

在 2014年11月24日星期一UTC-5上午9时38分43秒，Nicola Segata写道：

NN

unread,

Sep 14, 2018, 7:47:57 PM9/14/18

to LEfSe-users

Thank you for explaining this Nicola.

I am getting very different histograms from my relative abundance (or raw data) table compared to the '--output_table' using the 'format_input.py script' with 1M normalization.

I looked at the '--output_table' but can't figure out how to obtain the original data based on your (quoted) explanation of how normalization is performed. I am attaching the rel abundance table I provided as input for 'format_input.py', along with the generated '--output_table'.

I would really appreciate it if you or someone could help me understand how my output_table was transformed.

Many thanks.

perm_l6.xlsx

perm_output_table

Reply all

Reply to author

Forward