Contributional diversity analysis---Figure 2a ---In HUMAnN2 paper

97 views
Skip to first unread message

Jessie HOU

unread,
Apr 18, 2019, 4:13:41 AM4/18/19
to HUMAnN Users
Hello,

I am following the humann2 paper to do some contributional diversity analysis. But what I am confused with is the "Within- and between-sample contributional diversity for core metabolic pathways" shown in the Fig. 2a. Actually, I have obtained the core pathways in my data (i.e. 65 core skin microbiome MetaCyc pathways) using the thresholds mentioned in this paper. But I do not know what to do next for the contributional diversity analysis. Does anyone know how to use the stratified pathabundance table to obtain within and between sample contributional diversity values for each core pathway?

contributional diversity.png


In addition, in Fig 2, the authors said "Stars indicate background species-level whole-community diversity". I still don't know what this means. Finally, what is the threshold for low/high within, low/high between contributional diversity? Is it an arbitrary threshold? Any suggestion would be greatly appreciated. Thanks in advance.

Best regards,
Jessie

Eric Franzosa

unread,
Apr 18, 2019, 11:49:58 AM4/18/19
to humann...@googlegroups.com
Hi Jessie,

The trick to contributional diversity analysis is taking all stratifications of a given function and then performing ecological diversity analyses on the stratified values as if they were species abundances within a community. The star values in the plot are the ACTUAL ecological diversity indices of the community calculated on species abundances (i.e. traditional community diversity measures) provided for reference. To be conservative in the paper, we ignored functions that had a high percentage of unclassified copies, as the species contributions to "unclassified" are not clear. The "high" and "low" cutoffs were arbitrary at 0.5 in this figure - there is probably a more principled way to select these (perhaps median values?).

Thanks,
Eric


Jinpao Hou

unread,
Apr 19, 2019, 2:29:11 AM4/19/19
to humann...@googlegroups.com
Hello Eric,

Thank you so much for your clear explanation. Sure. I can calculate the alpha and beta diversity for each of the core pathway as you suggested. By the way, from the Fig. 2a in the humann2 paper, each pathway boils down to one point in the figure. I may choose the median of the alpha diversity for all samples in a group (i.e. To plot for the case and control group, respectively) as the x coordinate, but what should be the y-axis? Do you think if I can choose the maximum value from the distance matrix (e.g. bray-curtis dissimilarity) as the y coordinate? Many thanks again!

Best regards,
Jessie

Eric Franzosa

unread,
Apr 19, 2019, 11:42:44 AM4/19/19
to humann...@googlegroups.com
Hi Jessie,

Good point - I confirmed that we did this in the HUMAnN2 paper by averaging: "Diversity values for a pathway computed over samples (or sample pairs) were summarized by averaging." Medians would also be a good choice. I would avoid taking the max as this will be very sensitive to outliers.

Thanks,
Eric


Jinpao Hou

unread,
Apr 19, 2019, 10:04:11 PM4/19/19
to humann...@googlegroups.com
Hi Eric,

I see. Thank you for your suggestions. 

Best regards,
Jessie

shriram patel

unread,
May 1, 2019, 11:23:31 AM5/1/19
to HUMAnN Users
Hi,

Sorry for hijacking the thread, but I have a follow-up question to the answers provided above. 

For "Core pathway" to be identified it should be present in >75% of samples and should not have unclassified species explaining >25% of total pathway contribution. But stratified humann2 pathway not always sum to the original pathway abundance. In that case how should I proceed for "core pathway" diversity analysis, especially for point in excluding pathways with contribution from unclassified species.

To make it more easier to understand, please see example. For this pathway, Samp2 has no contribution from "unclassified" species, but still known species could only explained 36% (sum: 0.006) of the total pathway abundance in that sample. 

A) Should I flag this pathway for that particular sample "Core", despite of low known species abundance value.
B) Or rather than counting on original abundance of that pathway (0.0166) just taking the sum of the stratified abundance of known species (0.006) should be used?


Pathway         Samp1        Samp2           Samp3      Samp4
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis 0.00765589 0.0166637 0.00411085 0.00948822
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Akkermansia.s__Akkermansia_muciniphila 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_barnesiae 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_caccae 9.45E-05 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_cellulosilyticus 0 0 3.30E-05 0.000710572
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_clarus 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_coprocola 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_coprophilus 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_faecis 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_finegoldii 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_fluxus 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_fragilis 0 0.00166574 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_intestinalis 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_massiliensis 0.000471696 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_ovatus 0.00013764 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_sp_1_1_6 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_sp_2_1_22 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_stercoris 0.00052892 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_thetaiotaomicron 2.03E-05 0 0 0.000176184
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_uniformis 0.000932087 0.00420038 0.000214384 0.000721061
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Bacteroides.s__Bacteroides_xylanisolvens 0 0 5.10E-05 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Escherichia.s__Escherichia_coli 0 0.000164541 4.95E-05 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Parabacteroides.s__Parabacteroides_distasonis 0 0 3.59E-05 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Parabacteroides.s__Parabacteroides_goldsteinii 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Parabacteroides.s__Parabacteroides_johnsonii 0 0 0 3.43E-05
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Parabacteroides.s__Parabacteroides_merdae 5.92E-05 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Ralstonia.s__Ralstonia_pickettii 0 0 0 1.63E-05
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Streptococcus.s__Streptococcus_gordonii 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Streptococcus.s__Streptococcus_parasanguinis 0 0 0 0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|unclassified 0.000423685 0         0.000542891 4.52E-05
                                                                                                                                                                                                                                                Sum=0.006
Thank you very much for your inputs and time,

Regards
Shriram


Eric Franzosa

unread,
May 1, 2019, 2:43:23 PM5/1/19
to humann...@googlegroups.com
This is a good observation. In practice, we required that the unclassified abundance be no greater than 25% of the summed, stratified abundances. Since the sum of the stratified abundances is usually less than the community total abundance for pathways, this also means that the unclassified abundance is <25% of community total abundance.

I'll also take this opportunity to emphasize that these _specific_ filters are not fundamental to contributional diversity analysis. You can perform the analysis on non-core pathways (though I would exclude samples where the pathway was "absent" by some reasonable definition) and you can perform the analysis treating "unclassified" as a single clade (noting that this will tend to underestimate diversity measures).

Thanks,
Eric


shriram patel

unread,
May 1, 2019, 5:10:23 PM5/1/19
to HUMAnN Users
Perfect..  Thank you for your suggestions.. I appreciate your time.
Best
Shriram

shriram patel

unread,
May 9, 2019, 12:19:45 PM5/9/19
to HUMAnN Users
Hi Eric, 

I have a follow-up question to this thread, so to keep it more organize I am including it here (rather than opening new thread).

Regarding contributional diversity, I have observed that some of the "core functions" have only 2-3 stratification (species level contribution) and for most of the samples one species contributes majorly while other occur in few of them (while some have none of them). In that case, what should be the optimal filtering because in "between sample diversity", when sample with no species contribution compared with samples with species contribution it will always have high beta diversity=1 (no species in common). Here by "core function" I mean the criteria followed in Humann2-methods paper for defining it.

And what are your thoughts on filtering based on number of stratification per function in addition to the above defined criteria? (say function with more than 10 species contribution).

Thank You for your time
Shriram

Eric Franzosa

unread,
May 9, 2019, 4:25:55 PM5/9/19
to humann...@googlegroups.com
Thanks for replying on the related thread - it's helpful for me in answering questions.

I don't think you want to filter on # of stratifications, since that's a component of within-sample diversity. It's biologically interesting if a function is supplied by one species in one sample and a mix of 10 in another.

I'm not sure how that differs from the first part of your question? If the issue is that in some samples a function is stratified while in others it isn't, I might treat the unstratified samples as "unknown" for the purposes of the analysis (and not include them in averaging, for example). This is what we did for the samples with a high unclassified fraction in the HUMAnN2 paper.

Thanks,
Eric



--
You received this message because you are subscribed to the Google Groups "HUMAnN Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to humann-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/humann-users/80e6ce64-6fd0-4fc5-9c21-77cf8a42a9b0%40googlegroups.com.

shriram patel

unread,
May 10, 2019, 11:51:26 AM5/10/19
to HUMAnN Users
Hi Eric, 

Now that makes much more sense to me. Thank you very much for your timely feedback. 

Have a lovely weekend ahead.

Regards
Shriram
Reply all
Reply to author
Forward
0 new messages