humann2's pathway abundance calculations

Billy Taj

unread,

Oct 8, 2019, 11:26:09 AM10/8/19

to HUMAnN Users

I'm trying to test out and understand HUMAnN2's pathway abundance output.

What I'm looking to do is see how well HUMAnN2 does in metabolic pathway analysis. To do this, I have a list of ECs from an external source, and I want to run it through HUMAnN2's pathway abundance analysis.

The list of ECs have an associated abundance score. However, the documentation surrounding the formation of the pathway abundance output is murky.

I have the EC -> metacyc reaction (A), and metacyc reaction -> pathway (B )maps.

I was originally taking all entries in map B, and summing up all EC abundances associated with the pathway, and calling this result the "pathway abundance". Is this wrong?

According to the literature, I should be doing something different but I'm not sure.
-> reconstruct the pathway, and check to see if the reactions I've got satisfies any of the pathways
-> Then using the abundances of the reactions, the pathway abundance is going to be the lowest amount

eg: Pathway 1 has reaction A, B. A is 15, B is 5. Therefore, pathway 1's abundance is 5
Additionally, if pathway 1 has alternate definition: reaction C, D. C is 10, D is 15.
Pathway 1's real abundance is 15, from 5 (A and B ) + 10 (C and D)

If I could just insert the ECs into HUMAnN2's existing workflow, this should solve my problems, and I can avoid reverse-engineering the code.

Does anyone have an opinion, or experience in the matter?

Eric Franzosa

unread,

Oct 8, 2019, 3:00:46 PM10/8/19

to humann...@googlegroups.com

Some responses in italics below.

====

I was originally taking all entries in map B, and summing up all EC abundances associated with the pathway, and calling this result the "pathway abundance". Is this wrong?

This procedure will be very prone to false positives. You will be assigning a non-zero abundance to pathways with many reactions when only one or a few were detected. It would be analogous to seeing one gene in a community from a species X and assigning a non-zero abundance to species X. If X is there, we should see a bunch of its genes.

According to the literature, I should be doing something different but I'm not sure.
-> reconstruct the pathway, and check to see if the reactions I've got satisfies any of the pathways
-> Then using the abundances of the reactions, the pathway abundance is going to be the lowest amount

eg: Pathway 1 has reaction A, B. A is 15, B is 5. Therefore, pathway 1's abundance is 5
Additionally, if pathway 1 has alternate definition: reaction C, D. C is 10, D is 15.
Pathway 1's real abundance is 15, from 5 (A and B ) + 10 (C and D)

Indeed, these are the sorts of computations involved in real-world pathway reconstruction: optimizing over sub-pathways and (typically) being conservative about pathway abundance by focusing on the minimum (rather than average or max) abundance of reactions.

If I could just insert the ECs into HUMAnN2's existing workflow, this should solve my problems, and I can avoid reverse-engineering the code.

You can do this. See this section of the manual:

https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-custom-pathways-database

Short version is that if you already have EC/reaction abundance, you can give HUMAnN2 that file via --input and a set of pathway definitions via --pathways-database. If you have raw gene abundance, you can give HUMAnN2 that file via --input and then two files to --pathways-database "$A,$B", where $A is a mapping from genes to reactions and $B is the set of pathway definitions (as referenced in my first example).

Hope this helps!

Thanks,

Eric

Message has been deleted

Eric Franzosa

unread,

Oct 9, 2019, 1:28:13 PM10/9/19

to humann...@googlegroups.com

Using custom pathways, a sample call might look like (note the comma):

humann2 --input $GENE_ABUNDANCES.tsv --output . --pathways-database $REACTION_GENE_MAP,$PATHWAY_REACTION_MAP

Or, if you already have reaction (or EC) abundance:

humann2 --input $REACTION_ABUNDANCES.tsv --output . --pathways-database $PATHWAY_REACTION_MAP

In these cases, HUMAnN2 knows to proceed right to pathway quantification based on the format of the input files (TSV). The file "metacyc_pathways_structured_filtered" you cite would be an example of a $PATHWAYS_REACTION_MAP file (as referenced above).

I believe non-key reactions can contribute to a pathway's abundance when present but don't adversely affect the abundance when absent.

Thanks,

Eric

On Tue, Oct 8, 2019 at 5:47 PM Billy Taj <bill...@gmail.com> wrote:

There is also a portion in the documetation that describes the format of the pathway database, but I am confused by the text.

If a minus-sign denotes an item that is not a "key item" in the path, why include it in the path definitions? What criteria does it satisfy?

--
You received this message because you are subscribed to the Google Groups "HUMAnN Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to humann-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/humann-users/495549b4-7614-4ad1-b7b4-90cccc9a5bd7%40googlegroups.com.

Billy Taj

unread,

Oct 9, 2019, 4:28:29 PM10/9/19

to HUMAnN Users

On Wednesday, 9 October 2019 13:28:13 UTC-4, Eric Franzosa wrote:

Using custom pathways, a sample call might look like (note the comma):

humann2 --input $GENE_ABUNDANCES.tsv --output . --pathways-database $REACTION_GENE_MAP,$PATHWAY_REACTION_MAP

Or, if you already have reaction (or EC) abundance:

humann2 --input $REACTION_ABUNDANCES.tsv --output . --pathways-database $PATHWAY_REACTION_MAP

What is the format of $REACTION_ABUNDANCES.tsv? I don't see it in the manual. Is it 2 columns: reaction name, and abundance number? Does the table have a header? Does it need to have specific names?

In these cases, HUMAnN2 knows to proceed right to pathway quantification based on the format of the input files (TSV). The file "metacyc_pathways_structured_filtered" you cite would be an example of a $PATHWAYS_REACTION_MAP file (as referenced above).

I believe non-key reactions can contribute to a pathway's abundance when present but don't adversely affect the abundance when absent.

Thanks,
Eric

On Tue, Oct 8, 2019 at 5:47 PM Billy Taj <bill...@gmail.com> wrote:

There is also a portion in the documetation that describes the format of the pathway database, but I am confused by the text.

If a minus-sign denotes an item that is not a "key item" in the path, why include it in the path definitions? What criteria does it satisfy?

--
You received this message because you are subscribed to the Google Groups "HUMAnN Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to humann...@googlegroups.com.

Billy Taj

unread,

Oct 16, 2019, 10:14:22 AM10/16/19

to HUMAnN Users

On Wednesday, 9 October 2019 13:28:13 UTC-4, Eric Franzosa wrote:
Using custom pathways, a sample call might look like (note the comma):

humann2 --input $GENE_ABUNDANCES.tsv --output . --pathways-database $REACTION_GENE_MAP,$PATHWAY_REACTION_MAP

Or, if you already have reaction (or EC) abundance:

humann2 --input $REACTION_ABUNDANCES.tsv --output . --pathways-database $PATHWAY_REACTION_MAP

What is the format of $REACTION_ABUNDANCES.tsv? I don't see it in the manual. Is it 2 columns: reaction name, and abundance number? Does the table have a header? Does it need to have specific names? Anyone have an answer?

Eric Franzosa

unread,

Oct 16, 2019, 11:47:10 AM10/16/19

to humann...@googlegroups.com

It's a standard HUMAnN2 abundance file: 2 columns, headers, first column = (stratified) feature IDs, second column = abundances. The format would be the same as a genefamilies.tsv file from any HUMAnN2 run. If you want to see exactly what this looks like for reactions, you can regroup gene family abundance to reactions using the regroup_table script.

Thanks,

Eric

--

You received this message because you are subscribed to the Google Groups "HUMAnN Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to humann-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/humann-users/de9d1b30-2b5a-44e4-ae0f-72bdc788dd97%40googlegroups.com.

Reply all

Reply to author

Forward