Number of pathway report by PICRUSt2 between abundance report and coverage report?

738 views
Skip to first unread message

Preecha Patumcharoenpol

unread,
Nov 20, 2018, 8:44:41 PM11/20/18
to picrust-users
Hi all,

I ran PICRUSt2 (version: 5e4cc67341300d62a2b66235f1b313f6f57d0f7a) on my data, by using pipeline script with "--stratified" option. Everything ran fine, with some warning from pandas. However; I couldn't figure it out why a number of pathway report in "path_abun_unstrat.tsv" and file "path_cov_unstrat.tsv" are different?  They list 347 and 153 pathways.

Even weirder is that in "strat" files, both abundance/coverage report the same number, only 313 pathways.


Thank you in advance,
Preecha

Gavin Douglas

unread,
Nov 21, 2018, 9:09:56 AM11/21/18
to picrus...@googlegroups.com
Hi there,

If the pathways are missing from the pathcoverage file that means that they are 0 across all samples. The pathway coverage is an alternative approach to assess how likely a pathway is present (ranging from 0 to 1) and likely will only be of use to advanced users. I included it in the output to be consistent with how HUMAnN2 outputs pathway abundances and coverages, but they usually aren’t used by users. The coverage is based on the harmonic mean of confidence scores for each reaction in a pathway. This scores will differ depending on what the median reaction abundance is for the group being assessed (i.e. different reaction scores will be inferred if the reaction is within one predicted genome compared to across an entire sample). Since the median reaction abundance will be higher across an entire sample it’s harder to be confident in rare reactions within a sample based on this approach.


Best,

Gavin

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To post to this group, send email to picrus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Preecha Patumcharoenpol

unread,
Nov 21, 2018, 10:38:49 PM11/21/18
to picrust-users
Ah, thank you very much for a detail explanation. I though that if a reaction/gene is listed in abundance then it should be in coverage file too, I guess I need to look up how it is all calculate from the ground up.

I have a bit more question on output. I looked into stratified/unstratified of pathway abundance, but they aren't exactly equal.

For example, this is a snippet from my path_abun_unstrat.tsv

pathway description
001M19 004M19 006M19
PWY-7003 glycerol degradation to butanol
453.5733 1783.8435 1260.9629


Compare to result from same pathway in path_abun_strat.tsv
pathway description sequence 001M19 004M19 006M19
PWY-7003 glycerol degradation to butanol df3d6113f855e22f2a6d44e60f01baa7 0 10.678 17.6949
PWY-7003 glycerol degradation to butanol efa98c673cef7e11e51b27d01167c2af 0 5.4915 0

As shown here, the sum of all abundance from 004M19 in path_abun_strat.tsv doesn't equal to value in path_abun_unstrat.tsv. Is this an expected behavior? I skimmed over wiki and couldn't find the explanation.

Thank you very much,
Preecha

Gavin Douglas

unread,
Nov 22, 2018, 6:50:22 AM11/22/18
to picrus...@googlegroups.com
Hi again,

There are different options for this step that affect the output - could you post the full command you used?


Thanks,

Gavin

Aqleem Abbas

unread,
Nov 22, 2018, 9:34:03 AM11/22/18
to picrus...@googlegroups.com
Dear Sir,

I want to use Qiime 2 output for  Picrust. I have downloaded greengene database from Picrust and now what to do? Can anyone suggest me?


Thanks 
Abbas

Preecha Patumcharoenpol

unread,
Nov 22, 2018, 8:46:25 PM11/22/18
to picrust-users
Hi Gavin,

Sure,  it was

picrust2_pipeline.py -s dna-sequences_filtered.fasta -i otu_table_rfc.tsv -o picrust2_contrib_stratified_withfiltered --stratified  --threads 17 -n --per_sequence_contrib"

ran on rock cluster with picrust2 (installed on a separate conda environment because I am that paranoid).  Just an hour ago, I tried this exact command again with subset of my original data, and the strat/unstrat still does not correlate to each other.

I would love to give a dataset I used, but it is a clinical data that is not mine :\.


Thank you very much for looking into this.

Best wishes,
Preecha

Gavin Douglas

unread,
Nov 22, 2018, 9:12:02 PM11/22/18
to picrus...@googlegroups.com

Hi there,


When the "--per_sequence_contrib" option is used that means that pathway abundances and coverages are calculated for each predicted genome individually. If this option is not set then the contribution of each predicted genome to the community-wide pathway abundances is reported.


Best,


Gavin

Preecha Patumcharoenpol

unread,
Nov 22, 2018, 9:42:54 PM11/22/18
to picrust-users
Hi Gavin,

I used that option since I would like to see a contribution from each sequence ( We are looking at how some family of sequence is assigned to some pathways, part curious, part validation). But what I am curious is that why a sum of all contribution in a certain pathway (path_abun_strat.tsv), is not equal to the community-wide abundance of the same pathway  (path_abun_unstrat.tsv). The number seems to have some correlation but it is 3x-10x off.

Thank you very much for helping me so far,

Best wishes,
Preecha

Gavin Douglas

unread,
Nov 23, 2018, 10:48:08 AM11/23/18
to picrus...@googlegroups.com
Hi there,

It is a little confusing to distinguish these two abundances. It may be easiest with an example.
 
Let’s say pathway X is made up of 3 reactions (A, B, and C). 

If genome 1 has 40 copies of reaction A, but no copies of B and C then the pathway wont called as present within that particular genome. However, if genome 2 has 30 copies of B and C each then pathway X might be called as present in that genome and at an abundance of 30.

To calculate the “community-wide” pathway abundances the individual genomes aren’t considered, only the total gene family abundances overall. If there is only genomes 1 and 2 in the community then the abundances would be 40, 30, and 30 for reactions A, B, and C respectively. The harmonic mean of these 3 values is higher than 30, which is what you would get by summing the pathway abundances within each genome alone. In other words, in this simple example the community-wide and summed per-genome pathway abundance differs.

Hopefully that helps,

Gavin

Preecha Patumcharoenpol

unread,
Nov 25, 2018, 12:24:20 PM11/25/18
to picrust-users
This thing is much more clear for me now. Thank you very much.

Best wishes,
Preecha

Aqleem Abbas

unread,
Nov 25, 2018, 8:10:49 PM11/25/18
to picrus...@googlegroups.com
Dear Sir,

I have paired 2*300bp 16S forward and reverse reads. The interactive quality plots I am attaching with this email. Kindly see the plots, If I do not truncate the forward and reverse reads 3511 highest and lowest frequency 784. However, when I truncate forward at 295b and reverse at  285b, the highest sequence count was  21309 and the lowest 7215. What is this I don't understand? When I will truncate further I will get more. Please explain this, I could not understand. I have attached interactive quality plots of both forward and reverse reads and suggest the best trimming method. Further after using DADA2 how much sequence count retained from the original company forward and reverse reads. 

Yours Sincerely
Abbas  
triming.png
Untitled.png

Preecha Patumcharoenpol

unread,
Nov 27, 2018, 9:49:56 PM11/27/18
to picrust-users
That happened to me once (and it made me frustrated for a week), it turned out that DADA2 though that the low quality reads (no-truncate ones) are chimera and threw them away.

Turn DADA's verbosity up and you can see the number of reads in each step. I hope that help.

Best wishes,
Preecha
Reply all
Reply to author
Forward
0 new messages