Inquiry Regarding PICRUST2 Input Data

youngtae won

unread,

Jul 7, 2025, 11:34:15 AMJul 7

to picrust-users

Hello, I am using PICRUST2 in my microbiome research to predict specific functional features. My question concerns the appropriate level of data processing for input.

I have performed preprocessing of 16S rRNA sequencing fastq data using dada2. As you may know, I followed the standard workflow: cutadapt, dada2::filterAndTrim, dada2::learnErrors, dada, dada2::removeBimeraDenovo, and dada2::assignTaxonomy. My question is, at which step should I create the .fasta and .biom files for PICRUST2?

In particular, after running assignTaxonomy, the number of ASVs in my data decreases from about 4,000 to 1,400 (this difference is due to keeping only the ASVs that matched down to the species level). I am wondering how to handle this situation and would appreciate any opinions or advice you can share.

youngtae won

unread,

Jul 7, 2025, 11:34:15 AMJul 7

to picrust-users

Robyn Wright

unread,

Jul 7, 2025, 1:20:51 PMJul 7

to picrust-users

Hi there,

I have not used the DADA2 pipeline in probably 5 years or so, but I think my answer here is not dependent on the workflow used to generate ASVs anyway. I would certainly not limit your PICRUSt2 input (or any analysis) to only the ASVs that you have species-level taxonomic classifications for - this would massively bias your analysis. It is often difficult to get species-level taxonomy for ASVs, particularly when they are from short variable regions, and some taxonomic classifiers won't even give species-level classifications for this reason. Prior to running PICRUSt2 I would:

Trim primers, quality filter and denoise sequences
Remove very low abundance ASVs (typically for MiSeq this would be 0.001x the mean sequencing depth across your samples, as this is what possible bleeds through between sequencing runs)
Remove low prevalence ASVs (the limit here will depend on your sampling strategy and what you are interested in, people often apply something like a 10% prevalence filter)
Depending on the region sequenced and primers used, you may need to remove ASVs classified as mitochondria and chloroplasts

Best wishes,

Robyn

youngtae won

unread,

Jul 8, 2025, 9:32:38 AMJul 8

to picrust-users

Thank you for your response.

In my dataset, there are three groups, and each group is divided into an experimental and a control group. So, I have a total of six datasets: A_experimental, A_control, B_experimental, B_control, C_experimental, and C_control.
In this case, should I run PICRUSt2 separately for each of these groups?

2025년 7월 8일 화요일 오전 2시 20분 51초 UTC+9에 roby...@gmail.com님이 작성:

Robyn Wright

unread,

Jul 8, 2025, 9:34:13 AMJul 8

to picrust-users

It won't make a difference whether you run PICRUSt2 separately for each of the groups or together - it is run on a per sample basis anyway.

Robyn

Reply all

Reply to author

Forward