Hello, I am using PICRUST2 in my microbiome research to predict specific functional features. My question concerns the appropriate level of data processing for input.
I have performed preprocessing of 16S rRNA sequencing fastq data using dada2. As you may know, I followed the standard workflow: cutadapt, dada2::filterAndTrim, dada2::learnErrors, dada, dada2::removeBimeraDenovo, and dada2::assignTaxonomy. My question is, at which step should I create the .fasta and .biom files for PICRUST2?
In particular, after running assignTaxonomy, the number of ASVs in my data decreases from about 4,000 to 1,400 (this difference is due to keeping only the ASVs that matched down to the species level). I am wondering how to handle this situation and would appreciate any opinions or advice you can share.
Thank you for your response.
In my dataset, there are three groups, and each group is divided into an experimental and a control group. So, I have a total of six datasets: A_experimental, A_control, B_experimental, B_control, C_experimental, and C_control.
In this case, should I run PICRUSt2 separately for each of these groups?