archaea output for v3-v4 sequencing?

Zoë Williams

unread,

Mar 28, 2025, 7:23:04 PM3/28/25

to picrust-users

Hello! I'm running the new picrust2 and would greatly appreciate some help if you have a moment. For my experiments we sequenced v3-v4 and I'm getting the following errors:

Experiment 1:

Warning - 114 input sequences aligned poorly to reference sequences

Warning - 991 input sequences aligned poorly to reference sequences

Experiment 2:

Warning - 4 input sequences aligned poorly to reference sequences

Warning - 157 input sequences aligned poorly to reference sequences

Then for both I get the following:

Warning: There was only one file for the function: KO

Maybe that's fine if you used custom traits or there were no sequences matching one of the domains.

Warning: There was only one file for the function: EC

Maybe that's fine if you used custom traits or there were no sequences matching one of the domains.

My understanding is that the new picrust2 aligns bacteria first and archaea second, so it seems I am having less alignment to reference sequences for archaea, but I am still getting some alignment. However, despite some alignment occurring no final files are generated for (I assume) the archaea domain. Taken together, I have two questions:

1) Is it unusual to have no archaea outputs from picrust2 for v3-v4 sequencing despite having some alignment to reference sequences? I am inexperienced in the archaea realm, but a quick look online suggested that v3-v4 regions should sequence archaea too. My experiment is exploring bacteria specifically, so I'm not concerned about losing archaea if none were sequenced, but I wanted to make sure that this isn't a sign that something else is wrong (especially given that some alignment to reference sequences is indeed occurring).

2) I have quite different alignment outcomes from Experiment 1 and Experiment 2, and I'm wondering if the amount being dropped in Experiment 1 is concerning? I'm planning to go back over my initial filtering to make sure that I have no mitochondria or chloroplasts sneaking in, but I was curious if there is a rule of thumb for how many input sequences to expect to be filtered out.

Thank you so much in advance for your help!!

Zoë

Robyn Wright

unread,

Mar 28, 2025, 7:55:36 PM3/28/25

to picrust-users

Hi there,

First off, that output looks totally normal - assuming that you have a lot more than ~1000 sequences in the files that you're running, or that they're from an environment where you don't expect them to closely match reference sequences. I am also not an expert in archaea, but if you used standard V3-V4 primers then I don't think that you'd really expect any archaea at all. The sequencing facility that runs adjacent to our lab has these tables of primers and they suggest that coverage of archaea is not expected to be good. Have you already classified the sequences taxonomically? The best thing to do to check that the output is as expected is to see if you have sequences being classified as archaea in your taxonomic output. In those first alignment steps, only sequences that align really badly to reference sequences will be dropped. As they're both still 16S, you'll usually get some alignment for sequences of the other domain. The next step compares which tree each sequence fits best in (has the lowest Nearest Sequences Taxon Index). In our comparisons, we didn't really ever find that this was ambiguous unless we were looking at a sequence that actually looked like it may be a bit messed up (didn't hit anything with BLAST either).

It's difficult to say how many I'd expect to be dropped... it really depends a lot on the environment that your samples are from, how many sequences are in your input, and how those sequences were processed. Could you tell me a bit more about what your two experiments are, what the differences were between the two, how you processed your sequencing data to get the sequences that you used for PICRUSt, and how many sequences you had in each?

Best,

Robyn

Zoë Williams

unread,

Mar 31, 2025, 1:43:31 PM3/31/25

to picrust-users

Hello Robyn,

That is such good point about taxonomic classification; I did already classify with SILVA and just checked -- I have no archaea sequences... I''m so sorry for not having thought of this myself! Thank you also for the very clear explanation on what is going on behind the hood and why I'm still getting alignment at that first step.

Experiment 1 = Mouse cecal content, sequenced with NextSeq 2000 // 4,245 ASVs

Experiment 2 = Rat feces, sequenced with MiSeq // 2,038 ASVs

Both were processed through the same pipeline using dada2 in R:

1) fastqc files were assessed to determine trunc/trim settings and samples were processed with dada2 in R.

2) Sequences were taxonomically classified using a pre trained SILVA classifier (silva_nr99_v138.1_train_set.fa.gz and the silva_species_assignment_v138.1.fa.gz -- https://zenodo.org/records/4587955#.Y_NmdXbP02x)

3) Samples were then filtered by phylum (excluded all ASVs that were unassigned at phylum level).

4) Data was then converted from it's phyloseq object into the appropriate tsv and fasta files for analysis with picrust2.

Thank you so much for your help! I greatly appreciate it,

Zoë

Robyn Wright

unread,

Mar 31, 2025, 1:51:26 PM3/31/25

to picrust-users

Hi Zöe,

No problem!!

OK, so it makes sense that more are dropped from the experiment with more sequences... that does seem like quite a lot for what (I assume) is a relatively well-characterised environment, but I don't think is necessarily problematic. I think that to determine how well your PICRUSt2 results likely represent your samples, I'd take a look at the relative abundance accounted for by the ASVs that are included in your PICRUSt2 output. I'd guess that it's more of the rare ones that have been excluded, and if you're retaining most of the relative abundance then I wouldn't be too concerned about the dropped sequences. Another thing to check - do you have a lower proportion of genus/species level taxonomic classifications for experiment 1? In this case, I guess that - for whatever reason - the sequences in experiment 1 just aren't as well represented in reference databases as for experiment 2. You could also do some NCBI BLAST searches with some of the ASVs that are excluded - we have sometimes experienced some slightly funky results from taxonomic classifiers, and looking at the BLAST results might give you a hint as to whether some of the sequences seem like they could just have reasonably come from several different taxa.

Robyn

Zoë Williams

unread,

Apr 7, 2025, 8:53:06 AM4/7/25

to picrust-users

Hello Robyn,

These are great ideas for validating taxonomic classifications/picrust2 outputs that I will definitely look into; thank you so much for your help!