PacBio CCS Amplicon SOP v2

11 views

Skip to first unread message

D S

unread,

Jun 6, 2025, 7:55:14 AMJun 6

to Microbiome Helper

Hi Langille Lab,

First off, thank you for sharing the CCS Amplicon SOP V2.

I’m working with full-length 16S amplicons (bacterial primer set), and overall, the sequencing quality looked great for most of my samples before starting (photo attached). I followed the SOP through to the end, and everything worked smoothly. However, during the denoising and chimera removal steps, I noticed a significant drop in reads — some samples dropped below 10,000 reads (table attached - ccs pacbio stats.tsv; given that most of these samples started with around 15,000 reads/sample or more), which is usually my cutoff for downstream analysis. This seemed a bit unexpected given the initial quality of the initial FASTA files.

The only deviation I made from the SOP V2 was integrating the chimera check directly into the denoising command (script attached - CCS_SOP_V2_commands). Could you please take a look and let me know if you have any suggestions or thoughts on what might be going on?

Thanks again for making the SOP available,

Best,

CCS_SOP_V2_commands

ccs pacbio stats .tsv

fastqc_per_base_sequence_quality_plot.png

Andre Comeau

unread,

Jun 6, 2025, 5:24:19 PMJun 6

to Microbiome Helper

OK, first of all, let's look at your command you ran for the DADA2 step:

qiime dada2 denoise-ccs --i-demultiplexed-seqs reads_qza/seqs.qza --p-min-len 1200 --p-max-len 1800 --p-front AGRGTTYGATYMTGGCTCAG --p-adapter RGYTACCTTGTTACGACTT --p-n-threads $NCORES --p-chimera-method consensus --p-min-fold-parent-over-abundance 5 --o-table
 dada2_output/table-denoise-seqs.qza --o-representative-sequences dada2_output/rep-denoise-seqs.qza --o-denoising-stats dada2_output/stats-denoise-seqs.qza --verbose

So you mention the only change being the chimeras, however the chimera-checking is always integrated into the DADA2 and all you have done here is added the flag (--p-chimera-method consensus) for what it the default anyways (https://amplicon-docs.qiime2.org/en/latest/references/plugins/dada2.html#q2-action-dada2-denoise-ccs). What you have changed is the "--p-min-fold-parent-over-abundance" which is a default of 3.5, but you chose 5. Since you had hardly any loss to chimeras, this isn't then making the difference.

As for read depth, you could easily go down to 5000 reads (about 2000 is our min acceptable often) as your minimum and still only lose 5 samples (most of which just didn't sequence well):

sample-id	input	primer-removed	percentage of input primer-removed	filtered	percentage of input passed filter	denoised	non-chimeric	percentage of input non-chimeric
#q2:types	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric
KK1QH167EC6	161	78	48.45	31	19.25	1	1	0.62
KK1791K6395	197	93	47.21	38	19.29	3	3	Jan-52
KK15K0FFPT8	157	98	62.42	35	22.29	9	9	Mai 73
KX1A8LCNE28	1560	1359	87.12	1135	72.76	854	854	54.74
KK1K5R4PCX3	7812	7136	91.35	5727	73.31	3515	3498	44.78
KK1R99NVGR0	15044	13597	90.38	10638	70.71	5190	5175	34.4

10,000 reads is a quite large cutoff, especially for PacBio 16S which tends to be lower depth than Illumina MiSeq.

As for the read loss at the DADA2 step - this is not an uncommon phenomenon (see the author of DADA2 discussing it here: https://github.com/benjjneb/dada2/issues/1164) and many of our clients, and ourselves when process ours+client data, have noticed it too (to be clear, this can happen with Illumina reads as well, depending on community makeup). It has less to do with the quality of the reads, though at extremes that can influence things, but it seems more related to community composition.

In fact, you are seeing the same general phenomenon that we see as well - for some reason, the ITS seems to perform better after DADA2 and then the 16S + 18S are a bit worse...we usually see around 50% of reads left after DADA2 with those latter ones (ie: so up to 50% loss at times).

You are not losing many sequences at the primer filter step, nor from the chimera removal, so the PCRs themselves are OK, but it is simply that DADA2 is "throwing out" a lot during its dereplication step - this indicates (most probably) that the ASV/species diversity is low in these samples (as discussed in the forum post above) and DADA2 is finding a lot of slightly-different (potential SNP) versions of the dominant sequences and is calling them "errors" (which they may be). You could always try Deblur instead (you'd need to follow the first part of our PacBio SOPv1 = Step1, then substitute Step2 onwards with deblur from our other SOPs) if you wanted to test if there is as much loss, but be aware Deblur wasn't explicitly designed for PB data.

ANDRÉ M. COMEAU, PhD
Manager • Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2

Research Associate (Lab Manager)

Morgan Langille Lab • Dept. of Pharmacology
ResearchGate Profile • GoogleScholar Publications

"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson

From: microbio...@googlegroups.com <microbio...@googlegroups.com> on behalf of D S <singhdh...@gmail.com>

Sent: Friday, June 6, 2025 8:55 AM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: [microbiome-helper] PacBio CCS Amplicon SOP v2

CAUTION: The Sender of this email is not from within Dalhousie.

--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/microbiome-helper/d98b3b54-3461-497f-b8f8-57ae558c6503n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages