PacBio CCS Amplicon SOP v2

11 views
Skip to first unread message

D S

unread,
Jun 6, 2025, 7:55:14 AMJun 6
to Microbiome Helper
Hi Langille Lab,

First off, thank you for sharing the CCS Amplicon SOP V2.

I’m working with full-length 16S amplicons (bacterial primer set), and overall, the sequencing quality looked great for most of my samples before starting (photo attached). I followed the SOP through to the end, and everything worked smoothly. However, during the denoising and chimera removal steps, I noticed a significant drop in reads — some samples dropped below 10,000 reads (table attached - ccs pacbio stats.tsv; given that most of these samples started with around 15,000 reads/sample or more), which is usually my cutoff for downstream analysis. This seemed a bit unexpected given the initial quality of the initial FASTA files.

The only deviation I made from the SOP V2 was integrating the chimera check directly into the denoising command (script attached - CCS_SOP_V2_commands). Could you please take a look and let me know if you have any suggestions or thoughts on what might be going on?

Thanks again for making the SOP available,

Best,
DS
CCS_SOP_V2_commands
ccs pacbio stats .tsv
fastqc_per_base_sequence_quality_plot.png

Andre Comeau

unread,
Jun 6, 2025, 5:24:19 PMJun 6
to Microbiome Helper
OK, first of all, let's look at your command you ran for the DADA2 step:

qiime dada2 denoise-ccs --i-demultiplexed-seqs reads_qza/seqs.qza --p-min-len 1200 --p-max-len 1800 --p-front AGRGTTYGATYMTGGCTCAG --p-adapter RGYTACCTTGTTACGACTT --p-n-threads $NCORES --p-chimera-method consensus --p-min-fold-parent-over-abundance 5 --o-table dada2_output/table-denoise-seqs.qza --o-representative-sequences dada2_output/rep-denoise-seqs.qza --o-denoising-stats dada2_output/stats-denoise-seqs.qza --verbose

So you mention the only change being the chimeras, however the chimera-checking is always integrated into the DADA2 and all you have done here is added the flag (--p-chimera-method consensus) for what it the default anyways (https://amplicon-docs.qiime2.org/en/latest/references/plugins/dada2.html#q2-action-dada2-denoise-ccs). What you have changed is the "--p-min-fold-parent-over-abundance" which is a default of 3.5, but you chose 5. Since you had hardly any loss to chimeras, this isn't then making the difference.

As for read depth, you could easily go down to 5000 reads (about 2000 is our min acceptable often) as your minimum and still only lose 5 samples (most of which just didn't sequence well):


sample-id
input
primer-removed
percentage of input primer-removed
filtered
percentage of input passed filter
denoised
non-chimeric
percentage of input non-chimeric
#q2:types
numeric
numeric
numeric
numeric
numeric
numeric
numeric
numeric
KK1QH167EC6
161
78
48.45
31
19.25
1
1
0.62
KK1791K6395
197
93
47.21
38
19.29
3
3
Jan-52
KK15K0FFPT8
157
98
62.42
35
22.29
9
9
Mai 73
KX1A8LCNE28
1560
1359
87.12
1135
72.76
854
854
54.74
KK1K5R4PCX3
7812
7136
91.35
5727
73.31
3515
3498
44.78
KK1R99NVGR0
15044
13597
90.38
10638
70.71
5190
5175
34.4


10,000 reads is a quite large cutoff, especially for PacBio 16S which tends to be lower depth than Illumina MiSeq.


As for the read loss at the DADA2 step - this is not an uncommon phenomenon (see the author of DADA2 discussing it here: https://github.com/benjjneb/dada2/issues/1164) and many of our clients, and ourselves when process ours+client data, have noticed it too (to be clear, this can happen with Illumina reads as well, depending on community makeup). It has less to do with the quality of the reads, though at extremes that can influence things, but it seems more related to community composition.


In fact, you are seeing the same general phenomenon that we see as well - for some reason, the ITS seems to perform better after DADA2 and then the 16S + 18S are a bit worse...we usually see around 50% of reads left after DADA2 with those latter ones (ie: so up to 50% loss at times).


You are not losing many sequences at the primer filter step, nor from the chimera removal, so the PCRs themselves are OK, but it is simply that DADA2 is "throwing out" a lot during its dereplication step - this indicates (most probably) that the ASV/species diversity is low in these samples (as discussed in the forum post above) and DADA2 is finding a lot of slightly-different (potential SNP) versions of the dominant sequences and is calling them "errors" (which they may be). You could always try Deblur instead (you'd need to follow the first part of our PacBio SOPv1 = Step1, then substitute Step2 onwards with deblur from our other SOPs) if you wanted to test if there is as much loss, but be aware Deblur wasn't explicitly designed for PB data.


 

ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2
 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications

 

"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 

 

 



Sent: Friday, June 6, 2025 8:55 AM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: [microbiome-helper] PacBio CCS Amplicon SOP v2
 
CAUTION: The Sender of this email is not from within Dalhousie.
--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/microbiome-helper/d98b3b54-3461-497f-b8f8-57ae558c6503n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages