correct pacbio command

18 views
Skip to first unread message

Stephane Plaisance

unread,
Feb 14, 2022, 4:00:22 AM2/14/22
to LotuS rRNA pipeline
Dear,

I am running commands for pacbio hifi reads and get warnings although I set CL to cdhit and  -useVsearch 1, is my syntax not correct or should I just ignore the warning?
Also, is lambda the correct option for V1V9 16S amplicon HiFi reads as suggested by then --help output.

```
# lotus2 -i /data/analyses/PB_16S_datasets/lotus2 -m /data/analyses/PB_16S_datasets/lotus2/pacbio_sm_s.txt -o lotus2_pacbio_s -tmp /data/analyses/PB_16S_datasets/lotus2/tmp -s /opt/biotools/lotus2/configs/sdm_PacBio_LSSU.txt -p PacBio -t 64 -amplicon_type SSU -CL cdhit -refDB SLV -taxAligner lambda -useVsearch 1

```

Thanks for your help

PS: small typo in last text block (Alpha diveristy)

```
lotus2 -i $PWD \
> -m ${meta} \
> -o ${outfolder} \
> -tmp $PWD/tmp \
> -s ${sdmopt} \
> -p ${platform} \
> -t ${thr} \
> -amplicon_type SSU \
> -CL cdhit \
> -refDB SLV \
> -taxAligner lambda \
> -useVsearch 1

Using Silva SSU ref seq database.

WARNING:: CD-HIT or Vsearch clustering is recommended for PacBio HiFi reads.

--------------------------------------------------------------------------------

00:00:00 LotuS 2.19

COMMAND

perl /opt/biotools/bin/lotus2 -i /data/analyses/PB_16S_datasets/lotus2

-m /data/analyses/PB_16S_datasets/lotus2/pacbio_sm_s.txt

-o lotus2_pacbio_s -tmp /data/analyses/PB_16S_datasets/lotus2/tmp

-s /opt/biotools/lotus2/configs/sdm_PacBio_LSSU.txt -p PacBio

-t 64 -amplicon_type SSU -CL cdhit -refDB SLV -taxAligner

lambda -useVsearch 1

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:00 Reading mapping file

Sequence files are indicated in mapping file.

Samples will be combined.

Found "SequencingRun" column, with 3 categories (runPB3s

runPB1s runPB2s)

--------------------------------------------------------------------------------

------------ I/O configuration --------------

Input /data/analyses/PB_16S_datasets/lotus2

Output lotus2_pacbio_s

SDM options /opt/biotools/lotus2/configs/sdm_PacBio_LSSU.txt

TempDir /data/analyses/PB_16S_datasets/lotus2/tmp

------------ Configuration LotuS --------------

de novo sequence clustering with CD-HIT into OTU's

Sequencing platform pacbio

Amplicon target bacteria, SSU

Dereplication filter 0

Clustering algorithm CD-HIT into OTU's

Read mapping (non tax) minimap2

OTU nt id 0.97

Precluster read merging No

Ref Chimera checking Yes (DB=/opt/biotools/lotus2/DB/rdp_gold.fa, -chim_skew 2)

deNovo Chimera check Yes

Tax assignment Lambda (-LCA_frac 0.8, -LCA_cover 0.5, ids 97,95,93,91,88,78,0, -useBestBlastHitOnly 0)

ReferenceDatabase SILVA

RefDB location /opt/biotools/lotus2/DB/SLV_138.1_SSU.fasta

OTU phylogeny Yes (mafft, fasttree2)

Unclassified OTU's Kept in matrix

--------------------------------------------

--------------------------------------------------------------------------------

00:00:00 Demultiplexing, filtering, dereplicating input files, this

might take some time..

check progress at lotus2_pacbio_s/LotuSLogS/LotuS_progout.log

00:00:31 Finished primary read processing with sdm:

Reads processed: 904,137

Accepted (High qual): 52 (48,125 end-trimmed)

Accepted (Mid qual): 356

Rejected: 903,729

Dereplication: 52 unique sequences (avg size 1; 52 counts)

0/52 not passing derep conditions (0 counts; 0)

For an extensive report see lotus2_pacbio_s/LotuSLogS//demulti.log

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:31 CD-HIT OTU clustering

Cluster at 97

00:00:31 Finished

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:31 Starting backmapping of

- mid-quality reads

to OTU's using minimap2

00:00:31 Backmapping mid qual reads:

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:31 Extending and merging pairs of OTU Seeds

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:31 De novo chimera filter using vsearch uchime

Total removed OTUs: (0/6)

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:33 Ref chimera filter using vsearch uchime_ref

Total removed OTUs: (0/6)

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:33 Found 0 OTU's using minimap2 (phiX.0: /opt/biotools/lotus2/DB/phiX.fasta)

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:33 Postfilter:

Extended logs active, contaminant and chimeric matrix will be created.

After filtering 6 OTU's (138 reads) remaining in matrix.

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:33 Assigning taxonomy against /opt/biotools/lotus2/DB/SLV_138.1_SSU.fasta

using LAMBDA

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:00:51 Calculating Taxonomic Abundance Tables from SILVA assignments

--------------------------------------------------------------------------------




Calculating higher abundance levels

Adding 5 unclassified OTU's to output matrices

Total reads in matrix: 138

TaxLvl %Assigned_Reads %Assigned_OTUs

Phylum 3 100

Class 3 100

Order 3 100

Family 3 100

Genus 3 100

Species 3 100




WARNING:: Combined samples in lotus run.. attempting merge of metadata in .biom file



--------------------------------------------------------------------------------

00:00:51 Building tree (fasttree) and aligning (mafft) OTUs

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

00:01:04 LotuS2 finished. Output in:

lotus2_pacbio_s

Next steps:

- Phylogeny: OTU phylogentic tree available in lotus2_pacbio_s/OTUphylo.nwk

- .biom: lotus2_pacbio_s/OTU.biom contains biom formatted output

- Alpha diveristy/rarefaction curves: rtk (available as

R package or in bin/rtk)

- LotuSLogS/ contains run statistics (useful for describing

data/amount of reads/quality and citations to programs used

- Tutorial: Visit http://lotus2.earlham.ac.uk for a numerical

ecology tutorial

--------------------------------------------------------------------------------




The following WARNINGS occured:

WARNING:: CD-HIT or Vsearch clustering is recommended for PacBio HiFi reads.

WARNING:: Combined samples in lotus run.. attempting merge of metadata in .biom file

```

Falk Hildebrand

unread,
Feb 14, 2022, 4:10:09 AM2/14/22
to LotuS rRNA pipeline
Hey, I pushed a fix to the github, the warning will no longer appear. The warning was displayed in error, the command looks all good to me. lambda is fine for pacbio, but the performance between lambda, blast, vsearch is in my experience not very big, it's rather a matter of excecution time. RDP-classifier and sintax I would rather expect to really make a difference.
Message has been deleted

Falk Hildebrand

unread,
Feb 15, 2022, 9:05:20 AM2/15/22
to LotuS rRNA pipeline
I think I might have deleted a message here as it needed approval. If I remember correctly it was about how to use RDP, this is very straightforward, just add the flag: " -taxAligner RDP" (RDP is technically not doing an alignment..)
About mixing 16S with 23S in one long amplicon.. honestly I have never tested this nor thought about it so far. I would personally go with a conservative clustering with not too many assumptions (CD-HIT) and do a lambda alignment against SILVA SSU, assumming that the larger part of your OTUs is 16S and not 23S. But this might well be a suboptimal solution.
best, Falk


Reply all
Reply to author
Forward
0 new messages