Pipeline for 16s OTU analysis

831 views
Skip to first unread message

david martin

unread,
Mar 8, 2016, 4:09:26 AM3/8/16
to VSEARCH Forum
Hi,
I´m trying to build a pipeline to detect 16s OTU. My starting point is a fasta file with all 16s amplicons. I assume they have been all cleaned, chimeras removed and are ready for OTU clustering.


Here are the main three steps i´m running. I would appreciate your comments relative to the correct way of using vsearch for OTU clustering.


# Dereplicate and discard singletons                                                                                                                          
vsearch --derep_fulllength ALL_amplicons.fa --output derep.fa  --log=log --minuniquesize 2  --relabel OTU --fasta_width 0 --sizeout

# Sort by size                                                                                                                                                
vsearch --sortbysize derep.fa -output derep_sorted.fa

# Align my 16s sequences against a database of 16s amplicons with an ID of 99%
vsearch --usearch_global derep_sorted.fa --threads 16 --dbmask none --qmask none --rowlen 0 --notrunclabels --userfields query+id1+target --maxaccepts 0 --ma\
xrejects 1 --top_hits_only --output_no_hits --db 97_otusdb.fasta --id 0.99 -uc final.txt --userout assignments.txt


Then build my OTU matrix...

Is this one possible correct way ?

Frédéric Mahé

unread,
Mar 11, 2016, 10:05:29 AM3/11/16
to VSEARCH Forum
Hi David,

your pipeline is correct.

Regarding your taxonomic assignment against a database of 16S amplicons, you need to make sure your reference amplicons where trimmed using the same primers you used for your environmental amplicons. In other words, when using global pairwise alignment (--usearch_global), query and subject sequences have to be homologous (i.e. cover exactly the same genomic region, from primer F to primer R).

Best,

david martin

unread,
Mar 11, 2016, 11:19:39 AM3/11/16
to VSEARCH Forum
Thanks Frédéric. Sounds good.

So if my primers were for example mapping the V3V4 regions than i should extract first the subregions from my 16S database ?? How would you do that ? thinking of mappers such as bwa or bowtie ??


On the other side i have also been playing with a second option that is swarm and end up doing the same thing differently

For instance by running the same pipeline but replacing the global_alignment search by swarm worked very well:
swarm -f -t 4 -d 1 -z  #swarm -f -t 4 -d 1 -z  $outdir/derep_sorted.fa -s amplicons.stats -u ucfinal.txt -w OTUs.fa > amplicons.swarms
 -s $outdir/amplicons.stats -u $outdir/ucfinal.txt -w $outdir/OTUs.fa -l $outdir/logswarm > $outdir/amplicons.s
warms

Then i have use blast to mapp back OTUs to 16s database. My question is that should i replace blast with the vsearch global_alignment or should i consider swarm an alternative for OTU clustering followed by blast to 16s databases and doing LCA afterwards or RDP....



It looks confusing to me if swarm is a better alternative in my case ?


david martin

unread,
Mar 11, 2016, 11:21:19 AM3/11/16
to VSEARCH Forum
Sorry i meant (problem with copy and paste)

swarm -f -t 4 -d 1 -z  derep_sorted.fa -s amplicons.stats -u ucfinal.txt -w OTUs.fa > swarms

Frédéric Mahé

unread,
Mar 19, 2016, 7:26:45 AM3/19/16
to VSEARCH Forum

Hi David,

To extract the V3V4 region from your reference sequences, you can use cutadapt to trim before and after your primers.

Now, regarding the rest of your question, here is my own pipeline: (1) use swarm to clusterize my amplicons (i.e. OTU picking); (2) for each OTU representative, use vsearch to find the best match with a reference sequence (like a blast); (3) build an OTU table.

I prefer to use vsearch (global pairwise alignment) instead of blast (local pairwise alignment).

A more detailed version of my pipeline with all necessary code is available here: https://github.com/frederic-mahe/swarm/wiki/Fred's-metabarcoding-pipeline

david martin

unread,
Mar 19, 2016, 4:27:16 PM3/19/16
to VSEARCH Forum
Thanks for the update Frederic.

I actually use RDP for the classification but that is just a matter of taste. One more thing, i have also tried to build a tree from the output vsearch alignment but the file is not really compatible with fasttree.

fasttree -nt < vsearch.aln > mytree

Did you manage to get a phylogenetic tree from the alignement output of vsearch ? 

Frédéric Mahé

unread,
Mar 20, 2016, 1:12:27 PM3/20/16
to VSEARCH Forum
Hi David,

vsearch performs pairwise alignments and fastree expects a MSA (multiple sequence alignment). You can use mafft, muscle or T-coffee to compute MSAs.

david martin

unread,
Mar 20, 2016, 3:00:34 PM3/20/16
to VSEARCH Forum
Oopps,
Sorry for the confusion. Too overloaded with many things.

Thanks, 
david

Reply all
Reply to author
Forward
0 new messages