closed reference approach in vsearch

599 views
Skip to first unread message

anna....@gmail.com

unread,
Nov 21, 2016, 7:32:58 AM11/21/16
to VSEARCH Forum
Hi everyone,

I was wondering which are the steps to perform a closed reference OTU picking approach against Greengenes database. I want to obtain a closed reference OTU table to use it in PICRUST. 
1) Can I cluster against a reference before chimera checking? 
2) Which file do I have to use to make vsearch −−uchime_ref to make chimera checking? they usually recommended to use core_set_aligned.fasta.imputed, but it's an aligned fasta file so it gives errors because of the format...

I am really confused about the workflow to obtain a closed reference otu table using vsearch... If anyone can give me any clues, I would be so happy!

Thank you so much.
Best regards,

Anna

Colin Brislawn

unread,
Jan 20, 2017, 12:35:19 PM1/20/17
to VSEARCH Forum
Hello Anna,

which are the steps to perform a closed reference OTU picking approach against Greengenes database

Great question! The terms 'open-ref' and 'closed-ref' were created by the qiime team to describe their OTU picking workflow scripts that attempt to combine the speed of reference based clustering with the inclusiveness of de novo OTU clustering. When they say 'closed reference OTU picking' they just mean 'match all reads to a database.'

Here is how I perform the equivalent of closed-ref OTU picking, using vsearch. 
vsearch -usearch_global $reads -db $ref \
-strand plus -id .97 -threads $threads \
-uc closedref.97.map.uc --biomout closedref.97.biom

Here, $reads is my seqs.fna file with usearch sample labels.
$ref is the greengenes database (the newest version, which matches the precomputed PICRUSt database). 
$threads is for multiple threads to be run on the CPU


Let me know if that helps!
Colin

Colin Brislawn

unread,
Jan 20, 2017, 12:45:49 PM1/20/17
to VSEARCH Forum
Oh, I forgot to mention chimera checking.

1) Can I cluster against a reference before chimera checking?
I think the consensus is Yes. If you are using 'closed-ref' OTU picking, all OTUs found will be those already in the database and the database is presumed to be chimera free (we hope!). In the standard picrust workflow (which is always closed-ref), no chimera checking is used.
If you are not using closed-ref OTU picking, the order in which to search for chimeras is contested: https://github.com/torognes/swarm/wiki/Frequently-Asked-Questions#when-clustering-with-swarm-when-is-an-appropriate-time-to-check-for-chimeras 

2) Which file do I have to use to make vsearch −−uchime_ref to make chimera checking?
The uchime author argues to use a large reference database: 

they usually recommended to use core_set_aligned.fasta.imputed, but it's an aligned fasta file so it gives errors because of the format...
Neither usearch nor vsearch will process aligned reads. Perhaps you could try an unaligned version of that database. 

Colin

fra.p...@gmail.com

unread,
Jan 17, 2018, 5:19:38 AM1/17/18
to VSEARCH Forum
Hi Colin,
you already helped me a lot discussing this thred.
Now my boss asked me to use PICRUSt, so I need to perform a closed reference otu picking.
The pipeline I use is:

> #3 vsearch --derep_full seqs.fna --output derep.fna --log=log --sizeout --minuniquesize 2

> #4 vsearch -cluster_fast derep.fna -id 0.97 --sizein --sizeout --relabel OTU_  --centroids otus.fna --log=log1

> #5 vsearch --uchime_denovo otus.fna --nonchimeras otus_checked.fna --sizein --xsize --log=log2

> #6 vsearch -usearch_global joined/split/seqs.fna -db otus_checked.fna -strand plus -id 0.97 -uc otu_table_mapping.uc

> #7 python '/home/ngs/vsearch-2.4.3/Scripts vari/uc2otutab.py' otu_table_mapping.uc > tabfile.tsv

> #8 biom convert --table-type="OTU table" -i tabfile.tsv -o otu_table.biom --to-json

How should I modify it for the closed reference OTU picking?

Thank you in advance for your help.

Francesco

Colin Brislawn

unread,
Jan 18, 2018, 6:42:00 PM1/18/18
to VSEARCH Forum
Hello Francesco,

Closed-ref OTU picking is super easy. You just start at step 6, using your closed-ref OTU database (greengenes) as -db. 

Maybe like this:
> #6 vsearch -usearch_global joined/split/seqs.fna -db gg_13_8_otus/rep_set/97_otus.fasta -strand plus -id 0.97 -uc closed-ref.uc
> #7 python '/home/ngs/vsearch-2.4.3/Scripts vari/uc2otutab.py' closed-ref.uc > tabfile.tsv
> #8 biom convert --table-type="OTU table" -i tabfile.tsv -o closed-ref.biom --to-json


Also, you don't have to do taxonomy assignment. (Greengenes already comes with taxonomy, so you just use that with your .biom table.)

Colin

fra.p...@gmail.com

unread,
Jan 22, 2018, 3:59:11 AM1/22/18
to VSEARCH Forum
Hi Colin,
again thank you very much, everything went great!!
I have one question, if you can help me.

Using the same dataset, with the de novo OTU picking and with chimera checking I got 836 OTUs,
while with the closed-ref metod without chimera checking I got 1522 OTUs.

Could you help me understand?

Francesco

fra.p...@gmail.com

unread,
Jan 26, 2018, 9:11:21 AM1/26/18
to VSEARCH Forum
Hi Colin,
I have another question.
I need the rep_tre.tre for my beta diversity analysis.
In my de novo pipeline I do this:

#12 align_seqs.py -i otus_checked.fna -o pynast_aligned/
#13 filter_alignment.py -i pynast_aligned_closed-ref/otus_checked_aligned.fasta -o pynast_aligned/
#14 make_phylogeny.py -i pynast_aligned/otus_checked_aligned_pfiltered.fasta -o rep_tre.tre

In the closed-ref pipline I don't have the otus_checked.fna file since I don't check for chimeras as you said before.
In this case, wich is the file to align? joined/split/seqs.fna?

Thanks again for your help.

Francesco


Colin Brislawn

unread,
Jan 27, 2018, 11:00:02 AM1/27/18
to VSEARCH Forum
Hello Francesco,

Using the same dataset, with the de novo OTU picking and with chimera checking I got 836 OTUs,
while with the closed-ref metod without chimera checking I got 1522 OTUs.

Could you help me understand?

Why are the numbers different? It could be the chimer checking part, but I'm guessing it has to do with de novo vs closed-ref OTU picking. The de novo OTU picking method used in uclust / usearch / vsearch all try to explain the most number of sequences using the smallest number of OTUs. The closed-reference OTU picking is not really 'picking' any OTUs; all the OTUs are in the database and this method is just matching your reads to existing OTUs.


Now on to tree building!
I need the rep_tre.tre for my beta diversity analysis.
In my de novo pipeline I do this:
#12 align_seqs.py -i otus_checked.fna -o pynast_aligned/
#13 filter_alignment.py -i pynast_aligned_closed-ref/otus_checked_aligned.fasta -o pynast_aligned/
#14 make_phylogeny.py -i pynast_aligned/otus_checked_aligned_pfiltered.fasta -o rep_tre.tre
In the closed-ref pipline I don't have the otus_checked.fna file since I don't check for chimeras as you said before.
In this case, wich is the file to align? joined/split/seqs.fna?
Nope! You just use the .tre file that comes with greengenes. It's already inside the greengenes database folder, inside the one called trees.  

Colin

Reply all
Reply to author
Forward
0 new messages