Vsearch Chimera Checking

152 views
Skip to first unread message

Kevin Zhang

unread,
Aug 23, 2016, 3:12:37 PM8/23/16
to Qiime 1 Forum
Hello,

As per the rec of some people on the forum, I have switched to Vsearch for chimera checking. In my old pipeline, I used to do chimera checking after sequence alignment. But from the forums, I have seen there are two acceptable places for Vsearch chimera checking (before/after picking OTUs) I am currently testing Chimeras before picking OTUs. 

In my old ChimeraSlayer step, I supplied an aligned template file, is there a way of doing that in Vsearch at the stage I'm doing it at? If not, I would want to move the chimera checking step to where I can utilize a reference aligned template. Thank you in advanced!

Colin Brislawn

unread,
Aug 23, 2016, 5:52:25 PM8/23/16
to Qiime 1 Forum
Hello Kevin,

Vsearch implements both --uchime_ref and --uchime_denovo. Both of these commands could be run before or after OTU picking, but you will need size=x; annotation on your reads for --uchime_denovo. 

Because --uchime_ref does not require size annotations, you could easily check for chimeras before of after clustering. Maybe like this:

seqs.fna # Your demultiplexed reads.
vsearch --derepfulllength seqs.fna --output seqs.derep.fna --sizeout
vsearch --uchime_ref seqs.derep.fna --db rdp_gold.fna --nonchimeras seqs.derep.uchime_denovo.fna 
vsearch --cluster_fast seqs.derep.uchime_denovo.fna --id .97 --centroids OTUs.fna


Does that help answer your question?
Colin

Kevin Zhang

unread,
Sep 1, 2016, 3:11:49 PM9/1/16
to Qiime 1 Forum
Yes, that is exactly what I needed. Thank you very much.

Colin Brislawn

unread,
Sep 1, 2016, 6:34:27 PM9/1/16
to Qiime 1 Forum
Great! Let me know if you have any other questions.
Colin

Kevin Zhang

unread,
Sep 6, 2016, 3:42:33 PM9/6/16
to Qiime 1 Forum
Does it make experimental sense to use both types of Chimera detection? 
Another silly question, with the code: 
vsearch -uchime_denovo combined.test.derep.fna -sizein -sizeout -strand plus -nonchimeras seqs.derep.fna -chimeras seqs.denovo_chimeras.fna --log seqs.derep.checked_denovo.log
since the chimeras are found in seqs.denovo_chimeras.fna does that mean they are removed from the original combined.test.derep.fna? 


Thanks in advance. 

Colin Brislawn

unread,
Sep 6, 2016, 3:49:00 PM9/6/16
to Qiime 1 Forum
Hi Kevin,


Does it make experimental sense to use both types of Chimera detection?
Yeah, it could make sense. Robert Edgar recommends using both denovo and referance chimera checking on these pages (note, he uses his undocumented 'uparse-ref' algorythm for denovo checking intead of --uchime_denovo): 


Another silly question, with the code:
vsearch -uchime_denovo combined.test.derep.fna -sizein -sizeout -strand plus -nonchimeras seqs.derep.fna -chimeras seqs.denovo_chimeras.fna --log seqs.derep.checked_denovo.log
since the chimeras are found in seqs.denovo_chimeras.fna does that mean they are removed from the original combined.test.derep.fna?
Not a silly question at all! 
This script does not change the input in any way. Instead, the reads that are not flagged as chimeric are placed into the new file specific the the --nonchimeras flag. 

Let me know if you have more great questions,
Colin

Kevin Zhang

unread,
Sep 12, 2016, 10:56:08 AM9/12/16
to Qiime 1 Forum
Thank you so much for all of your help. Another question, since I want to incorporate both types of chimera detection (de novo and ref based) does the order in which I do them matter? 
For example ref based check, then taking the ones that are not flagged and using that as an input for de novo vs de novo chimera check then feeding the non chimeric reads into a ref based check. 

Kevin Zhang

unread,
Sep 12, 2016, 12:01:11 PM9/12/16
to Qiime 1 Forum
Also, the way my pipeline is set up right now is, I am using vsearch for chimera then going back to qiime for OTU picking. 

In one of your other posts I saw that i need to include a -xsize option since the size annotation can mess up other scripts, but do I do that in the same line as -uchime_denovo? 

Colin Brislawn

unread,
Sep 12, 2016, 12:41:39 PM9/12/16
to Qiime 1 Forum
Hello Kevin,

That's a very perceptive question. Yes, the order does matter here!

Because uchime de novo uses other amplicons in the data set as the potential 'parents' of chimeric 'children,' I run --uchime_denovo followed by --uchime_ref.
For example ref based check, then taking the ones that are not flagged and using that as an input for de novo vs de novo chimera check then feeding the non chimeric reads into a ref based check. 
 Exactly. Well said. 

In one of your other posts I saw that i need to include a -xsize option since the size annotation can mess up other scripts, but do I do that in the same line as -uchime_denovo? 
You can pass --xsize as an optional flag to vsearch, and it will remove sizes from the output file. So maybe like this:
vsearch --uchime_ref seqs.derep.denovochecked.fna --db rdp.fna --nonchimeric seqs.derep.denovochecked.refchecked.fna --xsize



Because these are move vsearch focused questions, you could also ask the vsearch devs directly over in the vsearch forum. They are probably more helpeful. 

It sould like you are using vsearch to construct OTUs, so I totally have to recommend "Fred's metabarcoding pipeline" written by the excelent dev of vsearch. 

Keep in touch,
Colin

leah reshef

unread,
Sep 13, 2016, 3:18:00 PM9/13/16
to Qiime 1 Forum
Hi Colin
Just a followup question on this thread - are you running  uchime_denovo on the entire dataset, containing sequences from  all samples at once? It seeme to me reasonable to run it per sample, as both parents of a chimeric sequence should be detected in the same sample. But I cant see an easy way of doing that in vsearch. It should be something like dereplicating each sample seperatly, running uchime_denovo on each, taking the non-chimeric seqs only,de-dereplicating them , than joining them to a single file which can then be dereplicated and clustered. Whew! Or can uchime_denovo do all that automatically? In another thread in this forum , I found a reference to such an option (--split_by_sampleid) if calling usearch through the identify_chimeric_seqs.py qiime script; but I don't see any such option in the vsearch manual.
Thanks
Leah

Colin Brislawn

unread,
Sep 13, 2016, 4:17:33 PM9/13/16
to Qiime 1 Forum
Hey Leah,

It seeme to me reasonable to run it per sample, as both parents of a chimeric sequence should be detected in the same sample. 
Yes, that does make sense, especially as chimeras are formed during PCR and PCR happens in separate wells for separate samples. However, I perform chimera checking on the full data set, for the reasons outlined in these two blog posts:

Some folks like to check for chimeras at the sample level (before global dereplication). I think this is what the flag --split_by_sampleid does.
The data2 algorithm implements something similar, called isBimeraDenovoTable(). https://github.com/benjjneb/dada2/blob/master/NEWS#L6 
The deblur algorithm also impliments per-sample chimera checking. https://github.com/biocore/deblur/blob/master/images/deblur_workflow.pdf 

In the end, the choice is up to you!

Colin

leah reshef

unread,
Sep 14, 2016, 11:47:49 AM9/14/16
to Qiime 1 Forum
Right, thanks for all the expalnations (on this and numerous other threads :-)  )

Colin Brislawn

unread,
Sep 14, 2016, 1:16:07 PM9/14/16
to Qiime 1 Forum
:-)
Reply all
Reply to author
Forward
0 new messages