no chimeras detected with vsearch --uchime_ref --uchime_denovo

675 views
Skip to first unread message

sar...@gmail.com

unread,
Jan 21, 2016, 11:25:58 AM1/21/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi,

I was very happy to learn about the existence of vsearch and I was testing it for chimera detection instead of usearch, however unfortunately I do not get any chimeras detected (with u search however around 20.000 chimeras are detected) - neither in reference nor in denovo mode and I'm not sure what could be the problem (there are no error messages) - this is my command:

vsearch --uchime_ref file.fasta --db ~/db/silva.gold.align --threads 4 --minh 0.2 --mindiv 1.5 --chimeras chimera.out.fasta --nonchimeras non.chimera.out.fasta

I'm using vsearch v1.9.7_linux_x86_64

Reading file rdp_gold.fa 100% 
29007378 nt in 20098 seqs, min 320, max 2210, avg 1443
Masking 100%
Counting unique k-mers 100% 
Creating index of unique k-mers 100% 
Detecting chimeras 100% 
Found 0 (0.0%) chimeras, 213043 (100.0%) non-chimeras,
and 0 (0.0%) suspicious candidates in 213043 sequences.

and this is the output with the same data set in usearch

00:00 2.0Mb Reading file.fasta
00:00  70Mb 213043 (213.0k) seqs, min 248, avg 253, max 255nt
00:00  73Mb Reading rdp_gold.fa, 30Mb
00:00 103Mb 20098 (20.1k) seqs, min 320, avg 1443, max 2210nt
00:01 103Mb  100.0% Masking
00:02 104Mb  100.0% Word stats
00:03 214Mb  100.0% Build index
03:46 583Mb  100.0% Search 23950/213043 chimeras found (11.2%)
03:46 583Mb  100.0% Writing 23950 chimeras                   
03:47 583Mb  100.0% Writing 189093 non-chimeras

to make sure i also tried with one of the test data sets:

vsearch --uchime_ref AF091148.fsa --db ~/db/Rfam_11_0.repr.fasta --chimeras chimtest.out
vsearch v1.9.7_linux_x86_64
https://github.com/torognes/vsearch

Reading file /home/shauzmayer/db/Rfam_11_0.repr.fasta 100% 
253269 nt in 2208 seqs, min 19, max 1800, avg 115
Masking 100%
Counting unique k-mers 100% 
Creating index of unique k-mers 100% 
Detecting chimeras 100% 
Found 0 (0.0%) chimeras, 1403 (100.0%) non-chimeras,
and 0 (0.0%) suspicious candidates in 1403 sequences.


It would be great if you had any suggestions what could have gone wrong here - I feel a bit lost without any indicators of what could possibly be the cause

Any help would be appreciated!
thx,
Sandra

Torbjørn Rognes

unread,
Jan 22, 2016, 4:26:53 AM1/22/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi Sandra

Thanks for your interest in vsearch and for your error report. I am sorry for your trouble with vsearch, but hope to solve it soon. I am looking into it now.

Would it be possible for you to provide me with a few of the sequences in your file "file.fasta" which are reported as chimeras by usearch? This could help identify the problem.

Best wishes,

- Torbjørn

sar...@gmail.com

unread,
Jan 25, 2016, 9:34:05 AM1/25/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi Torbjørn,

thanks for your quick reply i appreciate the help!

The file i was testing this with is a publicly available mock community data set (ERR348713) so no problem - please find some sequences which were reported by usearch as chimeras pasted below

also i just noticed that i specified the wrong database file above - i used rdp_gold.fa not silva.gold.align (which is an alignment file and does not work)

Thanks again!
Sandra

>ERR348713.2
TACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTCTTTAAGTCTGAAGTTAAAGGCA
GTGGCTTAACCATTGTACGCTTTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTGA
AATGCGTAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGG
GGAGCAAACAGG
>ERR348713.3
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCC
CCGGCTTAACCGGGGAGGGTCATTGGAAACTGGAAGACTGGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTG
AAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTG
GGGAGCAAACAGG
>ERR348713.17
TACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAAGCCC
ACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTG
AAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTG
GGTAGCGAACAGG
>ERR348713.19
TACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGGCTAAAGCGTGCGTAGGCGGCTTATTAAGTCGGATGTGAAATCCC
CGAGCTTAACTTGGGAATTGCATTCGATACTGGTGAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTG
AAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCTCCAGGGACAACACTGACGTTCATGCCCGAAAGCGTG
GGTAGCAAACAGG
>ERR348713.24
TACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGGGAGCGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCC
CCGGCTTAACCGGGGAGGGTCATTGGAAACTGGAAGACTGGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTG
AAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTG
GGGAGCAAACAGG

Torbjørn Rognes

unread,
Jan 25, 2016, 11:01:52 AM1/25/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi Sandra

Thanks for providing the sequences.

I have identified the problem. The bug was introduced in version 1.9.7 and caused database sequences in lower case to be masked. Since all the sequences in rdp_gold.fa are lower case no matches are found.

Thanks for reporting the bug. I'll fix the bug and release a new version very soon.

- Torbjørn

Torbjørn Rognes

unread,
Jan 25, 2016, 2:34:30 PM1/25/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
The problem should be fixed now in the new VSEARCH version 1.9.10 just released.

- Torbjørn

sar...@gmail.com

unread,
Feb 16, 2016, 1:07:33 PM2/16/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi Torbjørn,

I just noticed that actually for the de novo chimera search the problem is still persisting - i did not check this out earlier since i was mostly interested in the reference mode but i just thought I give you a heads up since i could imagine maybe you would like to know this
I was trying this now with the new version 1.10 now btw and using the same data set as above - while the ref mode now detects 27017 chimeras for the de novo mode: still zero

Cheers,
Sandra

-----------------------------------
this is the output:

Reading file xx/ERR348713_Ill_16S_single_mock.fasta 100%
53888560 nt in 213043 seqs, min 248, max 255, avg 253
Masking 100%
Sorting by abundance 100%
Counting unique k-mers 100%

Detecting chimeras 100%
Found 0 (0.0%) chimeras, 213043 (100.0%) non-chimeras,
and 0 (0.0%) borderline sequences in 213043 unique sequences.
Taking abundance information into account, this corresponds to

0 (0.0%) chimeras, 213043 (100.0%) non-chimeras,
and 0 (0.0%) borderline sequences in 213043 total sequences.
vsearch v1.10.0_linux_x86_64, 15.6GB RAM, 8 cores
https://github.com/torognes/vsearch

Torbjørn Rognes

unread,
Feb 22, 2016, 5:49:48 AM2/22/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi

Thanks for reporting that the problem persists. I have reopened the issue on github and will look into it shortly.

- Torbjørn

Torbjørn Rognes

unread,
Feb 23, 2016, 9:40:57 AM2/23/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi Sandra

I've looked into the case again. I've downloaded the EBB348713 file from EBI here: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR348/ERR348713/ERR348713.fastq.gz

It appears like you have not performed dereplication of the dataset. Please run "vsearch --derep_fulllength ERR348713.fasta --sizeout --output ERR348713.derep.fasta" or a similar command before you run de novo chimera detection.

De novo chimera detection depends on finding parent sequences of potential chimeras where the abundance of the parent sequences are at least twice that of the chimera. This ratio may be adjusted with the --abskew option, but currently cannot be 1 or lower. By default it is 2.

- Torbjørn

Torbjørn Rognes

unread,
Feb 23, 2016, 9:46:34 AM2/23/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Here are my results:

Dereplication:

$ vsearch --derep_full ERR348713.fasta --output ERR348713.derep.fasta --sizeout

vsearch v1.10.0_osx_x86_64, 16.0GB RAM, 8 cores

https://github.com/torognes/vsearch


Reading file ERR348713.fasta 100%  

53888560 nt in 213043 seqs, min 248, max 255, avg 253

Dereplicating 100%  

Sorting 100%

21496 unique sequences, avg cluster 9.9, median 1, max 22479

Writing output file 100%  



Chimera detection:


$ vsearch --uchime_denovo ERR348713.derep.fasta --chimeras chimera.out.fasta --nonchimeras non.chimera.out.fasta

vsearch v1.10.0_osx_x86_64, 16.0GB RAM, 8 cores

https://github.com/torognes/vsearch


Reading file ERR348713.derep.fasta 100%  

5434813 nt in 21496 seqs, min 248, max 255, avg 253

Masking 100%  

Sorting by abundance 100%

Counting unique k-mers 100%  

Detecting chimeras 100%  

Found 8385 (39.0%) chimeras, 12887 (60.0%) non-chimeras,

and 224 (1.0%) borderline sequences in 21496 unique sequences.

Taking abundance information into account, this corresponds to

26953 (12.7%) chimeras, 184839 (86.8%) non-chimeras,

and 1251 (0.6%) borderline sequences in 213043 total sequences.



sar...@gmail.com

unread,
Feb 24, 2016, 6:01:01 AM2/24/16
to VSEARCH Forum, sandra.h...@fh-campuswien.ac.at
Hi Torbjørn,
sorry i missed out the dereplication prerequisite...you are absolutely right - i m getting identical results now!
Nevertheless - thanks for checking this out again!
Best regards,
Sandra

Reply all
Reply to author
Forward
0 new messages