Chimera Checking Illumina MiSeq Data

Sara Dunaj

unread,

Jan 15, 2017, 11:13:07 PM1/15/17

to qiime...@googlegroups.com

Hi,

I have run my pre-processed data through the following pick OTU script with USearch61: pick_open_reference_otus.py -m usearch61 -o Full_Reads_otus_USEARCH61/ -i seqs.fna


It looks like from the description of the pick_open_reference_otus.py that there is not a chimera checking and removal step in this workflow. Is this correct?

Therefore - I think will need to run the identify chimera script but it asks for a aligned reference database (-a, --aligned_reference_seqs_fp) that was used to build the input sequences. Where do I find this reference database? Lastly, Do I use the rep_set_aligned.fasta or rep_set_aligned_pfiltered.fasta as my input file?

Also what is the reference database used by the pick_open_reference_otus.py?

Thank you,

Sara

Stefan Janssen

unread,

Jan 16, 2017, 12:59:04 PM1/16/17

to Qiime 1 Forum

Hi Sara,
I reached out to my co-workers with your problem. Stay tuned.

zech xu

unread,

Jan 17, 2017, 12:53:20 PM1/17/17

to Qiime 1 Forum

Hi Sara,

"pick_open_reference_otus.py" does NOT do chimera checking. "-a" option only provides a reference sequence collection for the first step of open reference otu picking strategy.

Sara Dunaj

unread,

Jan 19, 2017, 7:50:08 AM1/19/17

to Qiime 1 Forum

Hi,

I understand that it doesn't do chimera checking. I need assistance with running Chimera Slayer- which input files do I need for this mode of the identify chimera script and where do I find them in the open reference otu output files If chimera slayer assistance isn't available can you provide me with the location / path in QIIME to the green Genes reference set to use with USearch?

Stefan Janssen

unread,

Jan 19, 2017, 10:43:31 AM1/19/17

to Qiime 1 Forum

run the script print_qiime_config.py to get information about those paths.

try identify_chimeric_seqs.py prior to otu picking to get rid of chimeric reads.

Sara Dunaj

unread,

Jan 20, 2017, 6:32:41 PM1/20/17

to qiime...@googlegroups.com

Hi,

I have been trying to use the identify chimera script to remove chimeras. Chimera Slayer is not working - the txt file was empty after running it- I was reaching out to confirm if I was using the correct input files (-i rep_set_aligned.fasta (pynast aligned folder) and -a rep_set.fna)?

I also just tried running it prior to clustering with USearch like you recommended. I was able to get it to work without a reference set but when I tried to run it with a reference set it did not work- fatal error due to not having enough memory (see below):

$ identify_chimeric_seqs.py -i seqs.fna -m usearch61 -o usearch_checked_chimeras_Silva99ref/ -r /Users/Sara_Jeanne/Desktop/QIIME/SILVA123_QIIME_release/rep_set/rep_set_all/99/99_otus.fasta
Traceback (most recent call last):
File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 354, in <module>
    main()
File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 350, in main
    threads=threads)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 774, in usearch61_chimera_check
    log_lines, verbose, threads)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 961, in identify_chimeras_usearch61
    HALT_EXEC=HALT_EXEC)
File "/macqiime/anaconda/lib/python2.7/site-packages/bfillings/usearch.py", line 2411, in usearch61_chimera_check_ref
    app_result = app()
File "/macqiime/anaconda/lib/python2.7/site-packages/burrito/util.py", line 285, in __call__
    'StdErr:\n%s\n' % open(errfile).read())
burrito.util.ApplicationError: Unacceptable application exit status: 1
Command:
cd "/Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/"; usearch61 --mindiffs 3 --uchime_ref "/Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_consensus_fixed.fasta" --minh 0.28 --xn 8.0 --minseqlength 64 --threads 0.5 --mindiv 0.8 --uchimeout "/Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_chimeras_ref.uchime" --dn 1.4 --strand plus --db "/Users/Sara_Jeanne/Desktop/QIIME/SILVA123_QIIME_release/rep_set/rep_set_all/99/99_otus.fasta" --log "/Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_chimeras_ref.log" > "/tmp/tmp1ayFh13eW2nmNUfGRkAR.txt" 2> "/tmp/tmpHaRsZdDNwDW9wE96bAYa.txt"
StdOut:
usearch_i86osx32 v6.1.544, 4.0Gb RAM (8.6Gb total), 2 cores
(C) Copyright 2010-12 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch

StdErr:
00:00 966.7kb Reading /Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_consensus_fixed.fasta, 10Mb
00:00 11Mb 28993 (29.0k) seqs, min 184, avg 309, max 493nt
00:01 11Mb Reading /Users/Sara_Jeanne/Desktop/QIIME/SILVA123_QIIME_release/rep_set/rep_set_all/99/99_otus.fasta, 781Mb
00:39 782Mb 537233 (537.2k) seqs, min 900, avg 1436, max 3674nt
01:23 788Mb 100.0% Masking
02:00 789Mb 100.0% Word stats
usearch61(59351,0xa10771d4) malloc: *** mach_vm_map(size=8388608) failed (error code=3)
*** error: can't allocate region securely
*** set a breakpoint in malloc_error_break to debug

Out of memory mymalloc(10952), curr 1.04e+09 bytes

usearch61 --mindiffs 3 --uchime_ref /Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_consensus_fixed.fasta --minh 0.28 --xn 8.0 --minseqlength 64 --threads 0.5 --mindiv 0.8 --uchimeout /Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_chimeras_ref.uchime --dn 1.4 --strand plus --db /Users/Sara_Jeanne/Desktop/QIIME/SILVA123_QIIME_release/rep_set/rep_set_all/99/99_otus.fasta --log /Users/Sara_Jeanne/Desktop/QIIME/Split_Lib_FullReads_attachedBC_q20/usearch_checked_chimeras_Silva99ref/seqs.fna_chimeras_ref.log

---Fatal error---
Out of memory, mymalloc(10952), curr 1.04e+09 bytes

Is there any way to work around this so I can do this on my local machine? I was able to get it to work with the gg set but I prefer to use a more updated reference set.

thank you,

Sara

TonyWalters

unread,

Jan 21, 2017, 1:47:27 AM1/21/17

to Qiime 1 Forum

Hello Sara,

It looks like you've hit the limit of memory allocation for the 32-bit version of usearch61. Your options are to get the 64 bit (not free) version of usearch61, or, try using vsearch (https://github.com/torognes/vsearch) instead. For vsearch, you would do these steps:

1. Find your current usearch61 executable file (i.e., which usearch61), rename this file.

2. Download vsearch, rename the file downloaded to usearch61

3. Put the renamed usearch61 file in the place where the original usearch61 file was found.

4. Make sure it's responding correctly, i.e., you get the vsearch version when you type: usearch61 --version

-Tony

Sara Dunaj

unread,

Jan 21, 2017, 11:26:34 AM1/21/17

to Qiime 1 Forum

Hi Tony,

Thank you very much for your help with this. I was able to get usearch to work with a smaller / QIIME compatible version of the Silva database. I would also like to try vsearch on the most recent release of Silva's database though.

one last question regarding USearch / UChime's output chimera file. Does this chimera output file have all the sequences detected with both the De novo and reference based chimera checking methods or does it only contain the chimeric sequences found across both methods?

Thank you!

Sara

TonyWalters

unread,

Jan 21, 2017, 12:20:36 PM1/21/17

to Qiime 1 Forum

Hello Sara,

By default, the sequences have to be flagged with both to be labeled as chimeras. You can modulate this behavior with the --non_chimeras_retention option. If you want to see where sequences are specifically detected in each, you'd have to dig through the intermediate de_novo and reference chimera files.

-Tony

Sara Dunaj

unread,

Jan 21, 2017, 8:18:23 PM1/21/17

to Qiime 1 Forum

Hi Tony,

Thank you. That is great to know.

Sara

Reply all

Reply to author

Forward