ChimeraSlayer resulting file is empty

Alexis Walker

unread,

Apr 26, 2016, 4:24:44 PM4/26/16

to Qiime 1 Forum

Hello,

I am attempting to check for chimeric sequences but continue to get a resulting blank file after running the script. I have run into this problem before and thought I had figured it out. These are the scripts I have attempted to run.

parallel_identify_chimeric_seqs.py -O 6 -m ChimeraSlayer -i /Users/alexiswalker/Desktop/16S/Beaufort/Silva_run/pynast_aligned_seqs/rep_set_aligned.fasta -a /Users/alexiswalker/Desktop/18S/Silva119_release_aligned_rep_files/97_16S_only/Silva_119_rep_set97_aligned_16S_only.fna -o /Users/alexiswalker/Desktop/16S/Beaufort/Silva_run/pynast_aligned_seqs/16S_97_beau_chim.txt

# I tried to set a PATH to chimera slayer as well:

export PATH=$PATH:/Users/alexiswalker/MacQIIME_1.9.0-20150227_OS10.6/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer

#Lastly I tried to move the -i and -a files to the chimeraslayer directory which also didn't work.

Any help on this would be very much appreciated!

Thanks,

Alexis

TonyWalters

unread,

Apr 26, 2016, 4:38:46 PM4/26/16

to Qiime 1 Forum

Alexis,

Does this run for a while and stop, producing an empty file? What happens when you run the non-parallel version?

Can you run

print_qiime_config.py -tf

Finally, can you download the attached file, and download this file:

http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed

and put them all in one directory and run this command:

identify_chimeric_seqs.py -m ChimeraSlayer -i 3seqs_1chimera.fasta -a core_set_aligned.fasta.imputed -o test_chimeric_seqs.txt

If this does detect a chimera, then it indicates that the ChimeraSlayer software is working, it just isn't detecting chimeras in your dataset for some reason. You can alternatively run usearch61 (or vsearch, which can be downloaded and renamed to usearch61, https://github.com/torognes/vsearch), but it needs to be run before OTU picking, see: http://qiime.org/tutorials/chimera_checking.html#usearch-6-1

3seqs_1chimera.fasta

Alexis Walker

unread,

Apr 26, 2016, 6:50:06 PM4/26/16

to Qiime 1 Forum

Hi Tony,

The non-parallel version also doesn't work. I went a head and ran print_qiime_config.py -tf and there looks to be 3 fails and 1 error which are possibly the reasons for my issues with chimera checking. What are your recommendations for fixing these issues?

Thanks you so much for the help!

Alexis

AW_print_config.txt

TonyWalters

unread,

Apr 26, 2016, 7:14:45 PM4/26/16

to Qiime 1 Forum

BLAST doesn't appear to be working, and you need that for ChimeraSlayer.

You can follow the instructions for the installation on MacQIIME, which is how I'm assuming you are running QIIME based upon your print_qiime_config.py result: http://www.wernerlab.org/software/macqiime/macqiime-installation/installing-blast-in-os-x

But the current link to the BLAST files isn't working, so use this link instead: http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/executables/release/2.2.22/blast-2.2.22-universal-macosx.tar.gz

Alexis Walker

unread,

Apr 26, 2016, 9:52:10 PM4/26/16

to Qiime 1 Forum

I saw that BLAST wasn't working so I looked into it and realized I had downloaded the BLAST+ latest version and not the 2.2.22 legacy. I downloaded the proper version of BLAST and followed the installation instruction to a T and I am still getting the following errors when I run print_qiime_config.py -tf

FAIL: test_blast_supported_version (__main__.QIIMEDependencyFull)

blast is in path and version is supported

----------------------------------------------------------------------

Traceback (most recent call last):

File "/macqiime/anaconda/bin/print_qiime_config.py", line 456, in test_blast_supported_version

"which components of QIIME you plan to use.")

AssertionError: blast not found. This may or may not be a problem depending on which components of QIIME you plan to use.

Any suggestions?

Thanks,

Alexis

TonyWalters

unread,

Apr 26, 2016, 9:57:29 PM4/26/16

to Qiime 1 Forum

Did you follow all of the macqiime blast installation instructions? Did you create the .ncbirc file? Is the downloaded file in the $PATH? Can you run:

blastall

and

formatdb

and get the response from the executable blast files?

Alexis Walker

unread,

Apr 26, 2016, 10:12:57 PM4/26/16

to Qiime 1 Forum

Yes

This is what my etc/bashrc file looks like (note PATH at bottom):

# System-wide .bashrc file for interactive bash(1) shells.

if [ -z "$PS1" ]; then

return

fi

PS1='\h:\W \u\$ '

# Make bash check its window size after a process completes

shopt -s checkwinsize

# Tell the terminal about the working directory at each prompt.

if [ "$TERM_PROGRAM" == "Apple_Terminal" ] && [ -z "$INSIDE_EMACS" ]; then

update_terminal_cwd() {

# Identify the directory using a "file:" scheme URL,

# including the host name to disambiguate local vs.

# remote connections. Percent-escape spaces.

local SEARCH=' '

local REPLACE='%20'

local PWD_URL="file://$HOSTNAME${PWD//$SEARCH/$REPLACE}"

printf '\e]7;%s\a' "$PWD_URL"

}

PROMPT_COMMAND="update_terminal_cwd; $PROMPT_COMMAND"

fi

#path to bin in blast

export PATH=/opt/blast-2.2.22/bin/:$PATH

This is what my ~/.ncbirc look like:

[NCBI]

Data=/opt/blast-2.2.22/data/

When I run ./blastall and ./formatdb everything looks okay.

Alexis Walker

unread,

Apr 26, 2016, 10:14:25 PM4/26/16

to Qiime 1 Forum

I was only able to get the blastall error fixed when I added the full file path to it in the qiime_config.

Alexis Walker

unread,

Apr 26, 2016, 10:37:12 PM4/26/16

to Qiime 1 Forum

Update: I tried adding the PATH to blast/bin in my root profile (/profile) and I am no longer getting the error after print_qiime_config:

FAIL: test_ampliconnoise_install (__main__.QIIMEDependencyFull)

AmpliconNoise install looks sane.

----------------------------------------------------------------------

Traceback (most recent call last):

File "/macqiime/anaconda/bin/print_qiime_config.py", line 382, in test_ampliconnoise_install

"$PYRO_LOOKUP_FILE variable is not set. See %s for help." % url)

AssertionError: $PYRO_LOOKUP_FILE variable is not set. See http://qiime.org/install/install.html#ampliconnoise-install-notes for help.

======================================================================

FAIL: test_usearch_supported_version (__main__.QIIMEDependencyFull)

usearch is in path and version is supported

----------------------------------------------------------------------

Traceback (most recent call last):

File "/macqiime/anaconda/bin/print_qiime_config.py", line 650, in test_usearch_supported_version

"which components of QIIME you plan to use.")

AssertionError: usearch not found. This may or may not be a problem depending on which components of QIIME you plan to use.

Since I don't need ampliconnoise or usearch I assume that is good news.

However, I am still getting a blank chimera output txt

TonyWalters

unread,

Apr 26, 2016, 10:43:08 PM4/26/16

to Qiime 1 Forum

Good-and those other failures aren't important for ChimeraSlayer, as you surmised.

Let's try running the known true chimera test-this is copied text from the prior message:

Can you download the attached file, and download this linked file as well:

http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed

and put them all in one directory and run this command:

identify_chimeric_seqs.py -m ChimeraSlayer -i 3seqs_1chimera.fasta -a core_set_aligned.fasta.imputed -o test_chimeric_seqs.txt

See if the output test_chimeric_seqs.txt file contains data when running this.

3seqs_1chimera.fasta

Alexis Walker

unread,

Apr 26, 2016, 11:04:30 PM4/26/16

to Qiime 1 Forum

I did it and this is what I got:

Traceback (most recent call last):

File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 354, in <module>

main()

File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 328, in main

keep_intermediates=keep_intermediates)

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 159, in chimeraSlayer_identify_chimeras

keep_intermediates=keep_intermediates):

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 143, in __call__

keep_intermediates=keep_intermediates)

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 637, in get_chimeras_from_Nast_aligned

app_results = app()

File "/macqiime/anaconda/lib/python2.7/site-packages/burrito/util.py", line 295, in __call__

result_paths = self._get_result_paths(data)

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 419, in _get_result_paths

raise ApplicationError("Calling ChimeraSlayer failed.")

burrito.util.ApplicationError: Calling ChimeraSlayer failed

TonyWalters

unread,

Apr 27, 2016, 7:01:33 AM4/27/16

to Qiime 1 Forum

What do you get from typing the following commands (from a new terminal):
ChimeraSlayer.pl

formatdb

blastall

Alexis Walker

unread,

Apr 27, 2016, 1:13:42 PM4/27/16

to Qiime 1 Forum

ChimeraSLayer.pl :

./ChimeraSlayer.pl

##########################################################################################

#

# Required:

#

# --query_NAST multi-fasta file containing query sequences in alignment format

#

# Common opts:

#

# --db_NAST db in NAST format (default: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta)

# --db_FASTA db in fasta format (megablast formatted) (default: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.fasta)

#

# -n number of top matching database sequences to compare to (default 15)

# -R min divergence ratio default: 1.007

# -P min percent identity among matching sequences (default: 90)

#

# ## parameters to tune ChimeraParentSelector:

#

# Scoring parameters:

# -M match score (default: +5)

# -N mismatch penalty (default: -4)

# -Q min query coverage by matching database sequence (default: 70)

# -T maximum traverses of the multiple alignment (default: 1)

#

# ## parameters to tune ChimeraPhyloChecker:

#

# --windowSize default 50

# --windowStep default 5

# --minBS minimum bootstrap support for calling chimera (default: 90)

# -S percent of SNPs to sample on each side of breakpoint for computing bootstrap support (default: 10)

# --num_parents_test number of potential parents to test for chimeras (default: 3)

# --MAX_CHIMERA_PARENT_PER_ID Chimera/Parent alignments with perID above this are considered non-chimeras (default 100; turned off)

#

# ## misc opts

#

# --printFinalAlignments shows alignment between query sequence and pair of candidate chimera parents

# --printCSalignments print ChimeraSlayer alignments in ChimeraSlayer output

# --exec_dir chdir to here before running

#

#########################################################################################

formatdb :

./formatdb

[formatdb 2.2.22] ERROR: No database name was specified

blastall :

./blastall

blastall 2.2.22 arguments:

-p Program Name [String]

-d Database [String]

default = nr

-i Query File [File In]

default = stdin

-e Expectation value (E) [Real]

default = 10.0

-m alignment view options:

0 = pairwise,

1 = query-anchored showing identities,

2 = query-anchored no identities,

3 = flat query-anchored, show identities,

4 = flat query-anchored, no identities,

5 = query-anchored no identities and blunt ends,

6 = flat query-anchored, no identities and blunt ends,

7 = XML Blast output,

8 = tabular,

9 tabular with comment lines

10 ASN, text

11 ASN, binary [Integer]

default = 0

range from 0 to 11

-o BLAST report Output File [File Out] Optional

default = stdout

-F Filter query sequence (DUST with blastn, SEG with others) [String]

default = T

-G Cost to open a gap (-1 invokes default behavior) [Integer]

default = -1

-E Cost to extend a gap (-1 invokes default behavior) [Integer]

default = -1

-X X dropoff value for gapped alignment (in bits) (zero invokes default behavior)

blastn 30, megablast 20, tblastx 0, all others 15 [Integer]

default = 0

-I Show GI's in deflines [T/F]

default = F

-q Penalty for a nucleotide mismatch (blastn only) [Integer]

default = -3

-r Reward for a nucleotide match (blastn only) [Integer]

default = 1

-v Number of database sequences to show one-line descriptions for (V) [Integer]

default = 500

-b Number of database sequence to show alignments for (B) [Integer]

default = 250

-f Threshold for extending hits, default if zero

blastp 11, blastn 0, blastx 12, tblastn 13

tblastx 13, megablast 0 [Real]

default = 0

-g Perform gapped alignment (not available with tblastx) [T/F]

default = T

-Q Query Genetic code to use [Integer]

default = 1

-D DB Genetic code (for tblast[nx] only) [Integer]

default = 1

-a Number of processors to use [Integer]

default = 1

-O SeqAlign file [File Out] Optional

-J Believe the query defline [T/F]

default = F

-M Matrix [String]

default = BLOSUM62

-W Word size, default if zero (blastn 11, megablast 28, all others 3) [Integer]

default = 0

-z Effective length of the database (use zero for the real size) [Real]

default = 0

-K Number of best hits from a region to keep. Off by default.

If used a value of 100 is recommended. Very high values of -v or -b is also suggested [Integer]

default = 0

-P 0 for multiple hit, 1 for single hit (does not apply to blastn) [Integer]

default = 0

-Y Effective length of the search space (use zero for the real size) [Real]

default = 0

-S Query strands to search against database (for blast[nx], and tblastx)

3 is both, 1 is top, 2 is bottom [Integer]

default = 3

-T Produce HTML output [T/F]

default = F

-l Restrict search of database to list of GI's [String] Optional

-U Use lower case filtering of FASTA sequence [T/F] Optional

-y X dropoff value for ungapped extensions in bits (0.0 invokes default behavior)

blastn 20, megablast 10, all others 7 [Real]

default = 0.0

-Z X dropoff value for final gapped alignment in bits (0.0 invokes default behavior)

blastn/megablast 100, tblastx 0, all others 25 [Integer]

default = 0

-R PSI-TBLASTN checkpoint file [File In] Optional

-n MegaBlast search [T/F]

default = F

-L Location on query sequence [String] Optional

-A Multiple Hits window size, default if zero (blastn/megablast 0, all others 40 [Integer]

default = 0

-w Frame shift penalty (OOF algorithm for blastx) [Integer]

default = 0

-t Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments. (0 invokes default behavior; a negative value disables linking.) [Integer]

default = 0

-B Number of concatenated queries, for blastn and tblastn [Integer] Optional

default = 0

-V Force use of the legacy BLAST engine [T/F] Optional

default = F

-C Use composition-based score adjustments for blastp or tblastn:

As first character:

D or d: default (equivalent to T)

0 or F or f: no composition-based statistics

2 or T or t: Composition-based score adjustments as in Bioinformatics 21:902-911,

1: Composition-based statistics as in NAR 29:2994-3005, 2001

2005, conditioned on sequence properties

3: Composition-based score adjustment as in Bioinformatics 21:902-911,

2005, unconditionally

For programs other than tblastn, must either be absent or be D, F or 0.

As second character, if first character is equivalent to 1, 2, or 3:

U or u: unified p-value combining alignment p-value and compositional p-value in round 1 only

[String]

default = D

-s Compute locally optimal Smith-Waterman alignments (This option is only

available for gapped tblastn.) [T/F]

default = F

TonyWalters

unread,

Apr 27, 2016, 1:56:07 PM4/27/16

to Qiime 1 Forum

Alexis, can you post the results of the following commands:

cd

which blastall

which ChimeraSlayer.pl

pwd

TonyWalters

unread,

Apr 27, 2016, 2:03:04 PM4/27/16

to Qiime 1 Forum

Also, could you run print_qiime_config.py -tf > printed_qiime_config.txt

and post the printed_qiime_config.txt file? I want to double check something.

Alexis Walker

unread,

Apr 27, 2016, 2:16:18 PM4/27/16

to Qiime 1 Forum

MacQIIME Hardy:~ $ cd

MacQIIME Hardy:~ $

MacQIIME Hardy:~ $ which blastall

/macqiime/microbiomeutil_2010-04-29/blast-2.2.22/bin/blastall

MacQIIME Hardy:~ $ which ChimeraSlayer.pl

/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraSlayer.pl

MacQIIME Hardy:~ $ pwd

/Users/alexiswalker

Note: I just added the absolute paths to my .bash_profile whereas previously I added them to /profile and bashrc. The "which" command just started working after I added the paths to my .bash_profile, so I'm going to run chimera checking with your test samples again and see what happens and get back to you.

printed_qiime_config.txt

Alexis Walker

unread,

Apr 27, 2016, 2:32:47 PM4/27/16

to Qiime 1 Forum

HI Tony,

I re-ran the test chimera checking and got the following resulting text file:

11057_115098_chimera 188898 11057

Seems like it worked?

TonyWalters

unread,

Apr 27, 2016, 2:38:00 PM4/27/16

to Qiime 1 Forum

Yes! So now you want to try again on your data set.

Just to confirm, did you generate the PyNast-aligned data for your dataset using the SILVA aligned file as the template (rather than the default Greengenes)? We want to make sure that the alignments are matched in size.

Alexis Walker

unread,

Apr 27, 2016, 3:49:50 PM4/27/16

to Qiime 1 Forum

Great!

I tried to run it on my dataset but I got another blank resulting file, so I decided to move everything into one directory and run it again. It is running now, so far so good. Yes, I used the silva database, here is my script for OTU picking:

pick_open_reference_otus.py -a -O 6 -i /Users/alexiswalker/Desktop/16S/Beaufort/Raw_data/16S_Beau_sed.faa -o /Users/alexiswalker/Desktop/16S/Beaufort/Silva_run -r /Users/alexiswalker/Desktop/18S/Silva119_release/rep_set/97/Silva_119_rep_set97.fna -p /Users/alexiswalker/Desktop/16S/Beaufort/Silva_run/parameters_silva119.txt --min_otu_size 2 -f

It's seeming like the overall fixes for this issue may have been adding paths to ChimeraSlayer and Blast in my .bash_profile and then putting all of the files needed in one directory when performing chimera checking.

I'll keep you posted on the results (I ran the parallel version so if it doesn't work I'll try the non-parallel).

Thanks so much for all your help in troubleshooting this issue, you have been so helpful!

Alexis

Alexis Walker

unread,

Apr 28, 2016, 3:52:16 PM4/28/16

to Qiime 1 Forum

Hi Tony,

It seemed like everything was running it's course, but when the script was finally finished running I got another blank chimera file. So I tried the non-parallel script this morning and I received this error:

ApplicationError("Calling ChimeraSlayer failed.")

burrito.util.ApplicationError: Calling ChimeraSlayer failed.

Now whenever I run my dataset I get this error again. And when I run the test set you sent me it works fine.

When I pass print_qiime_config.py -tf there are no additional errors.

I tried moving the data and directory I am working with to the ChimerSlayer directory bc that is where I put the test set, but this still didn't work.

Could this issue happen if I am not using the proper alignment files for Silva?

Thanks,

Alexis

TonyWalters

unread,

Apr 28, 2016, 4:01:59 PM4/28/16

to Qiime 1 Forum

Because it works with the test data set, the most likely possibilities are:

1. There is a resource issue with larger dataset, e.g. memory that is causing the crash.

2. There is something happening on the system you're running it on that is causing the crash (e.g. if you're doing job submissions on a cluster, make sure you use absolute filepaths).

3. There is something wrong with the alignment, fasta format, or fasta labels that is causing the ChimeraSlayer software to crash.

One possible way that three could have happened is if you did not use the SILVA template alignment when you processed your reads, leading to an alignment built with the Greengenes template that is a different size (i.e. alignment length) than the core alignment you are using for ChimeraSlayer. Look through your log files to see what parameters you used for running align_seqs.py.

Alexis Walker

unread,

Apr 28, 2016, 4:16:04 PM4/28/16

to Qiime 1 Forum

I am pretty sure I included everything I needed in my parameters file. For my sanity would you mind taking a look at my log file and parameters file?

Another thing I am wondering is if the Silva downloaded is corrupted, but I can't seem to find the md5 indicating a non-corrupt download.

I am thinking to re-run my OTU picking step anyway since these data we run on the 1.9.0 version anyhow.

I am wanting to rerun using RDP classifier and then try to chimera checking that new run.

What is the best way to include rdp, in the paramters file?

Thanks so much for all your help!

log_20160411182722.txt

parameters_silva119.txt

TonyWalters

unread,

Apr 28, 2016, 4:40:15 PM4/28/16

to Qiime 1 Forum

For RDP, you would have to add the

assign_taxonomy:assignment_method rdp

assign_taxonomy:rdp_max_memory X

Where X is at least 12000

I don't see an error in the current parameters for align_seqs.py, but you might try using the core_alignment/core_Silva119_alignment.fna file for both align_seqs.py and the -a input when running identify_chimeric_seqs.py to try to get around resource issues

Alexis Walker

unread,

Apr 29, 2016, 2:02:33 AM4/29/16

to Qiime 1 Forum

Hi Tony,

Thanks for info on RDP.

Unfortunately I am still having problems with ChimeraSlayer. I tried using the Silva core alignment and have the same issues: a blank chimera text file with parallel script, an an error of ApplicationError("Calling ChimeraSlayer failed.")burrito.util.ApplicationError: Calling ChimeraSlayer failed. - when I run the non parallel script.

Not really sure what I can try at this point, any thoughts? I was thinking that it may be a memory issue. I currently have 16GB of RAM and I can normally spare ~12GB for chimera checking, so not sure if that is enough.

I decided to try the blast_fragments method and it seems to be working in that it has been running for quite awhile, but I shall see. I couldn't find much about this method, do you know the reason that ChimeraSlayer is often chosen over this method?

TonyWalters

unread,

Apr 29, 2016, 6:18:56 AM4/29/16

to Qiime 1 Forum

Hello Alexis,

One main reason for the use of ChimeraSlayer is that it has a direct citation associated with it from Haas et. al 2011. You might try running ChimeraSlayer.pl directly on the reads, rather than through QIIME, to see if we get more informative feedback. Another alternative would be to run usearch61, but it's ran differently, before OTU picking: http://qiime.org/tutorials/chimera_checking.html#usearch-6-1
Note that you can either download usearch 6.1.544, rename it to usearch61, chmod 775 usearch61, and put it in the $PATH

http://www.drive5.com/usearch/download.html
As an alternative to this, you can use vsearch in the same way as usearch61 (rename, chmod and put it in the $PATH as usearch61):

https://github.com/torognes/vsearch
This option may be preferable, as only the 32 bit usearch software is available for free (non-commercial users only), so sometimes you will run into memory allocation limits.

Alexis Walker

unread,

May 1, 2016, 1:39:50 PM5/1/16

to Qiime 1 Forum

Hi Tony,

I tried running ChimerSlayer directly w/ the following script:

ChimeraSlayer.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta

Things seemed to be running, but sometime in the night it stopped with the following error message:

Use of uninitialized value in pattern match (m//) at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/chimeraParentSelector.pl line 876.

CMD: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/CPS_to_RENAST.pl --CPS_output rep_set_aligned.fasta.CPS --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta > /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta.CPS_RENAST

Error, cannot open file rep_set_aligned.fasta.CPS at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/CPS_to_RENAST.pl line 51.

Error, cmd (/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/CPS_to_RENAST.pl --CPS_output rep_set_aligned.fasta.CPS --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta > /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta.CPS_RENAST) died with ret(512) at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraSlayer.pl line 243.

It almost seems like the job "timed out". Not sure what this error means, any thoughts?

I am going to try and fiddle with ChimeraSlayer for as long as I can as this is really the preferred method for my data set. Although I did get a chimera file from the blast_fragments script I ran and I may try usearch if I get desperate or bored of trying to get ChimeraSlayer to work :).

Thanks,

Alexis

Alexis Walker

unread,

May 1, 2016, 2:30:48 PM5/1/16

to Qiime 1 Forum

Also, I figured it might be useful to update this feed on some changes I made with respect to my environmental variables. As I kept running:

print_qiime_config.py -tf

the output would change from no errors to errors indicating I was having path problems. Additionally when I ran:

./test_identify_chimeric_seqs.py

I got errors regarding path issues to usearch.

So I went ahead and downloaded usearch. I also removed all of the macqiime related paths from my mac environmental variable files (~/.bash_profile, bashrc, /profile) and just added the paths to blast and usearch to the bash_profile.txt in the macqiime/configs.

This seems to have appeased any issues that would come up when using the above commands.

Alexis Walker

unread,

May 1, 2016, 2:36:42 PM5/1/16

to Qiime 1 Forum

Last but not least, when I run chimerslayer from qiime using the identify_chimera_seq.py the error I get is always referring to line 419, an excerpt from which is below:

def _get_result_paths(self, data):

""" Set the result paths """

result = {}

inp_file_name = str(self.Parameters['--query_NAST'].Value)

inp_file_name = inp_file_name.rstrip('"')

inp_file_name = inp_file_name.lstrip('"')

exec_dir = self.Parameters['--exec_dir']

if exec_dir.isOn():

exec_dir = str(exec_dir.Value)

exec_dir = exec_dir.lstrip('"')

exec_dir = exec_dir.rstrip('"')

if inp_file_name[0] == '/':

# path is already absolute

pass

else:

inp_file_name = exec_dir + "/" + inp_file_name

if not exists(inp_file_name + ".CPS.CPC"):

raise ApplicationError("Calling ChimeraSlayer failed.")

It seems like both the identify_chimera_seqs.py script in macqiime and the direct use of ChimeraSlayer.pl are yielding similar errors indicating difficulty calling on input and resulting files.

TonyWalters

unread,

May 1, 2016, 2:38:45 PM5/1/16

to Qiime 1 Forum

Hmm, there aren't many results when searching for that "died with ret(512)" error message. How much memory do you have on this system?

Something else I noticed-it looks like it's using the default reference database for that command, which would be an alignment that wouldn't match your query SILVA alignment in size.

I think this command should let you run with the SILVA database as the reference (you may have to fix the paths, I think these are the correct ones from the previous emails).
ChimeraSlayer.pl --query_NAST /Users/alexiswalker/Desktop/16S/Beaufort/Silva_run/pynast_aligned_seqs/rep_set_aligned.fasta --db_FASTA /Users/alexiswalker/Desktop/18S/Silva119_release/core_alignment/core_Silva119_alignment.fna

I saw the other response as I was typing this regarding usearch-you can also use vsearch if you run into a memory issue; you'll see a error referring to "malloc" if so. There isn't great agreement between the chimera detection software, and there still is no real gold standard yet, so I wouldn't consider usearch/vsearch to be less valid than using chimeraslayer.

Alexis Walker

unread,

May 1, 2016, 3:07:53 PM5/1/16

to Qiime 1 Forum

Hi Tony,

I have 16GB of memory.

Thanks for pointing out that I forgot the Silva alignment file! I have been using the 97 instead of core, bc I want to use this one in the downstream analsyis and I seemed to be geting the same errors with the core alignment.

I just ran it and received this error:

ChimeraSlayer.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna

CMD: formatdb -i /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna -p F 2>/dev/null

CMD: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/run_chimeraParentSelector.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna -n 15 -P 90 -R 1.007 > rep_set_aligned.fasta.CPS

Error, no fasta entry retrieved by accession: JX193306.1.1540

Error, cmd (/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/run_chimeraParentSelector.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna -n 15 -P 90 -R 1.007 > rep_set_aligned.fasta.CPS) died with ret(65280) at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraSlayer.pl line 243.

It looked like if I don't provide a --db_NAST it will default and that may be the issue. Do you know how I can construct a NAST db with the silva db files?

Alexis Walker

unread,

May 1, 2016, 3:29:16 PM5/1/16

to Qiime 1 Forum

Hi Tony,

I am thinking maybe the silva database formats are clashing with CHimeraSlayer. I was looking at my formatdb.log after the most recent run and there was a lot of this:

Version 2.2.22 [Sep-27-2009]

Started database file "/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna"

WARNING: [000.000] Sequence number 1 (lcl|1_/macqiime/microbiomeutil_2010-04), 30307 illegal characters were removed:

99 -s

WARNING: [000.000] Sequence number 2 (lcl|2_/macqiime/microbiomeutil_2010-04), 30461 illegal characters were removed:

WARNING: [000.000] Sequence number 3 (lcl|3_/macqiime/microbiomeutil_2010-04), 30321 illegal characters were removed:

113 -s

Alexis Walker

unread,

May 1, 2016, 3:32:03 PM5/1/16

to Qiime 1 Forum

On a side note, I am wondering why QIIME doesn't have a silva 123 fasta as mothur does?

TonyWalters

unread,

May 1, 2016, 3:32:15 PM5/1/16

to Qiime 1 Forum

Yep, was just looking at that too. Give me a few minutes to test something locally.

TonyWalters

unread,

May 1, 2016, 3:34:56 PM5/1/16

to Qiime 1 Forum

There is one here, not on the official SILVA site (they are going to take over making these in future apparently-formatting for the RDP classifier is the biggest hurdle, but perhaps ChimeraSlayer is also a hurdle...): https://www.dropbox.com/s/ndkfgyy2n4yd0b4/SILVA123_QIIME_release.zip?dl=0

I don't think this one will get past the ChimeraSlayer issue either though.

TonyWalters

unread,

May 1, 2016, 4:10:15 PM5/1/16

to Qiime 1 Forum

It's not the blast database formatting that's causing the error-I did a test with a small subset of SILVA and a small subset of the Greengenes alignment-both give that same warning in the log, but Greengenes completes the process while the SILVA database errors out in steps after the blast database formatting. Digging further, there was this error:

Error, nast coord 23777 >= nast alignment length: 7682 at /Users/tony/code/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/../PerlLib/NAST_to_Eco_coords.pm line 28.

It seems that the software is hardcoded to look for the 16S alignments (which is a problem for SILVA, since it's quite a bit larger).

It's a hack, but I got around the error by modifying this line (line 28) in the /microbiomeutil_2010-04-29/ChimeraSlayer/PerlLib/NAST_to_Eco_coords.pm file (make a backup copy of it first):

confess "Error, nast coord $coord >= nast alignment length: " . length($ECO_NAST_SEQ);

by putting a # character in front of the line.

Alexis Walker

unread,

May 1, 2016, 4:26:16 PM5/1/16

to Qiime 1 Forum

Hi Tony,

I went ahead and made a back up and changed that line to:

#confess "Error, nast coord $coord >= nast alignment length: " . length($ECO_NAST_SEQ);

Then I ran:

ChimeraSlayer.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna

and got the same error:

TonyWalters

unread,

May 1, 2016, 4:28:18 PM5/1/16

to Qiime 1 Forum

Ooops, forgot to add this to the command (it's odd that it works this way...):
ChimeraSlayer.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna

Alexis Walker

unread,

May 1, 2016, 5:00:21 PM5/1/16

to Qiime 1 Forum

Great, thanks. I ran it and it is currently running ... fingers crossed.

In the meantime, any thoughts on how to get chimera slayer to work through qiime? Pretty soon I'm going to have a rather massive dataset and will need to work in an HPC environment. I will probably be running through an initial protocol with a sample dataset and I am definitely worried about whether I will be able to get through the chimera checking step on the HPC environment when it has been so tough on macqiime. The strange thing is that I have successfully run this same dataset through chimera checking on previous macqiime versions, but I think that was with the default gg ref db. I have also been reading/hearing that gg is no longer the best database with respect to curation and being updated as the silva db is, so maybe the newest QIIME version could change the default? Not that this would necessarily fix the chimera checking issue, but it does seem like a potential upgrade.

Anywho, thank you for all your help! I hope that this most recent script works , I'll let you know the outcome.

-Alexis

Alexis Walker

unread,

May 1, 2016, 5:13:32 PM5/1/16

to Qiime 1 Forum

Woo, looks like it worked! I'm not sure how the filtering step proceeds afterward, I'm thinking that -e is the output ${sequences}.NAST.CPS.CPC ?

TonyWalters

unread,

May 1, 2016, 5:17:55 PM5/1/16

to Qiime 1 Forum

Can you try running it through QIIME now? I think it will work, since the underlying code uses the same db_NAST/db_FASTA files as reference too (https://github.com/biocore/qiime/blob/master/qiime/identify_chimeric_seqs.py#L139). The CPS file is parsed to get the putative chimeras (https://github.com/biocore/qiime/blob/master/qiime/identify_chimeric_seqs.py#L526).

I'm expecting new Greengenes releases that could improve the core alignment and reference database, but that isn't under control of the QIIME team.

Alexis Walker

unread,

May 1, 2016, 5:41:40 PM5/1/16

to qiime...@googlegroups.com

I am running it through qiime now, it is looking good, and has definitely made it mast it's usual error point. I'll keep you updated.

In my previous email I meant more so to ask whether qiime would think about using silva as its default over greengenes. Sorry I was pretty unclear :).

On that note I noticed that there is only a core alignment and not a 97% alignment for the Silva123. Can I just change this in the otu picking script or do I need a specific 97 alignment fasta if that is what I am going for?

Alexis Walker

unread,

May 2, 2016, 3:53:43 PM5/2/16

to Qiime 1 Forum

Hi Tony,

The script ran for quite awhile and even yielded most of the output with assiciated content, however I did get an error and no resulting chimeras.txt.

There error was:

Traceback (most recent call last):

File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 354, in <module>

main()

File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 328, in main

keep_intermediates=keep_intermediates)

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 159, in chimeraSlayer_identify_chimeras

keep_intermediates=keep_intermediates):

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 143, in __call__

keep_intermediates=keep_intermediates)

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 637, in get_chimeras_from_Nast_aligned

app_results = app()

File "/macqiime/anaconda/lib/python2.7/site-packages/burrito/util.py", line 295, in __call__

result_paths = self._get_result_paths(data)

File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 419, in _get_result_paths

raise ApplicationError("Calling ChimeraSlayer failed.")

burrito.util.ApplicationError: Calling ChimeraSlayer failed.

TonyWalters

unread,

May 2, 2016, 4:28:05 PM5/2/16

to Qiime 1 Forum

Hmm, so that line of code is here:

https://github.com/biocore/qiime/blob/master/qiime/identify_chimeric_seqs.py#L418

Did it create a file that looks like the input sequence file name + .CPS.CPC when running it through QIIME?

I'm about out of ideas here. I would say we could open an issue for QIIME on this, but there is at least something in the underlying code of ChimeraSlayer that has to be resolved with its hardcoded alignment length test that isn't compatible with the SILVA alignment.

Actually, can you create a small test set of sequences for your input sequences?
E.g.:
head /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta > /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/TestSeqs.fna

Then run

identify_chimeric_seqs.py -i /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/TestSeqs.fna -o testChimeraSlayer97repset.txt -a /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna

and

identify_chimeric_seqs.py -i /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/TestSeqs.fna -o testChimeraSlayerCoreSet.txt -a INSERT_PATH_TO_COREALIGNED_SEQS_HERE

If those crash, can you attach the TestSeqs.fna file?

Alexis Walker

unread,

May 2, 2016, 4:50:08 PM5/2/16

to Qiime 1 Forum

Hi Tony,

I will go ahead and run those test but before I do I wanted to address the error message. I do have a input sequence file name + .CPS.CPC and it is not blank and looks to have been completely filled out. However this file as well as the input sequence file name + .CPS.CPC.w/taxon files are in the directory ChimeraSlayer, while all the other files are located or written in the directory ChimeraSlayer/silva-macq. So I'm wondering if everything was in the same directory with ChimeraSlayer.pl (ChimeraSlayer/) it would work. So I will try this when I run the test.

Alexis Walker

unread,

May 2, 2016, 8:01:29 PM5/2/16

to Qiime 1 Forum

Hi Tony,

I went ahead and ran both tests with a smaller dataset, both with the Silva 97 and core alignments. I also ran each of them in the directory with ChimeraSlayer.pl and in a directory within the ChimeraSlayer/ . It seems like all of them worked however it is difficult to say for sure since all of the resulting chimera text files were blank which could be due to no chimeras in this small subset or that it didn't work. I am sending along the test subset I created either way.

Thanks!

test_seqs.fasta

TonyWalters

unread,

May 2, 2016, 8:07:49 PM5/2/16

to Qiime 1 Forum

Maybe we could try a larger sample file?

So:

head -n 5000 /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta > /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/TestSeqs2500.fna

and see if those 2500 sequences complete.

How many total sequences are in there by the way? You can tell with:

grep -c /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta

Alexis Walker

unread,

May 5, 2016, 3:10:41 PM5/5/16

to Qiime 1 Forum

Hi Tony,

I am currently running the 2500 test sequences through and will update you when that job has finished.

Also, I'm not sure why, but grep seems to take a very long time on my computer and often times out. So, I normally use the count.seqs.py script instead. I have 35,369 sequences in my rep_set_aligned.fasta file.

I know it's a separate question but I wanted to ask if you were planning to make a 97% alignment for the Silva123? I noticed that there is only a core alignment.

Thanks,

Alexis

Alexis Walker

unread,

May 5, 2016, 8:05:33 PM5/5/16

to Qiime 1 Forum

Hi Tony,

I ran the test with 2500 and 5000 test sequences and still a blank chimera text file. Hmmmmm ... I am thinking I might try to work with uchime instead. I emailed tech support for ChimeraSlayer recently and received a reply saying that he is no longer supporting ChimeraSlayer and is exclusively recommending uchime.

Additionally I also tried this test using our supercomputing program to see if it would work better with more memory to work with and got an error:

line 410, in _get_result_paths raise ApplicationError,"Calling ChimeraSlayer failed."

#cogent.app.util.ApplicationError: Calling ChimeraSlayer failed.

Not sure if I should try my entire dataset again on my personal laptop and see if it yields a chimera text with seqs in it. I know there are chimeric seqs in my data as this worked awhile back with the same dataset using green genes and an older version of qiime.

Alexis Walker

unread,

May 5, 2016, 11:49:33 PM5/5/16

to Qiime 1 Forum

UCHIME worked in minutes and identified chimeras in the test sub-sample of my data (head -n 5000)! I had to use it outside of qiime, but it worked very well. I used the silva123 database and core alignment.

Quick unrelated question : Our supercomputing center still needs to update its QIIME version due to downloading issues so I am currently testing scripts on version 1.8.0. Is there any major difference with respect to my result when running the same workflow through different versions?

Thanks,

Alexis

TonyWalters

unread,

May 6, 2016, 6:57:17 AM5/6/16

to Qiime 1 Forum

Most of the results will be the same, the major changes are listed here:

https://github.com/biocore/qiime/releases/tag/1.9.0

https://github.com/biocore/qiime/blob/1.9.1/ChangeLog.md

The default settings for OTU picking with uclust/usearch are also different between 1.8.0 and 1.9.X, so you'd want to match certain settings to make sure you get the same results (the 1.8.0 settings for uclust/usearch are quite a bit slower, but a bit more accurate):
http://qiime.org/1.8.0/scripts/pick_otus.html
--max_accepts 20

--max_rejects 500

--stepwords 20

--word_length 12

Alexis Walker

unread,

May 6, 2016, 4:19:40 PM5/6/16

to Qiime 1 Forum

Thanks so much for all of your help Tony!

Wenshu Yap

unread,

Jun 27, 2016, 10:53:33 PM6/27/16

to Qiime 1 Forum

Hi Alexis and Tony,

I am having the same problem as Alexis and I followed all the steps mentioned in this thread and same as Alexis, I am still having empty file after every attempts mentioned in the thread. Before I follow Alexis footstep and try with usearch, just wondering is there any updated solution for this beside having to work with usearch?

Thanks

Wenshu

TonyWalters

unread,

Jun 27, 2016, 11:08:07 PM6/27/16

to Qiime 1 Forum

Hello Wenshu,

There hasn't been updates to the software or the application controllers in QIIME. It's a big thread, but there should be test files and instructions to run early in the thread that can act as a "positive control" of sorts to make sure the software completes, at least on a small data set. If that is working, but the actual dataset only produces an empty file, that could be due to ChimeraSlayer either not actual detecting any chimeras.

Alexis Walker

unread,

Jun 27, 2016, 11:28:53 PM6/27/16

to Qiime 1 Forum

Hello Wenshu and Tony,

It turns out, if you have qiime or macqiime, you already have a version of usearch that you can use to check chimeras without an additional download. It works incredibly quickly and the creator of chimerslayer says that this method is by far better. Here is a sample script that worked for me while in qiime:

usearch -uchime $PATH/rep_set_aligned.fasta -db align.fasta -nonchimeras $PATH/nochimeras.fasta -chimeras $PATH/chimeras.fasta

-nonchimeras and -chimeras are the output, so you can use the -chimeras output as -e file for the filter otus step following chimera checking in qiime.

Hope this works for you, good luck!

Wenshu Yap

unread,

Jun 28, 2016, 7:47:00 PM6/28/16

to Qiime 1 Forum

Hi Tony and Alexis,

Thank you very much for the prompt reply. I guess the conclusion here is use usearch61 instead of chimeraslayer to check for chimeras, right?

I tried with usearch61 just now and it is indeed faster than chimeraslayer. I run directly in qiime in cluster. It work fine with small file (although still empty chimera output file, I assume maybe there is no chimera for such small file). So I tried with my actual file, it return with "Fatal error, file size too big for 32bit version". I guess this might be because I am running in cluster and it is kind of pack inside there? I am communicating with my IT department to sort this out. Hopefully things will work well.

Thanks again.

Cheers.

Wenshu

TonyWalters

unread,

Jun 28, 2016, 8:33:52 PM6/28/16

to Qiime 1 Forum

Hello Wenshu,

Alternatively, you could get vsearch (which is 64 bit, doesn't have the memory restrictions of the freely available 32-bit usearch).

https://github.com/torognes/vsearch

To run it within QIIME, it can replace usearch61, so you'd have to rename the current usearch61 executable, rename the vsearch downloaded file to "usearch61", and put it in the $PATH environment (the same folder that usearch61 is in would work, you can find this with the command: which usearch61).

Colin Brislawn

unread,

Jun 28, 2016, 11:25:00 PM6/28/16

to Qiime 1 Forum

Hello Wenshu,

It's great to hear that you have usearch61 installed! With that on your cluster, you can run identify_chimeric_seqs.py -m usearch61 as discribed here:

http://qiime.org/scripts/identify_chimeric_seqs.html

You can also check for chimeras directly with usearch61 --uchime_denovo , but this has some other requirements. For example, the input is NOT the normal seqs.fna file you would use with identify_chimeric_seqs.py. Instead, this file is supposed to be dereplicated with size annodations inside of the header.