FAIL: test_blast_supported_version (__main__.QIIMEDependencyFull)
blast is in path and version is supported
----------------------------------------------------------------------
Traceback (most recent call last):
File "/macqiime/anaconda/bin/print_qiime_config.py", line 456, in test_blast_supported_version
"which components of QIIME you plan to use.")
AssertionError: blast not found. This may or may not be a problem depending on which components of QIIME you plan to use.
Any suggestions?
Thanks,
Alexis
# System-wide .bashrc file for interactive bash(1) shells.
if [ -z "$PS1" ]; then
return
fi
PS1='\h:\W \u\$ '
# Make bash check its window size after a process completes
shopt -s checkwinsize
# Tell the terminal about the working directory at each prompt.
if [ "$TERM_PROGRAM" == "Apple_Terminal" ] && [ -z "$INSIDE_EMACS" ]; then
update_terminal_cwd() {
# Identify the directory using a "file:" scheme URL,
# including the host name to disambiguate local vs.
# remote connections. Percent-escape spaces.
local SEARCH=' '
local REPLACE='%20'
local PWD_URL="file://$HOSTNAME${PWD//$SEARCH/$REPLACE}"
printf '\e]7;%s\a' "$PWD_URL"
}
PROMPT_COMMAND="update_terminal_cwd; $PROMPT_COMMAND"
fi
#path to bin in blast
export PATH=/opt/blast-2.2.22/bin/:$PATH
This is what my ~/.ncbirc look like:
[NCBI]
Data=/opt/blast-2.2.22/data/
When I run ./blastall and ./formatdb everything looks okay.
FAIL: test_ampliconnoise_install (__main__.QIIMEDependencyFull)
AmpliconNoise install looks sane.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/macqiime/anaconda/bin/print_qiime_config.py", line 382, in test_ampliconnoise_install
"$PYRO_LOOKUP_FILE variable is not set. See %s for help." % url)
AssertionError: $PYRO_LOOKUP_FILE variable is not set. See http://qiime.org/install/install.html#ampliconnoise-install-notes for help.
======================================================================
FAIL: test_usearch_supported_version (__main__.QIIMEDependencyFull)
usearch is in path and version is supported
----------------------------------------------------------------------
Traceback (most recent call last):
File "/macqiime/anaconda/bin/print_qiime_config.py", line 650, in test_usearch_supported_version
"which components of QIIME you plan to use.")
AssertionError: usearch not found. This may or may not be a problem depending on which components of QIIME you plan to use.
Since I don't need ampliconnoise or usearch I assume that is good news.
However, I am still getting a blank chimera output txt
Traceback (most recent call last):
File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 354, in <module>
main()
File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 328, in main
keep_intermediates=keep_intermediates)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 159, in chimeraSlayer_identify_chimeras
keep_intermediates=keep_intermediates):
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 143, in __call__
keep_intermediates=keep_intermediates)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 637, in get_chimeras_from_Nast_aligned
app_results = app()
File "/macqiime/anaconda/lib/python2.7/site-packages/burrito/util.py", line 295, in __call__
result_paths = self._get_result_paths(data)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 419, in _get_result_paths
raise ApplicationError("Calling ChimeraSlayer failed.")
burrito.util.ApplicationError: Calling ChimeraSlayer failed
./ChimeraSlayer.pl
##########################################################################################
#
# Required:
#
# --query_NAST multi-fasta file containing query sequences in alignment format
#
# Common opts:
#
# --db_NAST db in NAST format (default: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta)
# --db_FASTA db in fasta format (megablast formatted) (default: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.fasta)
#
#
# -n number of top matching database sequences to compare to (default 15)
# -R min divergence ratio default: 1.007
# -P min percent identity among matching sequences (default: 90)
#
# ## parameters to tune ChimeraParentSelector:
#
# Scoring parameters:
# -M match score (default: +5)
# -N mismatch penalty (default: -4)
# -Q min query coverage by matching database sequence (default: 70)
# -T maximum traverses of the multiple alignment (default: 1)
#
# ## parameters to tune ChimeraPhyloChecker:
#
#
# --windowSize default 50
# --windowStep default 5
# --minBS minimum bootstrap support for calling chimera (default: 90)
# -S percent of SNPs to sample on each side of breakpoint for computing bootstrap support (default: 10)
# --num_parents_test number of potential parents to test for chimeras (default: 3)
# --MAX_CHIMERA_PARENT_PER_ID Chimera/Parent alignments with perID above this are considered non-chimeras (default 100; turned off)
#
# ## misc opts
#
# --printFinalAlignments shows alignment between query sequence and pair of candidate chimera parents
# --printCSalignments print ChimeraSlayer alignments in ChimeraSlayer output
# --exec_dir chdir to here before running
#
#########################################################################################
./formatdb
[formatdb 2.2.22] ERROR: No database name was specified
./blastall
blastall 2.2.22 arguments:
-p Program Name [String]
-d Database [String]
default = nr
-i Query File [File In]
default = stdin
-e Expectation value (E) [Real]
default = 10.0
-m alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = query-anchored no identities and blunt ends,
6 = flat query-anchored, no identities and blunt ends,
7 = XML Blast output,
8 = tabular,
9 tabular with comment lines
10 ASN, text
11 ASN, binary [Integer]
default = 0
range from 0 to 11
-o BLAST report Output File [File Out] Optional
default = stdout
-F Filter query sequence (DUST with blastn, SEG with others) [String]
default = T
-G Cost to open a gap (-1 invokes default behavior) [Integer]
default = -1
-E Cost to extend a gap (-1 invokes default behavior) [Integer]
default = -1
-X X dropoff value for gapped alignment (in bits) (zero invokes default behavior)
blastn 30, megablast 20, tblastx 0, all others 15 [Integer]
default = 0
-I Show GI's in deflines [T/F]
default = F
-q Penalty for a nucleotide mismatch (blastn only) [Integer]
default = -3
-r Reward for a nucleotide match (blastn only) [Integer]
default = 1
-v Number of database sequences to show one-line descriptions for (V) [Integer]
default = 500
-b Number of database sequence to show alignments for (B) [Integer]
default = 250
-f Threshold for extending hits, default if zero
blastp 11, blastn 0, blastx 12, tblastn 13
tblastx 13, megablast 0 [Real]
default = 0
-g Perform gapped alignment (not available with tblastx) [T/F]
default = T
-Q Query Genetic code to use [Integer]
default = 1
-D DB Genetic code (for tblast[nx] only) [Integer]
default = 1
-a Number of processors to use [Integer]
default = 1
-O SeqAlign file [File Out] Optional
-J Believe the query defline [T/F]
default = F
-M Matrix [String]
default = BLOSUM62
-W Word size, default if zero (blastn 11, megablast 28, all others 3) [Integer]
default = 0
-z Effective length of the database (use zero for the real size) [Real]
default = 0
-K Number of best hits from a region to keep. Off by default.
If used a value of 100 is recommended. Very high values of -v or -b is also suggested [Integer]
default = 0
-P 0 for multiple hit, 1 for single hit (does not apply to blastn) [Integer]
default = 0
-Y Effective length of the search space (use zero for the real size) [Real]
default = 0
-S Query strands to search against database (for blast[nx], and tblastx)
3 is both, 1 is top, 2 is bottom [Integer]
default = 3
-T Produce HTML output [T/F]
default = F
-l Restrict search of database to list of GI's [String] Optional
-U Use lower case filtering of FASTA sequence [T/F] Optional
-y X dropoff value for ungapped extensions in bits (0.0 invokes default behavior)
blastn 20, megablast 10, all others 7 [Real]
default = 0.0
-Z X dropoff value for final gapped alignment in bits (0.0 invokes default behavior)
blastn/megablast 100, tblastx 0, all others 25 [Integer]
default = 0
-R PSI-TBLASTN checkpoint file [File In] Optional
-n MegaBlast search [T/F]
default = F
-L Location on query sequence [String] Optional
-A Multiple Hits window size, default if zero (blastn/megablast 0, all others 40 [Integer]
default = 0
-w Frame shift penalty (OOF algorithm for blastx) [Integer]
default = 0
-t Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments. (0 invokes default behavior; a negative value disables linking.) [Integer]
default = 0
-B Number of concatenated queries, for blastn and tblastn [Integer] Optional
default = 0
-V Force use of the legacy BLAST engine [T/F] Optional
default = F
-C Use composition-based score adjustments for blastp or tblastn:
As first character:
D or d: default (equivalent to T)
0 or F or f: no composition-based statistics
2 or T or t: Composition-based score adjustments as in Bioinformatics 21:902-911,
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911,
2005, unconditionally
For programs other than tblastn, must either be absent or be D, F or 0.
As second character, if first character is equivalent to 1, 2, or 3:
U or u: unified p-value combining alignment p-value and compositional p-value in round 1 only
[String]
default = D
-s Compute locally optimal Smith-Waterman alignments (This option is only
available for gapped tblastn.) [T/F]
default = F
MacQIIME Hardy:~ $ cd
MacQIIME Hardy:~ $
MacQIIME Hardy:~ $ which blastall
/macqiime/microbiomeutil_2010-04-29/blast-2.2.22/bin/blastall
MacQIIME Hardy:~ $ which ChimeraSlayer.pl
/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraSlayer.pl
MacQIIME Hardy:~ $ pwd
/Users/alexiswalker
Note: I just added the absolute paths to my .bash_profile whereas previously I added them to /profile and bashrc. The "which" command just started working after I added the paths to my .bash_profile, so I'm going to run chimera checking with your test samples again and see what happens and get back to you.
11057_115098_chimera 188898 11057
Seems like it worked?
ApplicationError("Calling ChimeraSlayer failed.")
burrito.util.ApplicationError: Calling ChimeraSlayer failed.
Use of uninitialized value in pattern match (m//) at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/chimeraParentSelector.pl line 876.
CMD: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/CPS_to_RENAST.pl --CPS_output rep_set_aligned.fasta.CPS --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta > /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta.CPS_RENAST
Error, cannot open file rep_set_aligned.fasta.CPS at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/CPS_to_RENAST.pl line 51.
Error, cmd (/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/CPS_to_RENAST.pl --CPS_output rep_set_aligned.fasta.CPS --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta > /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva2/rep_set_aligned.fasta.CPS_RENAST) died with ret(512) at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraSlayer.pl line 243.
It almost seems like the job "timed out". Not sure what this error means, any thoughts?
I am going to try and fiddle with ChimeraSlayer for as long as I can as this is really the preferred method for my data set. Although I did get a chimera file from the blast_fragments script I ran and I may try usearch if I get desperate or bored of trying to get ChimeraSlayer to work :).
Thanks,
Alexis
def _get_result_paths(self, data):
""" Set the result paths """
result = {}
inp_file_name = str(self.Parameters['--query_NAST'].Value)
inp_file_name = inp_file_name.rstrip('"')
inp_file_name = inp_file_name.lstrip('"')
exec_dir = self.Parameters['--exec_dir']
if exec_dir.isOn():
exec_dir = str(exec_dir.Value)
exec_dir = exec_dir.lstrip('"')
exec_dir = exec_dir.rstrip('"')
if inp_file_name[0] == '/':
# path is already absolute
pass
else:
inp_file_name = exec_dir + "/" + inp_file_name
if not exists(inp_file_name + ".CPS.CPC"):
raise ApplicationError("Calling ChimeraSlayer failed.")
It seems like both the identify_chimera_seqs.py script in macqiime and the direct use of ChimeraSlayer.pl are yielding similar errors indicating difficulty calling on input and resulting files.
ChimeraSlayer.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna
CMD: formatdb -i /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna -p F 2>/dev/null
CMD: /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/run_chimeraParentSelector.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna -n 15 -P 90 -R 1.007 > rep_set_aligned.fasta.CPS
Error, no fasta entry retrieved by accession: JX193306.1.1540
Error, cmd (/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/run_chimeraParentSelector.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/../RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna -n 15 -P 90 -R 1.007 > rep_set_aligned.fasta.CPS) died with ret(65280) at /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraSlayer.pl line 243.
It looked like if I don't provide a --db_NAST it will default and that may be the issue. Do you know how I can construct a NAST db with the silva db files?
Version 2.2.22 [Sep-27-2009]
Started database file "/macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna"
WARNING: [000.000] Sequence number 1 (lcl|1_/macqiime/microbiomeutil_2010-04), 30307 illegal characters were removed:
99 -s
WARNING: [000.000] Sequence number 2 (lcl|2_/macqiime/microbiomeutil_2010-04), 30461 illegal characters were removed:
WARNING: [000.000] Sequence number 3 (lcl|3_/macqiime/microbiomeutil_2010-04), 30321 illegal characters were removed:
113 -s
Error, nast coord 23777 >= nast alignment length: 7682 at /Users/tony/code/microbiomeutil_2010-04-29/ChimeraSlayer/ChimeraParentSelector/../PerlLib/NAST_to_Eco_coords.pm line 28.
It seems that the software is hardcoded to look for the 16S alignments (which is a problem for SILVA, since it's quite a bit larger).
It's a hack, but I got around the error by modifying this line (line 28) in the /microbiomeutil_2010-04-29/ChimeraSlayer/PerlLib/NAST_to_Eco_coords.pm file (make a backup copy of it first):
confess "Error, nast coord $coord >= nast alignment length: " . length($ECO_NAST_SEQ);
by putting a # character in front of the line.
Hi Tony,
I went ahead and made a back up and changed that line to:
#confess "Error, nast coord $coord >= nast alignment length: " . length($ECO_NAST_SEQ);
Then I ran:
ChimeraSlayer.pl --query_NAST /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/rep_set_aligned.fasta --db_FASTA /macqiime/microbiomeutil_2010-04-29/ChimeraSlayer/silva_2/Silva_119_rep_set97_aligned_16S_only.fna
Traceback (most recent call last):
File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 354, in <module>
main()
File "/macqiime/anaconda/bin/identify_chimeric_seqs.py", line 328, in main
keep_intermediates=keep_intermediates)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 159, in chimeraSlayer_identify_chimeras
keep_intermediates=keep_intermediates):
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 143, in __call__
keep_intermediates=keep_intermediates)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 637, in get_chimeras_from_Nast_aligned
app_results = app()
File "/macqiime/anaconda/lib/python2.7/site-packages/burrito/util.py", line 295, in __call__
result_paths = self._get_result_paths(data)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 419, in _get_result_paths
raise ApplicationError("Calling ChimeraSlayer failed.")
burrito.util.ApplicationError: Calling ChimeraSlayer failed.