Trouble Building a Phylogeny with !pick_open_reference_otus.py

33 views
Skip to first unread message

oak...@lemoyne.edu

unread,
Aug 17, 2016, 5:18:21 PM8/17/16
to Qiime 1 Forum
I'm a new Qiime user and I'm attempting to build a phylogenetic tree based on a sample of eukaryotes (mainly algal). I'm using the Fungal ITs tutorial as a guide for my code but I'm getting a strange error when I run: 

pick_open_reference_otus.py -i seqs.fna -r silva_104_rep_set.fasta -o otus_ft/ -p params.txt


The error message looks like this:

Mac-Pro:McManus_Gutter Karen$ pick_open_reference_otus.py -i seqs.fna -r silva_104_rep_set.fasta -o otus_ft/ -p params.txt  -i seqs.fna -r silva_104_rep_set.fasta -o otus_ft/ -p params.txt

Traceback (most recent call last):

 File "/Library/Frameworks/Python.framework/Versions/2.7/bin/pick_open_reference_otus.py", line 453, in <module>

   main()

 File "/Library/Frameworks/Python.framework/Versions/2.7/bin/pick_open_reference_otus.py", line 432, in main

   minimum_failure_threshold=minimum_failure_threshold)

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/qiime/workflow/pick_open_reference_otus.py", line 1071, in pick_subsampled_open_reference_otus

   status_update_callback=status_update_callback)

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/qiime/workflow/pick_open_reference_otus.py", line 327, in align_and_tree

   close_logger_on_success=close_logger_on_success)

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/qiime/workflow/util.py", line 122, in call_commands_serially

   raise WorkflowError(msg)

qiime.workflow.util.WorkflowError:


*** ERROR RAISED DURING STEP: Align sequences

Command run was:

align_seqs.py -i otus_ft//rep_set.fna -o otus_ft//pynast_aligned_seqs --template_fp silva_104/core_Silva_aligned.fasta

Command returned exit status: 1

Stdout:


Stderr

Traceback (most recent call last):

 File "/Library/Frameworks/Python.framework/Versions/2.7/bin/align_seqs.py", line 211, in <module>

   main()

 File "/Library/Frameworks/Python.framework/Versions/2.7/bin/align_seqs.py", line 194, in main

   log_path=log_path, failure_path=failure_path)

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/qiime/align_seqs.py", line 266, in __call__

   temp_dir=get_qiime_temp_dir())

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pynast/util.py", line 812, in pynast_seqs

   for seq, status in pynast_iterator:

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pynast/util.py", line 707, in ipynast_seqs

   Sequence(seq=template_alignment[template_seq_id],moltype=DNA)

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/skbio/alignment/_alignment.py", line 248, in __getitem__

   return self.get_seq(index)

 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/skbio/alignment/_alignment.py", line 515, in get_seq

   return self[self._id_to_index[id]]

KeyError: 'AY191848.1.1408 1408 bp Ralstonia sp. 12F'


I'm not sure what this error means and what I need to fix. My params.txt is as follows:


pick_otus:enable_rev_strand_match True
assign_taxonomy:assignment_method blast
assign_taxonomy:silva_104/Silva_taxa_mapping_104set_97_otus.txt
assign_taxonomy:silva_104/silva_104_rep_set.fasta
align_seqs:template_fp silva_104/core_Silva_aligned.fasta
filter_alignment:suppress_lane_mask_filter True
filter_alignment:entropy_threshold 0.10


Jai Ram Rideout

unread,
Aug 17, 2016, 6:16:27 PM8/17/16
to Qiime 1 Forum
Hello,

I've never seen this error before but I think the Silva 104 core alignment (silva_104/core_Silva_aligned.fasta) is causing issues due to the FASTA headers. The FASTA headers in that file contain whitespace and I think two different pieces of software are interpreting them in incompatible ways: one is (incorrectly) assuming the entire header is the sequence identifier, while the other is (correctly) parsing the FASTA header into a sequence identifier and description based on whitespace.

To work around this, you might try modifying core_Silva_aligned.fasta to not contain whitespace in the FASTA headers (make a backup of the original file first!). I'd truncate the header at the first occurrence of whitespace so that only the sequence identifier is retained. For example:

>AY191848.1.1408 1408 bp Ralstonia sp. 12F

would become:

>AY191848.1.1408

You might also try newer versions of the Silva database; they shouldn't have this issue.

Best,
Jai
Reply all
Reply to author
Forward
0 new messages