assign taxonomy; custom database

84 views
Skip to first unread message

Stephanie Matthews

unread,
May 15, 2017, 3:29:47 PM5/15/17
to Qiime 1 Forum
Hi, 

I am picking de novo otus using uclust, and assigning taxonomy using the mothur algorithm. I'm using a custom database. When I run the pick_de_novo_otus.py workflow script, I get the following error: 


/usr/Modules/3.2.10/init/sh: line 10: unalias: python: not found

Traceback (most recent call last):

  File "/lus/scratch/software/python/Python-2.7.10/bin/pick_de_novo_otus.py", line 180, in <module>

    main()

  File "/lus/scratch/software/python/Python-2.7.10/bin/pick_de_novo_otus.py", line 177, in main

    status_update_callback=status_update_callback)

  File "/lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime/workflow/upstream.py", line 306, in run_pick_de_novo_otus

    close_logger_on_success=close_logger_on_success)

  File "/lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime/workflow/util.py", line 122, in call_commands_serially

    raise WorkflowError(msg)

qiime.workflow.util.WorkflowError: 


*** ERROR RAISED DURING STEP: Assign taxonomy

Command run was:

 assign_taxonomy.py -o uclust97_1//uclust_assigned_taxonomy -i uclust97_1//rep_set//seqs_trimmed_nochimeras_rep_set.fasta --reference_seqs_fp ~/lus/steph_share/references/MIDORI_LONGEST_srRNA/MIDORI_LONGEST_1.1_srRNA_RDP.fasta --id_to_taxonomy_fp ~/lus/steph_share/references/MIDORI_LONGEST_srRNA/MIDORI_LONGEST_1.1_srRNA_RDP.tax2

Command returned exit status: 1

Stdout:


Stderr

Traceback (most recent call last):

  File "/lus/scratch/software/python/Python-2.7.10/bin/assign_taxonomy.py", line 417, in <module>

    main()

  File "/lus/scratch/software/python/Python-2.7.10/bin/assign_taxonomy.py", line 394, in main

    log_path=log_path)

  File "/lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime/assign_taxonomy.py", line 1306, in __call__

    result = self._uc_to_assignments(app_result['ClusterFile'])

  File "/lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime/assign_taxonomy.py", line 1364, in _uc_to_assignments

    tax = self.id_to_taxonomy[subject_id].split(';')

KeyError: 'HM851364'




The output of print_qiime_config.py is: 


System information

==================

         Platform: linux2

   Python version: 2.7.10 (default, Sep 29 2015, 01:41:59)  [GCC Intel(R) C++ gcc 4.4 mode]

Python executable: /lus/scratch/software/python/Python-2.7.10/bin/python


QIIME default reference information

===================================

For details on what files are used as QIIME's default references, see here:

 https://github.com/biocore/qiime-default-reference/releases/tag/0.1.3


Dependency versions

===================

          QIIME library version: 1.9.1

           QIIME script version: 1.9.1

qiime-default-reference version: 0.1.3

                  NumPy version: 1.12.0

                  SciPy version: 0.18.1

                 pandas version: 0.19.2

             matplotlib version: 2.0.0

            biom-format version: 2.1.5

                   h5py version: 2.6.0 (HDF5 version: 1.8.17)

                   qcli version: 0.1.1

                   pyqi version: 0.3.2

             scikit-bio version: 0.2.3

                 PyNAST version: 1.2.2

                Emperor version: 0.9.60

                burrito version: 0.9.1

       burrito-fillings version: 0.1.1

              sortmerna version: SortMeRNA version 2.0, 29/11/2014

              sumaclust version: SUMACLUST Version 1.0.10

                  swarm version: Swarm 2.1.5 [Sep 25 2015 23:34:51]

                          gdata: Installed.

QIIME config values

===================

For definitions of these settings and to learn how to configure QIIME, see here:

 http://qiime.org/install/qiime_config.html

 http://qiime.org/tutorials/parallel_qiime.html


                     blastmat_dir: None

      pick_otus_reference_seqs_fp: /lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

                         sc_queue: all.q

      topiaryexplorer_project_dir: None

     pynast_template_alignment_fp: /lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta

                  cluster_jobs_fp: start_parallel_jobs.py

pynast_template_alignment_blastdb: None

assign_taxonomy_reference_seqs_fp: /lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

                     torque_queue: friendlyq

                    jobs_to_start: 1

                       slurm_time: None

            denoiser_min_per_core: 50

assign_taxonomy_id_to_taxonomy_fp: /lus/scratch/software/python/Python-2.7.10/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

                         temp_dir: /tmp/

                     slurm_memory: None

                      slurm_queue: None

                      blastall_fp: blastall

                 seconds_to_sleep: 1


HM851364 is the title of one of my reference sequences. I've tried removing this sequence, and get the same error, with a different sequence (I've removed about thirty sequences). I think this might be related to my custom reference database, but I haven't been able to identify what the problem is (there don't seem to be any non-standard characters in the sequences that give errors, and they're formatted the same as the sequences that work) - my files looks like: 

MIDORI_own_tab_v3.tax: 

U33576 root;Eukaryota;Chordata;Actinopteri;Characiformes;Serrasalmidae;Acnodon;Acnodon_normani

U33565 root;Eukaryota;Chordata;Actinopteri;Characiformes;Serrasalmidae;Catoprion;Catoprion_mento

U33569 root;Eukaryota;Chordata;Actinopteri;Characiformes;Serrasalmidae;Myloplus;Myloplus_asterias

...

MIDORI_LONGEST_1.1_srRNA_RDP.fasta: 

>U33576 root;Eukaryota;Chordata;Actinopteri;Characiformes;Serrasalmidae;Acnodon;Acnodon normani

TTAGATGGTAAAACCTACAAGTAACATCCGCCAGGGTACTACAAGCGCTAGCTTAAAACC

CAAAGGACTTGACGGTGTCTCAGACCCACCTAGAGGAGCCTGTTCTAGAACCGATAATCC

CCGTTAAACCTCACCATCCCTTGTCTTACCCGCCTATATACCGCCGTCGCAAGCTTACCC

TGTGAAGGGCCTACAGTAAGCAAAATGGGCAAGCCCCAGAACGTCAGGTCGAGGTGTAGC

TTACGAGATGGAAAGAAATGGGCTACATTTTCTTAAACAGAATATTACGAACGGCACCAT

GAAATGTGGTGCCTGAAGGTGGATTTAGCAGTAAAAAAA

>U33565 root;Eukaryota;Chordata;Actinopteri;Characiformes;Serrasalmidae;Catoprion;Catoprion mento

TCAGATGTAGGTACGTACAAACAACATCCGCCAGGGCACTACAAGCGCTAGCTTAAAACC

CAAAGGACTTGACGGTGTCTCAGACCCGCCTAGAGGAGCCTGTTCTAGAACCGATAACCC

CCGTTAAACCTCACCATCCCTTGTCTTCCCCGCCTATATACCGCCGTCGCAAGCTTACCC

TGTGAAGGACTTACAGTAAGCAAAATGGGCCAACCCCAGAACGTCAGGTCGAGGTGTAGC

TCACGAGATGGAAAGAAATGGGCTACATTTTCTACAACAGAATATCACGAACGGTACCAT

GAAACCTGGTACCCGAAGGTGGATTTAGCAGTAAAAAAA

>U33569 root;Eukaryota;Chordata;Actinopteri;Characiformes;Serrasalmidae;Myloplus;Myloplus asterias

TCAGATGTTAACACGCACAAACAACATCCGCCAGGGTACTACAAGCGCTAGCTTAAAACC

CAAAGGACTTGACGGTGTCTCAGACCCGCCTAGAGGAGCCTGTTCTAGAACCGATAATCC

CCGTTAAACCTCACCATCCCTTGTTTTCCCCGCCTATATACCGCCGTCGCAAGCTTACCC

TGTGAAGGGCCTACAGTAAGCAAAATGGGTAAACCCCAGAACGTCAGGTCGAGGTGTAGC

TCACGAGATGGGAAGAAATGGGCTACATTTTCTACAACAGAATATCACGAACGGCACCAT

GAAATTTAGTGCCTGAAGGTGGATTTAGCAGTAAAAAAA


Strangely, I have another analysis that is using a separate set of input seqs/taxonomy/fasta (also custom, same format as these files) and all the same parameters, and is working just fine. The format of the taxonomy files, fasta files, and the qiime installation is the same. 

Any ideas on how else to troubleshoot this? 

Jai Ram Rideout

unread,
May 15, 2017, 7:31:42 PM5/15/17
to Qiime 1 Forum
Hi Stephanie,

It looks like some of your reference sequence IDs aren't in your taxonomy mapping file. For example, the error message seems to indicate that the reference sequence ID HM851364 is not in your taxonomy mapping file. Can you verify that each reference sequence has a corresponding entry in the taxonomy file?

Best,
Jai

Stephanie Matthews

unread,
May 15, 2017, 9:07:14 PM5/15/17
to Qiime 1 Forum
Hi Jai, 

Thanks for responding!

The error references correspond to sequences that are present in both the taxonomy mapping file and the fasta file (unfortunately - I was hoping the issue would be that simple). I've tried editing the files and copy/pasting the sequence ID to make sure that they're identical, and the taxonomy mapping file was created from the headers of the fasta file. 

When I said removing the sequence fixed the problem, I meant that I removed it from both files (taxonomy mapping file and fasta file).

Stephanie

Evan Bolyen

unread,
May 16, 2017, 5:21:16 PM5/16/17
to Qiime 1 Forum
Hi Stephanie,

Could you provide your taxonomy map and your reference sequences? It definitely sounds like something very strange is going on and we probably need the files themselves to figure this out.

Thanks!
-Evan

Stephanie Matthews

unread,
May 25, 2017, 3:29:11 PM5/25/17
to Qiime 1 Forum
Hi Jai and Evan, 

My apologies for the delay in responding! 

I seem to have figured out the problem with my files (or at least found a workaround). It turns out that some of my taxonomy mapping strings contained ':' characters, which QIIME/the mothur algorithm doesn't like. I  substituted _ characters for the colons, and this seems to have fixed the problem. Interestingly, the entries that contained ':' were not the same entries that were giving me the errors I described above. 

Also, the errors that I was getting with mothur weren't descriptive enough for me to troubleshoot, so I had been trying to get files to work using the uclust taxonomy assignment method (that's what I was running in the error file above). Changing the : to _ lets the mothur algorithm work, so it's possible that the errors from uclust that I described above are unrelated? 

In any case, removing : characters let me run the command I wanted to run ! Thank you for your willingness to help! 

Best,
Stephanie
Reply all
Reply to author
Forward
0 new messages