Using the Silva database

505 views
Skip to first unread message

lande...@greenmtn.edu

unread,
May 11, 2017, 7:55:41 PM5/11/17
to Qiime 1 Forum

I am working with 18S data and will use the Silva database (F566Euk/R1200Euk and Euk_1391f/EukBr). Based on the wealth of information in the forum I think I've got it working on my local computer. However, I was wondering how I can access Silva through an EC2 instance. Specifically, I think I need to know the path to the Silva directory. Sorry if it's in an older thread that I might have missed. Thanks for any help!

TonyWalters

unread,
May 12, 2017, 1:56:08 AM5/12/17
to Qiime 1 Forum
Hello,


Be sure to check the license.txt and readme.txt file on this page regarding academic and non-academic use: https://www.arb-silva.de/no_cache/download/archive/qiime/

-Tony

lande...@greenmtn.edu

unread,
May 12, 2017, 11:57:36 AM5/12/17
to Qiime 1 Forum
Thanks Tony. I have the download but perhaps I am misunderstanding something. Is that already installed on an EC2 instance or would I upload the file to my instance? Thanks again.

Bill

TonyWalters

unread,
May 12, 2017, 12:12:45 PM5/12/17
to qiime...@googlegroups.com
It's not installed by default on EC2 instances-you'd have to either use a wget command to download it, e.g.
(while you're logged into the instance)
or scp it to your instance if you've already downloaded it as an alternative.

lande...@greenmtn.edu

unread,
May 12, 2017, 12:22:23 PM5/12/17
to Qiime 1 Forum
Thanks for the clarification!

lande...@greenmtn.edu

unread,
May 15, 2017, 9:24:50 AM5/15/17
to qiime...@googlegroups.com
I seem to be having some trouble with the assign_taxonomy command when using open reference OTU picking and the Silva database. This is 18S data and I tried both the "18S only" as well as "taxonomy all." It looks like everything worked, with the exception of assigning taxonomy. The error message below is with all taxonomy. Content of the parameter is: assign_taxonomy:id_to_taxonomy_fp ~/SILVA_128_QIIME_release/taxonomy/taxonomy_all/97/consensus_taxonomy_all_levels.txt. Any thoughts? Thanks again for the help!

Error message from pick_open_reference_otus.py -i ~/Split_Host/seqs.fna -o ~/OpenReference_Host/ -p ~/openref_param.txt -r ~/SILVA_128_QIIME_release/rep_set/rep_set_all/97/97_otus.fasta

Traceback (most recent call last):
  File "/usr/local/bin/pick_open_reference_otus.py", line 453, in <module>
    main()
  File "/usr/local/bin/pick_open_reference_otus.py", line 432, in main
    minimum_failure_threshold=minimum_failure_threshold)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/pick_open_reference_otus.py", line 1030, in pick_subsampled_open_reference_otus
    status_update_callback=status_update_callback)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/pick_open_reference_otus.py", line 232, in assign_tax
    close_logger_on_success=close_logger_on_success)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError:
*** ERROR RAISED DURING STEP: Assign taxonomy
Command run was:
 assign_taxonomy.py -o /home/ubuntu/OpenReference_Host//uclust_assigned_taxonomy -i /home/ubuntu/OpenReference_Host//rep_set.fna --id_to_taxonomy_fp ~/SILVA_128_QIIME_release/taxonomy/taxonomy_all/97/consensus_taxonomy_all_levels.txt
Command returned exit status: 1
Stdout:
Stderr
Traceback (most recent call last):
  File "/usr/local/bin/assign_taxonomy.py", line 417, in <module>
    main()
  File "/usr/local/bin/assign_taxonomy.py", line 394, in main
    log_path=log_path)
  File "/usr/local/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 1306, in __call__
    result = self._uc_to_assignments(app_result['ClusterFile'])
  File "/usr/local/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 1364, in _uc_to_assignments
    tax = self.id_to_taxonomy[subject_id].split(';')
KeyError: '815498'

ubuntu@ip-10-37-180-150:~$ qiime_print_config.py
qiime_print_config.py: command not found
ubuntu@ip-10-37-180-150:~$ print_qiime_config.py
System information
==================
         Platform: linux2
   Python version: 2.7.3 (default, Aug  1 2012, 05:14:39)  [GCC 4.6.3]
Python executable: /usr/bin/python
QIIME default reference information
===================================
For details on what files are used as QIIME's default references, see here:
 https://github.com/biocore/qiime-default-reference/releases/tag/0.1.2
Dependency versions
===================
          QIIME library version: 1.9.1
           QIIME script version: 1.9.1
qiime-default-reference version: 0.1.2
                  NumPy version: 1.9.2
                  SciPy version: 0.15.1
                 pandas version: 0.16.1
             matplotlib version: 1.4.3
            biom-format version: 2.1.4
                   h5py version: 2.5.0 (HDF5 version: 1.8.4)
                   qcli version: 0.1.1
                   pyqi version: 0.3.2
             scikit-bio version: 0.2.3
                 PyNAST version: 1.2.2
                Emperor version: 0.9.51
                burrito version: 0.9.1
       burrito-fillings version: 0.1.1
              sortmerna version: SortMeRNA version 2.0, 29/11/2014
              sumaclust version: SUMACLUST Version 1.0.00
                  swarm version: Swarm 1.2.19 [May 26 2015 15:28:37]
                          gdata: Installed.
QIIME config values
===================
For definitions of these settings and to learn how to configure QIIME, see here:
 http://qiime.org/install/qiime_config.html
 http://qiime.org/tutorials/parallel_qiime.html
                     blastmat_dir: /qiime_software/blast-2.2.22-release/data
      pick_otus_reference_seqs_fp: /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                         sc_queue: all.q
      topiaryexplorer_project_dir: None
     pynast_template_alignment_fp: /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta
                  cluster_jobs_fp: start_parallel_jobs.py
pynast_template_alignment_blastdb: None
assign_taxonomy_reference_seqs_fp: /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                     torque_queue: friendlyq
                    jobs_to_start: 1
                       slurm_time: None
            denoiser_min_per_core: 50
assign_taxonomy_id_to_taxonomy_fp: /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
                         temp_dir: /home/ubuntu/temp/
                     slurm_memory: None
                      slurm_queue: None
                      blastall_fp: /qiime_software/blast-2.2.22-release/bin/blastall
                 seconds_to_sleep: 1

TonyWalters

unread,
May 15, 2017, 9:44:24 AM5/15/17
to Qiime 1 Forum
Hello,

I'm guessing that there is a:
assign_taxonomy:reference_seqs_fp  ~/SILVA_128_QIIME_release/rep_set/rep_set_all/97/97_otus.fasta

missing from the ~/openref_param.txt specified above, and the assign_taxonomy.py is instead pulling the default Greengenes reference sequence files, causing it to be confused when the IDs do not match up between the taxonomy file and the reference sequences file. This file has to be independently set from the -r file used when calling pick_open_reference_otus.py (which is only used during the clustering steps).

-Tony

lande...@greenmtn.edu

unread,
May 15, 2017, 9:54:29 AM5/15/17
to Qiime 1 Forum
I see. So keep the "-r" command as above but also include it within the parameters file. Thank you!

lande...@greenmtn.edu

unread,
May 15, 2017, 2:57:23 PM5/15/17
to Qiime 1 Forum
Thanks Tony, the suggestion worked. I think I'm almost there but still have one more snag: With the Open Reference OTU picking workflow I get 7 levels of taxonomy. I can improve this to all 14 levels using a separate "summarize taxonomy" command with the out_table_w_taxa_biom file. I was wondering if there was a way to get the 14 levels from within the open reference workflow, for example via the qiime parameters file. I know there is a "-L" command in "summarize taxonomy" but since that command is not part of the open reference workflow I don't think I can use it.

The Open Reference command takes 4 hours, hence I'm trying to avoide too much trial and error! Thanks again.

Bill

TonyWalters

unread,
May 15, 2017, 3:07:47 PM5/15/17
to Qiime 1 Forum
Hello Bill,

I would just run summarize_taxa_through_plots.py on the OTU table output from the current open-reference OTU picking process (this takes very little time compared to the OTU picking pipeline), and pass the parameters file with -p, with this line in it:
summarize_taxa:level 14

There are going to be a lot of empty levels though with that plot, since only some of the Eurkaryotes have that many levels of taxonomy defined.

-Tony


lande...@greenmtn.edu

unread,
May 15, 2017, 4:44:25 PM5/15/17
to Qiime 1 Forum
That did the job - thank you once again!

lande...@greenmtn.edu

unread,
May 17, 2017, 11:20:30 AM5/17/17
to Qiime 1 Forum
For my first dataset the OTU picking was fairly quick - about 4 hours. For the next dataset - which has about half the number of sequences and shorter amplicon length - it's been going for over one day and is still running uclust for step 1. My guess is that this is because there are far more OTUs in the second dataset, which is expected. However, does this processing time sound like a possibility? There are about 9 million sequences, amplicon size of <200bp and running on a c3.8xlarge instance.

Command is:
pick_open_reference_otus.py -i ~/Split_Protist/seqs.fna -o ~/OpenReference_Protist/ -p ~/openref_param.txt -r ~/SILVA_128_QIIME_release/rep_set/rep_set_18S_only/97/97_otus_18S.fasta

Jai Ram Rideout

unread,
May 17, 2017, 6:54:54 PM5/17/17
to Qiime 1 Forum
Hello,

That runtime sounds reasonable for open-reference OTU picking (especially because it isn't being run in parallel with -aO). However, it's possible that the job failed internally -- can you check whether the job is still using CPU and memory? If it's not using those resources, it's likely that the job failed somewhere and will remain in a "stuck" state.

Best,
Jai

lande...@greenmtn.edu

unread,
May 17, 2017, 8:36:55 PM5/17/17
to Qiime 1 Forum
Hi Jai,

Thanks for the info. It's definitely still running so I guess I'll just sit tight. As for running in parallel, do you know how I could determine if my instance has that capability? If yes, is it as simple as just passing "-a"? Thanks again for your help!

Bill

lande...@greenmtn.edu

unread,
May 18, 2017, 9:05:41 AM5/18/17
to Qiime 1 Forum
OK, I found the instructions for this.

Reply all
Reply to author
Forward
0 new messages