What files from greengenes do I need to download for assign_taxonomy.py?

1,651 views
Skip to first unread message

james.odon

unread,
Apr 13, 2015, 4:13:17 PM4/13/15
to qiime...@googlegroups.com
I am trying to figure out what files I need to download from greengenes.

From the text at the bottom of the page from qiime.org, it seems I need a map from id to taxonomy, and another file with has reference sequences.  I followed the directions to the greengenes database download page:

Which of these files should I download?  Where should I put them on my computer?

Thanks,

Jim

relevant text from qiime.org:

Reference data sets and id-to-taxonomy maps for 16S rRNA sequences can be found in the Greengenes reference OTU builds. To get the latest build of the Greengenes OTUs (and other marker gene OTU collections), follow the “Resources” link from http://qiime.org. After downloading and unzipping you can use the following files as -r and -t, where <otus_dir> is the name of the new directory after unzipping the reference OTUs tgz file.

-r <otus_dir>/rep_set/97_otus.fasta -t <otus_dir>/taxonomy/97_otu_taxonomy.txt

John Chase

unread,
Apr 13, 2015, 5:23:06 PM4/13/15
to qiime...@googlegroups.com
Hi James, 

The files you want are available on the QIIME resources page. The latest greengenes release is the first link on that page. Once the files are downloaded you can put them any where on your system as long the path to the files is defined in the QIIME config file. QIIME 1.9.0 already contains the 97 percent greengenes database as the default so you only need to change your config file if you wish to change the default reference files. 

If you want to use other green genes reference sets occasionally I would recommend just passing the path to those files on the command line. For instance 

assign_taxonomy.py -i input.fasta -r 99_otus.fasta -t 99_otu_taxonomy.txt

John






--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James O'Donnell

unread,
Apr 13, 2015, 6:47:47 PM4/13/15
to qiime...@googlegroups.com
Hi, I went to the QIIME resources page, then went to the first link.
The links are below; I clicked on the "Greengenes OTUs (16s)", not the 13_8 link.
Greengenes OTUs (16S)
13_8 (most recent)
13_5
12_10

Clicking on that link sent me to another page, where I clicked on the "Greengenes database" from May 2013 and it sent me to the following link.

Am I supposed to download everything?  Which two files am I supposed to put after the -r and -t flags?
I assumed the file that should go after the -r flag should be
and 
the file that should go after the -t flag should be
gg_13_5_taxonomy.txt.gz (9315 KiB)  (after unzipping)

Is that correct?  Do I also need the rest of the files on that page.

Thanks in advance!

-Jim

Also, is it correct that assgin_taxonomy.py should not work at all until these files are downloaded?  In other words, the original virtualbox did not have these files.  

I am running qiime 1.8.0.  I

James J.  O'Donnell PhD
Postdoctoral Fellow
Boone Lab
Indiana University School of Medicine -- South Bend

james.odon

unread,
Apr 13, 2015, 7:16:57 PM4/13/15
to qiime...@googlegroups.com
So I'm looking more at these files on the resource page, and it looks like the first link "Greengenes OTUs (16s)" just shows you all the individual files while the other links have them tarred.  So I guess I should copy the link address for 13_8 link, and then use wget on that?

I have tried that and for some reason it can't connect to the web page.  Specifically, I copied the link,which started out "ftp://greengenes.microbio.me/.......etc......"
I then typed the command

The connection times out, though.

-Jim
--

---


james.odon

unread,
Apr 13, 2015, 10:23:24 PM4/13/15
to qiime...@googlegroups.com
So I've looked around and apparently I already have these files.  I guess they were downloaded automatically or perhaps an attempted upgrade.  What's more, the .log file that is created when I run assign_taxonomy.py with all the defaults (no -r or -t options) shows that it knows where to go to find the 97_otus.fasta and 97_otu_taxonomy.txt as it shows the correct full path.

While this is good, I'm back to square one in that I don't know why assign_taxonomy.py will not run.  I've tried my files on someone else's computer and the command runs fine.  

Any ideas?

-Jim

james.odon

unread,
Apr 13, 2015, 10:55:51 PM4/13/15
to qiime...@googlegroups.com
So it looks like a big problem was the fact that there was no .qiime_config file.
I copied a .qiime_config file from this Michigan State site: https://wiki.hpcc.msu.edu/display/Bioinfo/Using+QIIME
I am using qiime version 1.8.  assign_taxonomy.py is still not working, but at least now it gives me an error.  
I will post the error in a new thread.

Is the following copied .qiime_config file ok?

For QIIME version 1.8.0 the following works correctly:

cluster_jobs_fp /opt/software/QIIME/1.8.0--GCC-4.4.5/bin/start_parallel_jobs_torque.py
python_exe_fp python
working_dir /mnt/scratch/someUSER
blastmat_dir /opt/software/BLAST/2.2.22--GCC-4.4.5/data
blastall_fp blastall
pynast_template_alignment_fp
pynast_template_alignment_blastdb
template_alignment_lanemask_fp
jobs_to_start 100
seconds_to_sleep 60
qiime_scripts_dir /opt/software/QIIME/1.8.0--GCC-4.4.5/bin
temp_dir /mnt/scratch/someUSER/tmp/
denoiser_min_per_core 50
cloud_environment False
topiaryexplorer_project_dir
torque_queue main
assign_taxonomy_reference_seqs_fp
assign_taxonomy_id_to_taxonomy_fp

John Chase

unread,
Apr 14, 2015, 4:36:28 PM4/14/15
to qiime...@googlegroups.com
Hi, 

The most recent question on this thread regarding the config file and assign taxonomy is being addressed on this thread. I will follow up there. 

Thanks,

John

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
Reply all
Reply to author
Forward
0 new messages