using greengenes 13

moni

unread,

Nov 18, 2013, 11:26:46 AM11/18/13

to qiime...@googlegroups.com

Hello,
i'm phd student working on metagenomic data of soil microbioma.
I've obtained 454 reads of 16 amplicons from soil dna, and i'm currently using Qiime for data elaboration.
I've noticed on the Qiime web site that a new release of greengenes(13_5) was released.
I would like to use qiime with this updated database, however i don't understand how to retrain Qiime with the new database since i have no "train" files in this latest version.
I send you a best of greengenes downloaded file names.
Can you help?
Thanks

Greengenes name files.odt

arp

unread,

Nov 18, 2013, 1:22:10 PM11/18/13

to qiime...@googlegroups.com

Hi,

Are you planning to do open- or closed-reference OTU picking?

Note that there is an even newer release of Greengenes (13_8), which includes updates to the 13_5 taxonomy files.

If you just need to use Greengenes as a reference for closed-reference OTU picking, you should not need to retrain anything; just change your settings so that those reference files are being used instead of the older version. You can change this in your qiime config file or, if you are using a workflow script, by passing a parameters file.

If you are using QIIME 1.7.0, you should not need to retrain.

Best,

Adam

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

moni

unread,

Nov 19, 2013, 11:40:12 AM11/19/13

to qiime...@googlegroups.com

Hi Adamrp,
then I am a beginner in using qiime. The moment I received the sequences in fasta format I followed a procedure that involves these steps:
pick_otus.py
pick_rep_set.py
assign_taxonomy.py
make_otu_table.py
summarize_taxa_through_plot.py
I thus obtained the graphs with all the bacterial species present in my samples.
I now use Greengenes for alignment but do not know how to proceed practically to do this.
thanks

arp

unread,

Nov 19, 2013, 1:22:31 PM11/19/13

to qiime...@googlegroups.com

Sorry, I'm not sure what you're trying to do. What alignment are you trying to create?

Also, can you post a copy of the commands you ran for the previous steps you mentioned (pick_otus.py, pick_rep_set.py, assign_taxonomy.py, make_otu_table.py, and summarize_taxa_through_plots.py)?

Thanks,

Adam

--

moni

unread,

Nov 20, 2013, 4:18:32 AM11/20/13

to qiime...@googlegroups.com

hello,
I would like to make taxonomic assignment of my samples using Greengenes.
Until now sequences from fasta I made the 7 steps in the article: "using QIIMe to analyze rRNA gene sequences from Microbial Communities" and I got into graphic bacteria present in each sample.
Now I would like to have confirmation of the results using Greengenes but do not know how to get started.
I do not know if I was not clear.
thanks

arp

unread,

Nov 20, 2013, 1:43:03 PM11/20/13

to qiime...@googlegroups.com

Which "Basic Protocol" from that article were you following? #2 covers assigning taxonomy assignment. What kind of OTU picking do you want to do (de novo, open-reference, or closed-reference)? Greengenes is a reference; I'm not sure what you mean by "confirming your results." If you're already to the point where you are looking at taxa summary plots (presumably identified by their taxonomy strings), I assume you have already assigned taxonomy.

Can you please post a copy of the commands that you ran for the steps you ran?

Thanks,

Adam

--

moni

unread,

Nov 21, 2013, 3:52:04 AM11/21/13

to qiime...@googlegroups.com

hello,
by the Protocol of Article I run:
-pick_otus.py -i ./split_library_output/seqs.fna
obtaining output uclust_picked_otus
-pick_rep_set.py -i ./uclust_picked_otus/seqs_otus.txt -f ./split_library_output/seqs.fna
obtaining output seqs.fna_rep_set.fasta
-assign_taxonomy.py -i ./split_library_output/seqs.fna_rep_set.fasta
obtaining output rdp22_assigned_taxonomy
-make_otu_table.py -i uclust_picked_otus/seqs_otus.txt -t rdp22_assigned_taxonomy/seqs.fna_rep_set_tax_assignments.txt -o otu_table.txt
obtaining otu_table
-summarize_taxa_through-plots.py -i otu_table.txt - o graphics - m Fasting_Map
I got the charts with bacteria of all my sample.
This is the work I've done.

Now I was asked to use Greengenes for a new taxonomic assignment and I think to have confirmation of the results obtained.

What do you advise me to do?

arp

unread,

Nov 21, 2013, 3:45:52 PM11/21/13

to qiime...@googlegroups.com

It sounds like you want to do closed-reference or open-reference OTU picking using Greengenes as the reference set. What kind of samples do you have? I would recommend doing closed-reference OTU picking first, since it is much faster. If you find that most of your reads are not hitting the reference (say, < 95%), then you should consider doing open-reference. See these scripts:

http://qiime.org/scripts/pick_closed_reference_otus.html

http://qiime.org/scripts/pick_open_reference_otus.html

In both cases, you will want to provide the Greengenes rep_set as the reference sequences, and the corresponding taxonomy file. If you download Greengenes 13_5 and unzip it into a directory called gg_13_5_otus, then these files will be located at gg_13_5/rep_set/97_otus.fasta (if you want to use 97% OTU clusters) and gg_13_5/taxonomy/97_otu_taxonomy.txt, respectively.

Best,

Adam

moni

unread,

Nov 22, 2013, 5:24:56 AM11/22/13

to qiime...@googlegroups.com

Adam thanks for your time.
I have a version of Qiime 1.3.0 and when I write the command "pick_closed_reference.py" tells me "command not found".
I need to update or download some scripts?
thanks

arp

unread,

Nov 22, 2013, 11:35:27 AM11/22/13

to qiime...@googlegroups.com

Yes, I would recommend updating to version 1.7.0. You can still do closed-reference OTU picking in the older version, but the newer version has a lot of improvements and bug fixes.

Best,

Adam

--

Julia Vierheilig

unread,

Nov 27, 2013, 10:08:49 AM11/27/13

to qiime...@googlegroups.com

Hi,

where can I find the new release of greengenes (13_8)?

I couldn´t find it on the greengenes homepage (http://greengenes.secondgenome.com/downloads), not on the qiime homepage (http://qiime.org/home_static/dataFiles.html) and not somewhere else online either.

Thanks,

Julia

arp

unread,

Nov 27, 2013, 10:23:47 AM11/27/13

to qiime...@googlegroups.com

Hi Julia,

You can get it here: ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz

Best,
Adam

Julia Vierheilig

unread,

Nov 27, 2013, 10:30:27 AM11/27/13

to qiime...@googlegroups.com

Thank you very much Adam!

Bibaswan

unread,

Dec 2, 2013, 6:44:09 PM12/2/13

to qiime...@googlegroups.com, vierh...@waterresources.at

Hi Adam,

Since you brought up the topic of the different ways to pick otus. Actually I am using the Greengenes 13.8 version for my updated data analysis but I am using the denovo method. Could you please tell me the differences between de novo, open and closed otu picking methods? After reading these threads I am confused on which one is the best to use.

Thanks,

Bibaswan

Luke Ursell

unread,

Dec 2, 2013, 6:48:23 PM12/2/13

to qiime...@googlegroups.com

Hi Bibaswan,

Closed reference is when you take your sequences and cluster them against the Greengenes reference set - any of your sequences that don't hit and thus discarded.

Open reference is when you do closed reference, but instead of discarding the sequences that don't hit you cluster them at a certain percent identity to create de novo OTUs which you then try to assign taxonomy to using Greengenes.

De novo is where all sequences are first clustered by percent identity, then a representative sequence is used to assign taxonomy to the entire cluster.

Closed reference is the fastest, de novo is the slowest, open reference is in the middle. Which one to use depends both on your question of interest (i.e. are you interested in finding unique OTUs, or just comparing communities between say control and treatment groups), the number of sequences you have, and your computing power (and how much time you want to wait for your OTU table).

Luke

moni

unread,

Dec 9, 2013, 3:33:15 AM12/9/13

to qiime...@googlegroups.com

Hello Luke,
I also had doubts about what kind of system to use.
I have bacterial DNA extracted from soil and I used the QIIME for taxonomic assignment starting from the open system.
Was I right?

Luke Ursell

unread,

Dec 9, 2013, 11:41:23 AM12/9/13

to qiime...@googlegroups.com

Hi,

By open system do you mean open reference OTU picking?

There are pros and cons to using open reference vs. close reference vs. de novo. It sounds like you want to do closed-reference or open-reference OTU picking using Greengenes as the reference set. What kind of samples do you have? I would recommend doing closed-reference OTU picking first, since it is much faster. If you find that most of your reads are not hitting the reference (say, < 95%), then you should consider doing open-reference. See these scripts:

http://qiime.org/scripts/pick_closed_reference_otus.html

http://qiime.org/scripts/pick_open_reference_otus.html

Luke

moni

unread,

Dec 10, 2013, 3:22:30 AM12/10/13

to qiime...@googlegroups.com

Hello,
I have extracted and amplified DNA from soil bacteria 16s across the region.
The results of the open system are the following, do you think they go well?

UclustOtuPicker parameters:
Application:uclust
Similarity:0.97
enable_rev_strand_matching:False
exact:False
max_accepts:20
max_rejects:500
new_cluster_identifier:None
optimal:False
output_dir:uclust_picked_otus
prefilter_identical_sequences:True
presort_by_abundance:True
save_uc_files:True
stable_sort:True
stepwords:20
suppress_sort:True
word_length:12
Num OTUs:48458
Result path: uclust_picked_otus/seqs_otus.txt

moni

unread,

Dec 10, 2013, 3:56:21 AM12/10/13

to qiime...@googlegroups.com

I did the analysis with the closed system, and these are the results ... which one should I take it as good?

OtuPicker parameters:
Application:uclust
Similarity:0.97

chimeras_retention:union

enable_rev_strand_matching:False
exact:False
max_accepts:20
max_rejects:500

new_cluster_identifier:denovo
next_new_cluster_number:1
optimal:False
output_dir:otus_w_tax/uclust_ref_picked_otus

prefilter_identical_sequences:True
presort_by_abundance:True
save_uc_files:True
stable_sort:True
stepwords:20

suppress_new_clusters:True
suppress_sort:True
word_length:12
Reference seqs:/home/qiime/Desktop/Shared_Folder/sffs/2013_10_28_AndreaPorceddu_1-10_SFF.fna
Num OTUs:93648
Num new OTUs:0
Num failures:3542
Result path: otus_w_tax/uclust_ref_picked_otus/seqs_otus.txt

Luke Ursell

unread,

Dec 10, 2013, 11:39:07 AM12/10/13

to qiime...@googlegroups.com

Hi Moni,

In principle both look fine, using the closed versus open OTU reference picking is up to you based on speed and if you want to identify new OTUs of interest, or to compare communities using known OTUs.

Luke

Reply all

Reply to author

Forward

using greengenes 13_5

moni

arp

moni

arp

moni

arp

moni

arp

moni

arp

Julia Vierheilig

arp

Julia Vierheilig

Bibaswan

Luke Ursell

moni

Luke Ursell

moni

moni

Luke Ursell