Statistics metrics with QIIME

478 views
Skip to first unread message

beatrizgi...@gmail.com

unread,
Jun 27, 2016, 10:56:54 AM6/27/16
to Qiime 1 Forum
Hi, 

I would like to perform the Shannon index but I do know how. Once I generated the alpha_rarefaction_plots.html just can see Chao1, Observed_species and PD_whole_tree metrics. 

Thanks. 

Bea. 

zhiying Guo

unread,
Jun 27, 2016, 11:42:41 AM6/27/16
to Qiime 1 Forum
Hello,
alpha_diversity.py has many choices for metrics of alpha diversity, you can acquire the script here http://qiime.org/scripts/alpha_diversity.html
you can use alpha_diversity.py -s to show all the allowed metrics, and with -m option to calculate the diversity index you like.

if you want to generate rarefaction result with alpha_rarefaction.py (which is a workflow), you need to set a parameter file. This parameter file tells the workflow which metric to use to calculate alpha diversity index.

Best,
Zhiying Guo

beatrizgi...@gmail.com

unread,
Jun 28, 2016, 6:36:00 AM6/28/16
to qiime...@googlegroups.com
Hi Zhiying, 

Thanks. 

I'm working just with 4 samples and I'm using the otu_table.biom itself without any filter to calculate the alpha_diversity. I've read something about to filter the otu_table for singletons and doubletons for the statistics metrics but I don't know how much impact it could have not to do it in my analysis. 

 


beatrizgi...@gmail.com

unread,
Jun 29, 2016, 6:38:48 AM6/29/16
to qiime...@googlegroups.com
Hi, 

As I've been checking, the singletons discard is recommended for the metrics calculations. I'm just wondering if it is ok to do it just for the diversity metrics and not necessary to do it for the previous steps like when assign taxonomy. 

Should the singletons be filtered from the otu_table.biom (through filter_otus_from_otu_table) before run the diversity scripts? 

Filtering the singletons from my otu_table, it results in an otu_table_no_singletons.biom output which it shows exactly the same values. I mean, there is no difference at all in both otu tables (otu_table and otu_table_no_singletons).  




Colin Brislawn

unread,
Jun 29, 2016, 2:18:15 PM6/29/16
to Qiime 1 Forum
Good morning,

I'm a little leery to comment here, because statistical normalization and singleton removal is such a contentious field. Reasonable scientists disagree! 

Here are two resources which could be helpful when you make your decision:
Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible. This directly relates to throwing away low abundance OTUs.
Robert Edgar (the uclust and usearch developer) on singletons. This relates to singletons reads during OTU clustering, and some phylosophy about how to think about singletons and errors. 

I hope this helps.
Colin

beatrizgi...@gmail.com

unread,
Jun 30, 2016, 5:13:28 AM6/30/16
to Qiime 1 Forum
Hi Colin, good morning: 

Thanks for your answer. I knew the website of Robert Edgar and I had a look about the singletons yesterday. I will have a look of the other link you've shared. 

I don't know at what point my singletons were discarded as looking through the converted biom I've realized I don't have any singletons. So that is great :D Maybe I discarded them in some of the quality filters I applied for the data. 

Regards, 

Beatriz. 

Colin Brislawn

unread,
Jun 30, 2016, 12:39:44 PM6/30/16
to Qiime 1 Forum
Hello Beatriz,

Some (all?) of the qiime scripts for OTU picking will throw away OTUs which appear only once, essentially following the reasonable recommendations of Robert Edgar. 

Alpha diversity is calculated for each sample separately. So while you may have no singletons if you take the study as a whole, you may have OTUs that appear once in a single sample. If you choose to remove these, mention it in the methods section. The choice is yours. 

Colin

beatrizgi...@gmail.com

unread,
Jul 1, 2016, 7:51:14 AM7/1/16
to Qiime 1 Forum
Hi Colin, 

Ya, I know what you mean. When I convert my otu table I don't have anything with less than two OTUS, so I don't have singletons. I guess they've been removed during the otu picking processing with some of the filters even when the quality filters where done. I I have to check in a single sample, but I would say I don't either have. Maybe I'm just wrong as I'm just starting to work with qiime. 

Regards, 

Bea. 

beatrizgi...@gmail.com

unread,
Jul 14, 2016, 11:41:55 AM7/14/16
to Qiime 1 Forum
HI Colin, 

I went again through your email. I have been checking that I don't have singletons taking the study as a whole, as you mentioned. That means, there are no singletons in the total count of OTU. But, I have singletons if we look within each sample separately. I removed the singletons using the filter script so I imagine that is correct done. However, I'm wondering why I still have singletons looking at each single sample. Anyway, is that correct for the analysis? I don't really see what is the different if I remove the singletons from the total account but they still appear in the single samples. 

Thanks, 

Bea. 

Colin Brislawn

unread,
Jul 14, 2016, 4:13:17 PM7/14/16
to Qiime 1 Forum
Hello Bea,

But, I have singletons if we look within each sample separately. 
Yep. That sounds normal. OTUs which appear once in the whole study are removed, but OTUs that appear multiple times but only once in some samples are kept. 


I removed the singletons using the filter script so I imagine that is correct done. However, I'm wondering why I still have singletons looking at each single sample.
What is the full command you used for filtering? I ask because I'm not sure if that command worked on each sample, or on the data set as a whole. 

Anyway, is that correct for the analysis? 
That choice is up to you! I tend to keep all my OTU including the rare ones, because I'm interested in all sources of variance in community structure. Also, filtering out OTUs will change your diversity metrics. If you are focused on the most common microbes, filtering rare OTUs from samples makes sense. 

Colin
 

beatrizgi...@gmail.com

unread,
Jul 15, 2016, 6:27:17 AM7/15/16
to Qiime 1 Forum
Hi Colin, 

Thanks so much for your explanations as it is more clear for me all that question about singletons. 

The full command I used for filtering was: filter_otus_from_otu_table -n 2   Once I ran that command I realize that at some point during the OTU picking, the singletons were discarded because the out put otu_table_no_singletons.biom is the same, not differences at all with the original otu_table.biom. 

I get the point to make about filtering rare OTUs. Maybe for the beta diversity it would make sense to filter the most common ones but I have to think about it yet. In case I would like to filter the singletons from the single samples, what would I have to do? Because, as we talked before, using the -n argument it is just applied to the whole study. 


Regards, 
Bea. 
 

beatrizgi...@gmail.com

unread,
Jul 15, 2016, 11:35:42 AM7/15/16
to Qiime 1 Forum
Hi Collin, 

I have another question related with the otu_biom table use as input for the metrics. I generated the data using the otu_table. biom its self. I mean, without any normalization. I've recently read that some people use the rarefied_otu_table for running the metrics. 

I used the pick_open_reference_otus.py  excluding the taxonomy_assignment. Then I generated the otu_table.biom with the observation metadata ( taxonomy metadata for each OTU). And that is the table I'm using for running the different metric scripts. Qiime tutorial explains that there are two different ways to rarefied the otu_table: single_rarefaction.py and multiple_rarefaction.py. Then, there is another option for normalize_table.py. But I don't really know how important is to do all of this normalization or not. So I would like to know if there is something I have to do before to run the metric scripts. 

I guess to summarize_taxa_through_plots.py is ok to use the BIOM table without any normalization? 

Regards. 
Bea. 

Colin Brislawn

unread,
Jul 15, 2016, 11:54:17 AM7/15/16
to Qiime 1 Forum
Good morning Bea,

In case I would like to filter the singletons from the single samples, what would I have to do?
I'm not sure if there is a way to do that with qiime. I looked over the documentation for filter_otus_from_otu_table.py and I didn't see how to do that. Also, I'm not sure that's a good idea. This paper makes the argument that changing the counts of reads in samples, including by filtering, will mess up the statistical distributions and reduce the resolutions of your metrics. The argument is that by keeping all your OTUs, you are able to resolve smaller differences between samples.

But I don't really know how important is to do all of this normalization or not. So I would like to know if there is something I have to do before to run the metric scripts.
Yes, normalization is important. If a sample has 10x more sequences in it, it will probably have more observed OTUs because the additional reads will capture more of the microbial community. When doing tests, you want to make sure the differences are caused by biology, and not by random differences in sequencing depths. Normalization will control for sequencing depth. That paper I mentioned also compares different normalization methods. I think it;s a great resource to help answer the questions you are asking.

I guess to summarize_taxa_through_plots.py is ok to use the BIOM table without any normalization?
Yes. But alpha and beta diversity metrics really need some sort of normalization. 

Keep in touch,
Colin

beatrizgi...@gmail.com

unread,
Jul 15, 2016, 12:00:13 PM7/15/16
to Qiime 1 Forum
Hi Colin, 

Many thanks for your advice and the resource. I gonna have a look of that paper to decide how to normalize my OTU table before go further with the alpha and beta diversity metrics. 

Keep in touch, 

Bea. 

beatrizgi...@gmail.com

unread,
Jul 18, 2016, 8:57:54 AM7/18/16
to qiime...@googlegroups.com
Hi Colin, 

I had a look at the paper, same as to old conversations in the forum related with rarefying and normalization. After all, I've realized that maybe it is not necessary to normalize or rarefy my OTU table but as I'm not sure I'd like to ask for your opinion first. Here the summary of the OTU table I'm using for the analysis: 

Num samples: 4
Num observations:1004 
Counts/sample summary: 
Min:12459
Max:25212
Counts/sample detail: 
1: 12459
2: 16183
3: 22066
4: 25212


I only have 4 samples and I'm not sure about how important would be to rarefy or normalized the table before the diversity metrics. As, from my point of view, they are kind in the same range and I don't have uneven sequences. Having a look of the rarefaction plots, the curves start to become flat around 2000 sequences per sample. 

Besides, once the beta_diversity_through_plots.py is run, the first step is rarefy the OTU table but in my case I didn't specify any -e argument (refers to the lower number of seqs). If I use the argument -e as 12459 (the lowest number of counts) I see that the PCoA differs to the one without defining -e but just in the case of the unweighted. 

So, although after reading all the information for me it makes sense not to normalize or rarefy the OTU table for the alpha and beta diversity metrics (as I just have 4 samples and they are kind of in the same range of counts per sample) I can appreciate that the PCoA for the unweighted metrics are quite different if I don't rarefy the otu_table. 

In the case of using the same number of counts for each sample (rarefying), what happens with the other number of samples that I'm not considering for the metrics?

And my last question is about alpha_diversity.py, should I run the single_rarefaction.py before using the script for the alpha_diversity? (because for beta_diversity_through_plots.py is part of the step 1)

Sorry for all of that questions but I'm newer with all of this Qiime world :) 


Thanks, 
Beatriz. 

Colin Brislawn

unread,
Jul 18, 2016, 1:46:03 PM7/18/16
to Qiime 1 Forum, Sophie Weiss, Jenya Kopylov
These are great questions, Beatriz! Like you, when I first read the "Waste Not, Want Not" paper, it raised more questions than it answered. These are important to address, but a big challenge for a non-statistician like me. Some comments... 

I only have 4 samples and I'm not sure about how important would be to rarefy or normalized the table before the diversity metrics. As, from my point of view, they are kind in the same range and I don't have uneven sequences.
 The largest sample is 2x larger than the smaller sample. Some kind of normalization is probably needed...
 
Having a look of the rarefaction plots, the curves start to become flat around 2000 sequences per sample. 
... but it may not be all that sensitive, if samples seem consistent around 2000 reads per sample.

And my last question is about alpha_diversity.py, should I run the single_rarefaction.py before using the script for the alpha_diversity? (because for beta_diversity_through_plots.py is part of the step 1)
When you choose a method, you should only perform normalization once. So for certain scripts, you could pass in the raw table, knowing that the workflow script will start by rarefying your samples. 

I'm not sure if I'm the right person to answer all these questions. I'll see if one of the qiime staticians can help more.

Keep in touch!
Colin


beatrizgi...@gmail.com

unread,
Jul 19, 2016, 1:09:25 PM7/19/16
to qiime...@googlegroups.com, sophie...@colorado.edu, jenya....@gmail.com
Hi Colin,

I think it is being more clear about the normalization Vs. rarefy the otu_table. And I'm more like in the normalize size than rarefying.

I've been trying during all day to run the normalize_table.py without any success. There are some problems that we cannot resolve to run that command. Here the error that appears in qiime terminal:

eri@ERI97[4times]  qiime > normalize_table.py -i otus/otu_table.biom -a CSS -o CSS_normalized_otu_table.biom                                                                                       [ 4:27PM]
Traceback (most recent call last):
  File "/usr/local/bin/normalize_table.py", line 157, in <module>
    main()
  File "/usr/local/bin/normalize_table.py", line 140, in main
    normalize_CSS(input_path, out_path, output_CSS_statistics)
  File "/usr/local/lib/python2.7/dist-packages/qiime/normalize_table.py", line 38, in normalize_CSS
    run_CSS(temp_fh.name, out_path, output_CSS_statistics=output_CSS_statistics)
  File "/usr/local/lib/python2.7/dist-packages/qiime/normalize_table.py", line 76, in run_CSS
    app_result = rsl(command_args=command_args, script_name='CSS.r')
  File "/usr/local/lib/python2.7/dist-packages/qiime/util.py", line 1968, in __call__
    (exit_status, command, stdout, stderr))
burrito.util.ApplicationError: Unacceptable application exit status: 1
command: cd "/home/eri/Desktop/4times/";  R --slave --args --source_dir /usr/local/lib/python2.7/dist-packages/qiime/support_files/R -i /tmp/QIIME-normalize-table-temp-table-HlORgh.biom -o CSS_normalized_otu_table.biom  < /usr/local/lib/python2.7/dist-packages/qiime/support_files/R/CSS.r
stdout:

stderr:
Error in CSS(opts$input_path, opts$out_path, opts$output_CSS_statistics) :
  could not find function "load_biom"
Execution halted


R has been updated and all the packages have been installed (R install notes). But, there is something going on wrong and the command cannot be totally ran.

Then, I tried to do the normalization using the CSS in R which I don't really understand very well. I followed the indications from https://bioconductor.org/packages/release/bioc/html/metagenomeSeq.html and I set parameter p as a default, but I don't really understand what p means. The normalized otu table is in txt format (and it is is referred with the taxonomy metadata) but when I use biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy I get the following error: ValueError: column index exceeds matrix dimension

Anyway, I would like to resolve the issue in qiime with the normalize_table.py to work with it as I'm not really familiar with R software. 

Can I used collapse_samples.py as a normalization method? 


Thanks.
Bea.

Colin Brislawn

unread,
Jul 19, 2016, 1:46:02 PM7/19/16
to Qiime 1 Forum, Justine Debelius
Hello Bea,

I think you are right, there is still something not quite right about your R environment. 
Error in CSS(opts$input_path, opts$out_path, opts$output_CSS_statistics) : 
  could not find function "load_biom"
Execution halted

I've cc'ed a qiime developer who may have seen this error before and could be more help. We will get normalize_table.py working for you!

Can I used collapse_samples.py as a normalization method? 
That script just merges samples together. normalize_table.py is right script to use, once we get it working

Thanks for working with me on this,
Colin

beatrizgi...@gmail.com

unread,
Jul 19, 2016, 1:58:01 PM7/19/16
to Qiime 1 Forum, j.deb...@gmail.com
Hi Colin, 

Many thanks for your help. Hope to find the way to resolve the issue and run the normalize_table.py command. 

I have another question: once I get the normalized otu_table and I'll run the beta_diversity_through_plots.py, what should I do with the -e argument? The first step for this command is to rarify the OTU table, which I won't like to do it as It will be normalized. I don't know if I can just run the command ignoring that argument. 

At the same time, I would like to ask how to do a heatmap using the abundances instead of the number of counts. Someone told me by qiime forum to use the normalize_table.py for it, but I don't know how to do it. 

Thanks again, 
Bea. 

Colin Brislawn

unread,
Jul 19, 2016, 2:15:53 PM7/19/16
to Qiime 1 Forum, j.deb...@gmail.com
Hi Bea,

Now these are questions I can answer! 

I have another question: once I get the normalized otu_table and I'll run the beta_diversity_through_plots.py, what should I do with the -e argument? The first step for this command is to rarify the OTU table, which I won't like to do it as It will be normalized. I don't know if I can just run the command ignoring that argument. 
According to the documentation, it looks like if you don't pass the -e flag, no rarefaction is done. I think you can safely ignore it. 

At the same time, I would like to ask how to do a heatmap using the abundances instead of the number of counts. Someone told me by qiime forum to use the normalize_table.py for it, but I don't know how to do it. 
The script make_otu_heatmap.py will convert to relative abundances for you! 

I hope that helps,
Colin

beatrizgi...@gmail.com

unread,
Jul 19, 2016, 2:20:13 PM7/19/16
to Qiime 1 Forum, j.deb...@gmail.com
That is what I thought about to ignore -e :) 

I will try to do the heatmap using relative abundances instead. I guess It would be better to use the normalize otu_table instead the one with the raw counts?

Regards, 
Bea.

Colin Brislawn

unread,
Jul 19, 2016, 5:03:37 PM7/19/16
to Qiime 1 Forum, j.deb...@gmail.com
Hello Bea,

I guess It would be better to use the normalize otu_table instead the one with the raw counts?
I try to use the same .biom table for all my downstream analysis. So using the normalized OTU table make sense to me. You could also try it both ways, and see how they differ. 


Colin

Sophie

unread,
Jul 20, 2016, 1:49:35 AM7/20/16
to Qiime 1 Forum, j.deb...@gmail.com
Hi Bea,
For your load_biom R error, please double check your metagenomeSeq installation.  Open a new R session, and type:
library("metagenomeSeq")

Can you then see the load_biom function?
Thanks,
Sophie

beatrizgi...@gmail.com

unread,
Jul 20, 2016, 5:02:26 AM7/20/16
to Qiime 1 Forum, j.deb...@gmail.com
Goodmorning Sophie,

That is what appears after type library("metagenomeSeq"):

> library("metagenomeSeq")
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unlist

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: limma

Attaching package: ‘limma’

The following object is masked from ‘package:BiocGenerics’:

    plotMA

Loading required package: matrixStats
matrixStats v0.50.2 (2016-04-24) successfully loaded. See ?matrixStats for help.

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians

Loading required package: RColorBrewer
Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

I cannot see any load_biom function.

We also use Rstudio at some points and I think metagenomeSeq was downloaded in Rstudio. As I am not familiar working with R I don't know if that make any sense.

Bea.

beatrizgi...@gmail.com

unread,
Jul 20, 2016, 8:02:25 AM7/20/16
to Qiime 1 Forum, j.deb...@gmail.com
Hi Colin, 

Ya, it makes sense of course. 
In relation to the heatmap and to make it using relative abundances instead of number of counts: I used the argument --absolute_abundance but the scale is from 0 to 3.6 I don't know if that is correct or not. 

Once I have my normalize_otu_table I don't think that I might run the command make_otu_heatmap.py using the --absolute_abundance argument, since that normalize samples and they will already be after the normalization, do I? 

As I have 1004 observations, the output results in a huge tree, but If I filter the samples and take just few of them I won't show the whole results. Any advice? :) 

Regards, 
Bea. 

Sophie

unread,
Jul 20, 2016, 11:00:59 AM7/20/16
to Qiime 1 Forum, j.deb...@gmail.com
Hi Bea,
That makes sense. After library("metagenomeSeq"), type:
load_biom

then press enter.  What do you see?
Thanks, 
Sophie

beatrizgi...@gmail.com

unread,
Jul 20, 2016, 11:09:25 AM7/20/16
to qiime...@googlegroups.com, j.deb...@gmail.com
Hi Sophie, 

After typing load_biom: 

Error: object 'load_biom' not found 

I've followed the instructions in qiime installation guide and I ran the following commands in R: 

install.packages(c('ape', 'biom', 'optparse', 'RColorBrewer', 'randomForest', 'vegan'))
source('http://bioconductor.org/biocLite.R')
biocLite(c('DESeq2', 'metagenomeSeq'))
If I ran them separately, there are few errors with install.packages. And running all together I got few errors too at the end and warning messages. 

Error in file(file, if (append) "a" else "w") :
cannot open the connection
ERROR: installing package DESCRIPTION failed for package ‘metagenomeSeq’
* removing ‘/home/eri/R/x86_64-pc-linux-gnu-library/3.3/metagenomeSeq’

The downloaded source packages are in
‘/tmp/RtmpVfgmBv/downloaded_packages’
Warning messages:
1: In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
installation of package ‘DESeq2’ had non-zero exit status
2: In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
installation of package ‘metagenomeSeq’ had non-zero exit status
3: installed directory not writable, cannot update packages 'rggobi', 'rgl',
'RGtk2'
>


I cannot copy all the text because it is so long, so that is the last part after the installation.

Thanks,
Bea.


Colin Brislawn

unread,
Jul 20, 2016, 12:29:35 PM7/20/16
to Qiime 1 Forum
Hello Bea,

Once I have my normalize_otu_table I don't think that I might run the command make_otu_heatmap.py using the --absolute_abundance argument, since that normalize samples and they will already be after the normalization, do I? 
I usually don't recommend normalizing twice, but heatmaps are an exception. Here, scaling everything to relative abundance is nice for graphing (that way you scale from 0 to 1 instead of 0 to 3.6). 

As I have 1004 observations, the output results in a huge tree, but If I filter the samples and take just few of them I won't show the whole results. Any advice? :) 
Good point! A graph showing 1004 OTUs would be hard to read. I usually merge OTUs with a similar taxonomy together, then graph. So I may use summarize_taxa.py to group at the class level, then graph those classes (so while there may be 1004 OTUs, they would only be in maybe 20 classes, makeing the graph easier to read). 

Good luck!
Keep up the good work!
Colin

beatrizgi...@gmail.com

unread,
Jul 21, 2016, 7:07:32 AM7/21/16
to qiime...@googlegroups.com
Hello Colin, 

First of all to say I got the normalized table finally! :) So now I'm working with the normalized OTU table to make the heatmap and to get the beta diversity metrics.

Again, so many thanks for your help as I'm starting to understand the different options to work with QIIME. However, every time I have more questions :) So: 

1. When you mean "scaling everything" I don't know how to scale the heatmap with the arguments for make_otu_heatmap.py script. 

- I've summarized the otu_table at family and genus level, as they are the ones of interest for my study. But, two things happen here: 

2. I did not use Greengenes database and the format for the taxonomy is something like this: 

D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 4;D_4__Unknown Family
D_0__Bacteria;D_1__Bacteroidetes;D_2__Cytophagia;D_3__Cytophagales;D_4__Cyclobacteriaceae 

I don't really know if that is the normal format for Greengenes too but once I summarize the otu_table.biom at -L 5 (family level) the output it has the format I showed you above. I expected something like: "Unknown Family" "Cyclobacteriaceae" (as I passed -L for family level). 

I can delete that information from the .txt file generated as an output and then convert the .txt file into a .biom table. But I'm wondering if there is any other way to do it. 

Then if I summarize_taxa.py at genus level (-L 6) I get a .txt file that looks like that: 

D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 4; D-4__Unknown Family;D_5__Blastocatella 
D_0__Bacteria;D_1__Bacteroidetes;D_2__Cytophagia;D_3__Cytophagales;D_4__Cyclobacteriaceae;D_5__Algoriphagus

Once again I expected just "Blastocatella" and "Algoriphagus". 

I think it might be something with the format of the file I used as a reference (from SILVA). My rep_set_tax_assigments file it has that format. 

3. Once I run the make_otu_heatmap.py using the output from the summarize_taxa.py I used the following command: 

make_otu_heatmap.py -i otu_table.biom --absolute_abundance -o heatmap_L5_abundance.txt 

As I would like to show the tree as well in the heatmap, I set the argument -t for that. But, when I run the command: make_otu_heatmap.py -i otu_table.biom --absolute_abundance -o heatmap_L5_abundance.txt -t otus/rep_set.tre  the following error appears: biom.exception.TableException: Duplicate sample IDs!

Dunno how to fix or avoid that and to get the heatmap with the tree. That happens too if I pass the -m argument in the last command indicated. 

Thanks, 
Regards. 
Bea. 

Sophie

unread,
Jul 21, 2016, 11:14:37 AM7/21/16
to Qiime 1 Forum
Hi Bea,
What version of metagenomeSeq is loaded in your R session?  It might be pre-biom.  Please check it is the latest version of metagenomeSeq.
Thanks,
Sophie
Message has been deleted

beatrizgi...@gmail.com

unread,
Jul 21, 2016, 1:12:47 PM7/21/16
to Qiime 1 Forum
Hi Sophie, 

Problems is resolved. The metagenomeSeq version wasn't the latest and actually R wasn't the correct version either. So there was a conflict with the version of the different packages and R. Now, everything works and I made the normalized otu table using normalize_table.py 

So many thanks. 

Regards, 
Bea. 

Colin Brislawn

unread,
Jul 21, 2016, 4:39:31 PM7/21/16
to Qiime 1 Forum
Hello Bea,

I'm glad you got the normalized table and have more great questions. Let's dive in!

1. When you mean "scaling everything" I don't know how to scale the heatmap with the arguments for make_otu_heatmap.py script. 
I should have said "scale every sample so it's abundance sums to 1". As it says in the documentation, this is done automatically by make_otu_heatmap.py. 

2. I did not use Greengenes database and the format for the taxonomy is something like this: 
Those look great!

Once again I expected just "Blastocatella" and "Algoriphagus". 
When you summarize at a level, the full lineage is included. So this is the 'correct' expected result. 
(Qiime does this because taxonomy is messy and some genera of microbes are placed in different families. Having the full linage let's you separate these similar microbes.) 


I'm not sure what's causing the duplicate sampleID... Have you checked for duplicate sampleIDs in that metadata file? 

Colin

beatrizgi...@gmail.com

unread,
Jul 22, 2016, 11:53:15 AM7/22/16
to Qiime 1 Forum
Hello Colin,

1. Oh, ok. I was probably wrong wondering to get a heatmap with a scale from 0 to 1.  Is that abundance shown in percentage?

2. Ok, I understand. But I don't think it is a nice way to present results with that long name. So maybe I have to remove them and keep the part of interest (in the case of -L 5 the family reference). I don't know if there is a way to do it using qiime or maybe I have to do it manually.

3. About the duplicate sample ID, I don't have any idea either ... I had a lok of the files I used to run the command but I cannot find why it happens.

Here the error that I get:

eri@ERI97[4timesnew]  qiime > make_otu_heatmap.py -i tax_mapping/otu_table_L5.biom --absolute_abundance -o heatpmap_L5_absolute.pdf -t otus/rep_set.tre

Traceback (most recent call last):
  File "/usr/local/bin/make_otu_heatmap.py", line 254, in <module>
    main()
  File "/usr/local/bin/make_otu_heatmap.py", line 242, in main
    otu_table = otu_table.sort_order(otu_id_order, axis='observation')
  File "/usr/local/lib/python2.7/dist-packages/biom/table.py", line 1827, in sort_order
    self.type)
  File "/usr/local/lib/python2.7/dist-packages/biom/table.py", line 331, in __init__
    errcheck(self)
  File "/usr/local/lib/python2.7/dist-packages/biom/err.py", line 472, in errcheck
    raise ret
biom.exception.TableException: Duplicate sample IDs!


Regards,

Bea.

beatrizgi...@gmail.com

unread,
Jul 22, 2016, 12:01:54 PM7/22/16
to qiime...@googlegroups.com
Hi Sophie,

After I resolved the issue with metagenomeSeq, now I'm running all the commands from the start. When I try to run the script: summarize_taxa_through_plots.py an error came related with the plots that they cannot be make because some error in the matplolib. We've tried few things to fix it (we didn't have that problem before to update QIIME, so maybe there is something missed with the new version) but no results at all. Here the error:

eri@ERI97[4timesnew]  qiime > summarize_taxa_through_plots.py -i otus/otu_table.biom -m mapping_file_R3.txt -s -o taxa_summary2

Traceback (most recent call last):
  File "/usr/local/bin/summarize_taxa_through_plots.py", line 143, in <module>
    main()
  File "/usr/local/bin/summarize_taxa_through_plots.py", line 140, in main
    status_update_callback=status_update_callback)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/downstream.py", line 711, in run_summarize_taxa_through_plots
    close_logger_on_success=close_logger_on_success)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError:

*** ERROR RAISED DURING STEP: Plot Taxonomy Summary
Command run was:
 plot_taxa_summary.py -i taxa_summary2/otu_table_sorted_L2.txt,taxa_summary2/otu_table_sorted_L3.txt,taxa_summary2/otu_table_sorted_L4.txt,taxa_summary2/otu_table_sorted_L5.txt,taxa_summary2/otu_table_sorted_L6.txt -o taxa_summary2/taxa_summary_plots/
Command returned exit status: 1
Stdout:

Stderr

Traceback (most recent call last):
  File "/usr/local/bin/plot_taxa_summary.py", line 278, in <module>
    main()
  File "/usr/local/bin/plot_taxa_summary.py", line 274, in main
    resize_nth_label, label_type, include_html_legend)
  File "/usr/local/lib/python2.7/dist-packages/qiime/plot_taxa_summary.py", line 1138, in make_all_charts
    resize_nth_label, label_type, include_html_legend))
  File "/usr/local/lib/python2.7/dist-packages/qiime/plot_taxa_summary.py", line 1073, in get_counts
    label_type, include_html_legend))
  File "/usr/local/lib/python2.7/dist-packages/qiime/plot_taxa_summary.py", line 861, in make_HTML_table
    props={'title': title})
  File "/usr/local/lib/python2.7/dist-packages/qiime/plot_taxa_summary.py", line 662, in make_area_bar_chart
    background_color, img_abs, generate_image_type, dpi)
  File "/usr/local/lib/python2.7/dist-packages/qiime/plot_taxa_summary.py", line 187, in make_legend
    shadow=False, fancybox=False)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/figure.py", line 1277, in legend
    l = Legend(self, handles, labels, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/legend.py", line 385, in __init__
    self._init_legend_box(handles, labels, markerfirst)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/legend.py", line 654, in _init_legend_box
    fontsize, handlebox))
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/legend_handler.py", line 119, in legend_artist
    fontsize, handlebox.get_transform())
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/legend_handler.py", line 252, in create_artists
    self.update_prop(p, orig_handle, legend)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/legend_handler.py", line 78, in update_prop
    legend._set_artist_props(legend_handle)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/legend.py", line 401, in _set_artist_props
    a.set_figure(self.figure)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 640, in set_figure
    raise RuntimeError("Can not put single artist in "
RuntimeError: Can not put single artist in more than one figure


I need the total counts and percentages for each taxonomy level. 

Thanks,

Bea.

Sophie

unread,
Jul 24, 2016, 10:56:09 PM7/24/16
to Qiime 1 Forum
Hi Bea,
Try using the -l option for multiple files as specified at the bottom of the page here: http://qiime.org/scripts/plot_taxa_summary.html
Thanks,
Sophie

beatrizgi...@gmail.com

unread,
Jul 25, 2016, 4:42:39 AM7/25/16
to Qiime 1 Forum
Hi Sophie,

Same type of error:

ri@ERI97[4timesnew]  qiime > plot_taxa_summary.py -i taxa_summary/otu_table_sorted_L2.txt -l Phylum -o sample_charts/

Beatriz Gil Pulido

unread,
Aug 5, 2016, 8:38:07 AM8/5/16
to qiime...@googlegroups.com
Hi Colin, 

I have another question about the beta diversity metrics. 

1. PCoA Plots: after running beta_diversity_through_plots.py I just got PCoA in three dimensions but not in 2d or the histograms. At the same time, the percent in each of the three axes I don't know with what they are referred to. 

2. Again, I should use the normalized otu_table for the beta_diversity.py command, shouldn't I? However to generate the plots for beta diversity, should I use the non normalized otu table? As for generating the rarefaction curves I did not use the normalized OTU table. But I'm not sure If I'm doing it correctly. 

3. Once I run the different beta_diversity metrics, I got few .txt files. But, for the "Phylogenetic beta diversity metrics" I expected to get some plots or tree image files. Is there any way to get a phylogenetic tree? 

4. Is there any way to get some RDA analysis and graphs using qiime? 


Regards, 

Bea. 

Colin Brislawn

unread,
Aug 5, 2016, 2:30:43 PM8/5/16
to Qiime 1 Forum
Hello Bea,

1. PCoA Plots: after running beta_diversity_through_plots.py I just got PCoA in three dimensions but not in 2d or the histograms. At the same time, the percent in each of the three axes I don't know with what they are referred to. 
Those percentages are the percent variation in the distance matrix explained by that principle coordinate. This let's you compare the relative importance or strength of the various principal coordinates. 

2. Again, I should use the normalized otu_table for the beta_diversity.py command, shouldn't I? 
Yes. Once I have a normalized table, I use it for all downstream analysis (except for key scripts which require raw counts, like alpha rarefaction and differential abundance with deseq2). 

I'm afraid I'm not much help with 3 or 4. I do most of my downstream analysis in R and phyloseq. If you are comfortable using R, I highly recomend this package.

Beatriz Gil Pulido

unread,
Aug 6, 2016, 12:20:37 PM8/6/16
to Qiime 1 Forum
Hello Colin, 

1. I'm still confuse with the meaning of that results in the way in what are based the PCoA? I don't understand what "percent of variation in the distance matrix" wants to mean. I thought PCoA have to be related with some variables to give sense to the results. 
Is there any way to get a tree of the distance matrix? Maybe is by using the command "jackknifed_beta_diversity.py?

2. Ya, I thought that. Since we talked about the other time I've been working with a normalized otu_table. Just in the case of the rarefaction curves and for taxonomy_through_plots I used the non normalized table. 

3 and 4: thanks for the link. I'm trying to be familiar with R (no idea about phyloseq) but I'm not comfortable using R so I was wondering how to do it within qiime and not to move to R. But, maybe it will be the time to move on :) 

Thanks once again, 

Bea. 

Beatriz Gil Pulido

unread,
Aug 6, 2016, 12:22:16 PM8/6/16
to Qiime 1 Forum
I Sophie, 

Still having the same problem and I don't know how to resolve it. In the meantime I'm working with the .txt files in excel but I would like to have some of the plots that can be generated with qiime. 

Regards, 

Bea. 

justink

unread,
Aug 7, 2016, 7:24:29 PM8/7/16
to Qiime 1 Forum
Let me just chime in here with one thing:

"Is there any way to get a tree of the distance matrix?"

yes, check out upgma_cluster.py

Beatriz Gil Pulido

unread,
Aug 9, 2016, 10:04:37 AM8/9/16
to Qiime 1 Forum
Hi Justink, 

Many thanks for the tip. I ran the upgma_cluster.py but I got as an output a file with .tre extension but once I open the file I can see just data, not a tree at all. I don't know if I have to open it using another program or what to do, as there is no more arguments to include for that command. 

 


Beatriz Gil Pulido

unread,
Aug 9, 2016, 11:21:58 AM8/9/16
to qiime...@googlegroups.com
Hi, 

I have another questions related to the alpha diversity metrics,

1. are they refer at genus level? I'm trying to explain the different alpha metrics I'm working with to explain the diversity in the samples. 

2. can I get the statistical signifcance (p) of the alpha metrics I´m interested in? 

Thanks, 



Beatriz Gil Pulido

unread,
Aug 9, 2016, 7:13:25 PM8/9/16
to Qiime 1 Forum, sophie...@colorado.edu, jenya....@gmail.com
Hi Collin, 

I want to go back to a question related with the rarefaction curves and the normalized otu_table. Should I work with the normalized otu_table for the rarefaction? As I´m using the non normalized table for the taxonomy and also for the rarefaction curves. I´m asking myself now if it might be more correct to work with the normalized. 

In the data set I´m working with, most of the samples go till the same number of seqs (around 8000) but some of them end the curve around 3500 (using the non-normalized otu table), what that it means in terms of the quality of the sampling done? Does it interfer in the statistical analysis?

Thanks, 

Jai Ram Rideout

unread,
Aug 10, 2016, 4:56:05 PM8/10/16
to Qiime 1 Forum, sophie...@colorado.edu, jenya....@gmail.com
Hi Beatriz,

I ran the upgma_cluster.py but I got as an output a file with .tre extension but once I open the file I can see just data, not a tree at all. I don't know if I have to open it using another program or what to do, as there is no more arguments to include for that command.

The tree file is in Newick format. There are many tree visualization programs you can use to visualize your tree (e.g. FigTree).

1. are they refer at genus level? I'm trying to explain the different alpha metrics I'm working with to explain the diversity in the samples. 

The alpha diversity metrics are performed using whatever "features" are in your .biom file. For example, if you provide an OTU table, the alpha diversity metrics will be applied to the OTUs. If you provide a genus-level taxonomy table (e.g. from summarize_taxa.py using the --absolute_abundance option), the alpha diversity metrics will be applied to genera.

2. can I get the statistical signifcance (p) of the alpha metrics I´m interested in? 

Check out compare_alpha_diversity.py.

 Should I work with the normalized otu_table for the rarefaction? As I´m using the non normalized table for the taxonomy and also for the rarefaction curves. I´m asking myself now if it might be more correct to work with the normalized. 

I recommend using the non-normalized table for rarefaction since rarefaction is a normalization process.

In the data set I´m working with, most of the samples go till the same number of seqs (around 8000) but some of them end the curve around 3500 (using the non-normalized otu table), what that it means in terms of the quality of the sampling done? Does it interfer in the statistical analysis?

alpha_rarefaction.py intentionally truncates curves to the minimal sampling depth in a group of samples, otherwise the rarefaction curve for that group would be a flat line (this would be an incorrect result). If you don't want truncation of the curves, pass -e/--max_rare_depth 3500 to truncate the curves at 3500. For stats (assuming you're using compare_alpha_diversity.py), the stats are only performed on a single rarefaction depth, so you could use -d/--depth 3500 to compare across your samples.

Best,
Jai

Beatriz Gil Pulido

unread,
Aug 11, 2016, 11:55:15 AM8/11/16
to Qiime 1 Forum, sophie...@colorado.edu, jenya....@gmail.com
Hi Jai, 

Thanks for the answers. I find them really useful. I will be back with further questions if there is any other "issue" with my analysis. 

Regards, 
Bea. 

Jai Ram Rideout

unread,
Aug 11, 2016, 5:46:57 PM8/11/16
to Qiime 1 Forum, sophie...@colorado.edu, jenya....@gmail.com
Hi Beatriz,

Please create separate new threads for each question so that we can better help you. This will also make it easier for other users who may be having similar issues.

Best,
Jai

Beatriz Gil Pulido

unread,
Aug 11, 2016, 6:24:12 PM8/11/16