otu_table.biom, OTUS and filters (filter_otus, filter_samples, filter_taxa)

380 views
Skip to first unread message

beatrizgi...@gmail.com

unread,
Jun 12, 2016, 8:18:09 AM6/12/16
to Qiime 1 Forum
Hi, 

I'm trying to filter the otus from my otu_table.biom as well as to filter_taxa_from_otu_table.py but I got few errors as it is explained below: 

1. otu_table.biom content: 

When I go through the otu_table.biom generated using the command: make_otu_table.py, there is nothing inside. This message appears: " {"id":"None", "format": "Biological Observation Matrix 1.0.0", "format_url":"http://biom-format.org", "type":"OTU table", "generated_by":"QIIME 1.8.0", "date":"2016-05-31" ... 

I was surprised as I used that otu_table.biom for the next steps in QIIME as: compute_core_microbiome.py, make_otu_network.py and the different metrics. 

That type of message appears as well when I use the commands: filter_otus_from_otu_table.py.  

Should I summarize those outputs too for checking the information? 


2. filter_taxa_from_otu_table.py: an error appears after run the command which indicates: raise TableException, "All observations were filtered out". 


About the make_otu_table.py option used: 

I used the pick_open_reference_otus.py but as I suppress the taxonomy assignment so I generated my own otu_table.biom for the taxonomy observation metadata categories (as for the otu_table_mc2.biom there is no references with the taxonomy). I used the following command: 

make_otu_table.py -i final_otu_map_mc2.txt -t rep_set_tax_assignments.txt -e rep_set_failures.fasta -o 

-i was generated from the pick_open_reference_otus.py same as rep_set_failures.fasta 

-t was generated running my own assing_taxonomy.py using the rep_set.fna generated from the OTU picking command as an input. 

I've been working with the output plots from the summarize_taxa_through_plots.py (where I used the otu_table.biom) without any problem and I can perfectly see the information of my otu_table.biom in otu_table_summary.txt as well as in otu_table_qual_summary.txt. 

So I don't know what happens when I run the comands: filter_otus_from_otu_table.py and filter_taxa_from_otu_table.py. 

Can does anyone help me with that question? 

Many thanks. 

beatrizgi...@gmail.com

unread,
Jun 12, 2016, 8:37:12 AM6/12/16
to qiime...@googlegroups.com
I attach the summary of the otu_table generated if it might help. 

Regards, 


otu_table_summary.txt

abir...@gmail.com

unread,
Jun 13, 2016, 2:25:12 AM6/13/16
to Qiime 1 Forum
Hi, 
Could you please post the OTU table itself that you are using as input to the filter scripts, as well as the command lines that you ran to invoke these scripts?
Thanks,
Amanda

beatrizgi...@gmail.com

unread,
Jun 13, 2016, 5:05:27 PM6/13/16
to qiime...@googlegroups.com
Hi Amanda, 

Here the otu_table.biom I´m using as an input. 

I´ve realized that opening the otu_table.biom by wordpad there is text inside but  the "error message"  I mentioned before still appears at the top as you can probably see opening the document. 

The filter comands I used were:

1. filter_taxa_from_otu_table.py -i otu_table.biom -p -o otu_table_filtered.biom

2. filter_otus_from_otu_table.py -i otu_table.biom -s 100 -o otu_table_filtered100.biom


otu_table.biom

abir...@gmail.com

unread,
Jun 14, 2016, 1:38:43 PM6/14/16
to Qiime 1 Forum
Hi,
>the "error message"  I mentioned before still appears at the top
This is not an error message--it is in fact the contents of the file.  Your OTU table is apparently in the JSON version of the biom format, which is a text-based file format, and this is beginning of its contents.

>1. filter_taxa_from_otu_table.py -i otu_table.biom -p -o otu_table_filtered.biom
This command will give an error because you have not specified a value for the -p switch; it is necessary to provide a comma-separated list of taxa to retain (e.g., p__Bacteroidetes,p__Firmicutes).  See http://qiime.org/scripts/filter_taxa_from_otu_table.html for further examples.

>2. filter_otus_from_otu_table.py -i otu_table.biom -s 100 -o otu_table_filtered100.biom
This command works fine; the "error" reported indicates that filtering resulted in an empty biom table, which is the correct outcome as you have specified with the -s switch that you want to filter out any OTU with fewer than 100 total counts, and none of your OTUs have 100 or more total counts.  Did you perhaps have some other filtering goal in mind, like filtering out samples with <100 counts?

Best,
Amanda

beatrizgi...@gmail.com

unread,
Jun 15, 2016, 8:12:08 AM6/15/16
to Qiime 1 Forum
Hi Amanda, 
Oh good, so it is just about the format. Great. What I would like to do is to check how many OTUS do I  have for each of the four samples ( I guess I can have that information using: biom summarize_table --qualitative)

Then I would like to look at the specific sequences for some of the taxonomy related. I mean: I want to know the OTU related with an specific family to blast that sequence). 

I also would like to check from the overall number of OTUS for one of the samples, how many are specifically related with and specific genus. 

Sorry if my questions are not clear enough and I'm trying to have all of that information clear on my mind too. 

1. Related with the filter taxa, I specified the -p value (actually I used the same as you mention in the e.g on brackets) but I didn't get anything. 

2. Oh, ok, maybe I have to use another -s value, I will try. 

I have another 3. question: how can I convert the otu_table.biom into excel format? 

Many thanks, 

Bea. 

beatrizgi...@gmail.com

unread,
Jun 20, 2016, 12:43:23 PM6/20/16
to Qiime 1 Forum
Hi Amanda, 

Thanks for the answer. 

There are few things that I have on mind to do: 

1. After using the command line summarize_taxa_through_plots.py I got the plots for the different taxonomy levels. I would like to have them filtering the relative abundances with less than 1% as I don't consider they provide useful information. I don't know how I have to use the command line filter_otus_from_otu_table.py. I guess I have to generate another otu_table.biom for using it in the summarize_taxa_through_plots.py, don't I? 

filter_otus_from_otu_table.py -i otu_table.biom --min_count 0.01 -o filtered_otu_table.biom

2. I want also to check the sequences related with one specific family. So I guess I have to identify the OTU number of those particular sequences to look for them in the OTU table? I'm not really sure how I might proceed with it. 

3. To convert the OTU table into excel format: biom convert -i otu_table.biom -o otu_table.biom  Is that correct?

Regards, 

Bea. 


abir...@gmail.com

unread,
Jun 20, 2016, 7:36:51 PM6/20/16
to Qiime 1 Forum
Hi,
Sorry for the delay in replying; I have been unavailable for a while.

1) I think your suggested command for filter_otus_from_otu_table.py is *almost* right.  However, the --min_count switch expects a whole number (i.e., a number of counts).  I think you will want to use --min_count_fraction: "Fraction of the total observation (sequence) count to apply as the minimum total observation count of an otu for that otu to be retained. this is a fraction, not percent, so if you want to filter to 1%, you specify 0.01." (see http://qiime.org/scripts/filter_otus_from_otu_table.html for more details).
2) filter_taxa_from_otu_table.py is the script you want here.  The reason it did not work for you when you ran it with a -p value of p__Bacteroidetes,p__Firmicutes is that these strings don't actually occur in the taxonomies in your table.  I don't know what the source of this biom table is, but I see it uses (for example) D_1__Bacteroidetes instead of p__Bacteroidetes and D_1__Firmicutes instead of p__Firmicutes.  I suggest you open up the biom table in a text editor and search for the un-prefixed taxonomy name you want to filter on (e.g., Firmicutes rather than p__Firmicutes, etc) ... once you find out what exact format that name has in your biom file, you can filter by that.  For example, running the following command on your biom file works fine:

filter_taxa_from_otu_table.py -i otu_table.biom -p D_1__Bacteroidetes otu_table_filtered.biom

Note that the output, filtered biom table is going to be in the HDF5 biom format (assuming you aren't using a very old version of QIIME :) so you will no longer be able to look at it in a text editor.

3) There is no explicit excel format for biom, but there *is* a tab-delimited text file, which can be opened easily in Excel.  For that, you would want to use a command like the following:
biom convert -i otu_table.biom -o table.from_biom_w_taxonomy.txt --to-tsv --header-key taxonomy


Best,
Amanda

beatrizgi...@gmail.com

unread,
Jun 21, 2016, 7:40:15 AM6/21/16
to qiime...@googlegroups.com
Hi Amanda, 

No worries at all. 

I will try your suggestions for the filters. Just another question about that: is there any way to filter the otus at taxonomy levels? For example: I want to filter out of my table the families present in less than 1%. Can I do it specifically for this taxonomy level or I have to filter all the levels?

I don't have any idea why my otu_table.biom is represented with that taxonomy. I mean, with those prefixes. I generated it using the final_otu_map_mc2.txt as an input from the open OTU picking and  -t rep_set_tax_assignments.txt. The reference database used is SILVA 119. 

Anyway, since there is any problem with those prefixes it is fine. I just have to find the exactly ones I would like to look for and use D_1_Bacteroidetes. 
How can I know the type of OTU table do I have? 

Regards. 
Bea. 

 

abir...@gmail.com

unread,
Jun 22, 2016, 1:36:35 PM6/22/16
to Qiime 1 Forum
Hi, 
Can you clarify for me what you mean by "How can I know the type of OTU table do I have?"  Are you referring to the OTU table file format (tab-delimited text, JSON-type biom format, HDF5-type biom format) or to some other aspect of the table?
Thanks,
Amanda

beatrizgi...@gmail.com

unread,
Jun 23, 2016, 7:23:21 AM6/23/16
to Qiime 1 Forum
Hi Amanda, 

Exactly, what I wanted to mean is which format I do have :D 

Thanks 

beatrizgi...@gmail.com

unread,
Jun 29, 2016, 1:08:39 PM6/29/16
to qiime...@googlegroups.com
Hi Amanda, 

Once I filtered my otu table using the -min_count_fraction as 0.01 (for those otus with less than 1% representatives, I got a filtered_otu_table to work with. In that case I generated another taxonomy plots but I expected to get the same  % but just discarding the ones under 1%. 

An example: 

1. Without filtering the otu table: Bacteroidetes: total percent = 29.2%
2. Filtering the otu table (--min_count_fraction 0.01): Bacteroidetes: total percent: 17.3%

Same as for the other taxonomic levels. The total percents are totally different. And what I wanted to do is exclude the ones under 1% expecting to keep the same total percents for the others. Maybe there is nothing to do with QIIME for that and it is an option to work with excel? 

I don't know if I'm explaining myself in the right way but what I want to do is just to filter those taxonomy levels with less than 1% percent of total representative. Do I have to use any of the taxa filters? 


Thanks. 

abir...@gmail.com

unread,
Jul 5, 2016, 2:29:28 PM7/5/16
to Qiime 1 Forum
Hi, 
Re your earlier question of how to find out what format of OTU table you have, I think the best way is to try to open it in a text editor.  If the top of the file looks basically like a simple text file, as shown here:

# Constructed from biom file
#OTU ID T0 T1 T8 T4
EF405987.1.1475 6.0 1.0 0.0 0.0

then it probably is tab-delimited.  If the file is readable in a text editor but starts with a curly brace, like this:

{"id": "None","format": "Biological Observation Matrix 1.0.0","format_url": "http://biom-format.org","type": "OTU table","generated_by": "QIIME 1.8.0","date": 

then it is probably the old JSON-based biom format.  If opening it in a text editor gives gobbledy-gook like:

âHDF
ˇˇˇˇˇˇˇˇÕÒ ˇˇˇˇˇˇˇˇ` à® ò TREE ˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇÿ HEAPX » observationsample 8 à®

then it is probably the current HDF5-based biom format.


Next, regarding your question about the results of your filtering: you gave an example looking at the amount of Bacteroidetes.  Since Bacteroidetes is a phylum, and by default OTUs are assigned to the approximately-species level (97% sequence identity), the outcome you report doesn't surprise me--there may be multiple species within the Bacteroidetes phylum that make up fewer than 1% of the counts in the OTU table, and they are being discarded as expected.  Can you explain further why this result seems incorrect to you?

Best,
Amanda

beatrizgi...@gmail.com

unread,
Jul 12, 2016, 8:33:11 AM7/12/16
to qiime...@googlegroups.com
Hi Amanda,

1. About the results of filtering: 

What I wanted to mean is that after use the otu_filter to exclude those samples with less than 1% representatives, the relative abundances are quite different as I show in the attached file.

My idea was to filter the taxonomy under 1% but if the filter is applied all of those under 1% from the total % are discarded. For example: for the family "Crymorphaceae" the total % is 0.7 so if the filter is used, that family is removed. But looking at the percent of each sample: 

T0: 0.0% | T1: 0.0% | T4: 0.2%  | T8:2.7% 

As is shown in the example, for T8, this family might be representative so I would keep it but not the other ones. 

I know, I can do that using excel, but I'm wondering if there is any way to filter the samples from qiime. 

Hope now, the explanation is a bit more clear. 


2. About OTU convert

I converted my OTU table into an excel document with the idea to check the total OTU in each sample for the different OTU ID. As I would like to related those OTU ID with a taxonomy ID, I used the rep_set_tax_assignments. What I realized is that both tables don't match with each others. What means, for the converted OTU table I have less columns than for the rep_set_tax_assigments. I attached a file named " OTU convert+taxonomy" to show what I try to explain. 
It seems that I have more rep_set_tax_assignments than the OTU ID obtained from the OTU table. 

Regards, 

Beatriz. 


Reply all
Reply to author
Forward
0 new messages