Attempting to remove unwanted taxa from OTU table... filter_taxa_from_otu_table.py not working?

66 views
Skip to first unread message

Seaver Wang

unread,
Oct 31, 2016, 10:00:38 AM10/31/16
to qiime...@googlegroups.com
Hi all,

I've been attempting to remove several unwanted taxa from an OTU table using the filter_taxa_from_otu_table.py script, but passing the script doesn't seem to have any effect on my resulting OTU table. This is what I'm attempting to do:

filter_taxa_from_otu_table.py -i otu_table_mc2_w_tax_no_pynast_failures.biom -o otu_table_Metazoa_Syndiniales_filtered.biom -n __Metazoa,__Syndiniales

I've also tried all sorts of other variations of syntax with the -n flag, including

-n "D_3__Metazoa","D_4__Syndiniales"
-n Metazoa,Syndiniales
-n D_3__Metazoa,D_4__Syndiniales
and a bunch of similar variations, with no success

Looking at a number of other threads on similar topics, I also tried the workaround of converting the biom table to tab-separated format, using grep -vE to remove my unwanted taxa, and converting the tab-separated text file back to biom format. When I do this, however, I receive the following error upon trying to work with the OTU table further:

Command returned exit status: 1
Stdout:

Stderr
Traceback (most recent call last):
  File "/usr/local/bin/summarize_taxa.py", line 261, in <module>
    main()
  File "/usr/local/bin/summarize_taxa.py", line 237, in main
    md_identifier)
  File "/usr/local/lib/python2.7/dist-packages/qiime/summarize_taxa.py", line 43, in make_summary
    md_identifier)
  File "/usr/local/lib/python2.7/dist-packages/qiime/summarize_taxa.py", line 79, in sum_counts_by_consensus
    raise ValueError("BIOM table does not contain any "
ValueError: BIOM table does not contain any observation metadata (e.g., taxonomy). You can add metadata to it using the 'biom add-metadata' command.

This is quite confusing to me, as my biom table obviously contained taxonomy prior to converting it to tab-separated text format, and I have no idea how this information was lost.

Does anyone have a solution for fixing either approach or a suggestion for how I can accomplish this another way? Many thanks!

-

Here are the other threads that I mentioned:

Jamie Morton

unread,
Oct 31, 2016, 12:10:54 PM10/31/16
to Qiime 1 Forum
Hi Seaver,

At a first glance, it doesn't seem like taxonomy has been assigned.  How did you run your pick otus method?

I'd also double check your biom table to see if there are taxonomy fields present.  This can be done as follows

biom convert -i otu_table_mc2_w_tax_no_pynast_failures.biom -o taxa.biom --to-tsv

grep
"__Metazoa" taxa.biom

Best,
Jamie

Seaver Wang

unread,
Oct 31, 2016, 12:55:32 PM10/31/16
to Qiime 1 Forum
Hi Jamie!

Hm... interesting... checking my biom table as you suggested, grep returns nothing for Metazoans, and checking the converted biom table it does appear that taxonomy is missing.

That's strange, as I picked OTUs using pick_open_reference_otus.py without passing –suppress_taxonomy_assignment or –suppress_align_and_tree! I had thought that I would get an OTU table with taxonomy out of this step by default, but I take it that this means that I still need to assign taxonomy myself? I am using usearch v6.1.544 as my OTU picking method, with SILVA 123.1 as my reference database.

--Seaver

Daniel McDonald

unread,
Nov 1, 2016, 9:03:01 PM11/1/16
to Qiime 1 Forum
Hi Seaver,

When you converted back to a BIOM table, did you explicitly tell `biom convert` to handle the observation metadata?

The conversion to a TSV is not necessary though, so I wonder if the taxa strings to be filtered are incomplete? I'd be happy to take a look at your table and tell you in more detail about what's going on -- feel free to send it directly to me at my email address.

Best,
Daniel

Daniel McDonald

unread,
Nov 2, 2016, 11:47:33 AM11/2/16
to Qiime 1 Forum
Hi Seaver, 

Thank you for passing on the biom table. I took a look and it appears the taxon corresponding to Metazoa is "D_3__Metazoa (Animalia)" in the table. I do see D_4__Syndiniales, however. One way to get at this is to examine output from summarize_taxa.py, or to interrogate the table directly although that requires some programming. Please see below for the QIIME command which worked for me. I also verified directly against the contents of the required table.

Best,
Daniel

08:41:00 (dtmcdonald@barnacle):~$ biom summarize-table -i otu_table_mc2_w_tax_no_pynast_failures.biom | head

Num samples: 20

Num observations: 8429

Total count: 2439804

Table density (fraction of non-zero values): 0.233


Counts/sample summary:

 Min: 40659.0

 Max: 258132.0

 Median: 105990.000

 Mean: 121990.200

08:41:05 (dtmcdonald@barnacle):~$ filter_taxa_from_otu_table.py -i otu_table_mc2_w_tax_no_pynast_failures.biom -o test.biom -n "D_3__Metazoa (Animalia)",D_4__Syndiniales
08:42:07 (dtmcdonald@barnacle):~$ biom summarize-table -i test.biom | head
Num samples: 20
Num observations: 5490
Total count: 1454559
Table density (fraction of non-zero values): 0.214

Counts/sample summary:
 Min: 19788.0
 Max: 243778.0
 Median: 48918.500
 Mean: 72727.950
08:43:13 (dtmcdonald@barnacle):~$ ipython
import 
08:43:24 (dtmcdonald@barnacle):~> import biom

08:43:25 (dtmcdonald@barnacle):~> t = biom.load_table('test.biom')

08:43:32 (dtmcdonald@barnacle):~> tax_strings = ['; '.join(md['taxonomy']).lower() for md in t.metadata(axis='observation')]

08:44:09 (dtmcdonald@barnacle):~> for ts in tax_strings:
   ...:     if 'metazoa' in ts:
   ...:         print(ts)
   ...:         break
   ...:     

08:44:24 (dtmcdonald@barnacle):~> for ts in tax_strings:                                                                    
    if 'syndiniales' in ts:
        print(ts)
        break
   ...:     

08:44:39 (dtmcdonald@barnacle):~> t_original = biom.load_table('otu_table_mc2_w_tax_no_pynast_failures.biom')

08:44:55 (dtmcdonald@barnacle):~> tax_strings = ['; '.join(md['taxonomy']).lower() for md in t_original.metadata(axis='observation')]

08:45:01 (dtmcdonald@barnacle):~> for ts in tax_strings:                                                     
    if 'metazoa' in ts:
        print(ts)
        break
   ...:     
d_0__eukaryota; d_1__opisthokonta; d_2__holozoa; d_3__metazoa (animalia); d_9__copepoda; d_10__calanoida

08:45:04 (dtmcdonald@barnacle):~> for ts in tax_strings:
    if 'syndiniales' in ts:
        print(ts)
        break
   ...:     
d_0__eukaryota; d_1__sar; d_2__alveolata; d_3__protalveolata; d_4__syndiniales; d_5__syndiniales group ii; d_6__uncultured eukaryote

Seaver Wang

unread,
Nov 2, 2016, 3:51:31 PM11/2/16
to Qiime 1 Forum
Hi Daniel,

Ah, that indeed does the trick--passing the command as you phrased it worked. While I knew that filter_taxa_from_otu_table.py worked based on exact string matching, I guess the lesson here is that the whole string corresponding to the taxon you want to filter has to be specified fully. 

I'm glad it was a simple fix! Thanks so much for the help!

--Seaver

Daniel McDonald

unread,
Nov 3, 2016, 12:57:09 AM11/3/16
to Qiime 1 Forum
Exactly, and glad to help! Regular expression support would be nice though :) Not sure if its slated for QIIME2

Best,
Daniel
Reply all
Reply to author
Forward
0 new messages