filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O')

39 views
Skip to first unread message

Alf Pascu

unread,
Oct 26, 2017, 1:57:28 PM10/26/17
to Qiime 1 Forum


Please, to see a description of the problem with the code highlighted and files to reproduce it go here, although I paste it below and attach the files for completeness in this message. I forgot to mention that the file was generated with the PiCRUST script predict_metagenomes.py:


I'm trying to use **filter_otus_from_otu_table.py** and  **filter_samples_from_otu_table.py** with no success. The three files needed to reproduce the issues are [test2git.zip](https://github.com/biocore/qiime/files/1419249/test2git.zip).

If I start trying to filter with a file containing just one observation (contained in prueba.txt) it works:

`$ filter_otus_from_otu_table.py -i otu.2test.metagenomes.biom -o otu.metagenomes.prueba.biom -e prueba.txt --negate_ids_to_exclude`

But if want to get two observations (file prueba2.txt):

`$ filter_otus_from_otu_table.py -i otu.2test.metagenomes.biom -o otu.metagenomes.prueba.biom -e prueba2.txt --negate_ids_to_exclude`

It doesn't work, and it returns: `TypeError: Object dtype dtype('O') has no native HDF5 equivalent`

The same happens if I use again the list with one observation (first example) but I do not include  the option `--negate_ids_to_exclude`, so it has problems when multiple observations/samples should be filtered but not with one. The error is also reproduced if I use directly biom:

`$ biom subset-table -i otu.2test.metagenomes.biom -a observation -s prueba2.txt -o otu.2test.metagenomes.prueba.biom`

Following this issue in [biom-format (#513)](https://github.com/biocore/biom-format/issues/513), it suggests that it may be a problem with the metadata. If try to convert to json:

`$ biom convert -i otu.2test.metagenomes.biom -o otu.2test.metagenomes.json.biom --table-type="OTU table" --to-json`

I get this error `TypeError: array([u'["cathepsin L [EC:3.4.22.15]"]'], dtype=object) is not JSON serializable`. And if I try to convert it to hdf5 with the suggested option `--collapsed-samples`:

`$ biom convert -i otu.2test.metagenomes.biom -o otu.2test.metagenomes.hdf5.biom --table-type="OTU table" --to-hdf5  --collapsed-samples`

I get `TypeError: Object dtype dtype('O') has no native HDF5 equivalent`.  Please note that I controlled that the solutions to this bug ([#759](https://github.com/biocore/biom-format/pull/759)) were incorporated in my code. If it helps, I found a similar issue in the project [CellProfiler (#995)](https://github.com/CellProfiler/CellProfiler/issues/995)

test2git.zip

Alf Pascu

unread,
Oct 26, 2017, 5:53:28 PM10/26/17
to Qiime 1 Forum

I've been able to perform the filtering making some collage of the code is used in picrust to deal with these matrices. It confirms that the problem comes from the metadata:

```
import picrust
import h5py
import json
import numpy as np
from biom import load_table
from biom.table import Table
from picrust.util import write_biom_table,picrust_formatter
from biom.util import HAVE_H5PY

table = load_table('otu.2test.metagenomes.biom')
# code found categorize_by_function.py
# metadata are not deserializing correctly. Duct tape it.
update_d = {}
for i, md in zip(table.ids(axis='observation'),
                 table.metadata(axis='observation')):
    update_d[i] = {k: json.loads(v[0]) for k, v in md.items()}
    table.add_metadata(update_d, axis='observation')
   
target = open("prueba2.txt","r")
genes = [row.strip() for row in target]
table_red=table.filter(genes,axis='observation',inplace=False)

#output in BIOM format found in predict_metagenomes.py
format_fs = {'KEGG_Description': picrust_formatter,
                     'COG_Description': picrust_formatter,
                     'KEGG_Pathways': picrust_formatter,
                     'COG_Category': picrust_formatter
                     }
write_biom_table(table_red,'table.test.biom',format_fs=format_fs) # hdf5
#write_biom_table(table_red,'table.test.biom',write_hdf5=False,format_fs=format_fs) # Json
```

zech xu

unread,
Oct 26, 2017, 8:43:16 PM10/26/17
to Qiime 1 Forum
Thank for reporting back the solution to this problem, Alf!
Reply all
Reply to author
Forward
0 new messages