Hi,
I have a problem with the filter_samples_from_otu_table.py command that may be related to the biom format,and which I cannot solve:
I am using MacQiime 1.9, and try to filter my samples based on the number of counts (I also tried with a file containing the samples to exclude) and get this error:
filter_samples_from_otu_table.py -i TrimmedSortedTaxaFull.otu_table.biom -o Filtered/TrimmedSortedTaxaFull.otu_table-f.biom --output_mapping_fp=Filtered/mapping_file_forEdited-f.txt -m mapping_file_forEdited.txt -n 26297
Traceback (most recent call last):
File "/macqiime/anaconda/bin/filter_samples_from_otu_table.py", line 162, in <module>
main()
File "/macqiime/anaconda/bin/filter_samples_from_otu_table.py", line 138, in main
write_biom_table(filtered_otu_table, output_fp)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/util.py", line 577, in write_biom_table
biom_table.to_hdf5(biom_file, generated_by, compress)
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3535, in to_hdf5
self.group_metadata(axis='observation'), 'csr', compression)
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3507, in axis_dump
formatter[category](grp, category, md, compression)
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 243, in general_formatter
compression=compression)
File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 99, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 60, in make_new_dset
raise ValueError("Shape tuple is incompatible with data")
ValueError: Shape tuple is incompatible with data
From the error message and a search on the internet (where i could not find much), I guess this has to do with the format: My TrimmedSortedTaxaFull.otu_table.biom is in JSON format, and maybe this command somehow deals/tries to convert/expects an hdf5 one.
The problem is that I cannot convert to hdf5 either:
biom convert -i TrimmedSortedTaxaFull.otu_table.biom -o TrimmedSortedTaxaFull.otu_table.txt --to-tsv --header-key taxonomy
#OK
biom convert -i TrimmedSortedTaxaFull.otu_table.txt -o TrimmedSortedTaxa_Fullhdf5.otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy -m mapping_file_forEdited.txt
Traceback (most recent call last):
File "/macqiime/anaconda/bin/pyqi", line 184, in <module>
optparse_main(cmd_obj, argv[1:])
File "/macqiime/anaconda/lib/python2.7/site-packages/pyqi/core/interfaces/optparse/__init__.py", line 275, in optparse_main
result = optparse_cmd(local_argv[1:])
File "/macqiime/anaconda/lib/python2.7/site-packages/pyqi/core/interface.py", line 41, in __call__
return self._output_handler(cmd_result)
File "/macqiime/anaconda/lib/python2.7/site-packages/pyqi/core/interfaces/optparse/__init__.py", line 250, in _output_handler
opt_value)
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/interfaces/optparse/output_handler.py", line 80, in write_biom_table
table.to_hdf5(f, generatedby())
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3537, in to_hdf5
self.metadata(), self.group_metadata(), 'csc', compression)
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3507, in axis_dump
formatter[category](grp, category, md, compression)
File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 238, in general_formatter
compression=compression)
File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 102, in create_dataset
self[name] = dset
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2405)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2362)
File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in __setitem__
h5o.link(
obj.id,
self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2405)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2362)
File "h5py/h5o.pyx", line 202, in h5py.h5o.link (-------src-dir--------/h5py-2.4.0/h5py/h5o.c:3624)
RuntimeError: Unable to create link (Name already exists)
(Note that my mapping file passed validation with no errors)
Can anyone help?I don't understand why the filtering command fails with a json file, or if this is not the reason, which one is? Also, is my workaround for converting JSON to hdf5correct ?
Finally, more general, is there interest for me to continue with hdf5 format if I want to eg collapse, summarize by category etc? I see that there have been issues with these commands related to the format, so what is the general guideline here? I also see that JSON format will be gradually abandonned...
Thank you,
Natassa