filter_samples_from_otu_table.py error and JSON to hdf5 conversion

819 views
Skip to first unread message

natass...@googlemail.com

unread,
Jul 22, 2015, 1:54:31 PM7/22/15
to Qiime Forum

Hi,
I have a problem with the filter_samples_from_otu_table.py command that may be related to the biom format,and which I cannot solve:
I am using MacQiime 1.9, and  try to filter my samples based on the number of counts (I also tried with a file containing the samples to exclude) and get this error:

filter_samples_from_otu_table.py -i TrimmedSortedTaxaFull.otu_table.biom -o Filtered/TrimmedSortedTaxaFull.otu_table-f.biom --output_mapping_fp=Filtered/mapping_file_forEdited-f.txt -m mapping_file_forEdited.txt -n 26297
Traceback (most recent call last):
  File "/macqiime/anaconda/bin/filter_samples_from_otu_table.py", line 162, in <module>
    main()
  File "/macqiime/anaconda/bin/filter_samples_from_otu_table.py", line 138, in main
    write_biom_table(filtered_otu_table, output_fp)
  File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/util.py", line 577, in write_biom_table
    biom_table.to_hdf5(biom_file, generated_by, compress)
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3535, in to_hdf5
    self.group_metadata(axis='observation'), 'csr', compression)
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3507, in axis_dump
    formatter[category](grp, category, md, compression)
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 243, in general_formatter
    compression=compression)
  File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 99, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 60, in make_new_dset
    raise ValueError("Shape tuple is incompatible with data")
ValueError: Shape tuple is incompatible with data

From the error message and a search on the internet (where i could not find much), I guess this has to do with the format: My TrimmedSortedTaxaFull.otu_table.biom is in JSON format, and maybe this command somehow deals/tries to convert/expects an hdf5 one.
The problem is that I cannot convert to hdf5 either:
biom convert -i TrimmedSortedTaxaFull.otu_table.biom -o TrimmedSortedTaxaFull.otu_table.txt --to-tsv --header-key taxonomy
#OK
biom convert -i TrimmedSortedTaxaFull.otu_table.txt -o TrimmedSortedTaxa_Fullhdf5.otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy -m mapping_file_forEdited.txt
Traceback (most recent call last):
  File "/macqiime/anaconda/bin/pyqi", line 184, in <module>
    optparse_main(cmd_obj, argv[1:])
  File "/macqiime/anaconda/lib/python2.7/site-packages/pyqi/core/interfaces/optparse/__init__.py", line 275, in optparse_main
    result = optparse_cmd(local_argv[1:])
  File "/macqiime/anaconda/lib/python2.7/site-packages/pyqi/core/interface.py", line 41, in __call__
    return self._output_handler(cmd_result)
  File "/macqiime/anaconda/lib/python2.7/site-packages/pyqi/core/interfaces/optparse/__init__.py", line 250, in _output_handler
    opt_value)
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/interfaces/optparse/output_handler.py", line 80, in write_biom_table
    table.to_hdf5(f, generatedby())
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3537, in to_hdf5
    self.metadata(), self.group_metadata(), 'csc', compression)
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 3507, in axis_dump
    formatter[category](grp, category, md, compression)
  File "/macqiime/anaconda/lib/python2.7/site-packages/biom/table.py", line 238, in general_formatter
    compression=compression)
  File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 102, in create_dataset
    self[name] = dset
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2405)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2362)
  File "/macqiime/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2405)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py-2.4.0/h5py/_objects.c:2362)
  File "h5py/h5o.pyx", line 202, in h5py.h5o.link (-------src-dir--------/h5py-2.4.0/h5py/h5o.c:3624)
RuntimeError: Unable to create link (Name already exists)

(Note that my mapping file passed validation with no errors)

Can anyone help?I don't  understand why the filtering command fails with a json file, or if this is not the reason, which one is? Also, is my workaround for converting JSON to hdf5correct ?
Finally, more general, is there interest for me to continue with hdf5 format if I want to eg collapse, summarize by category etc? I see that there have been issues with these commands related to the format, so what is the general guideline here? I also see that JSON format will be gradually abandonned...

Thank you,
Natassa

Colin Brislawn

unread,
Jul 22, 2015, 6:32:39 PM7/22/15
to Qiime Forum, natass...@googlemail.com
Hello Natassa,

All the scripts that accept .biom files should automatically detect their format and work with them regardless of internal format. Maybe that's not happening...
Could a qiime dev take a look at this script and comment on what may be going wrong?


For the time being, I think you workaround of converting everything to JSON is perfect. Currently I do all my analysis using JSON .biom files because some of my external software likes that format.

There should be no downside in using JSON .biom file for the time being. hdf5 provides smaller file size for very large data sets, but for now, try running it all through JSON.


I hope that helps,
Colin

natass...@googlemail.com

unread,
Jul 24, 2015, 11:31:23 AM7/24/15
to Qiime Forum, cbr...@gmail.com

Thanks Colin,
The update is that I effectively managed to get the workaround to work, ie I converted the JSON to txt and from txt to hdf5, and on that file, the filtering command is run with no errors.
However, the fact that the commands threw no errors is not reassuring here, because the hdf5 file created is not human-readable. I checked how it should look like and it is not the case, but I am not sure whether this is an encoding issue. So I am not sure whether I won't get stuck on the next steps of the process (alpha and beta diversity analyses)...
If a developper could look at the reasons why the filtering command did not work on the initial json format , that would be great! I can send the files needed so that they can reproduce the error.
Thanks,
Natassa

Colin Brislawn

unread,
Jul 24, 2015, 12:23:15 PM7/24/15
to natass...@googlemail.com, Qiime Forum
Qiime has an active (and very cool) team of developers on github. Check it out!

You could totally start an issue describing this problem so the developers are aware of it. 

Colin

SAMIK

unread,
Nov 1, 2015, 1:08:09 PM11/1/15
to Qiime Forum
Hi Natasha, 

thank you for taking up the issue. I think there is a bug in 1.9.0 version since I had same error like you. But when I have worked on QIIME 1.8.0, no error appear. So, it need to be fixed in qiime 19.0. Can anybody suggest whether it has been solved? 

Thank you

Jai Ram Rideout

unread,
Nov 2, 2015, 2:00:47 PM11/2/15
to qiime...@googlegroups.com
Hello,

Can you please provide a set of input files that reproduces the error, along with the command you're running and your QIIME version?

Thanks,
Jai

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

SAMIK BAGCHI

unread,
Nov 2, 2015, 3:01:06 PM11/2/15
to qiime...@googlegroups.com
Hi Jai,

Thank you. Please find the input files and the error at QIIME 1.9.0. Please know that when I'm running in QIIME 1.8.0, no error occurred, so I believe the problem is with the current version. I have  also provide qiime config for both the release.
filter_otus_from_otu_table.py -i otu_per_category/Combined_otu_Early.biom -n 1 -o otu_per_category/Early_otus.biom
Traceback (most recent call last):

  File "/tools/cluster/6.2/qiime/1.9.0/bin/filter_otus_from_otu_table.py", line 5, in <module>
    pkg_resources.run_script('qiime==1.9.0', 'filter_otus_from_otu_table.py')
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/pkg_resources.py", line 534, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/pkg_resources.py", line 1434, in run_script
    execfile(script_filename, namespace, namespace)
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/EGG-INFO/scripts/filter_otus_from_otu_table.py", line 137, in <module>
    main()
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/EGG-INFO/scripts/filter_otus_from_otu_table.py", line 134, in main
    write_biom_table(filtered_otu_table, opts.output_fp)
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/qiime/util.py", line 555, in write_biom_table
    biom_table.to_hdf5(biom_file, generated_by, compress)
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/biom_format-2.1.3-py2.7-linux-x86_64.egg/biom/table.py", line 3486, in to_hdf5

    self.metadata(), self.group_metadata(), 'csc', compression)

    formatter[category](grp, category, md, compression)

    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 81, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py/h5t.pyx", line 1448, in h5py.h5t.py_create (/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/build/h5py/h5py/h5t.c:15500)
  File "h5py/h5t.pyx", line 1468, in h5py.h5t.py_create (/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/build/h5py/h5py/h5t.c:15331)
  File "h5py/h5t.pyx", line 1523, in h5py.h5t.py_create (/panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/build/h5py/h5py/h5t.c:15249)
TypeError: Object dtype dtype('O') has no native HDF5 equivalent


(qiime-1.9.0)print_qiime_config.py

System information
==================
         Platform:      linux2
   Python version:      2.7.9 (default, Dec 15 2014, 22:30:12)  [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)]
Python executable:      /panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/bin/python

QIIME default reference information
===================================
For details on what files are used as QIIME's default references, see here:
 https://github.com/biocore/qiime-default-reference/releases/tag/0.1.1

Dependency versions
===================
          QIIME library version:        1.9.0
           QIIME script version:        1.9.0
qiime-default-reference version:        0.1.1
                  NumPy version:        1.9.1
                  SciPy version:        0.15.1
                 pandas version:        0.15.2
             matplotlib version:        1.4.2
            biom-format version:        2.1.3
                   h5py version:        2.4.0 (HDF5 version: 1.8.5)
                   qcli version:        0.1.1
                   pyqi version:        0.3.2
             scikit-bio version:        0.2.2
                 PyNAST version:        1.2.2
                Emperor version:        0.9.51
                burrito version:        0.9.0
       burrito-fillings version:        Installed.
              sortmerna version:        SortMeRNA version 2.0, 29/11/2014
              sumaclust version:        SUMACLUST Version 1.0.00
                  swarm version:        Swarm 1.2.19 [Feb  8 2015 15:53:44]
                          gdata:        Installed.

QIIME config values
===================
For definitions of these settings and to learn how to configure QIIME, see here:
 http://qiime.org/install/qiime_config.html
 http://qiime.org/tutorials/parallel_qiime.html

                     blastmat_dir:      None
      pick_otus_reference_seqs_fp:      /panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime_default_reference-0.1.1-py2.7.egg/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                         sc_queue:      all.q
      topiaryexplorer_project_dir:      None
     pynast_template_alignment_fp:      /panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime_default_reference-0.1.1-py2.7.egg/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.fasta
                  cluster_jobs_fp:      start_parallel_jobs_torque.py
pynast_template_alignment_blastdb:      None
assign_taxonomy_reference_seqs_fp:      /panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime_default_reference-0.1.1-py2.7.egg/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                     torque_queue:      default
                    jobs_to_start:      1
            denoiser_min_per_core:      50
assign_taxonomy_id_to_taxonomy_fp:      /panfs/pfs.acf.ku.edu/cluster/6.2/qiime/1.9.0/lib/python2.7/site-packages/qiime_default_reference-0.1.1-py2.7.egg/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
                         temp_dir:      /work/samikbag/qiime_temp
                     slurm_memory:      None
                      slurm_queue:      None
                      blastall_fp:      blastall
                 seconds_to_sleep:      1

(1.8.0)print_qiime_config.py

System information
==================
         Platform:      linux2
   Python version:      2.7.3 (default, Oct 12 2012, 10:01:01)  [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)]
Python executable:      /tools/cluster/6.2/qiime/1.8.0/bin/python

Dependency versions
===================
                     PyCogent version:  1.5.3
                        NumPy version:  1.7.1
                        SciPy version:  0.13.3
                   matplotlib version:  1.3.1
                  biom-format version:  1.3.1
                         qcli version:  0.1.0
                         pyqi version:  0.3.1
                         bipy version:  0.0.0-dev
                QIIME library version:  1.8.0-dev
                 QIIME script version:  1.8.0-dev
        PyNAST version (if installed):  1.2.2
                      Emperor version:  0.9.3
RDP Classifier version (if installed):  Not installed.
          Java version (if installed):  1.7.0_09-icedtea

QIIME config values
===================
                     blastmat_dir:      None
   template_alignment_lanemask_fp:      None
                    jobs_to_start:      1
pynast_template_alignment_blastdb:      None
                         sc_queue:      all.q
            denoiser_min_per_core:      50
      topiaryexplorer_project_dir:      None
     pynast_template_alignment_fp:      None
assign_taxonomy_id_to_taxonomy_fp:      None
                         temp_dir:      /work/samikbag/qiime_temp
assign_taxonomy_reference_seqs_fp:      None
                     torque_queue:      friendlyq
                      blastall_fp:      blastall
                 seconds_to_sleep:      2
                  cluster_jobs_fp:      None




Thank you
Samik

--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/WJ9tPQIhB-g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Kind Regards

 
Samik Bagchi, PhD
Center for Metagenomic Microbial Community Analysis 
University of Kansas
Lawrence, KS 66045
Combined_otu_Early.biom

Jai Ram Rideout

unread,
Nov 3, 2015, 11:07:36 AM11/3/15
to qiime...@googlegroups.com
Hi Samik,

Thanks for your input file and these details! I can reproduce the issue in QIIME 1.9.1 as well. I think it's a bug in the BIOM format software used by QIIME, I filed a bug report here with details. It looks like your input BIOM table (JSON based) has sample metadata that can't be written in the new BIOM file format (HDF5 based).

You could try uninstalling h5py to force QIIME to fall back to the older BIOM file format (JSON based). I tested this on my end and it worked. If you're unsure how to uninstall h5py, let me know how you installed QIIME and I can help you uninstall h5py.

Another workaround is removing sample metadata from the input file but I don't know of an easy way to do that.

Let me know how it goes,
Jai

SAMIK BAGCHI

unread,
Nov 3, 2015, 5:00:11 PM11/3/15
to qiime...@googlegroups.com
Hi Jai, 

Thank you for taking up the issue in GitHub. As i mentioned earlier, the script run fine in Qiime 1.8.0 and I'm fine with that at the moment. I raised the issue for you to notice and now understand the problem with the different versions of biom format. We have both 1.9.0 and 1.8.0 uploaded in Core Cluster which is shared by may users and if you can guide me the procedure to uninstall dhf5 that will be great. 

Thank you
Samik
 

Jai Ram Rideout

unread,
Nov 4, 2015, 12:02:33 PM11/4/15
to qiime...@googlegroups.com
Hi Samik,

Glad you have a workaround for now (using QIIME 1.8.0). To guide you through uninstalling h5py I need to know how QIIME was installed on your cluster. For example, was it installed with pip ("pip install qiime"), conda, or some other way? Note that you'll probably need your cluster administrator to uninstall h5py for you unless you have superuser privileges. I recommend contacting your cluster admin to see if they can uninstall h5py for you.

Jai

Jai Ram Rideout

unread,
Nov 5, 2015, 10:43:43 AM11/5/15
to qiime...@googlegroups.com
Hi Samik,

I heard back from the biom-format devs and here's another workaround you can use if you do not want to uninstall h5py:

1. Use "biom convert" with --collapsed-samples to convert your BIOM table into a format compatible with the new file format:

biom convert -i Combined_otu_Early.biom --to-hdf5 --collapsed-samples -o Combined_otu_Early_converted.biom

2. Use this new table with filter_otus_from_otu_table.py:

filter_otus_from_otu_table.py -i Combined_otu_Early_converted.biom -n 1 -o Early_otus.biom

Hope this helps,
Jai

SAMIK BAGCHI

unread,
Nov 5, 2015, 11:02:16 AM11/5/15
to qiime...@googlegroups.com
Hi Jai, 

Thank you. I think it's a great workaround since i was worried that uninstalling h5py would have hampered the new biom format table. 

thank you for your help. 

Samik 
Reply all
Reply to author
Forward
0 new messages