IOError: Can't read data (Can't open directory)

2,449 views
Skip to first unread message

Sarah Pohl

unread,
Oct 8, 2014, 8:09:00 AM10/8/14
to h5...@googlegroups.com
Hello,

I'm new to the HDF5 data format and "just" wanted to have a look at a file I received from a colleague. I can open it with h5py.File()
and go through the different groups and subgroups within. In fact, everything I tried about the groups so far (e.g. .name or .values()) worked perfectly fine. Some of the groups also contain datasets, for which I can look at .shape and .dtype as expected. Only when I want to see some of the data using a simple grp["dset"][0] indexing, I get this error message:

IOError                                   Traceback (most recent call last)
<ipython-input-45-509cebb66565> in <module>()
     
1 print geno["matrix"].shape
     
2 print geno["matrix"].dtype
----> 3 geno["matrix"][0]

/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.pyc in __getitem__(self, args)
   
443         mspace = h5s.create_simple(mshape)
   
444         fspace = selection._id
--> 445         self.id.read(mspace, fspace, arr, mtype)
   
446
   
447         # Patch up the output for NumPy

/home/sarah/anaconda/lib/python2.7/site-packages/h5py/h5d.so in h5py.h5d.DatasetID.read (h5py/h5d.c:2782)()

/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_proxy.so in h5py._proxy.dset_rw (h5py/_proxy.c:1709)()

/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_proxy.so in h5py._proxy.H5PY_H5Dread (h5py/_proxy.c:1379)()

IOError: Can't read data (Can't open directory)

Can somebody tell me what that means, and what I have to do to make this work?
By the way, I can open the dataset in HDFView.

Bests,
Sarah

Andrew Collette

unread,
Oct 8, 2014, 12:23:50 PM10/8/14
to h5...@googlegroups.com
Hi Sarah,

> IOError: Can't read data (Can't open directory)
>
> Can somebody tell me what that means, and what I have to do to make this
> work?
> By the way, I can open the dataset in HDFView.

I've not seen this error before, but from what I gather from Google
it's an error related to the HDF5 plug-in mechanism. The dataset you
have presumably uses a third-party "filter" which isn't available.
Your best bet is to ask the person you got the file from for an
"un-filtered" version, or for a copy of the filter library.

Sorry I can't be more helpful, but I've not used the HDF5 auto-plugin
mechanism much. People on the HDF-Forum list might have more
practical experience with this issue.

Andrew

Sarah Pohl

unread,
Oct 9, 2014, 7:25:37 AM10/9/14
to h5...@googlegroups.com
Hey Andrew,


On Wednesday, 8 October 2014 18:23:50 UTC+2, Andrew Collette wrote:

I've not seen this error before, but from what I gather from Google
it's an error related to the HDF5 plug-in mechanism.  The dataset you
have presumably uses a third-party "filter" which isn't available.
Your best bet is to ask the person you got the file from for an
"un-filtered" version, or for a copy of the filter library.
 
A filter? I really have to read more about that format, I don't understand what that could be.
I also have the raw data, but I don't know how to convert it to HDF5 format yet.

I'll just continue reading then, until I figure it out. Thanks!

Sarah

Sarah Pohl

unread,
Oct 10, 2014, 6:54:19 AM10/10/14
to h5...@googlegroups.com
OK, so apparently the dataset in the file has been compressed with gzip. It looks like this is a kind of standard, especially for h5py. Does that mean that that can't be the problem, or did I just miss something when I installed the HDF5 library and h5py?

Andrew Collette

unread,
Oct 10, 2014, 11:17:10 AM10/10/14
to h5...@googlegroups.com
Hmmm... gzip should work everywhere, since it's distributed with HDF5 itself.

Could I ask you to (1) let me know your anaconda version ("conda -V"),
and (2) run the attached script, and post the output?

Andrew
t.py

Sarah Pohl

unread,
Oct 13, 2014, 4:17:14 AM10/13/14
to h5...@googlegroups.com

Hmmm... gzip should work everywhere, since it's distributed with HDF5 itself.

Could I ask you to (1) let me know your anaconda version ("conda -V"),

Sure, it's conda 3.7.0.
 
and (2) run the attached script, and post the output?

 Summary of the h5py configuration
---------------------------------

h5py    2.3.1
HDF5    1.8.13
Python  2.7.8 |Anaconda 2.1.0 (32-bit)| (default, Aug 21 2014, 18:22:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
sys.platform    linux2
sys.maxsize     2147483647
numpy   1.9.0

encode 1 decode 2

shuffle True 3
f32 True 3
szip False
Traceback (most recent call last):
  File "t.py", line 19, in <module>
    print name, h5z.filter_avail(filter), h5z.get_filter_info(filter)
  File "h5z.pyx", line 91, in h5py.h5z.get_filter_info (h5py/h5z.c:883)
RuntimeError: Filter info not retrieved (Required filter is not registered)

OK, so I have no HDF5 filters available? I (re-)installed the HDF5 library last week after I found out about the gzip compression and it seemed to go fine.

Sarah

Andrew Collette

unread,
Oct 13, 2014, 10:36:35 AM10/13/14
to h5...@googlegroups.com
Hi Sarah,

Whoops, I didn't realize that would produce a RuntimeError. Try the
attached script instead.

Btw, it looks as if the problem may be that the Anaconda guys don't
build HDF5 with gzip support (!). I can hardly believe that's the
case, but I found this:

https://groups.google.com/a/continuum.io/forum/#!topic/anaconda/U9l7F4DmfIw

and a thread on this forum from April (I knew I remembered something
about this!):

https://groups.google.com/forum/#!topic/h5py/ThfXBR1wRWA

Unfortunately installing HDF5 yourself may not work, as I think
there's a private copy supplied with Anaconda.

Andrew
t.py

Sarah Pohl

unread,
Oct 14, 2014, 4:12:30 AM10/14/14
to h5...@googlegroups.com
Hey Andrew,

here's the output of the new script:
Enter codeSummary of the h5py configuration
---------------------------------

h5py    
2.3.1
HDF5    
1.8.13
Python  2.7.8 |Anaconda 2.1.0 (32-bit)| (default, Aug 21 2014, 18:22:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
sys
.platform    linux2
sys
.maxsize     2147483647
numpy  
1.9.0

encode
1 decode 2

shuffle
     
True
     
3
f32
     
True
     
3
szip
     
False

gzip
     
False
so
     
True
     
3
lzf
     
True
     
3

Traceback (most recent call last):

 
File "t.py", line 25, in <module>
    dset
= f.create_dataset('x', (10,), compression='gzip')
 
File "/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset
    dsid
= dataset.make_new_dset(self, shape, dtype, data, **kwds)
 
File "/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 99, in make_new_dset
    shuffle
, fletcher32, maxshape, scaleoffset)
 
File "/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/filters.py", line 103, in generate_dcpl
   
raise ValueError('Compression filter "%s" is unavailable' % compression)
ValueError: Compression filter "gzip" is unavailable here...

So, yes, gzip seems to be unavailable...
I didn't consider Anaconda as the culprit so far... I only installed it on my Linux because I liked it so much on Windows! Do you think I should remove Anaconda? I saw this problem only posted in combination with Windows systems so far, maybe I should ask first?

Thanks for all your help so far!
Sarah

Darren Dale

unread,
Oct 14, 2014, 9:30:46 AM10/14/14
to h5py
I think the problem is that anaconda's hdf5 recipe does not list zlib as a build and run requirement.

Probably best to continue this on the anaconda mailing list, and post here with the resolution.

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sarah Pohl

unread,
Oct 16, 2014, 3:29:41 AM10/16/14
to h5...@googlegroups.com
Are we discussing this in the right place over at the anaconda mailing list, though? I have that problem with Linux, not Windows...
Does the zlib library contain gzip also?

Andrew Collette

unread,
Oct 16, 2014, 11:10:43 AM10/16/14
to h5...@googlegroups.com
Hi Sarah,

> Are we discussing this in the right place over at the anaconda mailing list,
> though? I have that problem with Linux, not Windows...
> Does the zlib library contain gzip also?

Sorry, my email on that list was misleading. It looks like Anaconda
doesn't build these filters on any platform, not just Windows.

Darren's pull request looks like it will fix the issue (if approved by
the Anaconda team), but it will likely not make its way into the world
until the next anaconda release.

In the meantime, you might want to consider using a different Python
environment. Ubuntu provides nearly all of the Python scientific
stack, including NumPy, ipython, etc. I know for a fact that libhdf5
on Ubuntu does have the gzip filter available.

Andrew

Sarah Pohl

unread,
Oct 16, 2014, 11:23:25 AM10/16/14
to h5...@googlegroups.com
Hey Andrew,

I'm still not entirely sure that that is the problem.
Here's the thing: I'm working on a virtual Linux machine on my Windows computer. When I set it up, I had some specific requirements (libraries, packages) I wanted to install, which was - since I am quite inexperienced with Linux - difficult. That's why, when I had everything set up, I cloned the virtual machine to have a backup in case I ruined it later.
Obviously, I now took to the clone to check if removing Anaconda would solve my problem, but before I removed it, I tried using that gzip filter one more time in a basic script. And it worked. I don't know what happened (yet), but it seems that I was the problem rather than Anaconda.
I will keep you posted in case I figure this out.

Sarah

Darren Dale

unread,
Oct 17, 2014, 7:29:57 AM10/17/14
to h5py

There was definitely a problem with the anaconda hdf5 recipe. Zlib is required for gzip support. The pull request was approved, and I expect you should now be able to do a "conda update hdf5". You don't have to wait for the next release of anaconda, they push updates for individual packages continually.

--

Fabrizio Niro

unread,
Nov 3, 2014, 5:29:50 PM11/3/14
to h5...@googlegroups.com
Hi Sarah and all,

I've exactly the same problem and the same error message when trying to access the .value content of an HDF5 file, as in your case I can access the .shape and .dtype of the same file and I can read the file with HDFView without any problem. 
I'm running h5py on my MacPro using Macport for installation, see the output of this script. I'm still puzzled, I tried reinstalling hdf5 and h5py many times, apparently I have no problem with gzip,

Does anyone have an idea??

Tx!!
Fabrizio

Summary of the h5py configuration

---------------------------------


h5py    2.3.1

HDF5    1.8.12

Python  2.7.6 (default, Nov 18 2013, 15:12:51) 

[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]

sys.platform    darwin

sys.maxsize     9223372036854775807

numpy   1.8.0


encode 1 decode 2


shuffle True 3

f32 True 3

szip False

Traceback (most recent call last):

  File "testh5py.py", line 19, in <module>

    print name, h5z.filter_avail(filter), h5z.get_filter_info(filter)

  File "h5z.pyx", line 91, in h5py.h5z.get_filter_info (h5py/h5z.c:855)

Reply all
Reply to author
Forward
0 new messages