Re: Register new filter (snappy) for h5py

516 views
Skip to first unread message

Marvin Albert

unread,
Nov 22, 2013, 7:32:56 AM11/22/13
to h5...@googlegroups.com
I would also be very interested.
Best

On Thursday, August 23, 2012 3:24:44 PM UTC+2, Michael Rissi wrote:
Hello everybody,
I just wondered if it is possible to register a new filter in h5py. I write datasets using the HDF5 c-API,  compressed with 
snappy (code.google.com/p/snappy/) within the HDF5 filter pipeline. I wondered if there is a way for users
to add a custom filter to h5py in order to be able to read this data back?
Thanks for your help!
Best regards,
Michael Rissi

Andrew Collette

unread,
Nov 22, 2013, 11:19:12 AM11/22/13
to h5...@googlegroups.com
Hi Marvin, Michael,
There's a brand-new feature in HDF5 (1.8.11) for dynamically loaded
filters, which means that we can avoid needing to explicitly register
a filter through h5py. All that's necessary is that the filter
library be located in a particular place in the file system, and HDF5
will find it and decompress the data automatically, with nothing
special on the h5py side.

It's also a very small amount of (pure-Python) work to additionally
support creating datasets with such filters. I'd be happy to provide
guidance to people who would be interested in this feature.

Description of dynamic filters:
http://www.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf

Andrew

Marvin Albert

unread,
Jan 7, 2014, 11:18:01 AM1/7/14
to h5...@googlegroups.com
Hi Andrew,

thanks a lot for your answer. It took me a while to come back to you because now I have a working custom filter which I would really like to be able to use with h5py.
As you pointed out, dynamically loaded filter feature seems to be perfectly suited for this purpose. I just tried loading a file with a custom filter using h5py and it worked!
So for writing you say some things need to be changed in the python part of h5py? As far as I understand, at the moment the user can choose between gzip and lzf within h5py. Is there a straight-forward way of generalizing this?

Thanks a lot!
Best,
Marvin

Andrew Collette

unread,
Jan 7, 2014, 2:53:00 PM1/7/14
to h5...@googlegroups.com
Hi Marvin,

> So for writing you say some things need to be changed in the python part of
> h5py? As far as I understand, at the moment the user can choose between gzip
> and lzf within h5py. Is there a straight-forward way of generalizing this?

Yes, I think the place to start is here:

https://github.com/h5py/h5py/blob/master/h5py/_hl/filters.py

See also the contributors' guide:

http://www.h5py.org/docs/meta/contributing.html

Right now we have an enumerated list of filters which are supported.
This could be extended to allow the user to specify the filter code
manually, and any filter options, e.g.:

create_dataset('x', (10,), compression=32003, compression_opts=(whatever))

I think it could be as simple as modifying generate_dcpl() in the
above file to add an "else" statement.

You should also be prepared to write tests. For example, you could
test it by supplying the raw integer code for the GZIP filter
(h5py.h5z.FILTER_DEFLATE) and compression argument, and making sure
that dset.compression == 'gzip', etc.

I think this would be a great addition to h5py.

Andrew

Bernhard Kohn

unread,
Jul 22, 2014, 2:11:39 PM7/22/14
to h5...@googlegroups.com
Hi Marvin,

did you every succeed in adding a custom filter? I would also like to add a custom filter, but until now didn't found a solution.

Any hint would be great

best regards
  Bernhard

Marvin Albert

unread,
Jul 22, 2014, 2:17:32 PM7/22/14
to h5...@googlegroups.com
Hi Bernhard,

I did end up using the dynamically loaded filter feature but without actually changing the h5py code.
Look at the following example code:

    ar = n.ascontiguousarray(ar,dtype=n.uint16)
    dims = ar.shape

    # define chunk as planes defined by the last two axes
    chunk = dims[-2:]
    if len(dims)==3: chunk = (1,)+chunk

    file = h5py.h5f.create(filename)

    # Create the dataspace.  
    space = h5py.h5s.create_simple(dims)

    # Create the dataset creation property list and set the chunk size.
    dcpl = h5py.h5p.create(h5py.h5p.DATASET_CREATE)
    dcpl.set_chunk(chunk)
    dcpl.set_filter(307,h5py.h5z.FLAG_MANDATORY,(chunk[-2],chunk[-1],quality))

    # Create the chunked dataset.
    dset = h5py.h5d.create(file, hierarchy, h5py.h5t.STD_U16LE, space, dcpl)

    # Write the data to the dataset.
    dset.write(h5py.h5s.ALL, space, ar)

    # Force the objects to be closed.
    del dcpl
    del dset
    del space
    del file

In the line
dcpl.set_filter(307,h5py.h5z.FLAG_MANDATORY,(chunk[-2],chunk[-1],quality))
the filter number is set to 307 and as the last argument I pass the additional filter options.

Let me know if that helps.
Cheers,
Marvin

--
You received this message because you are subscribed to a topic in the Google Groups "h5py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h5py/JqaBv98OGMs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h5py+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bernhard Kohn

unread,
Jul 22, 2014, 2:22:17 PM7/22/14
to h5...@googlegroups.com
Hi Andrew, 

I tried to follow your approach for adding a dynamically loaded filter. 
But I haven#t succeed. I started to do this with an anaconda distribution on Windows. 
I downloaded the hdf5 version 1.8.11 and tested, if I could get the filter loaded dynamically on plain c, which
worked. Then I replaced in the site-package/h5py directory the provided hdf5 dll's with the 1.8.11


On Tuesday, January 7, 2014 8:53:00 PM UTC+1, Andrew Collette wrote:
Hi Marvin,

> So for writing you say some things need to be changed in the python part of
> h5py? As far as I understand, at the moment the user can choose between gzip
> and lzf within h5py. Is there a straight-forward way of generalizing this?

Yes, I think the place to start is here:

https://github.com/h5py/h5py/blob/master/h5py/_hl/filters.py


and added accordingly the new filter (lz4 compression). But It seems, that the filter get not recognized by the
_gen_filter_tuples() function, that means it is not present in the encode list.

Do I have to build the h5py from scratch to get things working?

Andrew Collette

unread,
Jul 22, 2014, 4:29:36 PM7/22/14
to h5...@googlegroups.com
Hi Bernhard,

> I downloaded the hdf5 version 1.8.11 and tested, if I could get the filter
> loaded dynamically on plain c, which
> worked. Then I replaced in the site-package/h5py directory the provided hdf5
> dll's with the 1.8.11

You should build h5py from source to do this (and if you want to
submit a pull request). To simplify this, on master there is a paver
file for building HDF5 (in "windows" directory), and a paver file in
the root directory for building h5py.

Andrew

Bernhard Kohn

unread,
Jul 23, 2014, 1:30:51 AM7/23/14
to h5...@googlegroups.com
Hi Marvin,

thanks a lot for this sample, with very few modifications it works right out of the box!

best regards
  Bernhard
Reply all
Reply to author
Forward
0 new messages