New preliminary support for Blosc2

4 views
Skip to first unread message

Francesc Alted

unread,
Aug 24, 2022, 5:47:46 AM8/24/22
to pytabl...@googlegroups.com
Hi team,

Just a brief message to tell you how the integration of Blosc in PyTables is going.  As you know, we are using the https://github.com/PyTables/PyTables/tree/direct-chunking-blosc2 branch for that.

All the basic functionality is there: chunks can be written and read using the new  embedded blosc2 filter for HDF5.  In addition, blosc2 chunks can be read more efficiently by introspecting into the chunk and do *partial* reads, while doing the I/O in parallel (a big improvement wrt blosc1), which should lead to much better performance for reads smaller than a chunk.  See an example here: https://github.com/PyTables/PyTables/blob/direct-chunking-blosc2/src/H5TB-opt.c#L352

The plan is to extend the optimized reads to arrays too, and also allow to create tables and arrays using the H5Dwrite_chunk (direct chunking) mechanism, instead of the existing blosc2 filter (this should provide a nice boost in write performance).  For this, I am expecting my colleague Oscar Guiñón to continue the job (expect seeing commits from him soon).

Finally, although we have already made quite a lot of progress, there is still lots of room for improvements, but I think we can complete the intended job for next October.  This is coincidental with the release of Python 3.11, so it would be nice to release wheels for it too (the plan is to release wheels for 3.8, 3.9, 3.10 and 3.11).  Let's see how it goes.

Will keep you informed as we make more progress!

Best,

--
Francesc Alted
Reply all
Reply to author
Forward
0 new messages