Why isn't the data storage based on HDF5?

861 views
Skip to first unread message

Dana

unread,
Nov 26, 2012, 4:48:46 PM11/26/12
to openvd...@googlegroups.com
Hi everyone,

The data storage needs sound an awful lot like they'd be a good match for HDF5 (http://www.hdfgroup.org/HDF5/).  Does anyone know if the developers considered using it to store the data?  If they considered it and decided not to use it, I'd be very curious as to why it didn't suit their needs.  After all, you'd get platform independence, parallel I/O via MPI, etc. for free.

Just curious,

Dana

Ken Museth

unread,
Nov 30, 2012, 5:57:44 PM11/30/12
to openvd...@googlegroups.com
Hi Dana,

There are four primary reasons we chose not to use HDF5 as our file format. In no particular order they were: Performance, streaming, license concerns, and parallel extensions.  


Performance and parallelism would seem like good reasons to select HDF5, but unfortunately HDF5 is not a good fit for us on either count.  

While it is true that we are not currently exploiting parallelism in our I/O, adopting HDF5 wouldn’t provide a solution for our needs. HDF5’s parallel support doesn’t map well onto our parallelism approach. The parallelism model that parallel-HDF5 addresses is multiple processes and not multiple threads. Parallel support was added to HDF for the National Labs ASCI (accelerated strategic computing initiative) program. The core applications were MPI-enabled large-scale weapons simulations on large distributed memory machines (super computers), generally using techniques like AMR (adaptive mesh refinement). Our model of parallelism is a shared memory (workstations and servers), rather than distributed memory.  

Additionally, we found in other studio projects that when we took the time to produce custom file formats that exploit the specifics of the underlying data structures we saw significant performance benefits in both I/O times and file-footprints, when compared to the more general HDF5 file format.

This is an opportunity to mention that we are continuing to improve our I/O.  We expect a significant improvement over the compression factors and times currently observed in OpenVDB. To decrease I/O time and reduce the on-disk footprint of our data, we are currently implementing a parallel and blocked compression/streaming technique.   

The ability to easily stream data simply was also a large factor in the design of our I/O.  We need to support streaming to enable full Houdini integration.  For example write OpenVDB grids directly into Houdini’s BGEO files.

Finally, HDF5 is distributed with a proprietary compression scheme SZIP, which was enabled by default since HDF5 1.6.0 (released in November 2009). SZIP embodies patents that are held by the National Aeronautics & Space Administration and requires a license for commercial use. This license does not comply with our Mozilla Public License Version 2.0 and the open source nature and goals of our project.
http://www.hdfgroup.org/doc_resource/SZIP/

The OpenVDB development team

Kirill Lykov

unread,
Jan 30, 2013, 8:37:00 AM1/30/13
to openvd...@googlegroups.com
Dana, I'm using HDF5 (as well as paraview formats) for storing grid and use it on clusters with MPI. Now I have a bit different grid than one in OpenVDB, but I can write an extension for openvdb. So if you have ideas about this or related issues, you may write me for details: lykov[dot]kirill[at]gmail[dot]com

Reply all
Reply to author
Forward
0 new messages