parallel read/write from several machines

405 views
Skip to first unread message

Zoltan Benedek

unread,
Mar 2, 2014, 12:39:42 PM3/2/14
to h5...@googlegroups.com
Hi,

Recently I've found HDF5 and this great project and I have a question:

What is the recommended way to use h5py from multiple machines on the same HDF5 file/group/dataset?
Can I use Parallel HDF5 from several machines, working on the same dataset?

Thanks
Zoltan Benedek

Andrew Collette

unread,
Mar 3, 2014, 2:59:20 PM3/3/14
to h5...@googlegroups.com
Hi Zoltan,

> What is the recommended way to use h5py from multiple machines on the same
> HDF5 file/group/dataset?
> Can I use Parallel HDF5 from several machines, working on the same dataset?

Yes, you can (using mpi4py), although I suspect you'll need a proper
parallel filesystem.

You can find basic info on using MPI and h5py together here:

http://docs.h5py.org/en/latest/mpi.html

Andrew

Pablo Rozas-Larraondo

unread,
Mar 21, 2014, 2:54:30 AM3/21/14
to h5...@googlegroups.com
Hello,

Following on Zoltan idea I'm really interested on building a HDF5 non blocking file server where different processes can read and write on the same files. I have read the basic info on the docs and also the concurrency chapter on the Python and HDF5 book, but I would like to know if there is a project or any example that I can use as a base for this kind of HDF5 database.

Thanks,
Pablo

Lukas Solanka

unread,
Mar 21, 2014, 6:08:07 AM3/21/14
to h5...@googlegroups.com
Hi Pablo,

I am not an expert, but couldn't a possible problem with such an approach be that you introduce a bottleneck by imposing a barrier that all the processes have to synchronise to? Unless the server is distributed but that probably is not a trivial solution to implement.

cheers,
Lukas

Pablo Rozas-Larraondo

unread,
Mar 21, 2014, 7:18:18 AM3/21/14
to h5...@googlegroups.com
Hi Lukas,

I don't fully understand your point. The basic idea would be to create a lock in the file when a process is writing on it but still allow other processes to read from the file. What I would like to know is if there are any implementations or examples of a concurrent system making use of parallel HDF5 based on h5py and mpi4py.

Cheers,
Pablo

Andrew Collette

unread,
Mar 21, 2014, 12:37:41 PM3/21/14
to h5...@googlegroups.com
Hi Pablo,

> Following on Zoltan idea I'm really interested on building a HDF5 non
> blocking file server where different processes can read and write on the
> same files. I have read the basic info on the docs and also the concurrency
> chapter on the Python and HDF5 book, but I would like to know if there is a
> project or any example that I can use as a base for this kind of HDF5
> database.

I'm not aware of a specific codebase which does this, but it's
certainly the sort of thing that MPI is supposed to handle. There are
some edge cases when using an HDF5 file as a shared data store,
related to the MPI semantics for atomic writes, but with a file open
with the "mpio" driver in general it works the way you describe.

If (as it sounds) you want one process to "broadcast" information to
other processes, MPI (and mpi4py) actually has that built-in. My
recommendation would be to go through the mpi4py docs and see if
there's an MPI-native approach that might be easier to develop and
debug than using a shared HDF5 file. There's also a good description
in there of things like Barrier() that you'll need in any case for
synchronization.

Andrew

Jialin Liu

unread,
Jul 11, 2017, 3:44:03 PM7/11/17
to h5py
Reply all
Reply to author
Forward
0 new messages