HDF5 output on cluster

318 views
Skip to first unread message

Igal Tsarfis

unread,
Nov 11, 2013, 7:38:53 AM11/11/13
to pflotra...@googlegroups.com
Hi

I'm having trouble outputting to HDF format on a cluster of 2 machines.
The cases I've tried:
a. running the same input deck with tecplot, it writes the files.
b. running the same input deck using parallel run of 2 CPU's on EITHER one of the machines, it outputs HDF & XMF files.
c. using the HDF group test, seems to be successful.

The errors are:
 --> creating hdf5 output file: /home/v/Desktop/work/Pflotran_b/b.h5
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) MPI-process 0:
  #000: H5F.c line 1500 in H5Fcreate(): unable to create file
    major: File accessability
    minor: Unable to open file

The output log is attached.
Any thoughts would be appreciated,

Igal
error_HDF5_WRITE.log

Hammond, Glenn E

unread,
Nov 11, 2013, 11:34:45 AM11/11/13
to pflotra...@googlegroups.com

Note sure what to tell you here.  I modified the non-legacy regression suite to attempt to write to an hdf5 files.  Can you pull the latest source and see if all regression tests pass?

 

cd $PFLOTRAN_DIR/src/pflotran

hg pull -u

make pflotran

make test

 

There should be 51 of 51 tests that pass.

Glenn

Gautam Bisht

unread,
Nov 11, 2013, 11:43:21 AM11/11/13
to pflotra...@googlegroups.com
Igal,

Did you installed PETSc with support for HDF5 (--download-hdf5=1 or --with-hdf5-dir=your-local-hdf5-dir)?

-Gautam.

Igal Tsarfis

unread,
Nov 11, 2013, 11:46:17 AM11/11/13
to pflotra...@googlegroups.com, geh...@sandia.gov
At the moment I'm working on the legacy version compiled (because of the MPHASE support), so I'll make a new copy and compile as you suggested.
I'll accomplish that by tomorrow probably, and upload the output.

Thank you,
Igal

Igal Tsarfis

unread,
Nov 11, 2013, 11:56:50 AM11/11/13
to pflotra...@googlegroups.com
Yes, I did installed PETSc with support for HDF5 using --download-hdf5=1 and had no prior HDF5 installation.

Igal Tsarfis

unread,
Nov 12, 2013, 8:42:16 AM11/12/13
to pflotra...@googlegroups.com
Hi
I've compiled a non-legacy version on two machines, along to the legacy one.
The regression test logs are attached (51 tests, 2 of which are failed only about the tolerance).
Again, each machine writes HDF5 & XMF when run individually, but not when run together.
I've noticed that the h5 file and the XMF ones are created, however, stays empty due to the same errors as on the legacy run.

Igal
pflotran-tests-2013-11-12_14-58-09.testlog
pflotran-tests-2013-11-12_15-12-50.testlog

Hammond, Glenn E

unread,
Nov 12, 2013, 11:05:22 AM11/12/13
to pflotra...@googlegroups.com, pflotr...@googlegroups.com

Igal,

 

Yes, clearly this is not an issue with HDF5 on a single parallel run.  After a google search on the HDF5 error messages sent yesterday (i.e. “open failed on a remote node”), I found a couple posts that suggest that this is an issue with your MPI_IO and MPI installation.  You are using a very recent version of mpich (3.0.4).  I suggest installing an older, but more stable version of mpich2 (e.g. mpich2-1.4.1).  I have had issues with newer version.  Other than that, I am not sure what to suggest.

 

From here on out, we need to move this conversation over to the pflotran-dev mailing list as there are PETSc developers on that list that can possibly help us out.  But please try mpich2-1.4.1.  If that works, we can revisit the issue with mpich-3.0.4 with scientists at Argonne.

 

Glenn

igal.t...@gmail.com

unread,
Nov 21, 2013, 11:36:30 AM11/21/13
to pflotr...@googlegroups.com, pflotra...@googlegroups.com, geh...@sandia.gov
Hi

I've tried to use the older mpich2-1.4.1 by typing for the master machine:
/usr/local/bin/mpiexec -n 4 .... machines.txt .. ./pflotran ...
and got the same errors on HDF5 write.


Igal

Charlotte Barbier

unread,
Nov 25, 2013, 7:09:01 PM11/25/13
to pflotra...@googlegroups.com, pflotr...@googlegroups.com, geh...@sandia.gov
Hi Igal,

I have the same problem. I installed with openmpi using:
./config/configure.py --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-openmpi=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --with-debugging=1 --with-c2html=0
and install pflotran with "make pflotran".

The pflotran runs are working (serial or parallel) as long as I don't use a h5 file for the mesh, otherwise I get the error below. I tested with the simple example given in pflotran-dev/example_problems/umesh/mixed/ : it works on the login node but not when I submit to the cluster. The cluster is a SGI Altix machine...  
 
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) MPI-process 0:
  #000: H5F.c line 1582 in H5Fopen(): unable to open file
    major: File accessability
    minor: Unable to open file
  #001: H5F.c line 1271 in H5F_open(): unable to open file: time = Thu Nov 21 10:40:05 2013
, name = 'mixed.h5', tent_flags = 0
    major: File accessability
    minor: Unable to open file
  #002: H5FD.c line 987 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDmpio.c line 1052 in H5FD_mpio_open(): MPI_File_open failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #004: H5FDmpio.c line 1052 in H5FD_mpio_open(): MPI_ERR_NO_SUCH_FILE: no such file or directory
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) MPI-process 0:
  #000: H5D.c line 316 in H5Dopen2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) MPI-process 0:
  #000: H5D.c line 437 in H5Dget_space(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) MPI-process 0:
  #000: H5S.c line 794 in H5Sget_simple_extent_ndims(): not a dataspace
    major: Invalid arguments to routine
    minor: Inappropriate type


Igal Tsarfis

unread,
Nov 28, 2013, 2:50:19 AM11/28/13
to pflotra...@googlegroups.com, pflotr...@googlegroups.com, geh...@sandia.gov
Charlotte,

Thank you for this information, I've contacted the PETSc support group with this issue.
I'll post any insight, if reached.

Igal

Greg Lackey

unread,
Mar 23, 2015, 6:21:25 PM3/23/15
to pflotra...@googlegroups.com, pflotr...@googlegroups.com, geh...@sandia.gov
Igal,

I'm writing to ask if you ever found a resolution to this problem?

I have recently been getting a similar HDF5-DIAG error on a remote machine that I have been using. It only occurs when I use more than one processor and try to write an h5 output file.

Thanks

Greg Lackey

Greg Lackey

unread,
Jun 18, 2015, 10:27:49 AM6/18/15
to pflotra...@googlegroups.com, pflotr...@googlegroups.com, geh...@sandia.gov
Hello everyone,

I'm still experiencing issues with creating pflotran output files on a remote machine using more than one processor.

It appears to be the same issue that Igal had back in 2013. I get the following HDF5-DIAG error when I output an h5 file:

 --> appending to hdf5 output file: pflotran.h5

HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) MPI-process 0:
  #000: H5D.c line 141 in H5Dcreate2(): not a location ID

    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value

This conversation was moved to pflotran-dev and I'm just wondering if there was ever a resolution.

Thanks,

Greg Lackey

Hammond, Glenn E

unread,
Jun 18, 2015, 10:45:40 AM6/18/15
to Greg Lackey, pflotra...@googlegroups.com, pflotr...@googlegroups.com
To my knowledge this has not been addressed, and I doubt it ever will be addressed until we can replicate the issue on one of our machines (I actually cannot remember whether anyone attempted to run your input deck on a developer machine.  Can someone remind me?).

Glenn

Greg Lackey

unread,
Jun 18, 2015, 6:27:05 PM6/18/15
to Hammond, Glenn E, pflotra...@googlegroups.com, pflotr...@googlegroups.com
I ended up reconfiguring petsc and specifying my mpi directory (vs. --download-mpich=yes). This cleared up the issue with outputting HDF5 files using multiple processors. Not sure if it was an issue specific to my remote machine.

Thanks



miru...@colorado.edu

unread,
Jun 14, 2017, 5:29:00 PM6/14/17
to pflotran-dev, geh...@sandia.gov, pflotra...@googlegroups.com, gdl...@gmail.com
Greg, how did you do this? something like:

./configure --download-hdf5=yes --with-mpi-dir=<dir>

Greg Lackey

unread,
Jun 14, 2017, 5:33:37 PM6/14/17
to miru...@colorado.edu, pflotran-dev, Hammond, Glenn E, pflotra...@googlegroups.com
Mickey,

Yes, I believe that is what I did. But that was awhile ago now so I'm not 100% sure.


Reply all
Reply to author
Forward
0 new messages