HDF5 error during the parallel simulations

820 views
Skip to first unread message

LIMING CHAO

unread,
Jul 13, 2022, 4:44:40 AM7/13/22
to IBAMR Users
Dear all,

When I simulated a 3D case with parallel command : mpiexec -np 4 ./main3d input3d, a error occurs: 

Writing visualization files...

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1498 in H5F_open(): unable to open file: time = Wed Jul 13 16:38:12 2022
, name = 'viz_3d/visit_dump.00000/processor_cluster.00000.samrai', tent_flags = 1
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = 'viz_3d/visit_dump.00000/processor_cluster.00000.samrai', errno = 2, error message = 'No such file or directory', flags = 1, o_flags = 2
    major: File accessibilty
    minor: Unable to open file
P=00002:Program abort called in file ``../../../../SAMRAI-2.4.4/source/toolbox/restartdb/HDFDatabase.C'' at line 2427
HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1498 in H5F_open(): unable to open file: time = Wed Jul 13 16:38:12 2022
, name = 'viz_3d/visit_dump.00000/processor_cluster.00000.samrai', tent_flags = 1
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = 'viz_3d/visit_dump.00000/processor_cluster.00000.samrai', errno = 2, error message = 'No such file or directory', flags = 1, o_flags = 2
    major: File accessibilty
    minor: Unable to open file
P=00001:Program abort called in file ``../../../../SAMRAI-2.4.4/source/toolbox/restartdb/HDFDatabase.C'' at line 2427
P=00001:ERROR MESSAGE:
P=00001:Unable to open HDF5 file viz_3d/visit_dump.00000/processor_cluster.00000.samrai
P=00001:
HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1498 in H5F_open(): unable to open file: time = Wed Jul 13 16:38:12 2022
, name = 'viz_3d/visit_dump.00000/processor_cluster.00000.samrai', tent_flags = 1
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = 'viz_3d/visit_dump.00000/processor_cluster.00000.samrai', errno = 2, error message = 'No such file or directory', flags = 1, o_flags = 2
    major: File accessibilty
    minor: Unable to open file
P=00003:Program abort called in file ``../../../../SAMRAI-2.4.4/source/toolbox/restartdb/HDFDatabase.C'' at line 2427
P=00003:ERROR MESSAGE:
P=00003:Unable to open HDF5 file viz_3d/visit_dump.00000/processor_cluster.00000.samrai
P=00003:
P=00002:ERROR MESSAGE:
P=00002:Unable to open HDF5 file viz_3d/visit_dump.00000/processor_cluster.00000.samrai
P=00002:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.13.2, Jun 02, 2020
[0]PETSC ERROR: ./main3d on a linux-opt named clm-Vostro-5490 by clm Wed Jul 13 16:37:46 2022
[0]PETSC ERROR: Configure options --CC=/home/clm/sfw/linux/openmpi/4.0.2/bin/mpicc --CXX=/home/clm/sfw/linux/openmpi/4.0.2/bin/mpicxx --FC=/home/clm/sfw/linux/openmpi/4.0.2/bin/mpif90 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1 --download-fblaslapack=1 --with-x=0
[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
[clm-Vostro-5490:22701] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
[clm-Vostro-5490:22701] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

How can I solve it ?

Best,

Li-Ming

Boyce Griffith

unread,
Jul 13, 2022, 11:32:05 AM7/13/22
to noreply-spamdigest via IBAMR Users

On Jul 13, 2022, at 4:44 AM, LIMING CHAO <chaoli...@gmail.com> wrote:

Dear all,

When I simulated a 3D case with parallel command : mpiexec -np 4 ./main3d input3d, a error occurs: 

I think this means that you are not able to write the visualization file. Perhaps you are over your disk quota or otherwise out of space in the location where you are trying to write the file. You also might not have permission to write files in the specified location.

--
You received this message because you are subscribed to the Google Groups "IBAMR Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ibamr-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ibamr-users/52b6a95d-4186-458d-b670-fa59a90a4891n%40googlegroups.com.

Li-Ming Chao

unread,
Jul 13, 2022, 12:06:00 PM7/13/22
to IBAMR Users
Dear Prof. Boyce,

The code works well if I only use one core with command: ./main3d input3d. However, the parallel simulation failed. Are there some problem with MPI?

Best,

Li-Ming

Boyce Griffith

unread,
Jul 14, 2022, 9:05:01 AM7/14/22
to noreply-spamdigest via IBAMR Users

On Jul 13, 2022, at 12:05 PM, Li-Ming Chao <clm0...@gmail.com> wrote:

Dear Prof. Boyce,

The code works well if I only use one core with command: ./main3d input3d. However, the parallel simulation failed. Are there some problem with MPI?

We don’t use parallel HDF5. This seems like it might not be a useful suggestion, but can you try “-np 1” and “-np 2” to see if either of those work? I am not sure, but I am a bit skeptical that this is related to parallelization in the code per se.

Wells, David

unread,
Jul 14, 2022, 9:27:46 AM7/14/22
to IBAMR Users
Hi Li-Ming,

One possibility is that you used one version of mpi to compile IBAMR and an mpirun/mpiexec executable from another version of MPI to run it - that would cause exactly this error in parallel. Can you double-check that you are consistently using the same version of MPI through-out?

Best,
David

From: ibamr...@googlegroups.com <ibamr...@googlegroups.com> on behalf of Li-Ming Chao <clm0...@gmail.com>
Sent: Wednesday, July 13, 2022 12:05 PM
To: IBAMR Users <ibamr...@googlegroups.com>
Subject: Re: [ibamr-users] HDF5 error during the parallel simulations
 

X A (Mike Lin)

unread,
Mar 31, 2023, 3:33:24 AM3/31/23
to IBAMR Users
Hi everyone,

I am surprised to discover the same issue, and this is my report:

I used the same optimised build of "main2d" for the eel2d case in the ConstraintIB folder. I have been trying to use
tagging_method = "REFINE_BOXES" 
instead of the one used in the original case
tagging_method = "GRADIENT_DETECTOR".

In other words, I was trying to use prescribed mesh refinement instead of automatically adaptive mesh refinement.

I have double-checked that:
  1. my disk is not full
  2. "mpiexec -np 1" runs OK for  tagging_method = "REFINE_BOXES" 
  3. mpi version is most likely not a problem, since it runs normally when I use  tagging_method = "GRADIENT_DETECTOR"
So I feel that something might be wrong with tagging_method = "REFINE_BOXES" ? Most likely an IBAMR issue rather than HDF5 issue? Why Unable to open file?

Best,

Mike

X A (Mike Lin)

unread,
Mar 31, 2023, 3:44:31 AM3/31/23
to IBAMR Users
It seems that something wrong with SAMRAI? 


Below is the full error log
--------------------------------------------------------------------------

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1567 in H5F_open(): unable to lock the file

    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 959 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed

P=00002:Program abort called in file ``../../../../SAMRAI-2.4.4/source/toolbox/restartdb/HDFDatabase.C'' at line 2427
P=00002:ERROR MESSAGE:
P=00002:Unable to open HDF5 file viz_eel2d_Str/visit_dump.00000/processor_cluster.00000.samrai
P=00002:

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1567 in H5F_open(): unable to lock the file

    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 959 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed

P=00003:Program abort called in file ``../../../../SAMRAI-2.4.4/source/toolbox/restartdb/HDFDatabase.C'' at line 2427
P=00003:ERROR MESSAGE:
P=00003:Unable to open HDF5 file viz_eel2d_Str/visit_dump.00000/processor_cluster.00000.samrai
P=00003:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD

with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[n02312:07853] 3 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[n02312:07853] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[n02312:07853] 1 more process has sent help message help-mpi-api.txt / mpi-abort

Boyce Griffith

unread,
Mar 31, 2023, 5:23:17 AM3/31/23
to ibamr...@googlegroups.com
This really looks like a file system issue. For this example, are you using the standard input files and just changing the tagging strategy?

Amneet Bhalla

unread,
Mar 31, 2023, 2:02:52 PM3/31/23
to ibamr...@googlegroups.com
Make sure that when you switch to "REFINE_BOXES", you are actually giving the correct boxes to tag. I suspect the manually defined boxes are out of range of the domain. 

Here, is one example where we use manual tagging of the domain


Note that if your intent is to use a static mesh throughout the simulation (maybe because the body is not traversing beyond a certain region in the domain), you can stop the regridding process altogether. You can achieve this by setting the regridding interval to be a large value (more than the total number of integrator steps)

You can replace this line indicating regrid_cfl_interval

by 

regrid_interval = MAX_INTERATOR_STEPS 

However, stopping the regridding process is risky for IB methods like ConstraintIBmethod, because the lagrangian points may still be moving out of a grid cell and that information needs to be captured by the data structures of IBHierarchyIntegrator. This however, is not a problem for IBLevelSetMethod that has no lagrangian markers, and the mesh can remain static throughout the simulation. 


  



--
--Amneet 



Reply all
Reply to author
Forward
0 new messages