Issues using mpirun / mpiexec

436 views
Skip to first unread message

Chuck Abolt

unread,
Apr 21, 2022, 3:48:58 PM4/21/22
to Amanzi-ATS Users
Hi everyone,

I'm having issues running ATS in parallel on a cluster at LANL.

ATS works when I run it on a single core, but I get an error message like this when I try to run it using mpirun or mpiexec.

Has anyone seen this before / know what's going wrong?

Thanks,

Chuck

cabolt@es38:/lclscratch/cabolt/NGEE-IM1-campaign/Barrow/spinup/multi_dynamic_tweaked$ mpiexec -n 32 ats --xml_file=../inputfiles/multi_dynamic_tweaked.xml > stdout.out
[es38.lanl.gov:25486] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 206
[es38.lanl.gov:25486] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 147
[es38.lanl.gov:25486] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/client/pmix_client.c at line 223
[es38.lanl.gov:25486] OPAL ERROR: Error in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix2x_client.c at line 112
*** An error occurred in MPI_Init
[es38.lanl.gov:25484] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 206
[es38.lanl.gov:25484] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 147
[es38.lanl.gov:25484] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/client/pmix_client.c at line 223
[es38.lanl.gov:25484] OPAL ERROR: Error in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix2x_client.c at line 112
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[es38.lanl.gov:25486] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[es38.lanl.gov:25484] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[es38.lanl.gov:25488] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 206
[es38.lanl.gov:25488] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 147
[es38.lanl.gov:25488] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/client/pmix_client.c at line 223
[es38.lanl.gov:25488] OPAL ERROR: Error in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix2x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[es38.lanl.gov:25488] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Lipnikov, Konstantin

unread,
Apr 21, 2022, 4:45:59 PM4/21/22
to Chuck Abolt, Amanzi-ATS Users

This is PMIX error, not Amanzi-ATS error. One of the reasons I know about is version incompatibility of libraries used to compile different pieces of software including MPI, e.g.


https://github.com/open-mpi/ompi/issues/5535


Konstantin


From: ats-...@googlegroups.com <ats-...@googlegroups.com> on behalf of Chuck Abolt <chuck...@gmail.com>
Sent: Thursday, April 21, 2022 1:48:58 PM
To: Amanzi-ATS Users
Subject: [EXTERNAL] Issues using mpirun / mpiexec
 
Hi everyone,

I'm having issues running ATS in parallel on a cluster at LANL.

ATS works when I run it on a single core, but I get an error message like this when I try to run it using mpirun or mpiexec.

Has anyone seen this before / know what's going wrong?

Thanks,

Chuck

cabolt@es38:/lclscratch/cabolt/NGEE-IM1-campaign/Barrow/spinup/multi_dynamic_tweaked$ mpiexec -n 32 ats --xml_file=../inputfiles/multi_dynamic_tweaked.xml > stdout.out
[http://es38.lanl.gov:25486] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 206
[http://es38.lanl.gov:25486] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 147
[http://es38.lanl.gov:25486] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/client/pmix_client.c at line 223
[http://es38.lanl.gov:25486] OPAL ERROR: Error in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix2x_client.c at line 112

*** An error occurred in MPI_Init
[http://es38.lanl.gov:25484] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 206
[http://es38.lanl.gov:25484] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 147
[http://es38.lanl.gov:25484] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/client/pmix_client.c at line 223
[http://es38.lanl.gov:25484] OPAL ERROR: Error in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix2x_client.c at line 112

*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[http://es38.lanl.gov:25486] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[http://es38.lanl.gov:25484] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[http://es38.lanl.gov:25488] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 206
[http://es38.lanl.gov:25488] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/mca/bfrops/v12/unpack.c at line 147
[http://es38.lanl.gov:25488] PMIX ERROR: UNPACK-PAST-END in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix/src/client/pmix_client.c at line 223
[http://es38.lanl.gov:25488] OPAL ERROR: Error in file /project/ngee/cabolt/ats-master/build/openmpi/openmpi-3.1.4-source/opal/mca/pmix/pmix2x/pmix2x_client.c at line 112

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[http://es38.lanl.gov:25488] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

--
You received this message because you are subscribed to the Google Groups "Amanzi-ATS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-users/fb461bee-3f20-4170-adc7-d54cf1db07a1n%40googlegroups.com.

Chuck Abolt

unread,
May 9, 2022, 9:27:31 PM5/9/22
to Amanzi-ATS Users
Thanks Konstantin! Sorry I didn't respond to this earlier. I switched the mpirun executable I was using and it worked fine.

Chuck
Reply all
Reply to author
Forward
0 new messages