OpenMPI-5.0.6: -x LD_LIBRARY_PATH not able to load shared objects

4 views
Skip to first unread message

Sangam B

unread,
Feb 14, 2025, 7:22:40 AMFeb 14
to Open MPI users
Hi,

OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1 compilers. An mpi program is compiled with this openmpi-5.0.6. 

While submitting job thru PBS on a Linux cluster, the intel compilers is sourced and the same is passed thru OpenMPI's mpirun command option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But still the job fails with following error: 

prted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory

PRTE has lost communication with a remote daemon.

  HNP daemon   : [prterun-cn19-2146925@0,0] on node cn19
  Remote daemon: [prterun-cn19-2146925@0,2] on node cn21

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.

However, if put "source <path_of_intel_compiler>vars.sh" in the ~/.bashrc, then job works fine. But this is not the right way to do so.

But my question here is that, after passing -x LD_LIBRARY_PATH to mpirun command, why it is not able to find the "libimf.so" on all the nodes? Is this a bug with OpenMPI-5.0.6?

Thanks

Patrick Begou

unread,
Feb 14, 2025, 8:00:37 AMFeb 14
to us...@lists.open-mpi.org
> To unsubscribe from this group and stop receiving emails from it, send
> an email to users+un...@lists.open-mpi.org.


Hi Sangam,

the "-x" option propagate your LD_LIBRARY_PATH as it is set on the
execution node. So may be you need only to set "-x LD_LIBRARY_PATH"
after sourcing your <path_of_intel_compiler>vars.sh in your PBS script ?

Patrick (not using PBS but Slurm, sorry)

Sangam B

unread,
Feb 14, 2025, 1:00:58 PMFeb 14
to us...@lists.open-mpi.org
Hi Patrick,

Thanks for your reply. 
Ofcourse, the intel vars.sh is sourced inside the pbs script and I've tried multiple ways to resolve this issue:

-x LD_LIBRARY_PATH
&
-x LD_LIBRARY_PATH=/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}

And then copied the libimf.so to job's working directory and set
-x LD_LIBRARY_PATH=.:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}

But in any of the case it didn't work

Gilles Gouaillardet

unread,
Feb 14, 2025, 1:17:55 PMFeb 14
to us...@lists.open-mpi.org
Sangam,

-x LD_LIBRARY_PATH won't do the trick here.

mpirun spawns prted daemons on the other nodes (via the tm interface or whatever the latest PBS uses if support was built into Open MPI, or SSH otherwise), and the daemons fail to start because the intel runtime cannot be found.
you can chrpath -l prted in order to check if prted was built with rpath. if so, make sure the runtime is available at the same location.


An other option is to rebuild prrte with gcc compilers so it does not depend on the Intel runtime.

Cheers,

Gilles

Patrick Begou

unread,
Feb 14, 2025, 1:19:42 PMFeb 14
to us...@lists.open-mpi.org
Hi Sangam

could you check that the install location of the library is the same on all the nodes ?  May be checking LD_LIBRARY_PATH after sourcing the intel vars.sh file ?
I'm using OpenMPI 5.0.6 but in a Slurm context and it works fine.

Patrick

Patrick Begou

unread,
Feb 14, 2025, 1:23:23 PMFeb 14
to us...@lists.open-mpi.org
Bad answer, sorry I did not managed prted was part of OpenMPI stack.

Sangam B

unread,
Feb 14, 2025, 2:00:02 PMFeb 14
to us...@lists.open-mpi.org
Thanks Gilles & Patrick.

As Gilles mentioned, while OpenMPI spawns prted daemons on compute nodes, it fails to get launched, because Intel runtime is not available.

To resolve this issue, I loaded the Intel runtime before job submission on the terminal session and used #PBS -V in the job script.
Thus it got resolved.

Other solutions can be: 
(1) If OpenMPI is built with intel compilers, then use a static build [ link the intel libs statically].
(2) Or Build Open MPI with gcc compilers [OS default] and use OMPI_CC=icc etc

Thanks


Reply all
Reply to author
Forward
0 new messages