Hi everyone
I've built a weather forecast model - OpenIFS - in a container and successfully run it on a computer cluster. The model is far too heavy to run on a single CPU so parallelisation with MPI is a must and preferrably multi-threading with OpenMP as well.
I start with
Bootstrap: docker
From: almalinux:9
then I add the GNU C/C++/Fortran compilers etc. I also built OpenMPI with PMIx (same as the host) and some other dependencies.
Finally I build the model in the container.
It runs fine on multiple cores parallelised with MPI, but when I set OMP_NUM_THREADS=4 to also use multi-threading, it crashes.
The error from the system is
77: --------------------------------------------------------------------------
77: WARNING: Open MPI failed to TCP connect to a peer MPI process. This
77: should not happen.
77:
77: Your Open MPI job may now hang or fail.
77:
77: Local host: n36
77: PID: 143210
77: Message: connect() to 10.127.32.38:1031 failed
77: Error: Operation now in progress (115)
77: -------------------------------------------------------------------------
I also get this message from each MPI task:
85: --------------------------------------------------------------------------
85: By default, for Open MPI 4.0 and later, infiniband ports on a device
85: are not used by default. The intent is to use UCX for these devices.
85: You can override this policy by setting the btl_openib_allow_ib MCA parameter
85: to true.
85:
85: Local host: n36
85: Local adapter: hfi1_0
85: Local port: 1
85:
85: --------------------------------------------------------------------------
85: --------------------------------------------------------------------------
85: WARNING: There was an error initializing an OpenFabrics device.
85:
85: Local host: n36
85: Local device: hfi1_0
85: --------------------------------------------------------------------------
but it seems to just be a warning. If not multi-threaded, the model runs fine anyway.
Is it possible to run an executable in a container using both MPI and OpenMP parallelisation? Or is only MPI possible?
Many thanks for any help or suggestions
/Joakim