Running a container with OpenMP and MPI

13 views
Skip to first unread message

Joakim Kjellsson

unread,
Jan 13, 2026, 10:55:22 AM (3 days ago) Jan 13
to discuss
Hi everyone

I've built a weather forecast model - OpenIFS - in a container and successfully run it on a computer cluster. The model is far too heavy to run on a single CPU so parallelisation with MPI is a must and preferrably multi-threading with OpenMP as well. 

I start with 
Bootstrap: docker
From: almalinux:9 
then I add the GNU C/C++/Fortran compilers etc. I also built OpenMPI with PMIx (same as the host) and some other dependencies. 
Finally I build the model in the container. 

It runs fine on multiple cores parallelised with MPI, but when I set OMP_NUM_THREADS=4 to also use multi-threading, it crashes. 
The error from the system is 
77: --------------------------------------------------------------------------
77: WARNING: Open MPI failed to TCP connect to a peer MPI process.  This
77: should not happen.
77: 
77: Your Open MPI job may now hang or fail.
77: 
77:   Local host: n36
77:   PID:        143210
77:   Message:    connect() to 10.127.32.38:1031 failed
77:   Error:      Operation now in progress (115)
77: -------------------------------------------------------------------------

I also get this message from each MPI task: 
85: --------------------------------------------------------------------------
85: By default, for Open MPI 4.0 and later, infiniband ports on a device
85: are not used by default.  The intent is to use UCX for these devices.
85: You can override this policy by setting the btl_openib_allow_ib MCA parameter
85: to true.
85: 
85:   Local host:              n36
85:   Local adapter:           hfi1_0
85:   Local port:              1
85: 
85: --------------------------------------------------------------------------
85: --------------------------------------------------------------------------
85: WARNING: There was an error initializing an OpenFabrics device.
85: 
85:   Local host:   n36
85:   Local device: hfi1_0
85: --------------------------------------------------------------------------
but it seems to just be a warning. If not multi-threaded, the model runs fine anyway. 

Is it possible to run an executable in a container using both MPI and OpenMP parallelisation? Or is only MPI possible? 

Many thanks for any help or suggestions
/Joakim 

Chris Hines

unread,
Jan 13, 2026, 5:16:55 PM (2 days ago) Jan 13
to Joakim Kjellsson, discuss
Hi Joakim,
I'm not an expert, and TBH by the time I had a MPI code and expected people to use it, I would probably stop using containerised solutions because the interface between the kernel and the mpi libs is just too complicated. In theory it should work, in practice I found it more effort than it was worth.

Likewise OpenMP + MPI should work, but I would ask: why? I know CPU architecture has changed a bit since I last played with this, but in my experience, running MPI with more ranks, and each rank pinned to a core performs better than one MPI rank per node and multiple OMP threads. The MPI model just allows for better control and cache coherency (I think I'm talking about the L2 cache in particular)

The obvious exception to this is if your finite element mesh or grid is "adaptive" in which case it may not be possible to distribute work evenly to each rank, and having "larger" ranks with threads might be more efficient.

It's also worth running perf to look at your hardware counters and keeping an eye on things like the kernel settings for hugepages. I've personally been pinched by having code run a 1/2 speed on a VM vs a hypervisor because the VM did not have hugepages enabled resulting in massively more TLB misses. Once the VM also had hugepages enabled it ran at near native speed. We only figured out what was going on by using perf and noting that the TLB misses were massively higher on the VM than the hypervisor. (yes I realise we are in the container mailing list rather than the virtualisation mailing list, just thought you would appreciate the anecdote :-P)

Regards,
--
Dr Chris Hines
Senior Research DevOps Engineer

Monash University
Monash eResearch Centre
15 Innovation Walk
Clayton campus, VIC 3800
Australia


We acknowledge and pay respects to the
Elders and Traditional Owners of the land on
which our four Australian campuses stand.
Information for Indigenous Australian



--
You received this message because you are subscribed to the Google Groups "discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@apptainer.org.
To view this discussion visit https://groups.google.com/a/apptainer.org/d/msgid/discuss/963dc3af-eb7a-4625-9028-67abcf947bcen%40apptainer.org.

Dave Dykstra

unread,
Jan 14, 2026, 10:49:44 AM (2 days ago) Jan 14
to Joakim Kjellsson, discuss
Hi Joakim,

I don't know much about MPI or OpenMP but I just want to make sure you have discovered the Apptainer documentation page about MPI
https://apptainer.org/docs/user/latest/mpi.html
especially the section on `--sharens`.

Dave
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@apptainer.org<mailto:discuss+u...@apptainer.org>.
> To view this discussion visit https://groups.google.com/a/apptainer.org/d/msgid/discuss/963dc3af-eb7a-4625-9028-67abcf947bcen%40apptainer.org<https://groups.google.com/a/apptainer.org/d/msgid/discuss/963dc3af-eb7a-4625-9028-67abcf947bcen%40apptainer.org?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages