Odd mpi error

2,125 views
Skip to first unread message

Tom Pace

unread,
Oct 22, 2020, 1:32:46 PM10/22/20
to singularity

I have an image I've been using successfully on a few different hosts. However, there is one host where I get mpi errors from within singularity.

Note that other mpi applications work fine on this host outside of singularity, so it seems mpi is correctly configured on the host itself.

I can successfully run "mpirun -n 2 whoami" from inside the singularity image, and it works as expected.
However, when I try any other applications that use the mpi (without mpirun, for example gmsh and fenics), I consistently get the following group of messages:

--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
  Local host:            myhost
  Device name:           i40iw0
  Device vendor ID:      0x8086
  Device vendor part ID: 14290Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
  Local host:           myhost
  Local device:         i40iw0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.The process that invoked fork was:

Local host:          [[12513,1],0] (PID 203581)

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[myhost:203617] 1 more process has sent help message help-mpi-btl-openib.txt / no device params found
[myhost:203617] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[myhost:203617] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port

Here is one additional piece of information, which may or may not be relevant: to use mpi outside of singularity, an environment module must be loaded. Of course, from inside the image I can't load that external module. But the whoami command above works anyway, so maybe the external module doesn't matter.

Any suggestions?

Thanks,
Tom Pace

Chris Wood

unread,
Oct 22, 2020, 1:41:04 PM10/22/20
to singu...@lbl.gov
Hi Tom,

what does you mpirun command look like? I've had to bind libibverbs.d, and specify the mca parameters:

mpirun n 8 --mca orte_base_help_aggregate 0 --mca btl_vader_single_copy_mechanim none --mca btl ^sm --mca btl_openib_allow_ib true --bind-to core singularity exec -B /etc/libibverbs.d my_container.sif /path/to/executable

Cheers
Chris

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/c0094f80-78a6-46fd-8c8a-3272a8fd856fn%40lbl.gov.

Tom Pace

unread,
Oct 22, 2020, 3:01:48 PM10/22/20
to singularity, Chris Wood
The commands that give errors don't involve mpirun. For gmsh, I get this error with "gmsh --version", and for fenics I get it from a python prompt where I type "import fenics". It's the same error in both cases, and both of these applications do use mpi internally. So it seems the mca parameters need to be set somewhere globally for them to work.

Is it possible that the environment module ("module load openmpi") is what would set those parameters ordinarily?

If so, does anyone know of any way to load an external module from within the image?

Chris Wood

unread,
Oct 22, 2020, 5:33:17 PM10/22/20
to Tom Pace, singularity
Good question.... module load <module> can run whatever the sysadmins want it to - I've no experience with what 'module load openmpi' specifically would normally set - on the cluster I'm currently logged into, module load openmpi (v4.0.3) doesn't seem to set any of mca parameters. What version of openmpi have you got installed inside the container vs what's on the host?

Tom Pace

unread,
Oct 23, 2020, 11:01:20 AM10/23/20
to singularity, Chris Wood, singularity, Tom Pace
The host is running openmpi version 1.10.7, and the container has version 2.1.1.

In looking into setting mca parameters, I discovered the "ompi_info" command.
This shows lots of mca parameters. And it works on the host natively, and from inside the container as well. So it seems like the parameters are indeed being set. I'm guessing this means the module probably isn't the issue.

Is it possible there is some kind of conflict between the host and the container?

Adrian Jackson

unread,
Oct 23, 2020, 12:25:32 PM10/23/20
to singu...@lbl.gov
Definitely there is the chance for a conflict between the host and container. I think the key line in the documentation is " The MPI in the container must be compatible with the version of MPI available on the host." and " Since the MPI implementation in the container must be compliant with the version available on the system, a standard approach is to build your own MPI container, including the target MPI implementation." Is the version of OpenMPI different between the different hosts you're using?

Tom Pace

unread,
Oct 23, 2020, 4:14:45 PM10/23/20
to singularity, Adrian Jackson
Yes, the hosts each have different OpenMPI versions. I'm running simulations on various systems at two different universities (managed by 3 different IT groups) and my own personal hardware as well. I don't have the administrative privileges needed to build singularity images on some of these systems, including the one I've been discussing here. I was also hoping to provide a singularity recipe along with the published simulation results, to facilitate reproduction by other groups if desired.

Given all that, I'd prefer to avoid tweaking the recipe for each set of hardware. I'd prefer to figure out what settings the container OpenMPI needs once it's running. But if recipe changes are the only way to get this to work, I'm open to attempting it.

Here's some new information: I've been trying to figure out where on the host openmpi the "device" in question is defined. It doesn't seem t be part of the ini file where the device parameters are usually stored. So it's not clear to me where the host openmpi is getting those settings from. But if I could figure that out, maybe I could manually copy those device properties into the container.

The device in question seems to be an Intel InfiniBand connection (that's the i40iw in the error message). Intel's readme for the hardware talks about OpenMPI settings, but only for a newer OpenMPI version than either the host or the container, and I don't really understand it on top of that.

Adrian Jackson

unread,
Oct 23, 2020, 4:29:51 PM10/23/20
to Tom Pace, singularity
I'd have to defer to the Singularity team on the OpenMPI version issue. It used to be that you needed to exactly match the OpenMPI versions between host and container, but I know that restriction has been relaxed. I guess one option would be to install 3 different OpenMPI versions in the container, one matching the specific OpenMPI versions on the hosts, and the choose at run time which OpenMPI version to use in the container based on which host you're on (i.e. how you invoke the container from the host). I've not fully thought through the mechanics of this one, but I think it _should_ be possible.

Reply all
Reply to author
Forward
0 new messages