Segmentation fault caused by "import numpy as np; from mpi4py import MPI"

11 views
Skip to first unread message

chri...@grothesque.org

unread,
May 11, 2021, 11:19:15 AMMay 11
to mpi4py

Hello,

We have been using mpi4py without problems for many years on our old cluster that was running subsequent versions of Debian stable.  It's currently running "stretch", i.e. oldstable.

Recently, we installed a new cluster with the current Debian stable "buster".  And we have been experiencing a strange problem for which we have not been able to find a solution.  Perhaps someone here has an idea?

Running the command
mpiexec -n 31 python3 -c "import numpy as np; from mpi4py import MPI"
fails with a segmentation fault (see the attached log of outpout to stderr).

Strangely, the above problem does not occur when -n 30 or smaller numbers is given.  The problem starts with -n 31 on both the frontend (32 cores) and the compute nodes (64 cores).  The problem does not occur when numpy is not imported, or when it's imported after mpi4py.

The above happens with mpi4py 2.0.0 as included in Debian buster.  We also tried latest mpi4py: the problem persists.

I searched on this list for possible solutions.  One thing I tried was
mpiexec -n 31 python3 -c "import numpy as np; import mpi4py; mpi4py.rc.threads = False; from mpi4py import MPI"
but this does not help.

Any ideas?

Christoph
err.log

Ivan Raikov

unread,
May 11, 2021, 11:28:26 AMMay 11
to mpi4py
I don't know if this is related, but I have a Python C++ extension that loads numpy and mpi4py via the C interface, and I found that import_mpi4py must be invoked before import_array(), otherwise various initialization problems with MPI occurred. You mention that this problem does not occur when importing mpi4py first, so this seems to be consistent with my experience.

Christoph Groth

unread,
May 11, 2021, 4:13:02 PMMay 11
to Ivan Raikov, mpi4py
Ivan Raikov wrote:

> I don't know if this is related, but I have a Python C++ extension
> that loads numpy and mpi4py via the C interface, and I found that
> import_mpi4py must be invoked before import_array(), otherwise various
> initialization problems with MPI occurred. You mention that this
> problem does not occur when importing mpi4py first, so this seems to
> be consistent with my experience.

Sounds like this could be related. The various problems you mention can
lead to a segmentation fault in our case. Is any one aware of a bug
report anywhere for this problem? (And is mpi4py or OpenMPI to blame?)

By the way we do not experience this problem on the older cluster that
runs Debian stretch.

Yury V. Zaytsev

unread,
May 11, 2021, 4:23:07 PMMay 11
to mpi4py, Ivan Raikov
On Tue, 11 May 2021, Christoph Groth wrote:

> Sounds like this could be related. The various problems you mention can
> lead to a segmentation fault in our case. Is any one aware of a bug
> report anywhere for this problem? (And is mpi4py or OpenMPI to blame?)
>
> By the way we do not experience this problem on the older cluster that
> runs Debian stretch.

Sorry, I didn't have a look at the log, but it's it the good old
RTLD_GLOBAL gag?

You can comment it out if you're lucky in that, either you don't need MPI
plugins or you happen to have an MPI distribution whose maintainers
actually know a thing or two about dynamic linking...

--
Sincerely yours,
Yury V. Zaytsev

Lisandro Dalcin

unread,
May 12, 2021, 2:47:25 AMMay 12
to mpi...@googlegroups.com
On Tue, 11 May 2021 at 18:19, chri...@grothesque.org <chri...@grothesque.org> wrote:

The problem does not occur when numpy is not imported, or when it's imported after mpi4py.

Is this a NumPy build using MKL?
 
The above happens with mpi4py 2.0.0 as included in Debian buster.  We also tried latest mpi4py: the problem persists.

This is most likely not an mpi4py problem. The issues occurs deep down in MPI_Init(), and mpi4py had had no chance yet to do anything.
 
I searched on this list for possible solutions. 

Can you please set the following envvar before running: ?

export PMIX_MCA_gds=hash

PS: it is a workaround, not a fix. Search about it in the Open MPI issues tracker.

--
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

Christoph Groth

unread,
May 15, 2021, 4:49:57 PMMay 15
to mpi...@googlegroups.com
Lisandro Dalcin wrote:
> On Tue, 11 May 2021 at 18:19, chri...@grothesque.org wrote:
> >
> > The problem does not occur when numpy is not imported, or when it's
> > imported after mpi4py.
>
> Is this a NumPy build using MKL?

No, it’s using OpenBLAS. Debian does include MKL, but only in its
"nonfree" section, so regular packages cannot depend on it.

> Can you please set the following envvar before running: ?
>
> export PMIX_MCA_gds=hash
>
> PS: it is a workaround, not a fix. Search about it in the Open MPI issues
> tracker.

This indeed makes the error message disappear. Many thanks for the
pointer, I’ll read up on it.

Cheers
Christoph
Reply all
Reply to author
Forward
0 new messages