I'm using mpi4py.MPI.Comm.scatter to scatter 96 numpy 2D arrays that get processed on different nodes. Almost always few of the nodes (changes every time) causes something similar to this:
Traceback (most recent call last):
File "run.py", line 67, in <module>
ws = comm.scatter(w,0)
File "Comm.pyx", line 767, in mpi4py.MPI.Comm.scatter (src/mpi4py_MPI.c:48619)
File "pickled.pxi", line 354, in mpi4py.MPI.PyMPI_scatter (src/mpi4py_MPI.c:20896)
File "pickled.pxi", line 84, in mpi4py.MPI._p_Pickler.load (src/mpi4py_MPI.c:17922)
File "pickled.pxi", line 27, in mpi4py.MPI.PyMPI_Load (src/mpi4py_MPI.c:17078)
cPickle.UnpicklingError: invalid load key, '¿'.
Timeout for rank 7 hostname 'node35'. Job is not finalized there.
Cleaning up all processes ...
Some rank on 'node24' exited without finalize.
done.
All other nodes work perfectly. Any ideas what is causing this?
Appreciate your help. Thanks,
Fadhel
CODE:
import rsf,numpy,pyct,window,unwindow,CreateThresh3D,bayesSolver
from mpi4py import MPI
from window import *
from unwindow import *
from CreateThresh3D import *
from bayesSolver import *
import os
comm = MPI.COMM_WORLD
W = WindowData(10,array([6,4,4]),array([20]))
w = []
n1 = 0
n2 = 0
n3 = 0
prims ="b1_3Dp.rsf"
mults ="b2_3Dp.rsf"
truep ="true.rsf"
niter = 1
mu = 1.2
lamb1 = 1.2
lamb2 = 0.8
x=[]
wsize =[];
if comm.Get_rank()==0:
.......code that gets arrays
S = Bayes(niter,mu,lamb1,lamb2)
if comm.Get_rank() == 0:
ws = comm.scatter(w,0)
A = fdct3(ws[0].shape,6,8,False)
if comm.Get_rank()>=0: #will be removed later
print("solving...")
x1,x2 = S.solve(A,ws[0],ws[1])
print("solved!")
x1 = x1;
x2 = x2;
x = [x1,x2]
g = comm.gather(x,0)
................. more unrelated code
MPI.Finalize()