cPickle.UnpicklingError: invalid load key

3,713 views
Skip to first unread message

Fadhel Al-Hashim

unread,
Jul 10, 2009, 2:03:52 AM7/10/09
to mpi...@googlegroups.com
Hi,

I'm using mpi4py.MPI.Comm.scatter to scatter 96 numpy 2D arrays that get processed on different nodes. Almost always few of the nodes (changes every time) causes something similar to this:

Traceback (most recent call last):
  File "run.py", line 67, in <module>
    ws = comm.scatter(w,0)
  File "Comm.pyx", line 767, in mpi4py.MPI.Comm.scatter (src/mpi4py_MPI.c:48619)
  File "pickled.pxi", line 354, in mpi4py.MPI.PyMPI_scatter (src/mpi4py_MPI.c:20896)
  File "pickled.pxi", line 84, in mpi4py.MPI._p_Pickler.load (src/mpi4py_MPI.c:17922)
  File "pickled.pxi", line 27, in mpi4py.MPI.PyMPI_Load (src/mpi4py_MPI.c:17078)
cPickle.UnpicklingError: invalid load key, '¿'.
Timeout for rank 7 hostname 'node35'. Job is not finalized there.
Cleaning up all processes ...
Some rank on 'node24' exited without finalize.
done.


All other nodes work perfectly. Any ideas what is causing this?

Appreciate your help. Thanks,

Fadhel





CODE:

import rsf,numpy,pyct,window,unwindow,CreateThresh3D,bayesSolver
from mpi4py import MPI
from window import *
from unwindow import *
from CreateThresh3D import * 
from bayesSolver import *
import os


comm = MPI.COMM_WORLD


W = WindowData(10,array([6,4,4]),array([20]))
w = []
n1 = 0
n2 = 0
n3 = 0
prims ="b1_3Dp.rsf"
mults ="b2_3Dp.rsf"
truep ="true.rsf"
niter = 1
mu  = 1.2
lamb1 = 1.2
lamb2 = 0.8
x=[]
wsize =[];

if comm.Get_rank()==0:

 
 .......code that gets arrays
         

S = Bayes(niter,mu,lamb1,lamb2)

if comm.Get_rank() == 0:

ws = comm.scatter(w,0)

A = fdct3(ws[0].shape,6,8,False)    

if comm.Get_rank()>=0:      #will be removed later
        
print("solving...")
x1,x2 = S.solve(A,ws[0],ws[1])
print("solved!")

x1 = x1;
x2 = x2;

x = [x1,x2]

g = comm.gather(x,0)

................. more unrelated code
        
MPI.Finalize()

Lisandro Dalcin

unread,
Jul 10, 2009, 2:00:08 PM7/10/09
to mpi...@googlegroups.com
I've never ever got such failues... It seems like your nodes are
getting garbage, then cPickle cannot reconstruct the object.

0) Are you using the latest release of mpi4py? What MPI implementation
are you using?

1) What kind of network do your cluster have? It could be some kind of
network failure...

2) Are your arrays very large? Because of MPI, mpi4py cannot manage
more than 2GB in a single "transaction" . This means that adding up
all your pickled arrays are the root, then cannot sum more than 2GB.

3) Do your nodes have ALL the same Python version and NumPy version?

4) Could you try to use a loop using send() at the root, and making a
recv() at the workers?

BTW, remove the finall call to MPI.Finalize(). mpi4py handles MPI
initialization/finalization for you. You should not call
MPI.Finalize() unless you really need it and are really sure what you
are doing.


Finally, could you write a simpler script, making some random arrays,
in such a way that you get that failures, and sent it to me for
testing on my side ?
--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

Fadhel

unread,
Jul 12, 2009, 2:14:11 AM7/12/09
to mpi4py
Sorry everyone, my bad!

I was scattering unclosed files.....

Everything is working fine now.

THANK YOU SO MUCH

Lisandro Dalcin

unread,
Jul 12, 2009, 7:29:49 PM7/12/09
to mpi...@googlegroups.com
On Sun, Jul 12, 2009 at 3:14 AM, Fadhel<fad...@gmail.com> wrote:
>
> Sorry everyone, my bad!
>
> I was scattering unclosed files.....
>
> Everything is working fine now.
>
> THANK YOU SO MUCH
>

What do you mean by "scattering unclosed files" ??

Fadhel

unread,
Jul 12, 2009, 9:22:43 PM7/12/09
to mpi4py
I was scattering a list of file names that are going to be read and
used in the other nodes. These files were actually created and written
in my code and I forgot to close the file stream.

At least once I closed the streams, everything was working fine.

Thanks again!

toila...@gmail.com

unread,
Sep 18, 2016, 4:11:03 AM9/18/16
to mpi4py
Hi, I am using mpi4py 2.0.0. I have the "UnpicklingError: invalid load key" when I use "comm.isend()". If I change to "comm.send()", everything works fine. Can you have a look at the attachments to see any error?

Thanks,
Tung.

Vào 20:00:08 UTC+2 Thứ Sáu, ngày 10 tháng 7 năm 2009, Lisandro Dalcin đã viết:
bugs.pbs
bugs.py

Lisandro Dalcin

unread,
Sep 18, 2016, 4:16:12 AM9/18/16
to mpi4py

On 15 September 2016 at 19:02, <toila...@gmail.com> wrote:
Hi, I am using mpi4py 2.0.0. I have the "UnpicklingError: invalid load key" when I use "comm.isend()". If I change to "comm.send()", everything works fine. Can you have a look at the attachments to see any error?


The "isend()" method returns a request object, you have to store these request instances somewhere and eventually wait for completion, i.e., something like:

requests = []

for ...:
    req = comm.isend(...)
    requests.append(req)

MPI.Request.Waitall(requests)




--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459

toila...@gmail.com

unread,
Sep 21, 2016, 11:07:54 AM9/21/16
to mpi4py
It works for me now, but can you briefly explain the underlining logic where I eventually have to wait for completion? Is there anything related to garbage collection?

Vào 10:16:12 UTC+2 Chủ Nhật, ngày 18 tháng 9 năm 2016, Lisandro Dalcin đã viết:

趙睿

unread,
Jul 6, 2017, 2:45:34 PM7/6/17
to mpi4py
I think it is related to garbage collection, because one of the problem (pickle garbage issue) I was experiencing disappeared just by storing all requests (but not only the latest ones).

Actually I wasted many days on that "bug"... Maybe the document could state this more clearly...

在 2016年9月21日星期三 UTC+1下午4:07:54,Tung Vu Xuan写道:

Lisandro Dalcin

unread,
Jul 7, 2017, 2:55:46 PM7/7/17
to mpi4py
On 21 September 2016 at 18:07, <toila...@gmail.com> wrote:
> It works for me now, but can you briefly explain the underlining logic where
> I eventually have to wait for completion?

This requirement is explicitly stated in the MPI standard. Every MPI
book or tutorial out there should have comments on this requirement.

> Is there anything related to
> garbage collection?
>

There are some connections. The MPI.Request instance holds a reference
to the memory buffer being communicated. GC deallocates memory, if
that happens before completion, you risk SEGFAULTing or communicating
garbage.

Lisandro Dalcin

unread,
Jul 7, 2017, 3:06:37 PM7/7/17
to mpi4py
On 6 July 2017 at 21:45, 趙睿 <renyu...@gmail.com> wrote:
>
> Actually I wasted many days on that "bug"... Maybe the document could state
> this more clearly...
>

The documentation is quite lacking, but unfortunately I don't have the
resources to spend more time on it. Do you have any clue who much time
I "wasted" developing/maintaining mpi4py over more than 10 years?
Reply all
Reply to author
Forward
0 new messages