Using "pickle" for interprocess communication - some notes and things that ought to be documented.

John Nagle

unread,

Jan 17, 2008, 2:28:26 PM1/17/08

to

It's possible to use "pickle" for interprocess communication over
pipes, but it's not straightforward.

First, "pickle" output is self-delimiting.
Each dump ends with ".", and, importantly, "load" doesn't read
any characters after the "." So "pickle" can be used repeatedly
on the same pipe, and one can do repeated message-passing this way. This
is a useful, but undocumented, feature.

It almost works.

Pickle's "dump" function doesn't flush output after dumping, so
there's still some data left to be written. The sender has to
flush the underlying output stream after each call to "dump",
or the receiver will stall. The "dump" function probably ought to flush
its output file.

It's also necessary to call Pickle's "clear_memo" before each "dump"
call, since objects might change between successive "dump" calls.
"Unpickle" doesn't have a "clear_memo" function. It should, because
if you keep reusing the "Unpickle" object, the memo dictionary
fills up with old objects which can't be garbage collected.
This creates a memory leak in long-running programs.

Then, on Windows, there's a CR LF problem. This can be fixed by
launching the subprocess with

proc = subprocess.Popen(launchargs,
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
universal_newlines=True)

Failure to do this produces the useful error message "Insecure string pickle".
Binary "pickle" protocol modes won't work at all in this situation; "universal
newline" translation is compatible, not transparent. On Unix/Linux, this
just works, but the code isn't portable.

Incidentally, in the subprocess, it's useful to do

sys.stdout = sys.stderr

after setting up the Pickle objects. This prevents any stray print statements
from interfering with the structured Pickle output.

Then there's end of file detection. When "load" reaches an end of
file, it properly raises EOFError. So it's OK to do "load" after
"load" until EOFerror is raised.

"pickle" and "cPickle" seem to be interchangeable in this application,
so that works.

It's a useful way to talk to a subprocess, but you need to know all the
issues above to make it work.

John Nagle

Christian Heimes

unread,

Jan 17, 2008, 3:55:52 PM1/17/08

to pytho...@python.org

John Nagle wrote:
> It's possible to use "pickle" for interprocess communication over
> pipes, but it's not straightforward.

IIRC the processing module uses pickle for IPC. Maybe you can get some
idea by reading its code?

http://pypi.python.org/pypi/processing/0.40

Christian

Irmen de Jong

unread,

Jan 17, 2008, 7:24:57 PM1/17/08

to

So does Pyro: http://pyro.sourceforge.net/

However Pyro uses TCP-IP sockets for communication.

It uses a small header that contains the size of the message and a few other things,
and then the (binary by default) pickle stream.

--irmen

John Nagle

unread,

Jan 18, 2008, 1:32:25 AM1/18/08

to

Irmen de Jong wrote:
> Christian Heimes wrote:
>> John Nagle wrote:
>>> It's possible to use "pickle" for interprocess communication over
>>> pipes, but it's not straightforward.
>>
>> IIRC the processing module uses pickle for IPC. Maybe you can get some
>> idea by reading its code?
>>
>> http://pypi.python.org/pypi/processing/0.40
>>

"Processing" is useful, but it uses named pipes and sockets,
not ordinary pipes. Also, it has C code, so all the usual build
and version problems apply.

> So does Pyro: http://pyro.sourceforge.net/
>
> However Pyro uses TCP-IP sockets for communication.
>
> It uses a small header that contains the size of the message and a few
> other things, and then the (binary by default) pickle stream.

I'd thought I might have to add another layer of encapsulation to
delimit "pickled" sections, but it turns out that's not necessary.
So it doesn't take much code to do this, and it's all Python.
I may release this little module.

John Nagle

unread,

Jan 18, 2008, 12:54:46 PM1/18/08

to

John Nagle wrote:
> Irmen de Jong wrote:
>> Christian Heimes wrote:
>>> John Nagle wrote:
>>>> It's possible to use "pickle" for interprocess communication over
>>>> pipes, but it's not straightforward.

Another "gotcha". The "pickle" module seems to be OK with the
translations of "universal newlines" on Windows, but the "cPickle" module
is not. If I pickle

Exception("Test")

send it across the Windows pipe to the parent in universal newlines
mode, and read it with cPickle's

load()

function, I get

ImportError: No module named exceptions

If I read it with "pickle"'s "load()", it works. And if I read the input
one character at a time until I see ".", then feed that to cPickle's "loads()",
that works. So cPickle doesn't read the same thing Python does in "universal
newline" mode.

Is there any way within Python to get the pipe from a child process to the
parent to be completely transparent under Windows?

John Nagle

Carl Banks

unread,

Jan 18, 2008, 2:59:56 PM1/18/08

to

On Jan 17, 2:28 pm, John Nagle <na...@animats.com> wrote:
> It's possible to use "pickle" for interprocess communication over
> pipes, but it's not straightforward.
>
> First, "pickle" output is self-delimiting.
> Each dump ends with ".", and, importantly, "load" doesn't read
> any characters after the "." So "pickle" can be used repeatedly
> on the same pipe, and one can do repeated message-passing this way. This
> is a useful, but undocumented, feature.
>
> It almost works.
>
> Pickle's "dump" function doesn't flush output after dumping, so
> there's still some data left to be written. The sender has to
> flush the underlying output stream after each call to "dump",
> or the receiver will stall. The "dump" function probably ought to flush
> its output file.

But... you can also write multiple pickles to the same file.

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cPickle
>>> f = open('xxx.pkl','wb')
>>> cPickle.dump(1,f)
>>> cPickle.dump('hello, world',f)
>>> cPickle.dump([1,2,3,4],f)
>>> f.close()
>>> f = open('xxx.pkl','rb')
>>> cPickle.load(f)
1
>>> cPickle.load(f)
'hello, world'
>>> cPickle.load(f)
[1, 2, 3, 4]

An automatic flush would be very undesirable there. Best to let those
worrying about IPC to flush the output file themselves: which they
ought to be doing regardless (either by explicitly flushing or using
an unbuffered stream).

> It's also necessary to call Pickle's "clear_memo" before each "dump"
> call, since objects might change between successive "dump" calls.
> "Unpickle" doesn't have a "clear_memo" function. It should, because
> if you keep reusing the "Unpickle" object, the memo dictionary
> fills up with old objects which can't be garbage collected.
> This creates a memory leak in long-running programs.

This is all good to know. I agree that this is a good use case for a
clear_memo on a pickle unloader.

> Then, on Windows, there's a CR LF problem. This can be fixed by
> launching the subprocess with
>
> proc = subprocess.Popen(launchargs,
> stdin=subprocess.PIPE, stdout=subprocess.PIPE,
> universal_newlines=True)
>
> Failure to do this produces the useful error message "Insecure string pickle".
> Binary "pickle" protocol modes won't work at all in this situation; "universal
> newline" translation is compatible, not transparent. On Unix/Linux, this
> just works, but the code isn't portable.

I would think a better solution would be to use the -u switch to
launch the subprocess, or the PYTHONUNBUFFERED environment variable if
you want to invoke the Python script directly. It opens up stdin and
stdout in binary, unbuffered mode.

Using "univeral newlines" in a non-text format seems like it's not a
good idea.

For text-format pickles it'd be the right thing, of course.

> Incidentally, in the subprocess, it's useful to do
>
> sys.stdout = sys.stderr
>
> after setting up the Pickle objects. This prevents any stray print statements
> from interfering with the structured Pickle output.

Nice idea.

> Then there's end of file detection. When "load" reaches an end of
> file, it properly raises EOFError. So it's OK to do "load" after
> "load" until EOFerror is raised.
>
> "pickle" and "cPickle" seem to be interchangeable in this application,
> so that works.
>
> It's a useful way to talk to a subprocess, but you need to know all the
> issues above to make it work.

Thanks: this was an informative post

Carl Banks

John Nagle

unread,

Jan 18, 2008, 4:19:12 PM1/18/08

to

Carl Banks wrote:
> On Jan 17, 2:28 pm, John Nagle <na...@animats.com> wrote:

>
>> It's also necessary to call Pickle's "clear_memo" before each "dump"
>> call, since objects might change between successive "dump" calls.
>> "Unpickle" doesn't have a "clear_memo" function. It should, because
>> if you keep reusing the "Unpickle" object, the memo dictionary
>> fills up with old objects which can't be garbage collected.
>> This creates a memory leak in long-running programs.
>
> This is all good to know. I agree that this is a good use case for a
> clear_memo on a pickle unloader.

reader = pickle.Unpickler(self.datain) # set up reader
....
reader.memo = {} # no memory from cycle to cycle

>
>
>> Then, on Windows, there's a CR LF problem. This can be fixed by
>> launching the subprocess with
>>
>> proc = subprocess.Popen(launchargs,
>> stdin=subprocess.PIPE, stdout=subprocess.PIPE,
>> universal_newlines=True)
>>
>> Failure to do this produces the useful error message "Insecure string pickle".
>> Binary "pickle" protocol modes won't work at all in this situation; "universal
>> newline" translation is compatible, not transparent. On Unix/Linux, this
>> just works, but the code isn't portable.
>
> I would think a better solution would be to use the -u switch to
> launch the subprocess, or the PYTHONUNBUFFERED environment variable if
> you want to invoke the Python script directly. It opens up stdin and
> stdout in binary, unbuffered mode.

Ah. That works. I wasn't aware that "unbuffered" mode also implied
binary transparency. I did that, and now cPickle works in both text (0)
and binary (2) protocol modes. Turned off "Universal Newline" mode.

> Thanks: this was an informative post

Thanks. We have this working well now. After a while, I'll publish
the module, which is called "subprocesscall.py".

John Nagle

Paul Boddie

unread,

Jan 18, 2008, 5:54:05 PM1/18/08

to

On 18 Jan, 07:32, John Nagle <na...@animats.com> wrote:
>
> "Processing" is useful, but it uses named pipes and sockets,
> not ordinary pipes. Also, it has C code, so all the usual build
> and version problems apply.

The pprocess module uses pickles over sockets, mostly because the
asynchronous aspects of the communication only appear to work reliably
with sockets. See here for the code:

http://www.python.org/pypi/pprocess

Unlike your approach, pprocess employs the fork system call. In
another project of mine - jailtools - I use some of the pprocess
functionality with the subprocess module:

http://www.python.org/pypi/jailtools

I seem to recall that a few things are necessary when dealing with
subprocesses, especially those which employ the python executable:
running in unbuffered mode is one of those things.

Paul

John Nagle

unread,

Jan 19, 2008, 11:06:15 AM1/19/08

to

Paul Boddie wrote:

> Unlike your approach, pprocess employs the fork system call.

Unfortunately, that's not portable. Python's "fork()" is
"Availability: Macintosh, Unix." I would have preferred
to use "fork()".

John Nagle

Paul Boddie

unread,

Jan 19, 2008, 11:11:32 AM1/19/08

to

There was a discussion some time ago about providing a fork
implementation on Windows, since Cygwin attempts/attempted to provide
such support [1] and there's a Perl module which pretends to provide
fork (using threads if I recall correctly), but I'm not sure whether
anyone really believed that it was workable. I believe that on modern
releases of Windows it was the ZwCreateProcess function which was
supposed to be usable for this purpose, but you then apparently have
to add a bunch of other things to initialise the new process
appropriately.

Of course, for the purposes of pprocess - providing a multiprocess
solution which should be as easy to use as spawning threads whilst
having some shared, immutable state hanging around that you don't want
to think too hard about - having fork is essential, but if you're
obviously willing to split your program up into different components
then any of the distributed object technologies would be good enough.

Paul

[1] http://www.cygwin.com/ml/cygwin/2002-01/msg01826.html