Safe Pickling

Heiko Wundram

unread,

May 23, 2003, 12:08:18 PM5/23/03

to

Hi All!

I've read about safe/unsafe pickling in the documentation of Python 2.2,
which states that you need to make sure that load_globals is zero in the
Pickler you get with cPickle.Pickler().

The 2.3 documentation doesn't state anything on that topic (at least it
didn't two days ago). Can I assume that default (and unchangeable)
behaviour now is to allow all pickles? (haven't tried out the 2.2
documentation hint yet... :))

Anyway, what I basically need is a module to pickle _basic_ Python
types, meaning string, int, float, list, tuple, dict, and boolean. As
the pickles that I get will be coming from a network, "unsafe" pickling
is not an option.

I've read about the XML-Pickler and several other attempts at pickling
Python objects that are called "safe", but I really don't want to use
any non-standard module.

I've tried implementing my own pickler in the last few days, but that
seems so much of an overhead for a little project, that I'd gratefully
request any hint that I can get...

Thanks in advance!

Heiko Wundram.

Paul Rubin

unread,

May 23, 2003, 10:12:17 PM5/23/03

to

It's really problematic. I've been using marshal and carefully
examining anything that comes back from the unmarshaller. That looks
to be pretty safe, but isn't promised to be portable between versions.

Alex Martelli

unread,

May 24, 2003, 10:56:55 AM5/24/03

to

Heiko Wundram wrote:
...

> Anyway, what I basically need is a module to pickle _basic_ Python
> types, meaning string, int, float, list, tuple, dict, and boolean. As

If that is all you need, then maybe module marshal, in the standard
Python library, may be sufficient?

Alex

Heiko Wundram

unread,

May 24, 2003, 12:16:54 PM5/24/03

to

On Sat, 2003-05-24 at 16:56, Alex Martelli wrote:
> If that is all you need, then maybe module marshal, in the standard
> Python library, may be sufficient?

Problem being that marshal explicitly states:

<quote>
Warning: The marshal module is not intended to be secure against
erroneous or maliciously constructed data. Never unmarshal data received
from an untrusted or unauthenticated source.
</quote>

I assume that you can cause e.g. a function or a module to be called if
you just send a .pyc file for unmarshalling...

I've started to create a stripped down pickler myself now, which just
pickles objects that are base Python objects; maybe this functionality
could be included in some future version of Python directly...

Heiko Wundram.

Paul Rubin

unread,

May 24, 2003, 6:25:09 PM5/24/03

to

Heiko Wundram <he...@ph0enix.homelinux.org> writes:
> <quote>
> Warning: The marshal module is not intended to be secure against
> erroneous or maliciously constructed data. Never unmarshal data received
> from an untrusted or unauthenticated source.
> </quote>

I've examined the marshal source code from Python 2.2 (or was it 2.1)
and didn't see any obvious ways that merely unmarshalling malicious
data could hurt you. The danger is in what you DO with the data once
you've unmarshalled it. I.e. the marshalled data could contain nasty
compiled bytecode that will clobber you if you run it. But the
unmarshaller itself doesn't run the code. You're left with the
responsibility of checking the stuff that comes back from the
unmarshaller and making sure it only contains what you expect.

There is, of course, a danger that some future version of the
unmarshaller could actually run the nasty code, or use a data format
incompatible with the current versions, so that two network peers
running different versions of Python couldn't interoperate via
marshalled objects. It's also possible that I missed something when I
checked the 2.2 sources. But the unmarshal code is much simpler than
the unpickle code and has fewer places to go wrong.