Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TypeError: can't pickle HASH objects?

103 views
Skip to first unread message

est

unread,
Oct 1, 2008, 3:50:05 PM10/1/08
to
>>> import md5
>>> a=md5.md5()
>>> import pickle
>>> pickle.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python25\lib\pickle.py", line 1366, in dumps
Pickler(file, protocol).dump(obj)
File "C:\Python25\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python25\lib\pickle.py", line 306, in save
rv = reduce(self.proto)
File "C:\Python25\lib\copy_reg.py", line 69, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle HASH objects

Why can't I pickle a md5 object? Is it because md5 algorithm needs to
read 512-bits at a time?

I need to md5() some stream, pause(python.exe quits), and resume
later. It seems that the md5 and hashlib in std module could not be
serialized?

Do I have to implement md5 algorithm again for this special occasion?

Or is there anyway to assige a digest when creating md5 objects?

mdsh...@gmail.com

unread,
Oct 1, 2008, 4:49:24 PM10/1/08
to

I'm sure some of the regulars can correct me if I'm wrong, but looking
at the source code, it seems that this is the error that you'll see if
the object doesn't explicitly support pickling, or possibly isn't
composed of objects that do.

Examining the md5 and hashlib source files, it seems that they rely on
C implementations, and so have internal states opaque to Python. If
you feel confident, you could write your own MD5 class that would have
methods to dump and restore state, but I think you're out of luck when
it comes to the official module.

Mark Sherry

Aaron "Castironpi" Brady

unread,
Oct 1, 2008, 11:27:05 PM10/1/08
to

Can you just pickle the stream, the part of it you've read so far?

est

unread,
Oct 2, 2008, 1:34:42 AM10/2/08
to
On Oct 2, 11:27 am, "Aaron \"Castironpi\" Brady"
> Can you just pickle the stream, the part of it you've read so far?- Hide quoted text -
>
> - Show quoted text -

wow. It's giga-size file. I need stream reading it, md5 it. It may
break for a while.

James Mills

unread,
Oct 2, 2008, 1:51:47 AM10/2/08
to est, pytho...@python.org
On Thu, Oct 2, 2008 at 3:34 PM, est <electr...@gmail.com> wrote:
> wow. It's giga-size file. I need stream reading it, md5 it. It may
> break for a while.

So use generators and consume the stream ?

--JamesMills

--
--
-- "Problems are solved by method"

est

unread,
Oct 2, 2008, 3:44:12 AM10/2/08
to
On Oct 2, 1:51 pm, "James Mills" <prolo...@shortcircuit.net.au> wrote:

> On Thu, Oct 2, 2008 at 3:34 PM, est <electronix...@gmail.com> wrote:
> > wow. It's giga-size file. I need stream reading it, md5 it. It may
> > break for a while.
>
> So use generators and consume the stream ?
>
> --JamesMills
>
> --
> --
> -- "Problems are solved by method"

no, I need to serialize half-finished digest, not file stream.

Anyone got solution?

Aaron "Castironpi" Brady

unread,
Oct 2, 2008, 4:22:52 AM10/2/08
to

I am looking at '_hashopenssl.c'. If you can find the implementation
of EVP_DigestUpdate, I'll give it a shot to help you write a ctypes
hack to store and write its state.

est

unread,
Oct 2, 2008, 5:03:13 AM10/2/08
to
On Oct 2, 4:22 pm, "Aaron \"Castironpi\" Brady" <castiro...@gmail.com>
wrote:
> hack to store and write its state.- Hide quoted text -

>
> - Show quoted text -


http://cvs.openssl.org/fileview?f=openssl/crypto/evp/digest.c

int EVP_DigestUpdate(EVP_MD_CTX *ctx, const void *data,
size_t count)
{
#ifdef OPENSSL_FIPS
FIPS_selftest_check();
#endif
return ctx->digest->update(ctx,data,count);
}


is this one?

Gabriel Genellina

unread,
Oct 2, 2008, 5:07:09 AM10/2/08
to pytho...@python.org
En Wed, 01 Oct 2008 16:50:05 -0300, est <electr...@gmail.com> escribiᅵ:

>>>> import md5
>>>> a=md5.md5()
>>>> import pickle
>>>> pickle.dumps(a)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Python25\lib\pickle.py", line 1366, in dumps
> Pickler(file, protocol).dump(obj)
> File "C:\Python25\lib\pickle.py", line 224, in dump
> self.save(obj)
> File "C:\Python25\lib\pickle.py", line 306, in save
> rv = reduce(self.proto)
> File "C:\Python25\lib\copy_reg.py", line 69, in _reduce_ex
> raise TypeError, "can't pickle %s objects" % base.__name__
> TypeError: can't pickle HASH objects
>
> Why can't I pickle a md5 object? Is it because md5 algorithm needs to
> read 512-bits at a time?
>
> I need to md5() some stream, pause(python.exe quits), and resume
> later. It seems that the md5 and hashlib in std module could not be
> serialized?

Yep, they're implemented in C and have no provision for serializing.
If you can use the old _md5 module, it is far simpler to serialize; a
md5object just contains a small struct with 6 integers and 64 chars, no
pointers.

With some help from ctypes (and a lot of black magic!) one can extract the
desired state, and restore it afterwards:

--- begin code ---
import _md5
import ctypes

assert _md5.MD5Type.__basicsize__==96

def get_md5_state(m):
if type(m) is not _md5.MD5Type:
raise TypeError, 'not a _md5.MD5Type instance'
return ctypes.string_at(id(m)+8, 88)

def set_md5_state(m, state):
if type(m) is not _md5.MD5Type:
raise TypeError, 'not a _md5.MD5Type instance'
if not isinstance(state,str):
raise TypeError, 'state must be str'
if len(state)!=88:
raise ValueError, 'len(state) must be 88'
a88 = ctypes.c_char*88
pstate = a88(*list(state))
ctypes.memmove(id(m)+8, ctypes.byref(pstate), 88)

--- end code ---

py> m1 = _md5.new()
py> m1.update("this is a ")
py> s = get_md5_state(m1)
py> del m1
py>
py> m2 = _md5.new()
py> set_md5_state(m2, s)
py> m2.update("short test")
py> print m2.hexdigest()
95ad1986e9a9f19615cea00b7a44b912
py> print _md5.new("this is a short test").hexdigest()
95ad1986e9a9f19615cea00b7a44b912

The code above was only tested with Python 2.5.2 on Windows, not more than
you can see. It might or might not work with other versions or platforms.
It may even create a (small) black hole and eat your whole town. Use at
your own risk.

--
Gabriel Genellina

Aaron "Castironpi" Brady

unread,
Oct 2, 2008, 5:19:07 AM10/2/08
to

est

unread,
Oct 2, 2008, 6:06:38 AM10/2/08
to
On Oct 2, 5:07 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> wrote:
> Gabriel Genellina- Hide quoted text -

>
> - Show quoted text -

WOW! I never expected python could be coded like that! Thanks a lot!


On Oct 2, 5:19 pm, "Aaron \"Castironpi\" Brady" <castiro...@gmail.com>

> http://www.google.com/codesearch?hl=en&q=struct+EVP_MD_CTX+show:mV3VB...
>
> But does Gabriel's work for you?- Hide quoted text -


>
> - Show quoted text -

Still need some hack with py2.5 on linux. Maybe I just need soft link /
usr/lib/python2.4/lib-dynload/md5.so to py2.5 :-)
py2.4 is pre-installed on most of the servers, I think.

0 new messages