Why can't I pickle a md5 object? Is it because md5 algorithm needs to
read 512-bits at a time?
I need to md5() some stream, pause(python.exe quits), and resume
later. It seems that the md5 and hashlib in std module could not be
serialized?
Do I have to implement md5 algorithm again for this special occasion?
Or is there anyway to assige a digest when creating md5 objects?
I'm sure some of the regulars can correct me if I'm wrong, but looking
at the source code, it seems that this is the error that you'll see if
the object doesn't explicitly support pickling, or possibly isn't
composed of objects that do.
Examining the md5 and hashlib source files, it seems that they rely on
C implementations, and so have internal states opaque to Python. If
you feel confident, you could write your own MD5 class that would have
methods to dump and restore state, but I think you're out of luck when
it comes to the official module.
Mark Sherry
Can you just pickle the stream, the part of it you've read so far?
wow. It's giga-size file. I need stream reading it, md5 it. It may
break for a while.
So use generators and consume the stream ?
--JamesMills
--
--
-- "Problems are solved by method"
no, I need to serialize half-finished digest, not file stream.
Anyone got solution?
I am looking at '_hashopenssl.c'. If you can find the implementation
of EVP_DigestUpdate, I'll give it a shot to help you write a ctypes
hack to store and write its state.
http://cvs.openssl.org/fileview?f=openssl/crypto/evp/digest.c
int EVP_DigestUpdate(EVP_MD_CTX *ctx, const void *data,
size_t count)
{
#ifdef OPENSSL_FIPS
FIPS_selftest_check();
#endif
return ctx->digest->update(ctx,data,count);
}
is this one?
>>>> import md5
>>>> a=md5.md5()
>>>> import pickle
>>>> pickle.dumps(a)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Python25\lib\pickle.py", line 1366, in dumps
> Pickler(file, protocol).dump(obj)
> File "C:\Python25\lib\pickle.py", line 224, in dump
> self.save(obj)
> File "C:\Python25\lib\pickle.py", line 306, in save
> rv = reduce(self.proto)
> File "C:\Python25\lib\copy_reg.py", line 69, in _reduce_ex
> raise TypeError, "can't pickle %s objects" % base.__name__
> TypeError: can't pickle HASH objects
>
> Why can't I pickle a md5 object? Is it because md5 algorithm needs to
> read 512-bits at a time?
>
> I need to md5() some stream, pause(python.exe quits), and resume
> later. It seems that the md5 and hashlib in std module could not be
> serialized?
Yep, they're implemented in C and have no provision for serializing.
If you can use the old _md5 module, it is far simpler to serialize; a
md5object just contains a small struct with 6 integers and 64 chars, no
pointers.
With some help from ctypes (and a lot of black magic!) one can extract the
desired state, and restore it afterwards:
--- begin code ---
import _md5
import ctypes
assert _md5.MD5Type.__basicsize__==96
def get_md5_state(m):
if type(m) is not _md5.MD5Type:
raise TypeError, 'not a _md5.MD5Type instance'
return ctypes.string_at(id(m)+8, 88)
def set_md5_state(m, state):
if type(m) is not _md5.MD5Type:
raise TypeError, 'not a _md5.MD5Type instance'
if not isinstance(state,str):
raise TypeError, 'state must be str'
if len(state)!=88:
raise ValueError, 'len(state) must be 88'
a88 = ctypes.c_char*88
pstate = a88(*list(state))
ctypes.memmove(id(m)+8, ctypes.byref(pstate), 88)
--- end code ---
py> m1 = _md5.new()
py> m1.update("this is a ")
py> s = get_md5_state(m1)
py> del m1
py>
py> m2 = _md5.new()
py> set_md5_state(m2, s)
py> m2.update("short test")
py> print m2.hexdigest()
95ad1986e9a9f19615cea00b7a44b912
py> print _md5.new("this is a short test").hexdigest()
95ad1986e9a9f19615cea00b7a44b912
The code above was only tested with Python 2.5.2 on Windows, not more than
you can see. It might or might not work with other versions or platforms.
It may even create a (small) black hole and eat your whole town. Use at
your own risk.
--
Gabriel Genellina
Oops, I needed 'EVP_MD_CTX'. I went Googling and found it.
But does Gabriel's work for you?
WOW! I never expected python could be coded like that! Thanks a lot!
On Oct 2, 5:19 pm, "Aaron \"Castironpi\" Brady" <castiro...@gmail.com>
> http://www.google.com/codesearch?hl=en&q=struct+EVP_MD_CTX+show:mV3VB...
>
> But does Gabriel's work for you?- Hide quoted text -
>
> - Show quoted text -
Still need some hack with py2.5 on linux. Maybe I just need soft link /
usr/lib/python2.4/lib-dynload/md5.so to py2.5 :-)
py2.4 is pre-installed on most of the servers, I think.