Click on http://groups.google.com/group/apam-python-users/web/pickling-data-persistence
- or copy & paste it into your browser's address bar if that doesn't
work.
Have you benchmarked cPickle.dump(npyobj, fileobj, 2) , i.e., using
pickle's protocol version 2?
--
Lisandro Dalcin
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
On 21 February 2010 15:46, Ian <ianla...@gmail.com> wrote:I wrote a couple functions that are useful for storing generic python objects containing large numpy arrays. Click on http://groups.google.com/group/apam-python-users/web/pickling-data-persistence - or copy & paste it into your browser's address bar if that doesn't work.Have you benchmarked cPickle.dump(npyobj, fileobj, 2) , i.e., using pickle's protocol version 2?
-- - Ian
Still, the differences are notorious... A quick test (with a single
32MB array) in my box shows me that ary.dump(filename) (basically,
cPicle.dump with protocol 2) is faster than numpy.save(filename, ary)
(basically, npy format)
> One disadvantage with
> pickleme/unpickleme is that they can't be used to pickle anything in python
> that doesn't have a __dict__ attribute (say an integer). This could be
> corrected of course...
>
Sorry, can you elaborate on this limitation? The pickle protocol let
you serialize objects without __dict__ (like built-in types, these you
get with 'cdef class' in Cython). You just have to implement some
special methods, like __reduce__ and __setstate__, or use the
'copy_reg' module...
-- - Ian
OK.. now I understand your comments...
> First though I need to experiment to see what is
> causing the slowdown of cPickle on my system.
> If cPickle is indeed faster
> than numpy.load/save, then there is no reason for my functions.
>
No, sorry, I wrote it wrong... It was the other way around.. See
yourself, np.save() seems to be (a bit) faster in my box (using Python
2.6.2 and numpy 1.3.0).
In [1]: import numpy as np
In [2]: a = np.ones(4e6)
In [3]: a.nbytes
Out[3]: 32000000
In [4]: %timeit a.dump('/tmp/npyarray.tmp')
10 loops, best of 3: 195 ms per loop
In [5]: %timeit np.save('/tmp/npyarray.tmp', a) # note: saves to
'/tmp/npyarray.tmp.npy'
10 loops, best of 3: 136 ms per loop
On 22 February 2010 17:18, Ian Langmore <ianla...@gmail.com> wrote:I'm not sure what is causing these differences. The limitations I spoke of are due to the way I wrote pickleme/unpickleme. I have them search through the object's dictionary. This could be changed if someone wanted to.OK.. now I understand your comments...First though I need to experiment to see what is causing the slowdown of cPickle on my system. If cPickle is indeed faster than numpy.load/save, then there is no reason for my functions.No, sorry, I wrote it wrong... It was the other way around.. See yourself, np.save() seems to be (a bit) faster in my box (using Python 2.6.2 and numpy 1.3.0). In [1]: import numpy as np In [2]: a = np.ones(4e6) In [3]: a.nbytes Out[3]: 32000000 In [4]: %timeit a.dump('/tmp/npyarray.tmp') 10 loops, best of 3: 195 ms per loop In [5]: %timeit np.save('/tmp/npyarray.tmp', a) # note: saves to '/tmp/npyarray.tmp.npy' 10 loops, best of 3: 136 ms per loop
-- - Ian