shelve vs pickle

hungj...@yahoo.com

unread,

Sep 14, 2001, 4:28:50 PM9/14/01

to pytho...@cwi.nl

Hi,

I've read a few articles in the mailing list, but it is not apparent
to me the advantages of shelve over pickle. (These are two modules in
Python.)

(1) Shelve does not write to disk immediately, at least in Windows
platform. So, if you are storing things to the shelve, you are often
doing it in RAM, not to the disk. And if your program crashes before
you close the shelve, all changes are gone. Which may or may not be
what you want. For a good database, you choose when and how you
commit transactions. For shelve, can you choose the transactional
behavior? (So that it saves things more often?)

(2) Shelve, then, very often stores things just in RAM. I don't know
the details, but I guess that if the file is large (10GB? 100GB?), it
may be smart not to load everything into RAM.

(3) Or is the advantage of Shelve in the open() and close()
statements? Is it smart enough to save only those items that have
been modified? So that open() and close() are fast enough if only a
few items have been touched?

(4) In another test, I closed the Python program while it was in a
loop writing items to shelve. The database file got corrupted enough
that the majority of items are now missing.

All in all, I am just wondering, just like some other people have
asked before in the newsgroup (and not getting any real answer): when
is it good to use shelve?

I guess shelve can only be used safely if extra caution is taken into
account:

(1) Before opening a shelve file, make sure of making a back up copy.
(This can be done at the moment of saving, too, depends on each
person's preference.)
(2) Delete the back up copy only if the shelve has been closed
successfully.
(3) Close the shelve file often, to make sure that your changes are
recorded.

However, you folks can realize a contradiction here: in order to use
shelve safely, you have to copy the big external file everytime you
want to open (or save) it. The benefit of fast opening and fast
closing is gone.

So, I still don't see any benefit from shelve. I can only see that to
do things safely I would use:

(1) a transactional database, or

(2) pickle or xml my Python objects in small units, so that I can
safely access them. ("safely accessing" means the standard procedure
of keeping a backup copy before overriding the file, and use file
renaming schemes to minimize the time of possible data inconsistency
if the program crashes unexpectly, for instance, due power failure.)

And I just don't see when to use shelve. If I have 1 million items to
store, I wouldn't use shelve: I'd use transactional database. If I
have 100 items to save, I wouldn't use shelve: I just use pickle.

There is a small range of applicability of shelve: in situations
where you: (1) have many items, (2) need to modify many things, but
in a transactional fashion, typically spending 5 minutes to an hour
in the process. (3) Don't mind losing the changes if the transaction
is not completed. I guess you can use shelve for report logging, for
non-critical web session data, etc.

regards,

Hung Jung

Sheila King

unread,

Sep 15, 2001, 12:59:43 AM9/15/01

to

On Fri, 14 Sep 2001 20:28:50 -0000, hungj...@yahoo.com wrote in
comp.lang.python in article
<mailman.1000499358...@python.org>:

:(3) Or is the advantage of Shelve in the open() and close()

:statements? Is it smart enough to save only those items that have
:been modified? So that open() and close() are fast enough if only a
:few items have been touched?
:
:(4) In another test, I closed the Python program while it was in a
:loop writing items to shelve. The database file got corrupted enough
:that the majority of items are now missing.
:
:All in all, I am just wondering, just like some other people have
:asked before in the newsgroup (and not getting any real answer): when
:is it good to use shelve?

I really liked your questions. Right now, I was working with some
programs that use shelve. It is promoted rather highly in a number of
the Python books that I have purchased, so I just thought it was a good
thing, and I would use it. But you have very interesting question. (more
below)

:I guess shelve can only be used safely if extra caution is taken into

:account:
:
:(1) Before opening a shelve file, make sure of making a back up copy.
:(This can be done at the moment of saving, too, depends on each
:person's preference.)
:(2) Delete the back up copy only if the shelve has been closed
:successfully.
:(3) Close the shelve file often, to make sure that your changes are
:recorded.

Here's something I noticed with shelve tonight:

I have a shelve database, and I delete all of the entries from it. When
I ask it to print the keys() for the database, I get an empty list, as I
would expect. However, if I view the contents of the database file, I
can see that all the entries are still there, and that in fact it is
taking up 11K of space. How can I compact those empty entries out to
regain disk space?

Also, when I use whichdb to guess the type of database, it tells me that
it is a dbhash. However, I'm using Python on a Win98 machine, and I've
not installed the Sleepy Cat database on it. I was really surprised by
that, too. I would have thought that it would have been the dumbdbm
module.

--
Sheila King
http://www.thinkspot.net/sheila/
http://www.k12groups.org/

ko...@aesaeion.com

unread,

Sep 15, 2001, 2:35:36 AM9/15/01

to hungj...@yahoo.com, pytho...@cwi.nl

On Fri, 14 Sep 2001 hungj...@yahoo.com wrote:

> Hi,
>
<snipped for brevity>

> So, I still don't see any benefit from shelve. I can only see that to
> do things safely I would use:
>

What you might want to consider is using the ZODB. It is the object
database that backs zope but it can be used seperately from it easily. It
would give you transactions on your objects, rollbacks, version locking
etc. Overall I seems better then either shelve or pickle are by default.
Overall the ZODB is pretty lightweight and seems to run pretty fast.
Overall I see the ZODB as largely replacing what shelve does.

> (1) a transactional database, or
>
> (2) pickle or xml my Python objects in small units, so that I can
> safely access them. ("safely accessing" means the standard procedure
> of keeping a backup copy before overriding the file, and use file
> renaming schemes to minimize the time of possible data inconsistency
> if the program crashes unexpectly, for instance, due power failure.)
>
> And I just don't see when to use shelve. If I have 1 million items to
> store, I wouldn't use shelve: I'd use transactional database. If I
> have 100 items to save, I wouldn't use shelve: I just use pickle.
>
> There is a small range of applicability of shelve: in situations
> where you: (1) have many items, (2) need to modify many things, but
> in a transactional fashion, typically spending 5 minutes to an hour
> in the process. (3) Don't mind losing the changes if the transaction
> is not completed. I guess you can use shelve for report logging, for
> non-critical web session data, etc.
>
> regards,
>
> Hung Jung
>
>
>

> --
> http://mail.python.org/mailman/listinfo/python-list
>

Sheila King

unread,

Sep 15, 2001, 3:28:17 AM9/15/01

to

On Sat, 15 Sep 2001 00:35:36 -0600 (MDT), ko...@aesaeion.com wrote in
comp.lang.python in article
<mailman.1000535778...@python.org>:

:What you might want to consider is using the ZODB. It is the object

:database that backs zope but it can be used seperately from it easily. It
:would give you transactions on your objects, rollbacks, version locking
:etc. Overall I seems better then either shelve or pickle are by default.
:Overall the ZODB is pretty lightweight and seems to run pretty fast.
:Overall I see the ZODB as largely replacing what shelve does.

Doesn't using ZODB require using some sort of server (database server
and/or web server?)?

If so, I don't see how it would "replace" a module that has no such
requirement. Not all of us would be able to run a database server for
our scripts.

Piet van Oostrum

unread,

Sep 15, 2001, 1:29:15 PM9/15/01

to

>>>>> Sheila King <she...@spamcop.net> (SK) writes:

SK> On Sat, 15 Sep 2001 00:35:36 -0600 (MDT), ko...@aesaeion.com wrote in
SK> comp.lang.python in article
SK> <mailman.1000535778...@python.org>:

SK> :What you might want to consider is using the ZODB. It is the object
SK> :database that backs zope but it can be used seperately from it easily. It
SK> :would give you transactions on your objects, rollbacks, version locking
SK> :etc. Overall I seems better then either shelve or pickle are by default.
SK> :Overall the ZODB is pretty lightweight and seems to run pretty fast.
SK> :Overall I see the ZODB as largely replacing what shelve does.

SK> Doesn't using ZODB require using some sort of server (database server
SK> and/or web server?)?

Not if you use the database only from 1 program at the same time. It is
just a collection of modules that you use in your own program.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van....@hccnet.nl

Tom Loredo

unread,

Sep 17, 2001, 4:04:09 PM9/17/01

to

Sheila King wrote:
>
> Doesn't using ZODB require using some sort of server (database server
> and/or web server?)?

Sheila-

I have only the barest of experience with ZODB (and that on a Mac),
but here goes....

If you use the basic "FileStorage" storage class in ZODB, it just
reads and writes from a file in your file system. That's all I've
used; I don't have any access to a database server. Also, I may be
wrong here, but I believe ZODB uses pickle to store objects; but it
provides a sophisticated yet simple "front end" to help you automate
storage and retrieval of complicated objects.

Peace,
Tom Loredo

Wilson Yeung

unread,

Sep 18, 2001, 5:27:12 PM9/18/01

to

I've had good results with the Sleepycat/Berkeley DB via the
Python interface pybsddb as my underlying store, together with
the pickle module.

Wilson

Sheila King

unread,

Sep 18, 2001, 5:58:19 PM9/18/01

to

On 18 Sep 2001 14:27:12 -0700, wil...@netvmg.com (Wilson Yeung) wrote in
comp.lang.python in article
<ffc650ae.01091...@posting.google.com>:

:I've had good results with the Sleepycat/Berkeley DB via the

:Python interface pybsddb as my underlying store, together with
:the pickle module.

I am also using Sleepycat as my underlying store, but I was using shelve
to access it (instead of pickle).

This thread has really given me pause, and I thought I had spent quite a
bit of time beforehand reading Python books (which play up shelve quite
a bit...such as Programming Python). Now I'm not sure if I'm handling my
data in a safe way.

By the way, does your Sleepycat have multiple writer support? Or do you
have to do you own file locking type of stuff? Or, is it not an issue
for your program?

In a couple of weeks, I'm definitely going to take a look at this ZODB.
I hope it will be fairly easy!