changes to shelf items fail silently?

eic...@metacarta.com

unread,

Mar 17, 2003, 4:42:29 PM3/17/03

to

shelftest.py:

#!/usr/bin/python
import shelve

complex_example = shelve.open("/tmp/complex_example_shelf")
complex_example["a"] = []
print complex_example["a"]
complex_example["a"].append("b")
print complex_example["a"]
complex_example["a"].extend(["b"])
print complex_example["a"]
complex_example["a"] = complex_example["a"] + ["b"]
print complex_example["a"]

$ python shelftest.py
[]
[]
[]
['b']

So, I see why the last case works (it can't help *but* work) but I
don't get why the append/extend mutators don't. Or rather, why if
they can't work, I don't get an error thrown so I notice it...

(this occurs in 2.1 and 2.2. Is there a "better" shelf-like thing
that I should use instead, for more completely persistent data?)

Steven Taschuk

unread,

Mar 17, 2003, 8:53:42 PM3/17/03

to

Quoth eic...@metacarta.com:
[...]

> complex_example["a"].append("b")

> complex_example["a"].extend(["b"])

> complex_example["a"] = complex_example["a"] + ["b"]

[...]

> So, I see why the last case works (it can't help *but* work) but I
> don't get why the append/extend mutators don't. Or rather, why if
> they can't work, I don't get an error thrown so I notice it...

They're mutating the returned list (to which you retain no
references, and which therefore is immediately discarded).

This would work if you were working with a dict, since then the
returned list would be the same object as the object in the dict.
But since it's a shelve, the returned list is just an in-memory
copy of the object in the shelve, and changes to it do not affect
the shelve.

Your append example, e.g., is the same as doing
foo = complex_example["a"]
foo.append("b")
There's no assignment into complex_example, so the shelve doesn't
get updated. Doing
foo = complex_example["a"]
foo.append("b")
complex_example["a"] = foo
will certainly work, and
complex_example["a"] += ["b"]
might too.

> (this occurs in 2.1 and 2.2. Is there a "better" shelf-like thing
> that I should use instead, for more completely persistent data?)

You want *every* change to the object to cause the shelve to be
updated immediately? Really?

That's pretty hard in general, since the shelve-like thing can't
know a priori what operations mutate an object and what operations
don't. (It could make an effort to detect changes after the fact
by checking for changes in __dict__ or in __getstate__ or
__reduce__'s return value... but you'd have to do this after
*every* method call on the object. Ick.)

--
Steven Taschuk stas...@telusplanet.net
"What I find most baffling about that song is that it was not a hit."
-- Tony Dylan Davis (CKUA)

eic...@metacarta.com

unread,

Mar 17, 2003, 11:56:20 PM3/17/03

to

> They're mutating the returned list (to which you retain no
> references, and which therefore is immediately discarded).

Ah, so I'm missing the distinction that it's returning a list, and not
being a reference to one. (Knowable only by looking at the code to
shelve, since the syntax hides it, right?)

> You want *every* change to the object to cause the shelve to be
> updated immediately? Really?

I certainly want every change to happen to the in-memory copy, which
was what alerted me to the problem; having a sync() to flush it out to
disk would suffice. But yes, in this particular case I do want live
changes. If it were large, using a real database might make sense,
but only *after* benchmarking shelve and finding it slow, which I
don't expect to :-)

(Note that I'm coming from perl, and would have done this with a tie'd
hash there.)

> That's pretty hard in general, since the shelve-like thing can't
> know a priori what operations mutate an object and what operations
> don't. (It could make an effort to detect changes after the fact

Or just pass back wrapped-objects, whose methods are "like" the input
type but commit themselves on __setitem__ calls?

In the meantime, I've worked around it with explicit stores, but
eventually will want something cleaner.

Raymond Hettinger

unread,

Mar 18, 2003, 1:17:32 AM3/18/03

to

> So, I see why the last case works (it can't help *but* work) but I
> don't get why the append/extend mutators don't. Or rather, why if
> they can't work, I don't get an error thrown so I notice it...
>
> (this occurs in 2.1 and 2.2. Is there a "better" shelf-like thing
> that I should use instead, for more completely persistent data?)

Yes.
Alex Martelli proposed a solution:
www.python.org/sf/553171

Raymond Hettinger

Alex Martelli

unread,

Mar 18, 2003, 2:53:27 AM3/18/03

to

eic...@metacarta.com wrote:

>> They're mutating the returned list (to which you retain no
>> references, and which therefore is immediately discarded).
>
> Ah, so I'm missing the distinction that it's returning a list, and not
> being a reference to one. (Knowable only by looking at the code to
> shelve, since the syntax hides it, right?)

Right. I consider this a bug in shelve, myself.

>> That's pretty hard in general, since the shelve-like thing can't
>> know a priori what operations mutate an object and what operations
>> don't. (It could make an effort to detect changes after the fact
>
> Or just pass back wrapped-objects, whose methods are "like" the input
> type but commit themselves on __setitem__ calls?

Unfortunately insufficient -- there are a zillion more ways to mutate
shelved objects in addition to __setitem__ calls. For lists, you could
go through the list of operations and methods and religiously wrap
every single one that's a mutator. But then you'd have to do it all
over again for dicts, and (the kicker...) for every mutable type (an
infinity thereof, mostly instances of classic and new-style classes).

> In the meantime, I've worked around it with explicit stores, but
> eventually will want something cleaner.

It _would_ be nice if there was a general way for objects to expose
an "I'm about to mutate" hook -- it would make such wrapping quite
feasible and also allow a general solution to the issue of making
immutable snapshots of something (wrap that thing with a wrapper that
raises a suitable exception when mutation would be about to happen).

Unfortunately, so far we have no design yet for such a feature:-(.

Alex

Steven Taschuk

unread,

Mar 18, 2003, 3:40:10 AM3/18/03

to

Quoth eic...@metacarta.com:

>
> > They're mutating the returned list (to which you retain no
> > references, and which therefore is immediately discarded).
>
> Ah, so I'm missing the distinction that it's returning a list, and not
> being a reference to one. (Knowable only by looking at the code to
> shelve, since the syntax hides it, right?)

Erk? I've never looked at the code to shelve, so, no, you don't
need to look at that code.

The code
container[key].mutate()
evaluates container[key] (which might mean calling the container's
__getitem__ method), obtaining thereby an object, and calls the
given method of that object. This is the case for dicts or
anything subscriptable, including shelves.

In the case of dicts, container[key] evaluates to a reference to
the very object which is in the dict, so mutating the value of
this expression mutates the object in the dict. In the case of
shelves, container[key] evaluates to a reference to an object
which is the result of unpickling the contents of a record found
in the backing dbm database; that object isn't the same thing as
the record in the database, so mutating it doesn't alter that
record.

No knowledge of the internals of shelve is necessary. It's just
expression evaluation plus the concept of shelve as a combination
of a key-value database on disk (dbm) and a generic object
serialization protocol (pickle).

Am I skipping over some important assumption without realizing it?

> > You want *every* change to the object to cause the shelve to be
> > updated immediately? Really?
>
> I certainly want every change to happen to the in-memory copy, which
> was what alerted me to the problem; having a sync() to flush it out to

> disk would suffice. [...]

The change *is* happening to an in-memory copy when you do
container[key].mutate()
It's just that the mutated copy is immediately discarded and not
written to the shelf.

> [...] But yes, in this particular case I do want live

> changes. If it were large, using a real database might make sense,
> but only *after* benchmarking shelve and finding it slow, which I
> don't expect to :-)

*chuckle* Good point. (I assume you also don't need multiple
users reading and writing simultaneously.)

> (Note that I'm coming from perl, and would have done this with a tie'd
> hash there.)

I don't know Perl, as it happens. What's a tied hash?

> > That's pretty hard in general, since the shelve-like thing can't
> > know a priori what operations mutate an object and what operations
> > don't. (It could make an effort to detect changes after the fact
>
> Or just pass back wrapped-objects, whose methods are "like" the input
> type but commit themselves on __setitem__ calls?

In the specific case of shelved lists and dicts, yes, that's a
good start. We'd have to trap __setslice__ calls and others as
well, but I'll agree that it's doable.

Now, what about shelved foobar.wombat objects? Which of their
methods should cause commits?

[...]
--
Steven Taschuk "The world will end if you get this wrong."
stas...@telusplanet.net -- "Typesetting Mathematics -- User's Guide",
Brian Kernighan and Lorrinda Cherry

eic...@metacarta.com

unread,

Mar 18, 2003, 6:08:14 PM3/18/03

to

> Am I skipping over some important assumption without realizing it?

I think the distinction is that in any other context, and any other
use of that *syntax*, you don't especially care that it is a
reference, because if you mutate the object, the dict still contains a
reference to the mutated object. With shelf, you're *not* mutating
the object, you're mutating a temporary copy.

Also:

> of a key-value database on disk (dbm) and a generic object
> serialization protocol (pickle).

While the __doc__ does say that, it isn't clear why that should cause
this effect; serialization is just "how it is stored", and doesn't
inherently imply "so you only get temporaries out".

> *chuckle* Good point. (I assume you also don't need multiple
> users reading and writing simultaneously.)

Right - another way of describing my use case is "single daemon that
does stuff, keeps information about it in-memory in a dict - great
prototype, now it needs to be restartable." One could just pickle the
whole dict and write it out at every commit point, but (1) shelve
looks more efficient (2) using shelve is far easier to write
[literally, import it, and then use shelve.open() instead of {} as an
initializer.]

> I don't know Perl, as it happens. What's a tied hash?

A perl hash is basically a python dict; the main difference is that
it can hold only "scalars" (numbers, strings, references; not other
hashes, or lists.)

A "tied" hash is one that has an associated Class with
a standard set of methods (FETCH, DELETE, STORE, etc.) that get called
when the relevant thing is done. Defining __getitem__/__setitem__ has
pretty much the same effect in python.

Hmm. I suspect it's actually harder to do this in perl (you pretty
much have to take the wrapper approach) since the only way to put
higher level objects in *is* to put in references - so that would have
led me to what the problem was much more quickly...

(On a side note, it occurs to me that the dict structure *isn't*
arbitrarily deep, it's only a dict-of-dicts, so I could instead
shelve.open each subdict. Then at startup, scan the values and open
all of the subshelves explicitly. A little tricky but may end up more
efficient once the subdicts start getting large (ie. that's the way to
handle it if the overhead of repickling on inserts gets too big.))

> Now, what about shelved foobar.wombat objects? Which of their
> methods should cause commits?

Mmm, throw an exception if they don't "conform" to some interface.
The reason I went down this path isn't that it silently discarded the
changes, after all; I'd have been happy if it had non-silently told me
(perhaps by having the references be non-mutable and cause exceptions
to be thrown on changes, for example.)

Greg Ewing (using news.cis.dfn.de)

unread,

Mar 18, 2003, 7:54:07 PM3/18/03

to

eic...@metacarta.com wrote:
> I certainly want every change to happen to the in-memory copy, which
> was what alerted me to the problem; having a sync() to flush it out to
> disk would suffice.

You could wrap something around shelve to do what you want.
The first time you retrieve a given key, it would fetch it
from the shelf, and after that just return a reference to
the previous object.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Steven Taschuk

unread,

Mar 19, 2003, 12:22:05 AM3/19/03

to

Quoth eic...@metacarta.com:
[...]

> > of a key-value database on disk (dbm) and a generic object
> > serialization protocol (pickle).
>
> While the __doc__ does say that, it isn't clear why that should cause
> this effect; serialization is just "how it is stored", and doesn't
> inherently imply "so you only get temporaries out".

Mm... yes, I see your point. On the other hand, what I've
sketched is imho the obvious implementation. Set a value in the
shelf, object gets pickled and written to disk; read a value from
the shelf, object gets read from disk and unpickled. Very simple
to implement. It really is just a combination of dbm and pickle.

Transparent persistence would be a much bigger deal (as is being
discussed below) -- so much so that it wouldn't occur to me to
expect it unless the documentation had some discussion of the
major issue, how changes are detected.

(Incidentally, in this sketch, "temporaries" is not an appropriate
term for the unpickled objects; it implies that the pickled object
in the shelf is the primary object, and the unpickled object in
memory an imperfect copy which lacks the true object's permanence.
I'd say it's the reverse: the in-memory object is the primary, and
the pickled bytestream in the shelf the imperfect copy, and not an
object at all (though of course it is convenient to speak of it as
if it is).)

[...]

> > Now, what about shelved foobar.wombat objects? Which of their
> > methods should cause commits?
>
> Mmm, throw an exception if they don't "conform" to some interface.
> The reason I went down this path isn't that it silently discarded the
> changes, after all; I'd have been happy if it had non-silently told me
> (perhaps by having the references be non-mutable and cause exceptions
> to be thrown on changes, for example.)

Well, making an immutable view of an object is just as much
trouble as detecting changes for automatic commits. How will the
wrapper know which methods of foobar.wombat objects to raise
exceptions for?

You could have some interface for this kind of thing (i.e., a way
to subscribe to "imminent mutation" announcements from an object,
and to interfere in those mutations, as Martelli suggests), and
allow only objects conforming to that interface to be shelved; but
one of the nice things about shelves is that they accept anything
picklable, which is pretty much anything.

(Actually, detecting changes after the fact is easier than
rejecting them beforehand. A wrapper could pickle the object
after every method call or attribute change and compare the result
to what was read from the dbm database, committing if different.
This would be very expensive but would detect all changes which
the shelf is capable of storing at all.)

--
Steven Taschuk stas...@telusplanet.net
"I'm always serious, never more so than when I'm being flippant."
-- _Look to Windward_, Iain M. Banks