bug? dbm.dbm not iterable

87 views
Skip to first unread message

C. Titus Brown

unread,
Aug 5, 2009, 12:05:05 PM8/5/09
to pygr...@googlegroups.com
Hi all,

on my laptop, I don't have the bsddb module installed for python2.6, and
so I get this warning,

--
WARNING dbfile.open_index: Falling back to hash index: unable to import bsddb
--

which is understandable, but then I get these errors:

======================================================================
ERROR: test_headerfile_create (seqdb_test.PrefixUnionDict_Creation_Test)
----------------------------------------------------------------------
...
File "/Users/t/dev/pygr/pygr/dbfile.py", line 104, in __iter__
return iter(self.dict)
TypeError: 'dbm.dbm' object is not iterable

---

This appears to be this problem,

http://bugs.python.org/issue5736

caused in this case by anydbm returning a dbm.dbm object, which is not
iterable.

This is one of those errors that could cause serious grief to n00bies ;).

At least three possible solutions:

- require bsddb

- put in code to "test" to see if what anydbm returns can give an
iterator, and if not, give a very clear error message;

- try to write our own wrapper code to "fix" the problem.

I vote for #2 or #1. Note that bsddb is deprecated in 2.7 and going
away (gone?) in 3.x, but we can deal with that... later. I'd like to
get in something for 8.0 final if possible.

Opinions?

cheers,
--titus
--
C. Titus Brown, c...@msu.edu

Christopher Lee

unread,
Aug 5, 2009, 12:29:54 PM8/5/09
to pygr...@googlegroups.com
Hi Titus,
we already try to handle these types of errors, specifically for gdbm
(here is our current dbfile.py code). Based on what you said, it
sounds like dbm.dbm manages to come up with yet another incompatible
iteration method. It shouldn't be hard to add support for that.

try:
return iter(self.dict) # <---------- LINE 104
except TypeError: # gdbm lacks __iter__ method, so try
iter_gdbm()
exc_type, exc_value, exc_traceback = sys.exc_info()
try:
self.dict.firstkey
except AttributeError: # evidently not a gdbm dict
raise exc_value, None, exc_traceback # re-raise
original error
else: # iterate using gdbm-specific method
return iter_gdbm(self.dict)

-- Chris

On Aug 5, 2009, at 9:05 AM, C. Titus Brown wrote:

> on my laptop, I don't have the bsddb module installed for python2.6,
> and
> so I get this warning,

Christopher Lee

unread,
Aug 5, 2009, 5:25:51 PM8/5/09
to pygr...@googlegroups.com

On Aug 5, 2009, at 9:05 AM, C. Titus Brown wrote:

>
> Hi all,
>
> on my laptop, I don't have the bsddb module installed for python2.6,
> and
> so I get this warning,
>

> ...
> File "/Users/t/dev/pygr/pygr/dbfile.py", line 104, in __iter__
> return iter(self.dict)
> TypeError: 'dbm.dbm' object is not iterable

Curious. anydbm tries its possible backends in the following order:

['dbhash', 'gdbm', 'dbm', 'dumbdbm']

Apparently neither dbhash nor gdbm is working on your laptop?

-- Chris

C. Titus Brown

unread,
Aug 5, 2009, 5:30:55 PM8/5/09
to pygr...@googlegroups.com
On Wed, Aug 05, 2009 at 02:25:51PM -0700, Christopher Lee wrote:
-> On Aug 5, 2009, at 9:05 AM, C. Titus Brown wrote:
->
-> > Hi all,
-> >
-> > on my laptop, I don't have the bsddb module installed for python2.6,
-> > and
-> > so I get this warning,
-> >
-> > ...
-> > File "/Users/t/dev/pygr/pygr/dbfile.py", line 104, in __iter__
-> > return iter(self.dict)
-> > TypeError: 'dbm.dbm' object is not iterable
->
-> Curious. anydbm tries its possible backends in the following order:
->
-> ['dbhash', 'gdbm', 'dbm', 'dumbdbm']
->
-> Apparently neither dbhash nor gdbm is working on your laptop?

Yep, can't import either one. (Under py2.6. py2.5 works just fine ;)

--t

Christopher Lee

unread,
Aug 5, 2009, 5:37:15 PM8/5/09
to pygr...@googlegroups.com

On Aug 5, 2009, at 9:05 AM, C. Titus Brown wrote:

> This appears to be this problem,
>
> http://bugs.python.org/issue5736
>
> caused in this case by anydbm returning a dbm.dbm object, which is not
> iterable.

Yuck. What a mess Python has inflicted on us: anydbm is supposed to
provide a transparent interface to multiple backends, but even basic
capabilities like iteration and __contains__ don't work for backends
like gdbm and dbm. As a result, the standard module shelve won't work
correctly either for those backends. And Python is taking away the
bsddb backend, the one backend that actually worked correctly with
shelve...

Pygr currently provides a workaround for gdbm's iterator failure.

For dbm there appears to be no way to access iteration (from Python)
except via iter(db.keys()), thereby loading the entire set of keys
into memory. Yuck. This will work, but it's not highly scalable. In
addition, asking "key in db" doesn't work on dbm objects, which again
will break shelve. This gets us into the business of subclassing
shelve.Shelf to provide improved has_key(), __contains__() and get()
methods (we already subclass it to provide an improved __iter__()
method for gdbm). All of those are easy, but it makes you wonder,
when will the madness end?


>
> This is one of those errors that could cause serious grief to
> n00bies ;).

Absolutely. Pygr is trying to clear away all these annoyances that
get between people and real work.

>
> At least three possible solutions:
>
> - require bsddb
>
> - put in code to "test" to see if what anydbm returns can give an
> iterator, and if not, give a very clear error message;
>
> - try to write our own wrapper code to "fix" the problem.
>
> I vote for #2 or #1. Note that bsddb is deprecated in 2.7 and going
> away (gone?) in 3.x, but we can deal with that... later. I'd like to
> get in something for 8.0 final if possible.

Option #1 puts us on a collision course with Python's avowed policy to
deprecate bsddb. It also means users won't be able to install Pygr
unless they are able to install bsddb. I am concerned that installing
bsddb may in some cases be tricky (I've heard of problems on various
platforms...).

Option #2 is not much of an improvement. The user will be able to
install Pygr without bsddb, but all the file storages (SequenceFileDB,
NLMSA etc.) use shelve. So if shelve doesn't actually work on their
box, they won't be able to do much of anything!

So I end up at Option #3, reluctantly. It looks quite easy to solve
both dbm's iteration and __contains__ bugs. But then there is the
fact that the iter() scalability will be poor (because your only
choice is to call iter(db.keys()). And we keep hitting new problems
with the backends (first gdbm, now dbm, next time ???). When will the
madness end? It would be great if Python would solve all these
backend problems in Python 2.7.

So I conclude that we should eventually move away from all these
bsddb / dbm style backends entirely, and use sqlite as the backend for
shelve. sqlite looks like it has a future in Python; the other
backends seem like throwbacks to the past. I believe Istvan looked at
sqlite performance some time ago; I think it was pretty good.

What do other people think?

-- Chris

Christopher Lee

unread,
Aug 5, 2009, 5:54:52 PM8/5/09
to Pygr Development Group
> So I conclude that we should eventually move away from all these
> bsddb / dbm style backends entirely, and use sqlite as the backend
> for shelve. sqlite looks like it has a future in Python; the other
> backends seem like throwbacks to the past. I believe Istvan looked
> at sqlite performance some time ago; I think it was pretty good.

Looks like the Pythonistas have developed this idea pretty far, but
are currently not doing anything further on it:
http://bugs.python.org/issue3783

Maybe this will go into a future version of Python? Maybe one of us
could offer to take over work on this branch?

-- Chris

Christopher Lee

unread,
Aug 5, 2009, 9:12:20 PM8/5/09
to pygr...@googlegroups.com
I searched the Python tracker for "shelve iter" and "Shelf iter" and
found a few relevant things, e.g.

- Python issue 5736 is trying to make dbm and gdbm support the
standard iterator protocol.

I'm wondering if we should add the following as a bug report:

- iter(shelve) should not load the entire index into memory, but
should use the native iteration method of the backend (once 5736 makes
that possible).

What do you think?

-- Chris

Christopher Lee

unread,
Aug 5, 2009, 9:20:40 PM8/5/09
to Pygr Development Group

On Aug 5, 2009, at 2:37 PM, Christopher Lee wrote:

> For dbm there appears to be no way to access iteration (from Python)
> except via iter(db.keys()), thereby loading the entire set of keys
> into memory. Yuck. This will work, but it's not highly scalable.

It looks like all we'd have to do is add support for iteration (in the
above, not-very-scalable way). Shelf.__contains__ seems to work fine
with dbm.

-- Chris

C. Titus Brown

unread,
Aug 5, 2009, 9:26:53 PM8/5/09
to pygr...@googlegroups.com
-> [ ... ] All of those are easy, but it makes you wonder,
-> when will the madness end?
->
-> > At least three possible solutions:
-> >
-> > - require bsddb
-> >
-> > - put in code to "test" to see if what anydbm returns can give an
-> > iterator, and if not, give a very clear error message;
-> >
-> > - try to write our own wrapper code to "fix" the problem.
-> >
-> > I vote for #2 or #1. Note that bsddb is deprecated in 2.7 and going
-> > away (gone?) in 3.x, but we can deal with that... later. I'd like to
-> > get in something for 8.0 final if possible.
->
-> Option #1 puts us on a collision course with Python's avowed policy to
-> deprecate bsddb. It also means users won't be able to install Pygr
-> unless they are able to install bsddb. I am concerned that installing
-> bsddb may in some cases be tricky (I've heard of problems on various
-> platforms...).
->
-> Option #2 is not much of an improvement. The user will be able to
-> install Pygr without bsddb, but all the file storages (SequenceFileDB,
-> NLMSA etc.) use shelve. So if shelve doesn't actually work on their
-> box, they won't be able to do much of anything!
->
-> So I end up at Option #3, reluctantly. It looks quite easy to solve
-> both dbm's iteration and __contains__ bugs. But then there is the
-> fact that the iter() scalability will be poor (because your only
-> choice is to call iter(db.keys()). And we keep hitting new problems
-> with the backends (first gdbm, now dbm, next time ???). When will the
-> madness end? It would be great if Python would solve all these
-> backend problems in Python 2.7.
->
-> So I conclude that we should eventually move away from all these
-> bsddb / dbm style backends entirely, and use sqlite as the backend for
-> shelve. sqlite looks like it has a future in Python; the other
-> backends seem like throwbacks to the past. I believe Istvan looked at
-> sqlite performance some time ago; I think it was pretty good.
->
-> What do other people think?

Why not:

- remove support for dbm, leaving in gdbm support if necessary
- build in support for sqlite-based shelve (for next release)
- live happily ever after

?

We can leave bsddb support in for python 2.3 and python2.4, which don't
have sqlite, and move to a default of sqlite support.

Christopher Lee

unread,
Aug 5, 2009, 9:38:42 PM8/5/09
to pygr...@googlegroups.com
On Aug 5, 2009, at 6:26 PM, C. Titus Brown wrote:
> Why not:
>
> - remove support for dbm, leaving in gdbm support if necessary
> - build in support for sqlite-based shelve (for next release)
> - live happily ever after

It also depends on whether Python will fix issue 5736 (provide a real
iterator for dbm) and issue 3783 (dbm.sqlite implementation), and what
Python versions would provide these fixes. It seems like there enough
platforms lacking bsddb that we would want to backport dbm.sqlite
support to handle those platforms...

-- Chris

C. Titus Brown

unread,
Aug 5, 2009, 9:45:45 PM8/5/09
to pygr...@googlegroups.com
On Wed, Aug 05, 2009 at 06:38:42PM -0700, Christopher Lee wrote:
-> On Aug 5, 2009, at 6:26 PM, C. Titus Brown wrote:
-> > Why not:
-> >
-> > - remove support for dbm, leaving in gdbm support if necessary
-> > - build in support for sqlite-based shelve (for next release)
-> > - live happily ever after
->
-> It also depends on whether Python will fix issue 5736 (provide a real
-> iterator for dbm) and issue 3783 (dbm.sqlite implementation), and what
-> Python versions would provide these fixes. It seems like there enough
-> platforms lacking bsddb that we would want to backport dbm.sqlite
-> support to handle those platforms...

Is Mac OS X the only mainstream distro that's a problem? Windows binary
dists of Python come with bsddb support, and with Linux you've either
got a good install of Python with your distro, or package managers can
install bsddb for you.

Relying on sqlite with bsddb as a backup for 2.3 and 2.4 distros should
only affect Mac OS X. But Python 2.3 doesn't compile on the more recent
Mac OS X now, so we're left to worry about Mac OS X users running
Python 2.4.

I really don't want to expand our business of supporting this kind of
Python obnoxiousness; it's hard to test, and it's relatively complicated
code to maintain.

I'm more strongly in favor of just putting a straightforward error
message in and leaving it at that for 0.8.

Istvan Albert

unread,
Aug 7, 2009, 8:58:06 AM8/7/09
to pygr-dev


On Aug 5, 5:37 pm, Christopher Lee <l...@chem.ucla.edu> wrote:

> Yuck.  What a mess Python has inflicted on us: anydbm is supposed to  
> provide a transparent interface to multiple backends, but even basic  
> capabilities like iteration and __contains__ don't work for backends  
> like gdbm and dbm.

In all fairness to Python I think iterators came after anydbm, and for
a long time they were not as commonly understood/used as they are
today.

I think it may better if the operations did not silently fall back to
some other backend - and then perform in an atrociously slowly.That
could just cause even more problems. The error message should be more
informative, and explicitly require bsddb.

> backends seem like throwbacks to the past. I believe Istvan looked at  
> sqlite performance some time ago; I think it was pretty good.

Yes, I did, performance looked pretty good. And I think we could use
the code from

http://bugs.python.org/issue3783

right away, and update as necessary. Maybe this could be a independent
microrelease release after 0.8. that does not add any new
functionality but swaps out one backend for another. I can take a
second look in a few weeks (need to finish something else first, but
that will happen soon)

Istvan

Christopher Lee

unread,
Aug 12, 2009, 10:09:32 PM8/12/09
to pygr...@googlegroups.com

On Aug 7, 2009, at 5:58 AM, Istvan Albert wrote:

> I think it may better if the operations did not silently fall back to
> some other backend - and then perform in an atrociously slowly.That
> could just cause even more problems. The error message should be more
> informative, and explicitly require bsddb.

I think your proposal is much more sensible than what I was
suggesting. In the first 0.8 release, let's just raise a very clear
error message if neither bsddb or gdbm (which we handle well) are
available. Titus, what do you think?

>
>> backends seem like throwbacks to the past. I believe Istvan looked at
>> sqlite performance some time ago; I think it was pretty good.
>
> Yes, I did, performance looked pretty good. And I think we could use
> the code from
>
> http://bugs.python.org/issue3783
>
> right away, and update as necessary. Maybe this could be a independent
> microrelease release after 0.8. that does not add any new
> functionality but swaps out one backend for another. I can take a
> second look in a few weeks (need to finish something else first, but
> that will happen soon)

Yes, I think this is a great solution -- it genuinely solves the
problem by providing a reliable backend.

-- Chris

C. Titus Brown

unread,
Aug 12, 2009, 10:11:54 PM8/12/09
to pygr...@googlegroups.com
On Wed, Aug 12, 2009 at 07:09:32PM -0700, Christopher Lee wrote:
-> On Aug 7, 2009, at 5:58 AM, Istvan Albert wrote:
->
-> > I think it may better if the operations did not silently fall back to
-> > some other backend - and then perform in an atrociously slowly.That
-> > could just cause even more problems. The error message should be more
-> > informative, and explicitly require bsddb.
->
-> I think your proposal is much more sensible than what I was
-> suggesting. In the first 0.8 release, let's just raise a very clear
-> error message if neither bsddb or gdbm (which we handle well) are
-> available. Titus, what do you think?

Absolutely.

-> >> backends seem like throwbacks to the past. I believe Istvan looked at
-> >> sqlite performance some time ago; I think it was pretty good.
-> >
-> > Yes, I did, performance looked pretty good. And I think we could use
-> > the code from
-> >
-> > http://bugs.python.org/issue3783
-> >
-> > right away, and update as necessary. Maybe this could be a independent
-> > microrelease release after 0.8. that does not add any new
-> > functionality but swaps out one backend for another. I can take a
-> > second look in a few weeks (need to finish something else first, but
-> > that will happen soon)
->
-> Yes, I think this is a great solution -- it genuinely solves the
-> problem by providing a reliable backend.

Yep -- let's put it on the list for 0.8.1.

--t

Reply all
Reply to author
Forward
0 new messages