Migrating to BTrees 4.x

104 views
Skip to first unread message

David Glick

unread,
Nov 19, 2016, 3:21:44 PM11/19/16
to zodb
I'm working on updating Plone to support ZODB 4 and 5.

So far most of the blockers are because BTrees 4.x no longer allows using keys whose ordering is not well-defined, including None, and there are various places where Plone was using None as a key. I understand the reason for the restriction and am making progress on making Plone no longer do this.

However I'm looking for suggestions on how to handle migrating existing BTrees to remove the None key. Plone users are used to first updating the software and then running upgrade steps to make any necessary changes to the database. However, because of the new restriction, existing BTrees with None as a key cannot be unpickled:

  File "/Users/davisagli/.buildout/eggs/ZODB-4.4.3-py2.7.egg/ZODB/Connection.py", line 899, in setstate
    self._setstate(obj, oid)
  File "/Users/davisagli/.buildout/eggs/ZODB-4.4.3-py2.7.egg/ZODB/Connection.py", line 956, in _setstate
    self._reader.setGhostState(obj, p)
  File "/Users/davisagli/.buildout/eggs/ZODB-4.4.3-py2.7.egg/ZODB/serialize.py", line 623, in setGhostState
    obj.__setstate__(state)
TypeError: Object has default comparison

Any ideas about how I can write a migration script to fix these BTree instances *after* the BTrees package has been updated? I guess I'm looking for a hook to adjust the state for a particular class before __setstate__ is called, since BTrees are extension types and I can't override __setstate__.

thanks,
David

Hanno Schlichting

unread,
Nov 19, 2016, 4:01:18 PM11/19/16
to zo...@googlegroups.com
Hey David.

I don't remember the details, but you might have some luck with the copy_reg module [0].

Usually that only allows you to overwrite the per-type pickle function for extension types and not the unpickle one. But IIRC the actual bytes of a extension type pickle contain something like `copy_reg.__newobj__` as the actual function to invoke, with the type as an argument. So you might be able to monkey patch the `__newobj__` function in copy_reg.

Of course this is quite a horrible hack ;-)

Hanno

--
You received this message because you are subscribed to the Google Groups "zodb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zodb+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Madden

unread,
Nov 19, 2016, 4:22:02 PM11/19/16
to David Glick, zodb

> On Nov 19, 2016, at 14:21, David Glick <da...@glicksoftware.com> wrote:
>
> I'm working on updating Plone to support ZODB 4 and 5.
> ...
>
> Any ideas about how I can write a migration script to fix these BTree instances *after* the BTrees package has been updated? I guess I'm looking for a hook to adjust the state for a particular class before __setstate__ is called, since BTrees are extension types and I can't override __setstate__.

If you've got a way to override __setstate__ that will do what you want, then you may have an option.

BTrees 4 now ships with a pure-python implementation that should allow you to replace __setstate__. It will be used if the C extension isn't available, or, more helpfully, if the PURE_PYTHON environment variable is set at startup.

If that's not an option, then with a little bit of work, it can be swapped in at runtime and later swapped back out. If you replace __setstate__ an the pure-python implementation, you could swap it in, unpickle the BTrees in the database to perform your fixes, pickle them back, and reverse the swap.

Note that you'll need the BTrees 4.3.0 release to make sure that the pickles are correct.


from BTrees import OOBTree
# Patch __setstate__
OOBTree.OOBTreePy.__setstate__ = # XXX

# Save original values
orig_values = dict(vars(OOBTree))

# Swap in Python implementation
OOBTree.OOBTree = OOBTree.OOBTreePy
...

# Unpickle and do stuff
...

# Restore original implementation
for k, v in orig_values.values():
if getattr(OOBTree, k) is not V:
setattr(OOBTree, k, v)

I haven't tried anything exactly like this, but it seems perfectly possible to me, despite being a hack. I know that the pickles are compatible and can work with each implementation because there are tests for that.

Jason

Jim Fulton

unread,
Nov 19, 2016, 6:59:13 PM11/19/16
to David Glick, zodb
On Sat, Nov 19, 2016 at 3:21 PM, David Glick <da...@glicksoftware.com> wrote:
I'm working on updating Plone to support ZODB 4 and 5.

So far most of the blockers are because BTrees 4.x no longer allows using keys whose ordering is not well-defined, including None, and there are various places where Plone was using None as a key. I understand the reason for the restriction and am making progress on making Plone no longer do this.

Hm. IMO, the restriction should be in adding items, not on unpickling.  We should go to great lengths, to avoid unpickling errors.

I think we should avoid raising this error in __setstate__. It should be easy to fix.

Jim

--

Jason Madden

unread,
Nov 19, 2016, 7:28:31 PM11/19/16
to Jim Fulton, David Glick, zodb
Interestingly, the Python implementation currently *doesn't* raise an error in this case, at least in small trees.

I pickled an OOBTree from ZODB3 3.10.7 that looked like {None: 42}.

When I use the pure Python implementation, it loads:

In [9]: import pickle

In [10]: pickle.loads('ccopy_reg\n__newobj__\np0\n(cBTrees.OOBTree\nOOBTree\np1\ntp2\nRp3\n((((NI42\ntp4\ntp5\ntp6\ntp7\nb.')
Out[10]: <BTrees.OOBTree.OOBTree at 0x10849eed0>

In [11]: bt2 = pickle.loads('ccopy_reg\n__newobj__\np0\n(cBTrees.OOBTree\nOOBTree\np1\ntp2\nRp3\n((((NI42\ntp4\ntp5\ntp6\ntp7\nb.')

In [12]: bt2
Out[12]: <BTrees.OOBTree.OOBTree at 0x10863fa50>

In [13]: type(bt2)
Out[13]: BTrees.OOBTree.OOBTreePy

In [14]: bt2[None]
Out[14]: 42

In [15]: None in bt2
Out[15]: True

I can't set None to a new value:

In [17]: bt2[None] = 43
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)

But I can delete it:

In [30]: len(bt2)
Out[30]: 1

In [31]: del bt2[None]

In [32]: len(bt2)
Out[32]: 0


Iterating keys and values appear to work as expected, too.

Are these the semantics you would expect? If there are unorderable values in the tree, is that going to cause corruption on adding new keys or removing other keys?

Some such keys may not be removable, at least in extreme cases.

Here I am back in ZODB3 again:

In [6]: class Bad(object):
...: def __eq__(self, other):
...: return False # Corner case extreme example of broken object
...:


In [8]: bt[Bad()] = 42

And here's loading that pickle in BTrees 4 Python. Note that I can load it successfully, but I can't delete it, even though I'm using the identical instance:

In [35]: bt2 = pickle.loads('ccopy_reg\n__newobj__\np0\n(cBTrees.OOBTree\nOOBTree\np1\ntp2\nRp3\n((((ccopy_reg\n_reconstructor\np4\n(c__main__\nBad\np5\nc__builtin__\nobject\
...: np6\nNtp7\nRp8\nI42\ntp9\ntp10\ntp11\ntp12\nb.')

In [36]: bt2
Out[36]: <BTrees.OOBTree.OOBTree at 0x1087d72d0>

In [37]: len(bt2)
Out[37]: 1

In [39]: list(bt2.keys())
Out[39]: [<__main__.Bad at 0x108964b90>]

In [40]: bad = list(bt2.keys())[0]

In [41]: bad in bt2
Out[41]: False

In [42]: del bt2[bad]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)

Hmm, then I tried that same thing again with the C implementation, and it behaves very differently (correctly?):

In [2]: class Bad(object):
...: def __eq__(self, other): return False
...:


In [4]: bt = pickle.loads('ccopy_reg\n__newobj__\np0\n(cBTrees.OOBTree\nOOBTree\np1\ntp2\nRp3\n((((ccopy_reg\n_reconstructor\np4\n(c__main__\nBad\np5\nc__builtin__\nobject\np
...: 6\nNtp7\nRp8\nI42\ntp9\ntp10\ntp11\ntp12\nb.')

In [5]: bt
Out[5]: <BTrees.OOBTree.OOBTree at 0x1053d3e60>

In [6]: list(bt.keys())
Out[6]: [<__main__.Bad at 0x1055fc390>]

In [7]: bad = list(bt.keys())[0]

In [9]: bad in bt
Out[9]: True

In [10]: del bt[bad]

But wait! Checking even further, I find that even in BTrees 4, I can still insert Bad into instances of the C implementation, but I can't into the Python implementation:

In [10]: bt = BTrees.OOBTree.OOBTree()

In [11]: bt[Bad()] = 42

In [14]: bt2 = BTrees.OOBTree.OOBTreePy()

In [15]: bt2[Bad()] = 42
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)

TypeError: Can't use default __cmp__

Maybe this is an extreme and useless example of a broken object when it comes to ordering and testing what happens if we allow such objects into a BTree on unpickling. But it seems safe to say there are at least some discrepancies and some clear bugs in the handling of unorderable objects still.

Jason

David Glick (Glick Software)

unread,
Nov 19, 2016, 9:08:31 PM11/19/16
to Jason Madden, Jim Fulton, zodb
Is the above with BTrees master or a release? I just noticed that you
did some work on the consistency of the two implementations in August,
which hasn't been released.

Jason Madden

unread,
Nov 19, 2016, 9:40:50 PM11/19/16
to David Glick (Glick Software), Jim Fulton, zodb

> On Nov 19, 2016, at 20:08, David Glick (Glick Software) <da...@glicksoftware.com> wrote:
>
> Is the above with BTrees master or a release? I just noticed that you did some work on the consistency of the two implementations in August, which hasn't been released.

That's with the 4.3.1 release. And you're quite right, those changes may make a difference here on the unpickled object, but I don't *think* so for the directly inserting case. I thought those changes had been released but they don't seem to have been (the most recent commit to the repository is about changing a password, which happens to be what the most recent CHANGES entry also is, so I didn't look any further). Sorry for the confusion.

Jason

David Glick (Glick Software)

unread,
Nov 19, 2016, 9:42:33 PM11/19/16
to Jason Madden, Jim Fulton, zodb
Maybe you're looking at a working directory that isn't up to date?
https://github.com/zopefoundation/BTrees/blob/master/CHANGES.rst shows
the comparison fixes after the password change.

Jason Madden

unread,
Nov 19, 2016, 10:23:55 PM11/19/16
to David Glick (Glick Software), Jim Fulton, zodb
> On Nov 19, 2016, at 8:42 PM, David Glick (Glick Software) <da...@glicksoftware.com> wrote:
>
> Maybe you're looking at a working directory that isn't up to date? https://github.com/zopefoundation/BTrees/blob/master/CHANGES.rst shows the comparison fixes after the password change

I meant the changes shown for the release on PyPI. I quickly compared
that to the most recent commit comment in the repo, concluded they
were the same, and decided my changes had been released. Because
apparently checking the date would have been too much work 😀 I'll
repeat the tests with master and report back anything significant.

Jason

David Glick (Glick Software)

unread,
Nov 19, 2016, 11:05:51 PM11/19/16
to Jim Fulton, zodb
I was hoping you might feel that way.

I started implementing the fix and am able to make it ignore the default comparison TypeError during __setstate__. However, even with the state loaded, `__getitem__(None)` and `__delitem__(None)` raise errors, because those operations are checking for keys with default comparison:

>>> t = BTrees.OOBTree.OOBTree()
>>> bucket_state = ((None, 42),)
>>> tree_state = ((bucket_state,),)
>>> t.__setstate__(tree_state)
>>> t[None]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: None
>>> del t[None]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

TypeError: Object has default comparison

So, still not clear how to remove the `None` key to fix the btree. Well, I guess I can at least construct a replacement BTree by filtering the items now:
>>> list(t.items())
[(None, 42)]
>>> cleantree = BTrees.OOBTree.OOBTree([(k, v) for k, v in t.items() if t is not None])

David

Jason Madden

unread,
Nov 20, 2016, 9:40:43 AM11/20/16
to David Glick (Glick Software), Jim Fulton, zodb

> On Nov 19, 2016, at 21:23, Jason Madden <jason....@nextthought.com> wrote:
>
> I'll repeat the tests with master and report back anything significant.

Using master, I get the same results (the Python BTree will unpickle None keys, and None keys can be deleted; keys with broken comparisons like Bad cannot be found or deleted in the Python implementation but can in the C implementation; I can put instances of Bad in C trees but not Python trees).

Aside from the different way the two implementations determine default comparison, the major problematic discrepancy seems to be that the C implementation compares pointers for equality before calling the __eq__/__cmp__ operator (PyObject_RichCompareBool[1]), while the Python code does not.

Jason

[1] https://github.com/python/cpython/blob/2.7/Objects/object.c#L997

Jim Fulton

unread,
Nov 20, 2016, 11:26:02 AM11/20/16
to Jason Madden, Jim Fulton, David Glick, zodb
These are the semantics I would want.

If there are unorderable values in the tree, is that going to cause corruption on adding new keys or removing other keys?

Potentially, Yes. It's a matter of luck.  

We want to protect people from putting themselves in the position.  We don't want to go out of our way to punish them if they're already there.
 

Some such keys may not be removable, at least in extreme cases.

Here I am back in ZODB3 again:

  In [6]: class Bad(object):
   ...:     def __eq__(self, other):
   ...:         return False # Corner case extreme example of broken object
   ...:


  In [8]: bt[Bad()] = 42

I'm surprised you can assign it.  Apparently Bad isn't bad enough. :)
Yup. :)
 
But it seems safe to say there are at least some discrepancies and some clear bugs in the handling of unorderable objects still.

Yup.  Partly I suspect this is the related to the many ways one can express comparison in Python. (Talk about more than one way to do it.)

I think this may also be related to inserting into an empty BTree, where there's noting to compare the new item to and thus no opportunity to fail.  Perhaps, if we're paranoid, we should compare the first item to itself on insert.

In any case, I think the focus for this seat belt should be on insertion.

Jason Madden

unread,
Nov 20, 2016, 12:11:45 PM11/20/16
to Jim Fulton, David Glick, zodb

> On Nov 20, 2016, at 10:25, Jim Fulton <j...@jimfulton.info> wrote:
>
>> But it seems safe to say there are at least some discrepancies and some clear bugs in the handling of unorderable objects still.
>
> Yup. Partly I suspect this is the related to the many ways one can express comparison in Python. (Talk about more than one way to do it.)
>
> I think this may also be related to inserting into an empty BTree, where there's noting to compare the new item to and thus no opportunity to fail. Perhaps, if we're paranoid, we should compare the first item to itself on insert.

You're right, under Python 3, this only happens in an empty tree. Once there are other items, you get "TypeError: unorderable types: Bad() < ..." errors. Under Python 2, of course, no such luck.

> In any case, I think the focus for this seat belt should be on insertion.

I've opened a series of issues in the repository (https://github.com/zopefoundation/BTrees/issues) that I think capture the discussion here. I may be able to come up with some PRs over the next week for them.

One nice takeaway for None keys, at least, is that the Python implementation currently gets it more-or-less right.

Jason

Jim Fulton

unread,
Nov 20, 2016, 12:31:52 PM11/20/16
to Jason Madden, Jim Fulton, David Glick, zodb
On Sun, Nov 20, 2016 at 12:11 PM, Jason Madden <jason....@nextthought.com> wrote:

> On Nov 20, 2016, at 10:25, Jim Fulton <j...@jimfulton.info> wrote:
>
>> But it seems safe to say there are at least some discrepancies and some clear bugs in the handling of unorderable objects still.
>
> Yup.  Partly I suspect this is the related to the many ways one can express comparison in Python. (Talk about more than one way to do it.)
>
> I think this may also be related to inserting into an empty BTree, where there's noting to compare the new item to and thus no opportunity to fail.  Perhaps, if we're paranoid, we should compare the first item to itself on insert.

You're right, under Python 3, this only happens in an empty tree. Once there are other items, you get "TypeError: unorderable types: Bad() < ..." errors. Under Python 2, of course, no such luck.

> In any case, I think the focus for this seat belt should be on insertion.

I've opened a series of issues in the repository (https://github.com/zopefoundation/BTrees/issues) that I think capture the discussion here.

Thanks.
 
I may be able to come up with some PRs over the next week for them.

It looks like David was going to try a fix. David, did this discussion help?

Jim

David Glick (Glick Software)

unread,
Nov 26, 2016, 8:58:59 PM11/26/16
to Jim Fulton, Jason Madden, David Glick, zodb
I finally had time to look at this again today. I've got a branch (check-obj-cmp-on-insert-only) that makes the CPython implementation only do the check on insertion so it's more like the Python implementation. However, trying to delete None as a key still raises "TypeError: unorderable types: NoneType() < NoneType()" in Python 3 (both implementations; we weren't testing the pure-Python implementation on Python 3 except for PyPy3). This is presumably Python itself complaining when trying to search for the bucket. I suppose the workaround is: for the search during delete only, if the keys use default comparison, compare them using a function that mimics Python 2 comparison and thus skips the check for unorderable types. Gaaa...

Jim Fulton

unread,
Nov 27, 2016, 12:43:30 PM11/27/16
to David Glick, Jim Fulton, Jason Madden, David Glick, zodb
On Sat, Nov 26, 2016 at 8:55 PM, David Glick <dgl...@gmail.com> wrote:
On 11/20/16 9:31 AM, Jim Fulton wrote:
I finally had time to look at this again today. I've got a branch (check-obj-cmp-on-insert-only) that makes the CPython implementation only do the check on insertion so it's more like the Python implementation. However, trying to delete None as a key still raises "TypeError: unorderable types: NoneType() < NoneType()" in Python 3 (both implementations; we weren't testing the pure-Python implementation on Python 3 except for PyPy3). This is presumably Python itself complaining when trying to search for the bucket. I suppose the workaround is: for the search during delete only, if the keys use default comparison, compare them using a function that mimics Python 2 comparison and thus skips the check for unorderable types. Gaaa...

Well

a) We don't have a way to use databases created in Python 2 in Python 3 (do we?).

b) It's impossible to insert None as a key in Python 3.

If a & b, then this seems to be a non-issue and the tests should be Python version dependent.

Jim
 

David Glick (Glick Software)

unread,
Nov 27, 2016, 4:43:31 PM11/27/16
to Jim Fulton, David Glick, Jason Madden, zodb
I was thinking the same thing, and I've opened a pull request: https://github.com/zopefoundation/BTrees/pull/54

There is a tool to help with converting databases from Python 2 to Python 3: https://pythonhosted.org/zodb.py3migrate/ which was created at a sprint in Germany last year. It presumably could be enhanced to find and warn about unorderable keys.

Michael Howitz

unread,
Nov 28, 2016, 2:07:05 AM11/28/16
to David Glick (Glick Software), Jim Fulton, David Glick, Jason Madden, zodb
Am 27.11.2016 um 22:43 schrieb David Glick (Glick Software) <da...@glicksoftware.com>:
[…]
> There is a tool to help with converting databases from Python 2 to Python 3: https://pythonhosted.org/zodb.py3migrate/ which was created at a sprint in Germany last year. It presumably could be enhanced to find and warn about unorderable keys.

I have created an issue in zodb.py3migrate: https://github.com/gocept/zodb.py3migrate/issues/7
Feel free to comment there if I misunderstood the result of the discussion.


Mit freundlichen Grüßen
--
Michael Howitz · m...@gocept.com · Software-Entwickler
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · Tel +49 345 1229889-8
Python, Pyramid, Plone, Zope · Beratung und Entwicklung

signature.asc

Jim Fulton

unread,
Jan 5, 2017, 12:03:03 PM1/5/17
to David Glick (Glick Software), Jim Fulton, David Glick, Jason Madden, zodb
On Sun, Nov 27, 2016 at 4:43 PM, David Glick (Glick Software) <da...@glicksoftware.com> wrote:
On 11/27/16 9:43 AM, Jim Fulton wrote:


On Sat, Nov 26, 2016 at 8:55 PM, David Glick <dgl...@gmail.com> wrote:
On 11/20/16 9:31 AM, Jim Fulton wrote:


On Sun, Nov 20, 2016 at 12:11 PM, Jason Madden <jason....@nextthought.com> wrote:

... 
It looks like David was going to try a fix. David, did this discussion help?

I finally had time to look at this again today. I've got a branch (check-obj-cmp-on-insert-only) that makes the CPython implementation only do the check on insertion so it's more like the Python implementation. However, trying to delete None as a key still raises "TypeError: unorderable types: NoneType() < NoneType()" in Python 3 (both implementations; we weren't testing the pure-Python implementation on Python 3 except for PyPy3). This is presumably Python itself complaining when trying to search for the bucket. I suppose the workaround is: for the search during delete only, if the keys use default comparison, compare them using a function that mimics Python 2 comparison and thus skips the check for unorderable types. Gaaa...

Well

a) We don't have a way to use databases created in Python 2 in Python 3 (do we?).

b) It's impossible to insert None as a key in Python 3.

If a & b, then this seems to be a non-issue and the tests should be Python version dependent.

I was thinking the same thing, and I've opened a pull request: https://github.com/zopefoundation/BTrees/pull/54

This got merged a couple of weeks ago.

I just made a release with this fix.  Thanks!

Jim

Jim Fulton

unread,
Jan 5, 2017, 1:01:52 PM1/5/17
to Jim Fulton, David Glick (Glick Software), David Glick, Jason Madden, zodb
So, now the problem will move up through the application stack. :)

And perhaps become harder.

But surely people have built apps with BTrees 4, because BTrees 4 was released a loooooooong time ago (late 2012), and this seatbelt was introduced in BTrees 4.0.

When indexing content, you will very often encounter content without a value set, typically defaulting to None.

When such values are indexed, you'll get an error.  I don't see any guard against this error in, for example, zope.index.

Have people built newer apps with indexing on BTrees 4? If so, how have you dealt with this issue?

Jim

David Glick (Glick Software)

unread,
Jan 5, 2017, 1:11:06 PM1/5/17
to Jim Fulton, Jason Madden, zodb
On 1/5/17 10:01 AM, Jim Fulton wrote:
> So, now the problem will move up through the application stack. :)
>
> And perhaps become harder.
>
> But surely people have built apps with BTrees 4, because BTrees 4 was
> released a loooooooong time ago (late 2012), and this seatbelt was
> introduced in BTrees 4.0.
>
> When indexing content, you will very often encounter content without a
> value set, typically defaulting to None.
>
> When such values are indexed, you'll get an error. I don't see any
> guard against this error in, for example, zope.index.
>
> Have people built newer apps with indexing on BTrees 4? If so, how
> have you dealt with this issue?
>
Thanks for the release.

Products.ZCatalog was updated some time ago to work with BTrees 4. It
explictly checks for None and silently skips indexing it, to avoid
erroring in that scenario:
https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/PluginIndexes/unindex.py#L246

Hanno Schlichting

unread,
Jan 5, 2017, 1:13:11 PM1/5/17
to zo...@googlegroups.com
On Thu, Jan 5, 2017, at 19:01, Jim Fulton wrote:
But surely people have built apps with BTrees 4, because BTrees 4 was released a loooooooong time ago (late 2012), and this seatbelt was introduced in BTrees 4.0.

When indexing content, you will very often encounter content without a value set, typically defaulting to None.

When such values are indexed, you'll get an error.  I don't see any guard against this error in, for example, zope.index.

Have people built newer apps with indexing on BTrees 4? If so, how have you dealt with this issue?

I've encountered this when working on Products.ZCatalog back in 2014. At first I passed on the TypeError, and reraised it to get a more application specific error message (https://github.com/zopefoundation/Products.ZCatalog/commit/c378cdab2fb8997af1a17261458a1528e7131243).

Later on this was deemed to be too cumbersome, so in mid-2016 I've changed this to simply ignore any call that tried to put None values into a ZCatalog Unindex (https://github.com/zopefoundation/Products.ZCatalog/commit/1b078be1e336998f87ecdfbfd04944ebc33d0af7).

Neither ZCatalog nor Zope2 have data migration frameworks, so migrations of existing database is left to each application. I case of Plone, there is a migration framework and while working on that, David ran into this problem.

There's probably a Plone version being released that uses BTrees 4+ and ZCatalog 3.1+ later this or next year.

I think this is all pretty much as expected, as the main feature everyone wants from a new Zope2 version is Python 3 compatibility and that is only very slowing making its way through the stack.

Hanno

Jim Fulton

unread,
Jan 5, 2017, 2:19:08 PM1/5/17
to David Glick (Glick Software), Jim Fulton, Jason Madden, zodb
Thanks David (and Hanno).

Good to know. It's sad that zope.index hasn't gotten a fix.

In Postgres (and I assume other RDBMSs), you can specify how nulls are handled, so null values are still handled.

I wonder if that's something that should be done here as well.  That is, I wonder if indexes should have some special None accommodation.

Jim

David Glick (Glick Software)

unread,
Jan 5, 2017, 2:25:20 PM1/5/17
to Jim Fulton, Jason Madden, zodb
It certainly seems like a valid use case to support querying for items that don't have a value set. That could be done by storing a separate treeset of ids with no value, outside the BTree, of course.

Jim Fulton

unread,
Jan 5, 2017, 2:31:09 PM1/5/17
to David Glick (Glick Software), Jim Fulton, Jason Madden, zodb
The simplest way to do this, would be to add special handling of None in handling object keys, to treat None as always greater than everything but itself.  This would be similar to Postgres' NULLS LAST in CREATE INDEX. (or less than / NULL FIRST).

David Glick (Glick Software)

unread,
Jan 5, 2017, 2:35:31 PM1/5/17
to Jim Fulton, Jason Madden, zodb
Not sure I agree that's simpler than doing nothing, but that does sound nicer than rejecting None just because Python doesn't know how to order it, if we can enforce a reasonable ordering ourselves. Ordering None first would be the way to go if we care about backwards compatibility with how it got ordered in BTrees 3 on Python 2.

Jim Fulton

unread,
Jan 5, 2017, 2:40:27 PM1/5/17
to David Glick (Glick Software), Jim Fulton, Jason Madden, zodb
On Thu, Jan 5, 2017 at 2:35 PM, David Glick (Glick Software) <da...@glicksoftware.com> wrote:
On 1/5/17 11:30 AM, Jim Fulton wrote:


On Thu, Jan 5, 2017 at 2:25 PM, David Glick (Glick Software) <da...@glicksoftware.com> wrote:
On 1/5/17 11:18 AM, Jim Fulton wrote:


On Thu, Jan 5, 2017 at 1:11 PM, David Glick (Glick Software) <da...@glicksoftware.com> wrote:
On 1/5/17 10:01 AM, Jim Fulton wrote:
So, now the problem will move up through the application stack. :)

And perhaps become harder.

But surely people have built apps with BTrees 4, because BTrees 4 was released a loooooooong time ago (late 2012), and this seatbelt was introduced in BTrees 4.0.

When indexing content, you will very often encounter content without a value set, typically defaulting to None.

When such values are indexed, you'll get an error.  I don't see any guard against this error in, for example, zope.index.

Have people built newer apps with indexing on BTrees 4? If so, how have you dealt with this issue?

Thanks for the release.

Products.ZCatalog was updated some time ago to work with BTrees 4. It explictly checks for None and silently skips indexing it, to avoid erroring in that scenario: https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/PluginIndexes/unindex.py#L246


Thanks David (and Hanno).

Good to know. It's sad that zope.index hasn't gotten a fix.

In Postgres (and I assume other RDBMSs), you can specify how nulls are handled, so null values are still handled.

I wonder if that's something that should be done here as well.  That is, I wonder if indexes should have some special None accommodation.

It certainly seems like a valid use case to support querying for items that don't have a value set. That could be done by storing a separate treeset of ids with no value, outside the BTree, of course.

The simplest way to do this, would be to add special handling of None in handling object keys, to treat None as always greater than everything but itself.  This would be similar to Postgres' NULLS LAST in CREATE INDEX. (or less than / NULL FIRST).
Not sure I agree that's simpler than doing nothing,

I didn't consider doing nothing a solution to the problem of indexing nulls. Of course, we culd choose to ignore that use case. 
 
but that does sound nicer than rejecting None just because Python doesn't know how to order it,

Yup. I feel a bit bad about my tunnel vision on this, just thinking about the "default comparison" bug magnet and not thinking about the higher level concern of dealing with "null" (None) values at the indexing level.
 
if we can enforce a reasonable ordering ourselves. Ordering None first would be the way to go if we care about backwards compatibility with how it got ordered in BTrees 3 on Python 2.

Ooops, yeah. Good point.
 
Jim

Jason Madden

unread,
Jan 5, 2017, 2:56:37 PM1/5/17
to Jim Fulton, zodb

> On 1/5/17 10:01 AM, Jim Fulton wrote:
>
> But surely people have built apps with BTrees 4, because BTrees 4 was released a loooooooong time ago (late 2012), and this seatbelt was introduced in BTrees 4.0.
>
> When indexing content, you will very often encounter content without a value set, typically defaulting to None.
>
> When such values are indexed, you'll get an error. I don't see any guard against this error in, for example, zope.index.
>
> Have people built newer apps with indexing on BTrees 4? If so, how have you dealt with this issue?

We've built some largish new applications beginning with BTrees 4 and using zc.catalog/zope.catalog/zope.index.

Our solution to the problem was to subclass Catalog and override the relevant methods where objects are added/updated in indexes and catch-and-ignore this TypeError[1]. Overall it wasn't really that onerous and was easy to fix once it cropped up, though it would be nice if it was handled out of the box. (Of course we wound up needing to sometimes handle other edge case exceptions for legacy reasons--yes, even in a new application; testers can get eventually very attached to their large, old, test DBs that have had botched migrations and botched distributed GCs---so we'd still have had to subclass eventually.)

Jason

[1] Mainly this means updateIndex. This is called from the IObjectAdded subscriber, and typically our objects have contracts that prevent a field from becoming None once's its been created, so it wasn't necessary to override index_doc which is called from the IObjectModified subscriber. YMMV, of course.

Jim Fulton

unread,
Jan 10, 2017, 12:09:51 PM1/10/17
to Jim Fulton, David Glick (Glick Software), Jason Madden, zodb
I'm going to work on PR to special case None, treating it as less than anything but itself.

Jim

David Glick (Glick Software)

unread,
Jan 10, 2017, 12:13:45 PM1/10/17
to Jim Fulton, Jason Madden, zodb
Yay!

One nice side effect of this is that I think it will allow us to use ZODB 4 with existing releases of Plone rather than waiting for a new major release. (Maybe ZODB 5 too, but I haven't gotten as far as evaluating it yet, and I know for 5.1 we'll need to fix Zope's transaction notes.)

David

Jim Fulton

unread,
Jan 11, 2017, 5:34:00 PM1/11/17
to Jim Fulton, David Glick (Glick Software), Jason Madden, zodb
On Tue, Jan 10, 2017 at 12:09 PM, Jim Fulton <j...@jimfulton.info> wrote:
I'm going to work on PR to special case None, treating it as less than anything but itself.

Done and released as BTrees 4.4.0.

Thanks David for the excellent review!

Jim

Bill Janssen

unread,
Jan 24, 2017, 3:23:44 PM1/24/17
to zodb, j...@jimfulton.info, da...@glicksoftware.com, jason....@nextthought.com
Jim, there's a junk file in BTrees-4.4.0 (also in 4.3.2, I see) which is causing me problems with Windows packaging:  #BTreeTemplate.c#.

Bill

Jim Fulton

unread,
Jan 24, 2017, 3:28:07 PM1/24/17
to Bill Janssen, zodb, Jim Fulton, David Glick (Glick Software), Jason Madden
On Tue, Jan 24, 2017 at 3:23 PM, Bill Janssen <bill.j...@gmail.com> wrote:
Jim, there's a junk file in BTrees-4.4.0 (also in 4.3.2, I see) which is causing me problems with Windows packaging:  #BTreeTemplate.c#.

Gaaa. Someone configured packaging to include everything and then added a (inevitably incomplete) blacklist.

Thanks. I'll make a 4.4.1 release that fixes this.

Jim

Bill Janssen

unread,
Jan 24, 2017, 3:55:17 PM1/24/17
to zodb, bill.j...@gmail.com, j...@jimfulton.info, da...@glicksoftware.com, jason....@nextthought.com
Also saw a lot of junk files in the ZEO 5.0.4 tar file.  A couple of checkpoint files, and a lot of Emacs backup ~ files.

Bill

Jim Fulton

unread,
Jan 24, 2017, 4:04:31 PM1/24/17
to Bill Janssen, zodb, Jim Fulton, David Glick (Glick Software), Jason Madden
On Tue, Jan 24, 2017 at 3:55 PM, Bill Janssen <bill.j...@gmail.com> wrote:
Also saw a lot of junk files in the ZEO 5.0.4 tar file.  A couple of checkpoint files, and a lot of Emacs backup ~ files.

Yeah, this is a pattern.  I'm trying to unscrew this but distutils is an evil black art.

Jim

 

Bill

On Tuesday, January 24, 2017 at 12:28:07 PM UTC-8, Jim Fulton wrote:


On Tue, Jan 24, 2017 at 3:23 PM, Bill Janssen <bill.j...@gmail.com> wrote:
Jim, there's a junk file in BTrees-4.4.0 (also in 4.3.2, I see) which is causing me problems with Windows packaging:  #BTreeTemplate.c#.

Gaaa. Someone configured packaging to include everything and then added a (inevitably incomplete) blacklist.

Thanks. I'll make a 4.4.1 release that fixes this.

Jim

--

--
You received this message because you are subscribed to the Google Groups "zodb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zodb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim Fulton

unread,
Jan 24, 2017, 4:14:25 PM1/24/17
to Bill Janssen, zodb, Jim Fulton, David Glick (Glick Software), Jason Madden
On Tue, Jan 24, 2017 at 3:23 PM, Bill Janssen <bill.j...@gmail.com> wrote:
Jim, there's a junk file in BTrees-4.4.0 (also in 4.3.2, I see) which is causing me problems with Windows packaging:  #BTreeTemplate.c#.

I've released 4.4.1, which should be emacs junk file free.

Jim

 

--
You received this message because you are subscribed to the Google Groups "zodb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zodb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marius Gedminas

unread,
Jan 25, 2017, 3:28:56 AM1/25/17
to zodb
On Tue, Jan 24, 2017 at 04:04:10PM -0500, Jim Fulton wrote:
> On Tue, Jan 24, 2017 at 3:55 PM, Bill Janssen <[1]bill.j...@gmail.com>
> wrote:
>
> Also saw a lot of junk files in the ZEO 5.0.4 tar file.  A couple of
> checkpoint files, and a lot of Emacs backup ~ files.
>
>
> Yeah, this is a pattern.  I'm trying to unscrew this but distutils is an evil
> black art.

https://pypi.python.org/pypi/check-manifest was made for this purpose:
to make sure sdists don't contain junk (and don't omit real files).

https://pypi.python.org/pypi/zest.releaser runs it by default, if both
are installed and you run `fullrelease`.

Cheers!
Marius Gedminas
--
The citizens of classical Athens used to vote by inscribing the name of their
favored candidate onto a bullet. They would then all gather together in the
public square, shout “Υι-χα”, and fire into the air. Election officials would
tally the fallen bullets to determine the winner.
-- agrumer's reaction to "guns are a necessary precondition for democracy"
signature.asc
Reply all
Reply to author
Forward
0 new messages