Feel free to test queryset-refactor branch

19 views
Skip to first unread message

Malcolm Tredinnick

unread,
Apr 13, 2008, 7:23:34 AM4/13/08
to django...@googlegroups.com
We're getting pretty close to merging queryset-refactor into trunk and
would like to do this as soon as practical. There are still a couple of
enhancements to add (#5420, mostly), one bug to fix (#5937) and some
internal tweaking to do, but all the main stuff is ready to be used.

So if anybody wants to test it out, go ahead. Read the wiki page[1] if
you've got code that does any slightly unusual stuff, but for existing
code that works on trunk, there shouldn't be any real changes required.

[1] http://code.djangoproject.com/wiki/QuerysetRefactorBranch

File any bug reports in Trac against the queryset-refactor
"version" (please do NOT put the qs-rf keyword on the ticket; I'm using
that for other purposes). Bug reports that are regressions from existing
functionality are more interesting and important at the moment than
feature enhancements to the new features, since the latter case can be
dealt with at our leisure (they're not features that people are already
relying upon).

If you see any different results testing against the branch compared to
trunk, it would be interesting to know about them. Reduce it to a small
example before opening a ticket, wherever possible. Please don't make me
wade through dozens of lines of code just to get to one query that is
relevant. Bear in mind, though, that the difference could be because a
bug existed in trunk and the branch is now giving the correct result. So
make sure your test case is valid (even I got bitten by that in a
project I wrote).

Regards,
Malcolm

--
The sooner you fall behind, the more time you'll have to catch up.
http://www.pointy-stick.com/blog/

Julien

unread,
Apr 13, 2008, 7:51:52 AM4/13/08
to Django users
Hi Malcolm,

I've been using the branch on a project in development for a few weeks
now, and haven't come across any issue yet - although I can't say I've
pushed it to its limits.

Glad to hear it's close to be merged to trunk.

Thanks so much for this massive contribution!

Best,

Julien

On Apr 13, 9:23 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Ivan Illarionov

unread,
Apr 13, 2008, 9:08:59 AM4/13/08
to Django users
Glad to hear that queryset-refactor is almost ready.

Currently, I noticed that there are few SQL portability issues and few
old queryset API issues (eg in admin). I already filed #6956 and
#6957. I do some heavy testing of this branch and I will report
anything that goes wrong.

Regards,
--
Ivan

On Apr 13, 3:23 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

mrts

unread,
Apr 14, 2008, 10:12:16 AM4/14/08
to Django users
Big cheers and big thanks!

Has anyone tried merging qs-rf and nf-admin already?

MS

Justin Fagnani

unread,
Apr 17, 2008, 5:53:59 PM4/17/08
to django...@googlegroups.com
Hey Malcolm,

I've been using qs-rf for a while now with basically no problems. Excellent work.

There's one thing that may a little odd that I stumbled on while trying to get some primitive polymorphism working:

The first thing is that there seems to be no way to tell if an instance of a parent class has a child instance without trying the child reference and catching the DoesNotExist exception. For a class with multiple subclasses, this is a cumbersome, so I've been adding a _type field to parent classes that gets set in save() of the subclasses.

Is there a better way to do this, or is this something that could be included? I know there's no way to determine whether or not a class will be subclassed in the future, so I wouldn't be surprised if the answer is no. But maybe there should be a documented pattern.

The odd part is what happens with the child reference. parent.child obviously works as expected, and returns either an instance of Child or raises DoesNotExist. But for an instance of Child, .child always returns a reference to itself, so that c.child == c is always True. This makes sense on one hand, because c is also an instance of Parent, but on the other, Child doesn't have a subclass, so should .child be None?

I haven't actually encountered this in any real life situation, because it's hard to end up with collection in Django where you have a mix of parent and child instances, so maybe it'll never be a problem.

One additional thing is that in one case, I know which subclasses I'm interested in, and it'd be great to have a way to specify that a queryset should return polymorphic results by specifying the subclasses for the join. Something like:

Parent.objects.all().select_subclasses('Child1','Child2')

Cheers and thanks,
  Justin

scott lewis

unread,
Apr 17, 2008, 6:34:01 PM4/17/08
to django...@googlegroups.com


This is a dirty hack, but it came in handy for me...

If you add this method to your parent class:

def canonical(self):
attr_name = '%s_ptr' % self._meta.module_name
children_fields = [r.get_accessor_name() for r in
self._meta.get_all_related_objects() if r.field.name == attr_name]
for f in children_fields:
try:
return getattr(self, f).canonical()
except models.ObjectDoesNotExist:
pass
return self

You can then convert a queryset to a list of child classes:

child_classes = [c.canonical() for c in Parent.objects.all()]

Basically, canonical() tries to grab a list of descendant classes,
then cycles through those until it finds one that exists. If it can't
find an instance of a descendant class, it just hands back the parent
since that's what you have. It's also recursive so it will traverse n-
levels of inheritance.


scott.

Malcolm Tredinnick

unread,
Apr 17, 2008, 11:44:57 PM4/17/08
to django...@googlegroups.com

On Thu, 2008-04-17 at 14:53 -0700, Justin Fagnani wrote:
[...]

> The first thing is that there seems to be no way to tell if an
> instance of a parent class has a child instance without trying the
> child reference and catching the DoesNotExist exception. For a class
> with multiple subclasses, this is a cumbersome, so I've been adding a
> _type field to parent classes that gets set in save() of the
> subclasses.

That's a reasonable way to do it.

Another way is to have an extra table that maps object id and content
type to "most derived content type" or something. If you had third-party
models you were subclassing, that isn't too hard to implement either.

> Is there a better way to do this, or is this something that could be
> included? I know there's no way to determine whether or not a class
> will be subclassed in the future, so I wouldn't be surprised if the
> answer is no.

That's right. There's no way to tell if something's going to be
subclassed and we don't want to change any third-party database tables
(a design feature is that you *must* be able to subclass third-party
models transparently). Also, it's quite fiddly to keep such fields up to
date if you think about the manipulation required for A subclassed by B
subclassed by C. You end up with type fields everywhere.

> But maybe there should be a documented pattern.

I'll add something. It seems kind of obvious, though, and given that it
isn't a particularly common pattern to query the parents and descend to
the children (if you care about the differences, you'll usually be
querying the children directly; if you care about the common stuff, it's
all on the parent), I'd kind of hope people needing this already had the
skills to connect A to B.

A couple of sentences won't confuse things too much, though. We can fix
that.

> The odd part is what happens with the child reference. parent.child
> obviously works as expected, and returns either an instance of Child
> or raises DoesNotExist. But for an instance of Child, .child always
> returns a reference to itself, so that c.child == c is always True.
> This makes sense on one hand, because c is also an instance of Parent,
> but on the other, Child doesn't have a subclass, so should .child be
> None?

Hmm, hadn't noticed that, although it's not too surprising. My first
reaction is "well, don't do that."

It's quite possibly fiddly to fix, since reverse relations (which is
what the "child" attribute is) should be transparently accessible on any
child class even when they exist on the parent. I think I tried to
prevent the "traversing down to yourself" case at one point and it trips
up when you have multiple subclasses. The structure in the
tests/model_inheritance/models.py test file is a bit of a medium-level
stress test for this behaviour, since multiple unrelated things inherit
from Place and Restaurant (and are related to them and each other) and
there are more twisted cases out there, too. They're not in the test
because they rapidly become almost opaque to comprehension and we try to
keep the tests reasonably clean.

It's probably overkill to add in lots of extra processing to avoid the
case of "things on the path that lead to myself, but don't stop too soon
in the hierarchy" when it's probably easier just to not access that
attribute in the code.

The hard cases are always multi-layer hierarchies (trees of inheritance,
not just linear or dual-level cases). I'll have another look now that
it's been a while since I last look at it and maybe there's an easy fix,
but try to avoid doing that. It doesn't make sense.

> I haven't actually encountered this in any real life situation,
> because it's hard to end up with collection in Django where you have a
> mix of parent and child instances, so maybe it'll never be a problem.

Indeed (hopefully).

We're not actually just making this up as we go along and I suspect
you'll find that the cases that are harder are also relatively rare.
Quite a few of the people involved in the design of this stuff in Django
(particularly 18 months or so ago when we were doing some heavy lifting
in the design are), including myself, have a fair bit of experience with
other OO and inheritance-based systems such as C++, Java, relational
database design and CORBA interfaces, in addition to Python. Duck
typing is handy quite often in Python, but inheritance at the data layer
level doesn't seem to be one of them. Every situation we could come up
with, or knew about from experience is possible and I hope we've found a
nice middle-ground in order to make the common stuff easy and the rarer
stuff possible (of course, one man's common is another man's rare, but
that's life in a crowded space).

> One additional thing is that in one case, I know which subclasses I'm
> interested in, and it'd be great to have a way to specify that a
> queryset should return polymorphic results by specifying the
> subclasses for the join. Something like:
>
>
> Parent.objects.all().select_subclasses('Child1','Child2')

Maybe one day.

It's not significantly less faster to just create two querysets in this
case and then merge the two iterators to get a combined iterator of the
results (if you care about them being ordered properly, merge sort is
your fiend -- otherwise just use itertools.chain).

Combining disparate sets of output columns into the appropriate models
is quite a bit of extra processing and querysets are already not the
fastest things in the world. Adding bells and whistle features like this
that are always on the common processing path -- every time we construct
some results we need to check if special classes have been specified and
act accordingly -- when it's only a few lines of Python to do it in your
own code might not be worth it.

Regards,
Malcolm

--
I just got lost in thought. It was unfamiliar territory.
http://www.pointy-stick.com/blog/

Chris Hoeppner

unread,
Apr 18, 2008, 9:05:36 AM4/18/08
to django...@googlegroups.com
I wonder if anyone has tried coming up with some sort of wannabe-backend
for the GAE Datastore?

~ Chris

Reply all
Reply to author
Forward
0 new messages