ORM refactoring, part 2

276 views
Skip to first unread message

Anssi Kääriäinen

unread,
Aug 25, 2012, 3:35:02 PM8/25/12
to Django developers
I have done some more ORM refactoring work. I thought it would be a
good idea to post a summary of what is going on.

First, I haven't committed the utils.tree refactoring patch I was
planning to commit [https://github.com/akaariai/django/commits/
refactor_utils_tree]. The reason is that I now feel that add_filter()
and deeper levels of the Query class need some refactorings first.

There is some work going on in the refactoring of the deeper levels of
the ORM. I committed join promotion improvement patch today: [https://
code.djangoproject.com/ticket/16715#comment:25]. I have more planned
improvements to commit:


* Remove "dupe avoidance" logic from the ORM (#18748).
This is removing some code I strongly believe isn't needed any more.
This is still pending Malcolm's review.

* Fix join promotion in disjunctive filters (#18854).
This one should make join promotion in disjunction cases cleaner and
more efficient (as in less OUTER JOINS). In addition, this will make
add_q/add_filter a bit easier to understand. The problem currently is
that the first ORed filter is added to the query with connector=AND,
the rest of the filters in the disjunction with connector=OR. This is
done for join promotion reasons. However, this is very confusing when
trying to work with add_q and add_filter, and doesn't actually work
that well (see the added tests in the ticket's patch).

* Remove "trim" argument from add_filter (#18816).
The trim argument is only needed for split_exclude (that is, pushing
negated m2m filters to subqueries). So, add_filter (and then also
trim_joins) needs to handle split_exclude's special case. Handling
this case inside split_exclude() turned out to be a little ugly, but
IMO less ugly than the trim flag. This also allows further cleanups in
the following item.

* A biggie: complete refactoring of setup_joins (#10790, a separate
ticket could be good for this).
Best described in the last post in the ticket [https://
code.djangoproject.com/ticket/10790#comment:41]. The patch isn't
completely ready (there are still stale comments floating around, the
commit history is ugly).


Applying the above patches should make the ORM code easier to follow.
Further features like custom lookups should be easier to add. The
produced queries should be of higher quality (less joins, less left
joins). And, it should be easier to do cleanups in the ORM.

A note about extra_filters: The setup_joins() refactoring removes the
"extra_filters" system used for generic relations. The system adds new
filters to the query for any joins generated by generic relations.
However, these are pushed into the WHERE clause, and this doesn't work
nicely with LOUTER JOINS. The refactoring adds the extra condition
directly into the join's ON clause, thus producing correct left outer
join conditions.

The extra_filters is (as far as I know) private API, but I'd like to
know if this is heavily used by the community. If so, it is easy
enough to leave this API in place (while still fixing the generic
relations stuff).

I hope I can get reviews for the above tickets. Getting reviews from
people who do not know the ORM is valuable, too, as one of the goals
is to make the ORM easier to understand. As the author I can't easily
see if my attempts to make the code easier to follow actually improve
anything.

Even if I do not get any reviews, I think it is a good idea to push
these patches in. Thus far it has been hard to get reviews for larger
ORM patches, and I am afraid that the refactoring work will stall if I
have to wait for a full review for every patch. If you want to review
the patches, but don't have time just now, please make a note in the
tickets about this. There is no hurry.

If pushing these patches without reviews seems like a bad idea to you,
then please say so (preferably before I commit anything)...

I am sorry if I haven't worked on other patches I thought I had time
to work on. The core ORM refactorings are IMO really important to work
on, and thus they have bypassed some other items in my admittedly too
long TODO list.

- Anssi

Alex Gaynor

unread,
Aug 25, 2012, 3:57:50 PM8/25/12
to django-d...@googlegroups.com

 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Anssi,

Thanks for all your awesome work on this, of the patches I've reviewed so far they've been really good. I don't think I'll have time to review until DjangoCon, so feel free to either commit now or wait, whichever you'd prefer!

Alex


--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero

Luke Plant

unread,
Aug 27, 2012, 9:06:42 AM8/27/12
to django-d...@googlegroups.com
Anssi,

I just got back from holiday. I hope to be able to review at least some
of these patches within the next 2 weeks. If you don't hear from me in
that time, I'd encourage you to carry on anyway.

Thanks,

Luke
--
Sometimes I wonder if men and women really suit each other. Perhaps
they should live next door and just visit now and then. (Katherine
Hepburn)

Luke Plant || http://lukeplant.me.uk/

Anssi Kääriäinen

unread,
Sep 25, 2012, 4:23:48 AM9/25/12
to Django developers
It seems it is time to push these patches in.

The only question I have is if I should do that pre-feature freeze or
is it enough to do that early next month? These aren't new features,
but there are some changes to internal APIs I suspect are used at
least somewhat outside of django-core. Namely query.setup_joins(),
query.join() and add_filter() APIs will have some changes to them
and .extra_filters() for fields will be gone.

The #10790 one is a big change to how setup_joins() work, and I
guarantee there will be some regressions. I am committed to fixing
what I break.

I'd like to postpone these to early next month so that I have more
time to help in reviews and pushing some new features in. Other
options are postponing review work, and postponing these patches to
1.6.

- Anssi

Jacob Kaplan-Moss

unread,
Sep 25, 2012, 10:25:39 AM9/25/12
to django-d...@googlegroups.com
On Tue, Sep 25, 2012 at 3:23 AM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> I'd like to postpone these to early next month so that I have more
> time to help in reviews and pushing some new features in. Other
> options are postponing review work, and postponing these patches to
> 1.6.

As long as the changes are in before the beta I'm fine with it. We
need to give some time for people who're relying on those internals
(other db backends, perhaps?) to update their code, but since they're
internals and not public APIs we can relax the feature-freeze deadline
a bit.

So as long as you can get it done before the beta, have at it.

Jacob

Alex Gaynor

unread,
Sep 25, 2012, 10:48:35 AM9/25/12
to django-d...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.


My view is these things are 100% undocumented, 100% internal, anyone using them is 100% on their on. Simply put, if we can't make changes to these APIs without having to worry, what's the point in having a backwards compatibility policy?

Jacob Kaplan-Moss

unread,
Sep 25, 2012, 12:16:19 PM9/25/12
to django-d...@googlegroups.com
On Tue, Sep 25, 2012 at 9:48 AM, Alex Gaynor <alex....@gmail.com> wrote:
> My view is these things are 100% undocumented, 100% internal, anyone using
> them is 100% on their on. Simply put, if we can't make changes to these APIs
> without having to worry, what's the point in having a backwards
> compatibility policy?

Oh, we totally can make changes, and we're pretty clear about what's
supported and what's not. But given the choice, if we can not be
assholes about those changes that's generally a good thing. Being nice
to users is generally thought of as a good thing :)

Jacob

Jonas Obrist

unread,
Sep 26, 2012, 9:48:08 AM9/26/12
to django-d...@googlegroups.com
This thread caught my interest (because of potential implications for django-hvad).

While the proposed changes break django-hvad, I've already written a simple patch for django-hvad, so this is a non-issue. (The problem was the cleanup of Query.where. master and 1.4 had lots of calls to start_subtree which added WhereNodes into Query.where.children, those children are now tuples which broke our code. However the tuples make a lot more sense).

Regarding the actual patch, I like the cleanup in add_q, it's a lot nicer to read now (although needs_having is a bit crazy).
Reply all
Reply to author
Forward
0 new messages