Changes to ORM expressions API

143 views
Skip to first unread message

Anssi Kääriäinen

unread,
Jun 18, 2014, 2:49:13 AM6/18/14
to django-d...@googlegroups.com
This is a heads-up post about some planned changes to the ORM and specifically to the expressions API. This affects how the following features work inside the ORM:
  - F-expressions (and other ExpressionNode subclasses)
  - aggregates
  - anything using SQLEvaluator (django.db.models.sql.expressions)

While the changes target private APIs, these APIs have remained stable for a long time. I expect there to be significant amount of users of the above APIs. The main concern is that the planned changes will break existing code. I am looking for feedback from 3rd party library developers - does the planned changes break existing code for you, and if yes, why? If we find some common cases we might be able to add backwards compatibility code paths for those cases.

There are two main reasons for doing the changes. First, the change allows for a lot of nice new features - doing conditional aggregates, aggregates using expressions and writing custom expressions, all this using public APIs. The second reason is that the current coding is somewhat complex, and that complexity makes it hard to write custom aggregates or expressions.

Currently the expressions and aggregates are built up from two classes. The first one is public facing API (for example Sum in django.db.models.aggregates), the second is how the public facing API is executed in the ORM (Sum in django.db.models.sql.aggregates). The idea is that we have one public facing component users should use for different queries. Then different Query implementations can have different implementation classes. Thus the same public facing class can be executed in different ways depending on the used Query class. Unfortunately this leads to cases where it is hard to extend expressions or aggregates - while it is easy to add a new public facing API class, it isn't easy to add an implementation for that class - the implementation belongs to the used Query class, but that class isn't under user control.

In addition to the extensibility problem the current implementation is somewhat complex to follow. Still, aggregation implementation doesn't share code with expressions, but after all expressions are just a special kind of expression.

The new way is simplified - there is just public facing classes. The classes know how to implement themselves. The new expressions know how to add themselves to the query, and they know how to generate a query against different database backends. Different database backends are handled with as_vendorname() syntax. Aggregates are a subclass of certain kind of expression (Func class), so aggregates use the same code paths as other expressions. The end results is simplified code, ability to use Sum('foo') + Sum('bar') style aggregations, and the ability to write new expressions and aggregates using a public stable API.

A patch exists that implements the new way. It is written by Josh Smeaton. The patch also implements a way to annotate non-aggregates to queries (.annotate(Coalesce(F('foo'), F('bar'))). The patch can be used as basis for other improvements to the ORM, for example the ability to queries like .order_by(Lower('somecol').desc()) has been discussed on this list recently.

The only big problem with introducing the new way is backwards compatibility. The current coding is implemented the way it is because the aim was allowing writing different kinds  of backends (NoSQL). The NoSQLQuery would just need to contain different implementation class than the normal Query class had, and then you could do whatever you want. I claim that it is possible to do the exact same thing with addition of a rewriter to the NoSQLQuery class - it inspects the new-style classes, and creates different implementation classes on the fly.

The bigger problem seems to be existing 3rd party aggregates and expressions - while technically we are changing only private APIs, I don't see it as a good idea to break existing code if we can avoid that.

I have written a bit about this also in DEP format, but ran out of interest of writing a DEP as the DEP process doesn't seem to be doing that well. This seems like a good candidate for DEP, but before I start finishing the DEP there must be some guarantees that we have a working DEP process to handle it. I want to avoid the situation where this feature is stalled because of the DEP process. You can see the half-written DEP at https://github.com/akaariai/deps/blob/master/drafts/expressions.rst. The most interesting part is about the current implementation.

The most important thing now is to find backwards incompatibilities caused by the planned change. So, if you depend on the current implementation of expressions, aggregates or SQLEvaluator, please check if the new way breaks your code. If so, report that in ticket #14030, and lets see if there is something we can do to help ease the transition. Of course, other feedback is also welcome.

 - Anssi

Josh Smeaton

unread,
Jun 18, 2014, 8:31:55 PM6/18/14
to django-d...@googlegroups.com
manfre has raised some issues about registering implementations in IRC. I had been assuming that AppConfigs could be used for backends to register their changes, but that would require users to modify their INSTALLED_APPS for that to work. Otherwise, backends never really get an opportunity to run start up code. That is obviously not a good approach to take.

Perhaps we could implement a setup hook for each of the backends where startup code could be run. Maybe this could even be an implicit way of registering an AppConfig for the backend? I'm not familiar enough with AppLoading to know if an implicit AppConfig is a good idea or not - but I assume it probably isn't.

As a concept though, would a setup hook be useful to backends? It would allow patching of aggregates and functions for this refactor, but would it find other uses?

Josh

Michael Manfre

unread,
Jun 18, 2014, 9:51:44 PM6/18/14
to django-d...@googlegroups.com
After spending a bit more time looking over the patch, I think the best and most logical place for a database backend to hook in to the as_vendor override is in SQLCompiler.compile, which is where Django checks the node for the vendor specific override and then calls it.

Database backends are already required to use what is deemed internal API, which includes SQLCompiler, so that is the most logical place for us to make the changes we need. Projects and 3rd party apps would still use the public API (either register or monkey patch). This also avoids the need to try to make the public API work for two conflicting needs.

Regards,
Michael Manfre


 


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7efa709d-5b4c-435d-a6db-f66e09fe23d7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Anssi Kääriäinen

unread,
Jun 27, 2014, 7:11:26 AM6/27/14
to django-d...@googlegroups.com
It seems the biggest worry right now is how 3rd party backends and users can alter the generated query string per backend. The current way is the same used for Lookups and Transforms, and those are in 1.7. So, I don't think we need to invent some other way for expressions. And, as Michael Manfre said 3rd party backends can do this in custom SQLCompiler. Adding an official hook instead of overriding SQLCompiler per backend should likely be done (see https://groups.google.com/d/msg/django-developers/HggiPzwkono/lP0qygZv_lQJ).

As for private API changes, there have been no complaints. I take this as a sign that we can proceed with the planned changes. Past experience tells me that when we get to alpha and beta stage for 1.8 we will get complaints about the breakage. When those are reported we can check if there are some common patterns we want to support or at least offer an easy upgrade path.

I wonder if we need to have a DEP for this? If not, any other objections/questions/comments?

 - Anssi
Reply all
Reply to author
Forward
0 new messages