This is a heads-up post about some planned changes to the ORM and specifically to the expressions API. This affects how the following features work inside the ORM:
- F-expressions (and other ExpressionNode subclasses)
- aggregates
- anything using SQLEvaluator (django.db.models.sql.expressions)
While the changes target private APIs, these APIs have remained stable for a long time. I expect there to be significant amount of users of the above APIs. The main concern is that the planned changes will break existing code. I am looking for feedback from 3rd party library developers - does the planned changes break existing code for you, and if yes, why? If we find some common cases we might be able to add backwards compatibility code paths for those cases.
There are two main reasons for doing the changes. First, the change allows for a lot of nice new features - doing conditional aggregates, aggregates using expressions and writing custom expressions, all this using public APIs. The second reason is that the current coding is somewhat complex, and that complexity makes it hard to write custom aggregates or expressions.
Currently the expressions and aggregates are built up from two classes. The first one is public facing API (for example Sum in django.db.models.aggregates), the second is how the public facing API is executed in the ORM (Sum in django.db.models.sql.aggregates). The idea is that we have one public facing component users should use for different queries. Then different Query implementations can have different implementation classes. Thus the same public facing class can be executed in different ways depending on the used Query class. Unfortunately this leads to cases where it is hard to extend expressions or aggregates - while it is easy to add a new public facing API class, it isn't easy to add an implementation for that class - the implementation belongs to the used Query class, but that class isn't under user control.
In addition to the extensibility problem the current implementation is somewhat complex to follow. Still, aggregation implementation doesn't share code with expressions, but after all expressions are just a special kind of expression.
The new way is simplified - there is just public facing classes. The classes know how to implement themselves. The new expressions know how to add themselves to the query, and they know how to generate a query against different database backends. Different database backends are handled with as_vendorname() syntax. Aggregates are a subclass of certain kind of expression (Func class), so aggregates use the same code paths as other expressions. The end results is simplified code, ability to use Sum('foo') + Sum('bar') style aggregations, and the ability to write new expressions and aggregates using a public stable API.
A patch exists that implements the new way. It is written by Josh Smeaton. The patch also implements a way to annotate non-aggregates to queries (.annotate(Coalesce(F('foo'), F('bar'))). The patch can be used as basis for other improvements to the ORM, for example the ability to queries like .order_by(Lower('somecol').desc()) has been discussed on this list recently.
The only big problem with introducing the new way is backwards compatibility. The current coding is implemented the way it is because the aim was allowing writing different kinds of backends (NoSQL). The NoSQLQuery would just need to contain different implementation class than the normal Query class had, and then you could do whatever you want. I claim that it is possible to do the exact same thing with addition of a rewriter to the NoSQLQuery class - it inspects the new-style classes, and creates different implementation classes on the fly.
The bigger problem seems to be existing 3rd party aggregates and expressions - while technically we are changing only private APIs, I don't see it as a good idea to break existing code if we can avoid that.
I have written a bit about this also in DEP format, but ran out of interest of writing a DEP as the DEP process doesn't seem to be doing that well. This seems like a good candidate for DEP, but before I start finishing the DEP there must be some guarantees that we have a working DEP process to handle it. I want to avoid the situation where this feature is stalled because of the DEP process. You can see the half-written DEP at
https://github.com/akaariai/deps/blob/master/drafts/expressions.rst. The most interesting part is about the current implementation.
The most important thing now is to find backwards incompatibilities caused by the planned change. So, if you depend on the current implementation of expressions, aggregates or SQLEvaluator, please check if the
new way breaks your code. If so, report that in ticket #14030, and lets
see if there is something we can do to help ease the transition. Of course, other feedback
is also welcome.
- Anssi