I'm interested in tackling ticket#14030, ideally before the Jan 20th cutoff, but I'm looking for some guidance. There have been a lot of references to Anssi Kääriäinen attempting an ORM refactor before trying to fix aggregates, but I'm not sure if this is still a goal or not.
The work done by nateb at https://github.com/django/django/pull/46 was very similar to the structure I believed would be the right way forward - refactoring Aggregates to inherit from ExpressionNode. Unfortunately, the patch submitted has been closed because it no longer applies cleanly. There was also discussion regarding type coercion and a lack of real world tests that should be considered.What I'd like to do is attempt to rewrite the above PR so that it applies cleanly. The reviews that had already happened on that patch would then mostly apply to the rewrite, eliminating the need for a start to finish review of the concept in general. Is this something that I should go through with, or are we better off waiting for the ORM refactor then reevaluating the way forward?
So I've gone ahead and tried to remove SQLEvaluator by pushing its functions down into ExpressionNode. For the most part, it's working well. I rebased on akaariai/lookups_3 to make sure that any changes to lookups would be reflected in this change as they are *somewhat* related. The diff between lookups and my patch is below:There is still a lot missing (I haven't touched GIS yet), and some of the tests are failing. I'm running tests against SQLITE and Postgres at the moment, but I'll add in Oracle and MySQL once I've got something closer to finished.I'm having some trouble with the test that is currently failing, and if someone could point out where I'm going wrong that'd be awesome. Specifically, ExpressionTests.test_filter() is failing on updates spanning relationships (line 179). The problem is that relabeled_clone isn't being called or used at the appropriate time. I did some debugging, and I *think* that Lookup.process_rhs() is being called before Lookup.relabeled_clone(), which results in the ExpressionNode using the incorrect alias. I assume this is a problem with how I've refactored ExpressionNode since the same tests pass when I checkout lookups_3. ExpressionNode.relabeled_clone is called, but after the SQL has already been generated. Should I be providing a method or attribute for Lookup to inspect maybe?
I'd appreciate some feedback on the patch as it stands too if possible. I think this is the right direction to head towards, as it simplifies the existing F() evaluation, and should allow a straightforward implementation of Aggregates on top. I'll be away until the 2nd of January, but will pick this up again then.
The big problem is that if we drop the SQLEvaluator and similarly the models/aggregates.py <-> models/sql/aggregates.py ways of writing Expressions and Aggregates, then there will be a lot of broken 3rd party aggregates and expressions. Even if the APIs are private I am not willing to just drop the old way. So, some backwards compatibility is required.
I now have all tests passing on Postgres and SQLite (See here). I still haven't touched GIS, but I believe it should be as simple as changing the query to directly call prepare on the expression, and removing references to SQLEvaluator.The big problem is that if we drop the SQLEvaluator and similarly the models/aggregates.py <-> models/sql/aggregates.py ways of writing Expressions and Aggregates, then there will be a lot of broken 3rd party aggregates and expressions. Even if the APIs are private I am not willing to just drop the old way. So, some backwards compatibility is required.What level of backwards compatibility would be required? I like the idea of using the @add_implementation decorator to extend Expressions and Aggregates, but would that be enough? I've not used django-nonrel or any other non-official backend, so I'm really not sure what would be required or where a line should be drawn. The SQLEvaluator API (mostly) has moved down to ExpressionNode, so the callers only change by calling .prepare on the value rather than wrapping it in an Evaluator.
class SQLArrayAgg(SQLAggregate): sql_function = 'array_agg' class ArrayAgg(Aggregate): name = 'ArrayAgg' def add_to_query(self, query, alias, col, source, is_summary): klass = SQLArrayAgg aggregate = klass( col, source=source, is_summary=is_summary, **self.extra) aggregate.field = DecimalField() # vomit query.aggregates[alias] = aggregate
Then doing:
qs = SomeModel.objects.annotate(ArrayAgg('id'))
should produce results using the ArrayAgg class. This is using private API, but there is a lot of code relying on this. We should support this if at all possible.
Another case you should check is how GIS works *without* rewriting anything in django.contrib.gis. If that works, then that shows fairly good backwards compatibility.
To write a custom aggregate one needed two classes, one for user facing API, one for implementing that API for SQL queries.
Another case you should check is how GIS works *without* rewriting anything in django.contrib.gis
I'll try to check your work in detail as soon as possible. Unfortunately I am very busy at the moment, so I can't guarantee anything.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/518c79cd-050c-49f7-8420-1e53bfa4b78e%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
Why were the aggregate specific fields, properties, etc. added to ExpressionNode, instead of being on django.db.models.aggregates?
- Why does an ExpressionNode containing any Aggregates need to pollute the entire tree with that knowledge?
Instead of using is_ordinal and is_computed (confusing name, why not is_float), you could eliminate those and define output_type for each Aggregate that does something special.
What does is_summary mean?
- Custom backend implementation of aggregates and annotations
- Annotations that aren't aggregates. I think this should be extremely easy, but I might not be aware of some underlying difficulties.
Things that should be done in the future, as I don't think there will be time for this just now:
- Change order_by (and other functions) to accept Expressions.
Any idea how to do this?
The new way in custom lookups is trying to first call as_vendorname(qn, connection), if that doesn't exist then call the standard implementation, that is as_sql(). For example on sqlite Django will first try to call as_sqlite(), if that doesn't exist then as_sql() will be called.
I don't think it is that easy. Django assumes that if any annotation exist, then the query is an aggregation query. But of course that doesn't hold if the above is done.
- Annotations that aren't aggregates. I think this should be extremely easy, but I might not be aware of some underlying difficulties.
order_by(RefSQL("case when {{author__birthdate__year}} > %s then 1 else 0 end", [1981])). Joins generated by Django, and you can use any custom lookups you have. No need for .extra() any more!
Any idea how to do this?
The new way in custom lookups is trying to first call as_vendorname(qn, connection), if that doesn't exist then call the standard implementation, that is as_sql(). For example on sqlite Django will first try to call as_sqlite(), if that doesn't exist then as_sql() will be called.I've been following your PR and noted the change (and made some comments too) away from @add_implementation. I'm having a bit of an internal debate about how I've structured expressions with respect to as_sql though. Each expression also has a get_sql() method that as_sql() calls internally on leaf nodes. I'm thinking it'd be best for the extension point to be on get_sql(), but it'd be inconsistent with the Lookups API. The problem with overriding as_sql() would be that the extender would also need to know to recurse the entire tree which is not a good idea at all. Would the inconsistency be acceptable? When I spend some time actually trying out swappable implementations, this may all become clearer to me.
Maybe both Aggregate and Expression could implement the query expression API (that is, those things Col class has in the lookups patch). This way F('author__age') + F('author__books_sold') would translate to:
Expression(+):
Col(author, age)
Col(author, books_sold)
Internally this would mean that Expression just generates the connector between nodes, but doesn't produce any SQL for the nodes itself. It seems Expression is trying to do too much currently. It produces connector between nodes when it is internal node and is also generates SQL for the leaf nodes, too.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/8vEAwSwJGMc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7c5a1866-a414-48c6-af4a-6477b8b5b79a%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/026da964-8066-49f6-a762-1d8cc8bd1ef2%40googlegroups.com.