Window expressions, #26608

347 views
Skip to first unread message

Mads Jensen

unread,
Nov 22, 2016, 7:19:10 AM11/22/16
to django-d...@googlegroups.com
Hi,

I got somewhat stuck on progress with this ticket, and as I'd like to
get it merged eventually (and avoid an abundant amount of fixes), I have
a few things I like a bit of input about.

1. Since this is specific to postgres, I'm looking for a better place to
put the actual Window-expression class, as well as axillary helpers
(Order By, Partition and the frame).
django/db/models/sql/expressions.sql may be one such place, but perhaps
introducing another file for this? Eventually, something like WITHIN
GROUP could be added, which shares some traits with window expressions.
This also goes the tests.

2. Tests, and if the initial test-cases seem fair? I took a small sample
of test cases related to some made-up salaries in a few departments. I
tried to introduce some wrapper to guard against testing functions that
aren't available (Oracle is the most complete in this sense, PostgreSQL
doesn't support all of the aggregate-functions, such as Variance).

3. The Order By-clause proposed in the initial code in the code only
leaves room for one column (no expressions). A regular query has
"add_ordering" to add more than one ordering-clause to a query, but this
just works for an ordering in SELECT, and I didn't spot anything that
could be used for this. Maybe abstracting the code from add_ordering so
it could be used also for something more? Something similar goes for
Partition By. Again, I'm sure it's a common use case to partition and
order by more than one expression. I currently commented out the
OrderByWrapper.

4. Would it be ok to add code for this in the base backend? I'm thinking
of constants for CURRENT ROW etc. which are shared among backends. I had
a look at SQLalchemy to see how things are arranged there, and what it
supports. SQLalchemy is feature-rich, but leaves out more
expressive-style ranges, and only supports unpreceding following/current
row, unbound following. I guess this is a reasonable limitation to make.

5. Will a 2.0 release notes be introduced soon, as I'm certain I won't
manage to make the final PR before the 1.11 feature freeze as it needs
to be finished, reviewed and merged. I started some time ago, and had it
lying around till I understood window functions better.

As pointed out in #26608, I have some rough, drafty code that's quite
far from a PR. I'm more interested in input for the above.

I'm relatively new to Django internals (it's fun and educational hacking
on them), and it took me some time to read through the code and
documentation for the ORM, and there are many facets of it I can still
understand. If someone is willing to guide me a bit, I'd be quite
helpful. Thank you.
--
Med venlig hilsen / Kind regards,
Mads Jensen

Saajan Fernandes: I think we forget things if there is nobody to
tell them.
-- The Lunchbox (2013)




Josh Smeaton

unread,
Nov 22, 2016, 5:50:17 PM11/22/16
to Django developers (Contributions to Django itself)
Hi Mads,

Thanks for picking this up. I've been wondering if Window expressions would be possible, and what limitations we might have to make based on our ORM.

> 1. Since this is specific to postgres, I'm looking for a better place to 
put the actual Window-expression class, as well as axillary helpers 
(Order By, Partition and the frame). 

I would create a new file beside expressions.py called django/db/models/window.py.


> 2. Tests, and if the initial test-cases seem fair? I took a small sample 
of test cases related to some made-up salaries in a few departments.

Yes, it's a good start, but I haven't thoroughly reviewed them yet either. It'll be easier to do when there's a proper PR.

> 3. The Order By-clause proposed in the initial code in the code only 
leaves room for one column (no expressions).

Wouldn't the existing OrderBy type be sufficient? Really though, windowing should support a list of expressions (then call asc() or desc() on them).

> 4. Would it be ok to add code for this in the base backend?

Yes, if it makes the implementation nicer/easier for django and 3rd party backends. If having the code in the backend prevents *users* from creating their own windowing functions or modifying existing ones, then that's probably not ok. Django, backends, and users all need to be able to construct their own windowing expressions. You can certainly share code in the backend though, with things like PRECEDING ROW etc.

> 5. Will a 2.0 release notes be introduced soon,

The 1.11 alpha will be cut late January, which will trigger the creation of the 2.0 pages. Don't worry too much about release notes just yet.

> I'm relatively new to Django internals (it's fun and educational hacking 
on them), and it took me some time to read through the code and 
documentation for the ORM, and there are many facets of it I can still 
understand.

None of us understand all of the ORM, and we're more than happy to have more people helping out, so thanks! If you create a PR (even if it's not ready, just mark is [WIP] for Work In Progress) it'll be easier to comment on specific things within the code and see all of the commits tied together. You're also more likely to get feedback on a PR and some guidance there too.

Cheers

Mads Jensen

unread,
Nov 25, 2016, 7:22:58 AM11/25/16
to Django developers (Contributions to Django itself)
On Tuesday, November 22, 2016 at 11:50:17 PM UTC+1, Josh Smeaton wrote:
Thanks for picking this up. I've been wondering if Window expressions would be possible, and what limitations we might have to make based on our ORM.

I was fortunate that the reporter posted a snippet to work from :) Django's ORM is quite flexible, and adaptable. I remember seeing a video with Alex Gaynor where he complained about its shortcomings, but that seemed to have been sorted out.

https://www.youtube.com/watch?v=GxL9MnWlCwo
 
> 1. Since this is specific to postgres, I'm looking for a better place to 
put the actual Window-expression class, as well as axillary helpers 
(Order By, Partition and the frame). 

I would create a new file beside expressions.py called django/db/models/window.py.


Seems like a good starting point. I have been contemplating that grouping.py could be a good name as well, in case somehow would like to add WITHIN GROUP, or support for grouping sets (a bigger fish, since that means appending stuff after GROUP BY etc.)
Anyway, I have some window.py locally.
 
> 2. Tests, and if the initial test-cases seem fair? I took a small sample 
of test cases related to some made-up salaries in a few departments.

Yes, it's a good start, but I haven't thoroughly reviewed them yet either. It'll be easier to do when there's a proper PR.

I wouldn't call it proper, but I posted a pull request, and marked it [WIP].

> 3. The Order By-clause proposed in the initial code in the code only 
leaves room for one column (no expressions).

Wouldn't the existing OrderBy type be sufficient? Really though, windowing should support a list of expressions (then call asc() or desc() on them).

Maybe I'm reading the code wrong, but doesn't OrderBy just refer to a column, or are types of Expression/ExpressionWrapper also valid input?
 
> 4. Would it be ok to add code for this in the base backend?

Yes, if it makes the implementation nicer/easier for django and 3rd party backends. If having the code in the backend prevents *users* from creating their own windowing functions or modifying existing ones, then that's probably not ok. Django, backends, and users all need to be able to construct their own windowing expressions. You can certainly share code in the backend though, with things like PRECEDING ROW etc.
 
I'm cleaning up things in the pull request. At least elements that are mandatory are there, such as docs, tests and some code.
 
> I'm relatively new to Django internals (it's fun and educational hacking 
on them), and it took me some time to read through the code and 
documentation for the ORM, and there are many facets of it I can still 
understand.

None of us understand all of the ORM, and we're more than happy to have more people helping out, so thanks! If you create a PR (even if it's not ready, just mark is [WIP] for Work In Progress) it'll be easier to comment on specific things within the code and see all of the commits tied together. You're also more likely to get feedback on a PR and some guidance there too.

Good. I tend to have several small things in my commits, but I'll try to work on individual things in this PR.

Thanks for input.

Kind regards,
Mads

Josh Smeaton

unread,
Nov 25, 2016, 7:51:05 PM11/25/16
to Django developers (Contributions to Django itself)
OrderBy takes an expression - it doesn't have to be a single column. For example, this is valid:

(Lower('last_name') + Lower('first_name)).desc() == OrderBy(Lower('last_name') + Lower('first_name), descending=True)

Mads Jensen

unread,
Nov 29, 2016, 3:56:47 PM11/29/16
to Django developers (Contributions to Django itself)
On Saturday, November 26, 2016 at 1:51:05 AM UTC+1, Josh Smeaton wrote:
OrderBy takes an expression - it doesn't have to be a single column. For example, this is valid:

(Lower('last_name') + Lower('first_name)).desc() == OrderBy(Lower('last_name') + Lower('first_name), descending=True)

Yes. However, I was more interested in being able to add a list of expressions. The OrderBy-class does not evaluate more than
one expression. The Case-expression right above iterates over all of the cases, and parses them. This is what I had in mind. I don't writing something for it, but
I'm trying to use as much of the existing API as possible.

What I had in mind was offering the flexibility to write this (maybe offering a list of OrderBy-clauses as valid input would be a way to remediate this):

I read over the Query-class code which has add_ordering.

qs = WindowTestModel.objects.annotate(sum=Window(
expression=Sum('salary'),
order_by=[ExtractYear('hiredate'), F('department')])
)
print(qs.query)

Kind regards,
Mads

Adam Johnson

unread,
Nov 29, 2016, 4:35:34 PM11/29/16
to Django developers (Contributions to Django itself)
As to your point 1:
 
Since this is specific to postgres

Well, Window expressions aren't specific to Postgres:
I thus think this should all be part of Django's main QuerySet API.

Josh Smeaton

unread,
Nov 29, 2016, 5:04:49 PM11/29/16
to Django developers (Contributions to Django itself)
Adam, are you thinking we should be adding something like Model.objects.window(), or just allowing Window-type expressions on backends that have a specific feature flag? Does the compiler need to get involved at all, or can we handle the full range of window expressions with the expressions API?

Mads, there's nothing that currently handles a list of expressions, and certainly nothing specific to OrderBy. Your proposed syntax is basically what would be required `order_by=[ .. ]`.

Adam Johnson

unread,
Nov 29, 2016, 6:37:48 PM11/29/16
to django-d...@googlegroups.com
I'm not sure about that detail atm, I need to review the patch. I just think we shouldn't be putting anything in postgres-only land.

On 29 November 2016 at 22:04, Josh Smeaton <josh.s...@gmail.com> wrote:
Adam, are you thinking we should be adding something like Model.objects.window(), or just allowing Window-type expressions on backends that have a specific feature flag? Does the compiler need to get involved at all, or can we handle the full range of window expressions with the expressions API?

Mads, there's nothing that currently handles a list of expressions, and certainly nothing specific to OrderBy. Your proposed syntax is basically what would be required `order_by=[ .. ]`.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/f4999522-0a99-4e47-b499-bfba19cf3920%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Michael Manfre

unread,
Nov 29, 2016, 8:10:41 PM11/29/16
to django-d...@googlegroups.com
On Tue, Nov 29, 2016 at 6:37 PM Adam Johnson <m...@adamj.eu> wrote:
I'm not sure about that detail atm, I need to review the patch. I just think we shouldn't be putting anything in postgres-only land.

On 29 November 2016 at 22:04, Josh Smeaton <josh.s...@gmail.com> wrote:
Adam, are you thinking we should be adding something like Model.objects.window(), or just allowing Window-type expressions on backends that have a specific feature flag? Does the compiler need to get involved at all, or can we handle the full range of window expressions with the expressions API?

I agree that this should be behind a database feature and not locked away in contrib.postgres or with a vendor check. Regarding the specifics, I haven't had time to dig in to see how difficult it would be to add window expressions to the expressions API, but that would be the way I'd like to see it happen. Doing something like Model.objects.window() seems like it would require getting the compiler involved and be more of a pain trying to implement the various window expressions for each backend.

 
Mads, there's nothing that currently handles a list of expressions, and certainly nothing specific to OrderBy. Your proposed syntax is basically what would be required `order_by=[ .. ]`.

We (or anyone) could make an ExpressionList class that is essentially an Expression that is a list of expressions. It would basically be a Combinable with comma as the connector.

Regards,
Michael Manfre

Shai Berger

unread,
Nov 30, 2016, 1:30:15 AM11/30/16
to django-d...@googlegroups.com
On Wednesday 30 November 2016 03:10:23 Michael Manfre wrote:
> > On 29 November 2016 at 22:04, Josh Smeaton <josh.s...@gmail.com>
> > wrote:
>
> > Mads, there's nothing that currently handles a list of expressions, and
> > certainly nothing specific to OrderBy. Your proposed syntax is basically
> > what would be required `order_by=[ .. ]`.
>
> We (or anyone) could make an ExpressionList class that is essentially an
> Expression that is a list of expressions. It would basically be a
> Combinable with comma as the connector.
>
I don't believe you can override comma, and I'm also not sure an
ExpressionList is of general utility -- so far, all the places where we needed
such lists were method calls (like order_by()). I'd try harder to look for
APIs which keep it that way -- just off the top of my head,

Window(expression, *order_by)

(this probably doesn't work, Window() is more complex than that, but the
order_by=[...] suggested by Mads probably does)

Mads Jensen

unread,
Nov 30, 2016, 5:06:57 AM11/30/16
to Django developers (Contributions to Django itself)


On Tuesday, November 29, 2016 at 10:35:34 PM UTC+1, Adam Johnson wrote:
As to your point 1:
 
Since this is specific to postgres

Well, Window expressions aren't specific to Postgres:
I thus think this should all be part of Django's main QuerySet API.

This is also my plan. I left out an important "not" in that particular phrase. If you notice, I also added feature-flags. But still, the code is quite messy.

~ Mads

Mads Jensen

unread,
Nov 30, 2016, 12:33:19 PM11/30/16
to Django developers (Contributions to Django itself)
On Tuesday, November 29, 2016 at 11:04:49 PM UTC+1, Josh Smeaton wrote:
Adam, are you thinking we should be adding something like Model.objects.window(), or just allowing Window-type expressions on backends that have a specific feature flag? Does the compiler need to get involved at all, or can we handle the full range of window expressions with the expressions API?

I'd prefer the expression-api. I think it's more transparent what happens, and unless you're thinking of stating the windows explicitly after FROM/GROUP BY/HAVING, I don't see much reason.

I'm fairly new, but it seems to complicate things a lot more to need to write code inside the compiler, unless stricly necessary (supporting grouping sets could be one such example).
 
Mads, there's nothing that currently handles a list of expressions, and certainly nothing specific to OrderBy. Your proposed syntax is basically what would be required `order_by=[ .. ]`.

I actually like the idea of introducing ExpressionList. It would provide a bit more power, such as offering ASC/DESC for each ordering expression. Anyway, I'll work a bit with this, and post some
commits in the weekend.

~ Mads

Mads Jensen

unread,
Jan 2, 2017, 9:40:28 AM1/2/17
to Django developers (Contributions to Django itself)
    I think this has cleaned up to request input (I'm a bit zealous to make something that will be merged). I left a note in 1.11.txt but when 2.0.txt is introduced, I'll move it there. Also, now everything is in places that are not PostgreSQL-only. I realized in the process that the only thing the list of functions in the features-list of a backend is to prevent the test suite to run tests for unsupported functions. Mostly because MariaDB won't include FIRST_VALUE and LAST_VALUE and a few more. Perhaps a single list would be better?

I have a few questions to clear this up before asking for a more formal review:

1. PostgreSQL and MariaDB don't include support for RESPECT|IGNORE NULLS for certain functions (details are provided in
https://www.postgresql.org/docs/current/static/functions-window.html). Any thoughts if adding ignore_nulls=False|True to the constructor would be an idea, and then specify it's only possible for Oracle, or "as_oracle" with a special template? I noticed that nulls_last and nulls_first was added to OrderBy, however that is valid in all implementations. The Oracle virtual box was too big to fit on my machine, so I haven't tested locally for more than PostgreSQL, and on Python 3.

2. About __repr__ and __str__, in general, I'm in doubt what would be good ways to represent things, and how to test them? Also, I am in doubt what are intuitive input values that translate to CURRENT ROW, UNBOUNDED FOLLOWING etc.? I had a look at sqlalchemy which uses a mix of null-values and integers as valid input. In general, the frames-part is a bit of headache, and still slightly messy (the generated SQL is fine, so I suppose the current test data is adequate).

And probably, django/db/backends/mysql/features.py should be left untouched?

I know 1.11 is in the works, so I'm fine with this being on hold.

Thank you,
~ Mads

Mads Jensen

unread,
Feb 21, 2017, 1:36:04 AM2/21/17
to Django developers (Contributions to Django itself)
I hit somewhat of a wall, and would like some input/help.

* Import of window functions from `django.db.model.functions` seems to be the
  convention - `DEFAULT_INDEX_TABLESPACE` causes some headaches, because of
  XField being explicitly set in _output_field variable. Probably the functions need some extra fine-tuning in some way. So
  far, most of the functions don't have explicit arguments in the constructor,
  because they either have a varying number of arguments (Lead and Lag, for
  instance) or don't take any (rank and row_number; however, those can take an
  argument in case somebody wants to add support for WITHIN GROUP-expression). I
  deleted the "_output_field = Field" from the functions that do take an
  argument, such as Lead and Lag, would it be required to explicit about the
  output_field, such as overriding the resolve_output_field-field? In those
  cases, it's always the type of the first argument or the default-argument in
  case it's provided/available. Tests are running locally without the explicit
  _output_field. And just to avoid shuffling around things later, any arguments
  against django.db.models.functions.window being the right place for this?

* `__repr__` and `__str__` methods and testing. What's a good general convention
  for something like this? I don't have access to the connection in __repr__,
  and __str__-methods, and thus just imported from `django.db` in the tests,
  which I suppose is not the ideal solution for this. At least, it doesn't
  really feel that way.

* Are there more cases which should fail apart from filtering (supported by
  filterable) and use in UPDATE (can_be_reffed_in_update)?

Thank you.

Kind regards,
Mads

Josh Smeaton

unread,
Feb 22, 2017, 6:00:45 AM2/22/17
to Django developers (Contributions to Django itself)
First off, great work on this patch. I haven't given it much of a look over just yet unfortunately, but I'm trying to figure out where your issues are to help unblock.

- I'm unsure how DEFAULT_INDEX_TABLESPACE is causing issues with your patch, can you explain a bit?

- Explicit arguments in functions are a nice to have, and definitely help when you know exactly what should and shouldn't be allowed, but are not required. For functions that take no arguments, it's better to override the `__init__` method and make that explicit. You could also define the `arity` class property if you know exactly how many expression arguments are needed for a Func.

- Overriding resolve_output_field is a fine idea, and we should probably make use of that a little better in other types. You could also just override an `output_field` property to extract the output_field of the first expression in self.get_source_expressions.

- django.db.models.functions.windows is fine

- __str__ and __repr__ methods don't need to and shouldn't return database specific representations. They're there for debugging support mostly, and users aren't going to care (too much) if the PRECEDING language is slightly different from their particular backend. Hard code what you think will be the most useful representation in the __str__ and __repr__ methods. Tests are there to encourage implementors to add str/repr methods to their own expressions, that is all.

- are there other cases that should fail? Perhaps aggregating a window clause? I don't actually know, but it'd be worth testing. I think this will be caught by virtue of window functions deriving from Aggregate, but double check:

WindowTestModel.objects.annotate(lead=Window(
      expression=Lead('salary'),
      partition_by='department')
).aggregate(Sum('lead'))

Happy to clarify anything above if I wasn't too clear.

Cheers

Mads Jensen

unread,
Feb 24, 2017, 4:02:26 AM2/24/17
to django-d...@googlegroups.com
On 02/22/2017 12:00 PM, Josh Smeaton wrote:
> - I'm unsure how DEFAULT_INDEX_TABLESPACE is causing issues with your
> patch, can you explain a bit?

It appears that it's necessary to have functions defined
django/db/models/functions/X.py be importable from
django.db.models.functions, which means that they should be in
django/db/models/functions/__init__.py. I tried to experiment with this,
but I get a traceback about DEFAULT_INDEX_TABLESPACE not being set,
which again comes back Field referencing settings.DEFAULT_INDEX_TABLESPACE.

I had to revert the commit again:
https://github.com/django/django/pull/7611/commits
/0568902abadafa25fdc81b2ec5df50e5b7c11be7

> - Explicit arguments in functions are a nice to have, and definitely
> help when you know exactly what should and shouldn't be allowed, but are
> not required. For functions that take no arguments, it's better to
> override the `__init__` method and make that explicit. You could also
> define the `arity` class property if you know exactly how many
> expression arguments are needed for a Func.

The window functions are a bit of an odd case, as they are both
applicable in X() OVER(), but also in X() WITHIN GROUP(ORDER BY ...)
where X can take different arguments. I posted a couple links to some
blogs that explain this with a few examples. I'd like them also to
support WITHIN GROUP, where they can accept an argument; the
output-field is always for some functions either an integer or a float,
regardless of the input, and for others, such as last_value, it depends
on the type of the first argument, i.e., last_value(column with a
date-type) should output a date-type.

https://www.depesz.com/2014/01/11/waiting-for-9-4-support-ordered-set-within-group-aggregates/
https://sqlandplsql.com/2012/04/18/oracle-rank-function/

> - __str__ and __repr__ methods don't need to and shouldn't return
> database specific representations. They're there for debugging support
> mostly, and users aren't going to care (too much) if the PRECEDING
> language is slightly different from their particular backend. Hard code
> what you think will be the most useful representation in the __str__ and
> __repr__ methods. Tests are there to encourage implementors to add
> str/repr methods to their own expressions, that is all.

OK. Good.

> - are there other cases that should fail? Perhaps aggregating a window
> clause? I don't actually know, but it'd be worth testing. I think this
> will be caught by virtue of window functions deriving from Aggregate,
> but double check:
>
> WindowTestModel.objects.annotate(lead=Window(
> expression=Lead('salary'),
> partition_by='department')
> ).aggregate(Sum('lead'))

This doesn't crash on Postgres, and sums the result of lead together. It
adds a group by, though.

I suppose an exception should also be raised in case there's no ORDER BY
inside the window, at least some of the tests failed on Oracle because
they didn't specify the ordering. For instance, Oracle wasn't thrilled
about ROW_NUMBER() OVER(), although Postgres could understand it (I
haven't yet installed MariaDB locally to get an idea what works and what
doesn't).

Thank you. I got a few clues to work from here, but I'll ask again at
some point (I spent quite a lot of time on this, so I want to see it
merged) :)

> Cheers
Reply all
Reply to author
Forward
0 new messages