[Django] #28422: Allow adding joins to other querysets (or models) to a queryset with extra join conditions

7 views
Skip to first unread message

Django

unread,
Jul 21, 2017, 5:16:53 AM7/21/17
to django-...@googlegroups.com
#28422: Allow adding joins to other querysets (or models) to a queryset with extra
join conditions
------------------------------------------+------------------------
Reporter: Debanshu Kundu | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 1.11
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------+------------------------
In one of our project we had a need to join our sub-queries to our main
query. As Django ORM doesn't support this we had to write SQL queries. But
after some time those SQL queries became difficult to maintain and our
project was under active development and we were doing additions/changes
to models and query logic.

So we started to look for alternatives and found some ways to hack Django
ORM using which we can add joins to queryset. I have created this gist
with utility functions and related helper code which are we using to add
join to another queryset (or model) to a queryset:
https://gist.github.com/debanshuk/6fd9398cff0fab59e7093fe98b8a9152.

These functions are named `join_to_queryset()` and `join_to_table()`
respectively. They also allow adding extra conditions to the join added by
them using `get_active_extra_restriction()` helper function.

Following is an example showing use of `join_to_queryset()` function:

{{{
class Snake(models.Model):
name = models.TextField()
age = models.PositiveIntegerField()
length = models.FloatField()
sex = models.TextField()

class Egg(models.Model):
snake = models.ForeignKey(Snake)

class Kill(models.Model):
snake = models.ForeignKey(Snake)

queryset = Snake.objects.filter(pk__in=snake_pks)

queryset = join_to_queryset(
table=Snake,
subquery=Egg.objects.values('snake').annotate(egg_count=Count('pk')),
table_field='id',
subquery_field='snake_id',
queryset=queryset,
alias='SnakeEggAggr'
).extra(select={'egg_count': 'SnakeEggAggr.egg_count'})

queryset = join_to_queryset(
table=Snake,
subquery=Kill.objects.values('snake').annotate(kill_count=Count('pk')),
table_field='id',
subquery_field='snake_id',
queryset=queryset,
alias='SnakeKillAggr'
).extra(select={'kill_count': 'SnakeKillAggr.kill_count'})

print queryset.values('name', 'age', 'length', 'sex', 'egg_count',
'kill_count')
}}}

Output of above code would be something like:
{{{
[{'name': 'John', 'age': 5, 'length': 20.1, 'sex': 'male', 'egg_count':
10, 'kill_count': 5}, {'name': 'Jane', 'age': 8, 'length': 25.5, 'sex':
'female', 'egg_count': 5, 'kill_count': 1}, {'name': 'Jack', 'age': 2,
'length': 10.2, 'sex': 'trans', 'egg_count': 0, 'kill_count': 0}]
}}}


Above result can also be obtained by doing following query:

{{{
print
Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),
kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',
'egg_count', 'kill_count''egg_count', 'kill_count')
}}}

But this query will take more time to execute than previous one as 'name',
'age', 'length' and 'sex' all four fields would be in the GROUP BY clause
of SQL query and the time will increase more and more as the number of
such fields increases (this is the vary reason due to which we had to use
sub-queries for aggreagation).

It would be nice if such functionality can be added to the Django ORM
itself. It seems doable to as we were able to hack the ORM to do the same.

--
Ticket URL: <https://code.djangoproject.com/ticket/28422>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 21, 2017, 6:54:42 AM7/21/17
to django-...@googlegroups.com
#28422: Allow adding joins to other querysets (or models) to a queryset with extra
join conditions
--------------------------------+--------------------------------------

Reporter: Debanshu Kundu | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 1.11
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Description changed by Debanshu Kundu:

Old description:

New description:

queryset = Snake.objects.filter(pk__in=snake_pks)

'length': 10.2, 'sex': 'hermaphrodite', 'egg_count': 0, 'kill_count': 0}]
}}}


Above result can also be obtained by doing following query:

{{{
print
Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),
kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',
'egg_count', 'kill_count''egg_count', 'kill_count')
}}}

But this query will take more time to execute than previous one as 'name',
'age', 'length' and 'sex' all four fields would be in the GROUP BY clause
of SQL query and the time will increase more and more as the number of
such fields increases (this is the vary reason due to which we had to use
sub-queries for aggreagation).

It would be nice if such functionality can be added to the Django ORM
itself. It seems doable to as we were able to hack the ORM to do the same.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28422#comment:1>

Django

unread,
Jul 21, 2017, 8:02:09 AM7/21/17
to django-...@googlegroups.com
#28422: Allow adding joins to other querysets (or models) to a queryset with extra
join conditions
-------------------------------------+-------------------------------------

Reporter: Debanshu Kundu | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: duplicate

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* status: new => closed
* resolution: => duplicate
* component: Uncategorized => Database layer (models, ORM)
* type: Uncategorized => New feature


Comment:

Duplicate of #26426 and/or #27332? If not, perhaps you can make a more
specific API you're proposal.

--
Ticket URL: <https://code.djangoproject.com/ticket/28422#comment:2>

Django

unread,
Jul 21, 2017, 9:34:20 AM7/21/17
to django-...@googlegroups.com
#28422: Allow adding joins to other querysets (or models) to a queryset with extra
join conditions
-------------------------------------+-------------------------------------

Reporter: Debanshu Kundu | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Description changed by Debanshu Kundu:

Old description:

> In one of our project we had a need to join our sub-queries to our main

> 'length': 10.2, 'sex': 'hermaphrodite', 'egg_count': 0, 'kill_count': 0}]
> }}}
>

> Above result can also be obtained by doing following query:
>
> {{{
> print
> Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),
> kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',
> 'egg_count', 'kill_count''egg_count', 'kill_count')
> }}}
>
> But this query will take more time to execute than previous one as
> 'name', 'age', 'length' and 'sex' all four fields would be in the GROUP
> BY clause of SQL query and the time will increase more and more as the
> number of such fields increases (this is the vary reason due to which we
> had to use sub-queries for aggreagation).
>
> It would be nice if such functionality can be added to the Django ORM
> itself. It seems doable to as we were able to hack the ORM to do the
> same.

New description:

queryset = Snake.objects.filter(pk__in=snake_pks)

print(queryset.values('name', 'age', 'length', 'sex', 'egg_count',
'kill_count'))
}}}

Output of above code would be something like:
{{{
[{'name': 'John', 'age': 5, 'length': 20.1, 'sex': 'male', 'egg_count':
10, 'kill_count': 5}, {'name': 'Jane', 'age': 8, 'length': 25.5, 'sex':
'female', 'egg_count': 5, 'kill_count': 1}, {'name': 'Jack', 'age': 2,

'length': 10.2, 'sex': 'hermaphrodite', 'egg_count': 0, 'kill_count': 0}]
}}}


Above result can also be obtained by doing following query:

{{{
print(Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),


kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',

'egg_count', 'kill_count''egg_count', 'kill_count'))
}}}

But this query will take more time to execute than previous one as 'name',
'age', 'length' and 'sex' all four fields would be in the GROUP BY clause
of SQL query and the time will increase more and more as the number of
such fields increases (this is the vary reason due to which we had to use
sub-queries for aggreagation).

It would be nice if such functionality can be added to the Django ORM
itself. It seems doable to as we were able to hack the ORM to do the same.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28422#comment:3>

Django

unread,
Jul 21, 2017, 9:57:12 AM7/21/17
to django-...@googlegroups.com
#28422: Allow adding joins to other querysets (or models) to a queryset with extra
join conditions
-------------------------------------+-------------------------------------

Reporter: Debanshu Kundu | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Old description:

> print(queryset.values('name', 'age', 'length', 'sex', 'egg_count',
> 'kill_count'))


> }}}
>
> Output of above code would be something like:
> {{{
> [{'name': 'John', 'age': 5, 'length': 20.1, 'sex': 'male', 'egg_count':
> 10, 'kill_count': 5}, {'name': 'Jane', 'age': 8, 'length': 25.5, 'sex':
> 'female', 'egg_count': 5, 'kill_count': 1}, {'name': 'Jack', 'age': 2,

> 'length': 10.2, 'sex': 'hermaphrodite', 'egg_count': 0, 'kill_count': 0}]
> }}}
>

> Above result can also be obtained by doing following query:
>
> {{{

> print(Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),


> kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',

> 'egg_count', 'kill_count''egg_count', 'kill_count'))


> }}}
>
> But this query will take more time to execute than previous one as
> 'name', 'age', 'length' and 'sex' all four fields would be in the GROUP
> BY clause of SQL query and the time will increase more and more as the
> number of such fields increases (this is the vary reason due to which we
> had to use sub-queries for aggreagation).
>
> It would be nice if such functionality can be added to the Django ORM
> itself. It seems doable to as we were able to hack the ORM to do the
> same.

New description:

queryset = Snake.objects.filter(pk__in=snake_pks)

'length': 10.2, 'sex': 'hermaphrodite', 'egg_count': 0, 'kill_count': 0}]
}}}


Above result can also be obtained by doing following query:

{{{
print
Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),
kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',
'egg_count', 'kill_count''egg_count', 'kill_count')
}}}

But this query will take more time to execute than previous one as 'name',
'age', 'length' and 'sex' all four fields would be in the GROUP BY clause
of SQL query and the time will increase more and more as the number of
such fields increases (this is the vary reason due to which we had to use
sub-queries for aggreagation).

It would be nice if such functionality can be added to the Django ORM
itself. It seems doable to as we were able to hack the ORM to do the same.

--

Comment (by Debanshu Kundu):

@Tim, As I understood #27332 and #26426 both are only about specifying
extra conditions in a join, which is a part of this ticket but isn't the
main feature. The main feature here is to be able to join to sub-queries.
The example query (using `join_to_queryset()` function) stated in the
ticket will look something like following query in SQL:

{{{
SELECT name, age, length, sex, SnakeKillAggr.egg_count AS egg_count,
SnakeKillAggr.kill_count AS kill_count
FROM snakes_snake
LEFT OUTER JOIN (
SELECT snake_id, COUNT(id) AS egg_count
FROM snakes_egg
GROUP BY snake_id
) AS SnakeEggAggr ON (snakes_snake.id = SnakeEggAggr.snake_id)
LEFT OUTER JOIN (
SELECT snake_id, COUNT(id) AS kill_count
FROM snakes_kill
GROUP BY snake_id
) AS SnakeKillAggr ON (snakes_snake.id = SnakeKillAggr.snake_id)
}}}

And if we want to add an extra condition in the second join such that a
snake's kill count is only included if it's greater than it's age, we can
do it as follows:

{{{


queryset = join_to_queryset(
table=Snake,
subquery=Kill.objects.values('snake').annotate(kill_count=Count('pk')),
table_field='id',
subquery_field='snake_id',
queryset=queryset,

alias='SnakeKillAggr',
extra_restriction_func=lambda where_class, alias, related_alias: (
where_class([ExtraWhere([f'{alias}.kill_count >
{related_alias}.age'], ())])
)
).extra(select={'kill_count': 'SnakeKillAggr.kill_count'})
}}}

About example code's API is not very clean as it was just a hack to make
things work at the time, but it can be cleaned up and made a proper API.
Following can be the API:

`join_to_queryset()` method for queryset with following arguments:

`base_queryset_field`: Field of base queryset which will be used in
primary join condition.
`subqueryset`: Sub-queryset to join to.
`subqueryset_field`: Field of base sub-queryset which will be used in
primary join condition.
`extra_restrictions`: List of extra restriction for the join (format of an
object in this list needs to be decided).

And `join_to_model()` method for model with following argumets:

`base_queryset_field`: Field of base queryset which will be used in
primary join condition.
`model`: Model to join to.
`model_field`: Field of base model which will be used in primary join
condition.
`extra_restrictions`: List of extra restriction for the join (format of an
object in this list needs to be decided).

So the example in the ticket will look something like (format of
extra_restrictions needs to be decided):
{{{
queryset = queryset.join_to_queryset(
base_queryset_field='id',
subqueryset=Kill.objects.values('snake').annotate(kill_count=Count('pk')),
subqueryset_field='snake_id',
extra_restrictions=['{subqueryset}.kill_count > {base_queryset}.age']
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28422#comment:4>

Django

unread,
Jul 21, 2017, 11:38:01 AM7/21/17
to django-...@googlegroups.com
#28422: Allow adding joins to other querysets (or models) to a queryset with extra
join conditions
-------------------------------------+-------------------------------------

Reporter: Debanshu Kundu | Owner: nobody
Type: New feature | Status: closed
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Chris Connelly):

I was actually in the process of writing a similar ticket when this one
popped up! As with what Debanshu Kundu stated above, my goal was to add a
join to a subquery. My solution was a lot lower fidelity - I used
`QuerySet.extra` to inject a placeholder table (e.g. `{{JOIN}}`) which I
then substituted with the generated subquery before execution.

This does seem like this could be done if
https://github.com/django/django/pull/8238 was merged, which is an
offshoot of #27332.

--
Ticket URL: <https://code.djangoproject.com/ticket/28422#comment:5>

Reply all
Reply to author
Forward
0 new messages