Adding custom join support to Django

559 views
Skip to first unread message

Josh Smeaton

unread,
Mar 28, 2018, 3:54:06 AM3/28/18
to Django developers (Contributions to Django itself)
Someone has again brought up the lack of custom join support in django - specifically LEFT OUTER joins: https://code.djangoproject.com/ticket/29262

I figure it's probably something that we'd like the ORM to support but I don't think we've ever really landed on a design that's palatable. I'd like to try and get a rough consensus for a design and flesh out any likely issues. We can then either provide a ready to go syntax that someone could implement, or I could put together a DEP. At a minimum, we can point future askers towards a design and "PR Welcome" them. Best case someone (possibly myself) runs with the design and provides an implementation.

What I'm interested in seeing are specific use cases for customising joins as that will drive any kind of design.

I've been playing around with some syntaxes, and this is the kind of thing I'm currently leaning toward:

qs = Book.objects.annotate(
my_reviews=joins.Left(
Review.objects.filter(user=u1),
book=OuterRef('pk')
)
)

This supports customising both the join type, and the join conditions (ON book.id = review.book_id). One concern with this design is that now the result set contains multivalues - where a book might be repeated if the number of reviews that match the filter is greater than 1. We could ask the user to specify whether the result is potentially multi-valued and convert the result into a list, but that would require ordering and comparing the current row to the previous row to check if it's a multivalue. These are the kind of issues I want to flesh out, and that's only going to be possible given a sufficient corpus of examples.

So I'll kick off with an example that seems to be rather common:

- Add filtered results from a reverse relation / m2m relation to the result set without eliminating results using a LEFT JOIN.

Sassan Haradji

unread,
Apr 3, 2018, 5:52:39 PM4/3/18
to Django developers (Contributions to Django itself)
I'm the one who reported that issue. This is my usecase:
I have a table T like this:

pk: Primary Key (btree indexed)
value: double (btree indexed)
group1_id: Integer (btree indexed)
group2_id: Integer (btree indexed)
...

I'm querying the table for `MIN(pk), ARRAY_AGG(ARRAY(group_id, value,t1,t2))` group by `group1_id, group2_id` for example but i need to order it by value of column value in rows that meat some condition (group_id1=x, group_id2=y). So I need to left join the main query with a subquery that extracts above mentioned rows (group_id1=x, groupid2=y) and order the main query by the value column of this query.

The table contains around ~170,000,000 rows and counting and this is the only method I found that I can get the result in reasonable time.

Currently I'm doing it in Django like this:
1. Generate main query via Django ORM.
2. Filter and order it with Rest Framework Filters.
3. Extract the sql code via `sql_with_params()`
4. Generate the subquery via Django ORM too.
5. Filter it with Rest Framework Fitlers too.
6. Extract the sql code of the subquery too.
7. Use `re.sub` to inject the join into main query (`re.sub(' WHERE ', ' LEFT OUTER JOIN ({}) AS "order_query" ON ("t"."id"="order_query"."id" etc`
8. Use `re.sub` to inject the order among other orders in main query, I have other orders too and the order of these orders are dynamic based on user request so I have to find the position of this order among the orders RestFramework generated via Django ORM.
9 And at the end I have:
```
            queryset.__class__ = type(
                type(queryset).__name__,
                (type(queryset),),
                {'__len__': lambda self: count},
            )

```

So that rest framework doesn't complain about the lake of length in the returned query.

Even if Django had a way so that I could promise my query generates compatible with its ORM results this process would be easier.
In the next level it would be great if I could have the whole query structure in Django ORM api.

Alexandr Artemyev

unread,
Nov 28, 2019, 9:03:04 AM11/28/19
to Django developers (Contributions to Django itself)
I tried to describe my case with custom joins in django issue tracker.
I hope that I described it quite clearly. And that was helpful.
I would especially like to note that https://stackoverflow.com/questions/22902263/django-orm-joining-subquery/42816689#42816689 this solution completely suited me.

среда, 28 марта 2018 г., 10:54:06 UTC+3 пользователь Josh Smeaton написал:
Reply all
Reply to author
Forward
0 new messages