[Django] #33706: Implement annotations as (lateral) joins

224 views
Skip to first unread message

Django

unread,
May 13, 2022, 8:11:41 AM5/13/22
to django-...@googlegroups.com
#33706: Implement annotations as (lateral) joins
-------------------------------------+-------------------------------------
Reporter: Bálint | Owner: nobody
Balina |
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: 4.0
layer (models, ORM) | Keywords:
Severity: Normal | annotate,performance,subquery,postgresql
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
The ORM annotations are a very good and flexible tool for dynamic data.
However there is a huge performance loss because of the way they are
implemented. See the following example:


{{{
Product.objects.annotate(
on_stock=Subquery(
Stock.objects.filter(
product_id=OuterRef('id'),
).values('product_id').annotate(
on_stock=Sum('quantity'),
).values('on_stock').order_by(),
output_field=DecimalField(),
),
outgoing=Subquery(
OrderItem.objects.filter(
product_id=OuterRef('id'),
).values('product_id').annotate(
outgoing=Sum('quantity'),
).values('outgoing').order_by(),
output_field=DecimalField(),
),
available=F('on_stock') - F('outgoing'),
).filter(
available__gt=0,
).values('id', 'on_stock', 'outgoing', 'available')
}}}

When aggregates are needed from multiple tables annotation can be used as
subqueries. Annotations can also be filtered against and reused. However
django implements this query with the following SQL:


{{{
SELECT "product"."id",
(SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE ( u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") AS
"on_stock",
(SELECT SUM(u0."quantity") AS "outgoing"
FROM "order_item" u0
WHERE u0."product_id" = ("product"."id")
GROUP BY u0."product_id") AS
"outgoing",
((SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE (u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") - (SELECT SUM(u0."quantity") AS
"outgoing"
FROM "order_item" u0
WHERE u0."product_id" =
("product"."id")
GROUP BY u0."product_id")) AS
"available"
FROM "product"
WHERE (((SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE (u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") - (SELECT SUM(u0."quantity") AS
"outgoing"
FROM "order_item" u0
WHERE u0."product_id" =
("product"."id")
GROUP BY u0."product_id")) > 0)
}}}

Both the on_stock and outgoing annotations are inlined 3 times and
calculated 3 times by the database, which is 6 subquery instead of 2.

See the following PostgreSQL query adjusted by hand:

{{{
SELECT "product"."id",
"on_stock",
"outgoing",
"on_stock" - "outgoing" as available
FROM "product"
, LATERAL (SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE (u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") AS "on_stock"
, LATERAL (SELECT SUM(u0."quantity") AS "outgoing"
FROM "order_item" u0
WHERE u0."product_id" = ("product"."id")
GROUP BY u0."product_id") AS "outgoing"
WHERE (("on_stock" - "outgoing") > 0)
}}}

The exact same subquery can be joined with the lateral keyword, producing
the exact same results, but reusing calculations. The performance benefit
is huge, it can be hundreds of milliseconds even for this simple query,
even more so the more complicated and reused these calculations are.

I like the way annotate works, there are many small pieces to put together
to get a complex result. I'm planning to look into implementing this
behaviour, but wanted to check first about your opinions:

- do you think it is possible to patch the ORM to generate sql like above?
- do you see any pitfalls or caveats to this method of performing the
query?
- if there are no pitfalls, or they can be resolved do you think that this
could be the default way of annotating calculations in django? The
examples are in postgresql, not sure about the other supported database
engines compatibility.

--
Ticket URL: <https://code.djangoproject.com/ticket/33706>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 13, 2022, 8:13:08 AM5/13/22
to django-...@googlegroups.com
#33706: Implement annotations as (lateral) joins
-------------------------------------+-------------------------------------
Reporter: Bálint Balina | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
annotate,performance,subquery,postgresql| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Bálint Balina:

Old description:

New description:

Some links:
- https://www.sisense.com/blog/reuse-calculations-in-the-same-query-with-
lateral-joins/
- https://stackoverflow.com/questions/58094851/lateral-join-in-django-
queryset-in-order-to-use-jsonb-to-recordset-postgresql-f

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33706#comment:1>

Reply all
Reply to author
Forward
0 new messages