{{{
Product.objects.annotate(
on_stock=Subquery(
Stock.objects.filter(
product_id=OuterRef('id'),
).values('product_id').annotate(
on_stock=Sum('quantity'),
).values('on_stock').order_by(),
output_field=DecimalField(),
),
outgoing=Subquery(
OrderItem.objects.filter(
product_id=OuterRef('id'),
).values('product_id').annotate(
outgoing=Sum('quantity'),
).values('outgoing').order_by(),
output_field=DecimalField(),
),
available=F('on_stock') - F('outgoing'),
).filter(
available__gt=0,
).values('id', 'on_stock', 'outgoing', 'available')
}}}
When aggregates are needed from multiple tables annotation can be used as
subqueries. Annotations can also be filtered against and reused. However
django implements this query with the following SQL:
{{{
SELECT "product"."id",
(SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE ( u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") AS
"on_stock",
(SELECT SUM(u0."quantity") AS "outgoing"
FROM "order_item" u0
WHERE u0."product_id" = ("product"."id")
GROUP BY u0."product_id") AS
"outgoing",
((SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE (u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") - (SELECT SUM(u0."quantity") AS
"outgoing"
FROM "order_item" u0
WHERE u0."product_id" =
("product"."id")
GROUP BY u0."product_id")) AS
"available"
FROM "product"
WHERE (((SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE (u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") - (SELECT SUM(u0."quantity") AS
"outgoing"
FROM "order_item" u0
WHERE u0."product_id" =
("product"."id")
GROUP BY u0."product_id")) > 0)
}}}
Both the on_stock and outgoing annotations are inlined 3 times and
calculated 3 times by the database, which is 6 subquery instead of 2.
See the following PostgreSQL query adjusted by hand:
{{{
SELECT "product"."id",
"on_stock",
"outgoing",
"on_stock" - "outgoing" as available
FROM "product"
, LATERAL (SELECT SUM(u0."quantity") AS "on_stock"
FROM "stock" u0
WHERE (u0."product_id" = ("product"."id"))
GROUP BY u0."product_id") AS "on_stock"
, LATERAL (SELECT SUM(u0."quantity") AS "outgoing"
FROM "order_item" u0
WHERE u0."product_id" = ("product"."id")
GROUP BY u0."product_id") AS "outgoing"
WHERE (("on_stock" - "outgoing") > 0)
}}}
The exact same subquery can be joined with the lateral keyword, producing
the exact same results, but reusing calculations. The performance benefit
is huge, it can be hundreds of milliseconds even for this simple query,
even more so the more complicated and reused these calculations are.
I like the way annotate works, there are many small pieces to put together
to get a complex result. I'm planning to look into implementing this
behaviour, but wanted to check first about your opinions:
- do you think it is possible to patch the ORM to generate sql like above?
- do you see any pitfalls or caveats to this method of performing the
query?
- if there are no pitfalls, or they can be resolved do you think that this
could be the default way of annotating calculations in django? The
examples are in postgresql, not sure about the other supported database
engines compatibility.
--
Ticket URL: <https://code.djangoproject.com/ticket/33706>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
New description:
Some links:
- https://www.sisense.com/blog/reuse-calculations-in-the-same-query-with-
lateral-joins/
- https://stackoverflow.com/questions/58094851/lateral-join-in-django-
queryset-in-order-to-use-jsonb-to-recordset-postgresql-f
--
--
Ticket URL: <https://code.djangoproject.com/ticket/33706#comment:1>