[Django] #33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?

102 views
Skip to first unread message

Django

unread,
May 30, 2022, 4:28:56 AM5/30/22
to django-...@googlegroups.com
#33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?
-------------------------------------+-------------------------------------
Reporter: Thorben | Owner: nobody
Luepkes |
Type: | Status: new
Uncategorized |
Component: Utilities | Version: 4.0
Severity: Normal | Keywords: orm, django, join
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I think I am running into a similar problem here, where I would like to
append multiple `conditions` onto my join:
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:


{{{

class Event(models.Model):
instance_created_date = models.DateTimeField(auto_now_add=True)
car = models.ForeignKey(Car, on_delete=models.CASCADE,
related_name="car_events")
...
a lot of normal text fields here, but they dont matter for this
problem.
}}}


and


{{{

class Car(models.Model):
a lot of text fields here, but they dont matter for this problem.
hide_from_company_search = models.BooleanField(default=False)
images = models.ManyToManyField(Image, through=CarImage)
}}}

Lets say I want to query the amount of events for a given car:
{{{
def get_car_events_qs() -> QuerySet:
six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
microsecond=0) - timedelta(days=6))
cars = Car.objects.prefetch_related(
'car_events',
).filter(
some_conditions_on_fields=False,
).annotate(
num_car_events=Count(
'car_events',
filter=Q(car_events__instance_created_date__gt=six_days_ago),
distinct=True)
)

return cars
}}}


The really tricky part for this is the performance of the query: `Cars`
has `450.000` entries, and `Events` has `156.850.048`. All fields that I
am using to query are indexed. The query takes around 4 minutes to
complete, depending on the db load. It took 18 minutes before adding the
indicies.

This above ORM query will result in the following sql:


{{{
SELECT
"core_car"."id",
COUNT("analytics_carevent"."id") FILTER (WHERE
("analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date"
>= '2022-05-10T07:45:16.672279+00:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <
'2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
"analytics_carevent"."car_id")
WHERE
... some conditions that dont matter
GROUP BY
"core_car"."id"
}}}

I somehow suspect this `FILTER` to be a problem.
I tried with

{{{
.annotate(num_car_events=Count('car_events'))
}}}

and moving the `car_events__instance_created_date__gt=six_days_ago` into
the `filter`:

{{{
.filter(some_conditions_on_fields=False,
car_events__instance_created_date__gt=six_days_ago)
}}}


But of course this would filter out Cars with no Events, which is not what
we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example,
that I now would like to write into ORM, since we dont really want to use
rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
faaaaar less than the 18minutes.


{{{
SELECT
"core_car"."id",
COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
FROM
"core_car"
LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
"analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date" >
'2022-05-14T00:00:00+02:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <=
'2022-05-15T00:00:00+02:00'::timestamptz)

WHERE (some conditions that dont matter)
GROUP BY "core_car"."id";
}}}

My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the `left join`. If I just
use `filter()` it will just put it into the `where` clause, which is
wrong.
I tried:


{{{
two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
microsecond=0) - timedelta(days=2))
cars = Car.objects.prefetch_related(
'car_events',
).filter(some_filters,)
}}}

and

{{{
cars =
cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
flat=True), car_events__instance_created_date__gt=six_days_ago)
}}}


But I dont think this is quite correct. I also need the count of the
annotation.

Using Django 4 and latest python release as of this writing. :)

Thanks a lot!

**TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
of `queryset.filter()`**

--
Ticket URL: <https://code.djangoproject.com/ticket/33751>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 30, 2022, 4:31:22 AM5/30/22
to django-...@googlegroups.com
#33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?
-----------------------------------+--------------------------------------
Reporter: Thorben Luepkes | Owner: nobody
Type: Uncategorized | Status: new
Component: Utilities | Version: 4.0
Severity: Normal | Resolution:
Keywords: orm, django, join | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------------+--------------------------------------
Description changed by Thorben Luepkes:

Old description:

> and
>

New description:


{{{


and


{{{

return cars
}}}

{{{
.annotate(num_car_events=Count('car_events'))
}}}

{{{
.filter(some_conditions_on_fields=False,
car_events__instance_created_date__gt=six_days_ago)
}}}

and

I also tried with

{{{
cars = cars.annotate(events=FilteredRelation('car_events',
condition=Q(car_events__car_id__in=ads.values_list("id",
flat=True)))).filter(events__instance_created_date__gt=six_days_ago)
}}}

But this results in an inner join, which i dont think is wanted here

Using Django 4 and latest python release as of this writing. :)

Thanks a lot!

**TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
of `queryset.filter()`**

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33751#comment:1>

Django

unread,
May 30, 2022, 4:41:39 AM5/30/22
to django-...@googlegroups.com
#33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?
-----------------------------------+--------------------------------------
Reporter: Thorben Luepkes | Owner: nobody
Type: Uncategorized | Status: closed
Component: Utilities | Version: 4.0
Severity: Normal | Resolution: duplicate

Keywords: orm, django, join | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------------+--------------------------------------
Changes (by Carlton Gibson):

* status: new => closed
* resolution: => duplicate


Comment:

Cross-posted from https://code.djangoproject.com/ticket/29262#comment:19
(Please don't do that.)

--
Ticket URL: <https://code.djangoproject.com/ticket/33751#comment:2>

Reply all
Reply to author
Forward
0 new messages