Trouble expressing a query in the ORM

32 views
Skip to first unread message

Daniel Gagnon

unread,
Sep 8, 2011, 4:04:52 PM9/8/11
to django-users
Hi.

I have a query I am trying to perform in the ORM and I'm not sure how to do it.

I have two models: Target and Property with a one-to-many relationship. One Target can have many Properties.

I want to list all targets along with their latest properties (as stored under Property.export_date) sorted by fields on Property.

So if I have this simplified model:

Target
  - name
  - t1
  - t2

Property
  - fk_to_target
  - export_date
  - p1
  - p2

I want to have a list of record each having [name, t1, t2, export_date, p1, p2] with every target only being once in the list with its latest property.

How do I make this happen in the ORM?

Tim Shaffer

unread,
Sep 9, 2011, 7:59:37 AM9/9/11
to django...@googlegroups.com
Maybe something like this...

for target in Target.objects.all():
    property = target.property_set.latest("export_date")
    print target.name, target.t1, target.t2, property.export_date, property.p1, property.p2

Pewpewarrows

unread,
Sep 9, 2011, 9:28:26 AM9/9/11
to django...@googlegroups.com
Tim Shaffer's response would have you doing N+1 queries, and having to loop through all of your Target objects in-memory. Technically it would work, but as soon as you have a decently sized amount of data in there it'd slow to a crawl.

The best way I can see to do this straight with the Django ORM is:

prop_dates = Target.objects.annotate(latest_property=Max('property__export_date')).values_list('latest_property', flat=True)
properties = Property.objects.filter(export_date__in=prop_dates)

Which comes out to two queries. If you absolutely had to, you could execute some raw SQL to narrow it down to one query. Please note that there is a (very) small possibility that the second query might return extra Properties for a given Target. Because it's operating based on date-timestamps, there could be two Properties that happen to have the exact same export_date, one of which happens to be the most recent for a given Target. Eliminating those duplicates, if you want to account for that scenario, can be done in-memory or I believe through some adjustments to the second line above.

Christophe Pettus

unread,
Sep 9, 2011, 11:33:01 AM9/9/11
to django...@googlegroups.com

On Sep 9, 2011, at 6:28 AM, Pewpewarrows wrote:

> prop_dates = Target.objects.annotate(latest_property=Max('property__export_date')).values_list('latest_property', flat=True)
> properties = Property.objects.filter(export_date__in=prop_dates)

Note that if prop_dates has more than 10-15 entries, it's going to perform badly (at least on PostgreSQL, and almost certainly on MySQL too). I think this particular situation is definitely a .raw() opportunity.

--
-- Christophe Pettus
x...@thebuild.com

Daniel Gagnon

unread,
Sep 9, 2011, 11:36:23 AM9/9/11
to django...@googlegroups.com
On Fri, Sep 9, 2011 at 9:28 AM, Pewpewarrows <marco....@gmail.com> wrote:
Tim Shaffer's response would have you doing N+1 queries, and having to loop through all of your Target objects in-memory. Technically it would work, but as soon as you have a decently sized amount of data in there it'd slow to a crawl.


Indeed and I have to sort and pick a subset of the sort after that. I can't run a loop through thousands and thousands of items to only select a hundred.
 
The best way I can see to do this straight with the Django ORM is:

prop_dates = Target.objects.annotate(latest_property=Max('property__export_date')).values_list('latest_property', flat=True)
properties = Property.objects.filter(export_date__in=prop_dates)

Let say I have two targets which I'll call target1 and target2.

Target1 have a property for yesterday and one for today. Target2 only have a property for yesterday. Both yesterday and today will be included in prop_dates since they are the latest for at least one target. And then Target1 will have two entries in properties while it should only have one for today.
 

Which comes out to two queries. If you absolutely had to, you could execute some raw SQL to narrow it down to one query. Please note that there is a (very) small possibility that the second query might return extra Properties for a given Target. Because it's operating based on date-timestamps, there could be two Properties that happen to have the exact same export_date, one of which happens to be the most recent for a given Target. Eliminating those duplicates, if you want to account for that scenario, can be done in-memory or I believe through some adjustments to the second line above.

I just designed a SQL query, I am wondering if I couldn't translate the logic to the ORM. It looks like this:

select * from Target_Mgmt_target as t
inner join (select *,max(export_date) as max_export_date from Target_Mgmt_property group by target_id) as p on p.target_id = t.id and p.export_date = p.max_export_date;

Simon Riggs

unread,
Sep 9, 2011, 11:41:56 AM9/9/11
to django-users

Not sure I understand the limitation to 10-15 entries. Please explain?
Or EXPLAIN?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Thorsten Sanders

unread,
Sep 9, 2011, 12:35:05 PM9/9/11
to django...@googlegroups.com
--
from django.db.models import Max
Target_Mgmt_target.objects.annotate(export_date=Max('Target_Mgmt_property__export_date')).values(<vals you want>)

This should work

Pewpewarrows

unread,
Sep 9, 2011, 12:44:20 PM9/9/11
to django...@googlegroups.com
I don't understand the 10-15 limit either. I've done giant "in" queries in the past that perform fine with large data sets and proper indexing.

Pewpewarrows

unread,
Sep 9, 2011, 12:48:43 PM9/9/11
to django...@googlegroups.com



The best way I can see to do this straight with the Django ORM is:

prop_dates = Target.objects.annotate(latest_property=Max('property__export_date')).values_list('latest_property', flat=True)
properties = Property.objects.filter(export_date__in=prop_dates)

Let say I have two targets which I'll call target1 and target2.

Target1 have a property for yesterday and one for today. Target2 only have a property for yesterday. Both yesterday and today will be included in prop_dates since they are the latest for at least one target. And then Target1 will have two entries in properties while it should only have one for today.

I don't follow. The first instruction fetches all the Target objects, along with the latest Property related to each Target, and only selects those Properties' dates. You're saying Target1 has a Property A from yesterday, and B from today. Target 2 has a property C from yesterday. Only B and C's dates would be returned, not A's.

Christophe Pettus

unread,
Sep 9, 2011, 2:20:05 PM9/9/11
to django...@googlegroups.com

On Sep 9, 2011, at 8:41 AM, Simon Riggs wrote:

> Not sure I understand the limitation to 10-15 entries.

That's my rule of thumb as to when I prefer to re-express a query as a join rather than an IN. Larger values can work just fine, but __in in Django tends to be abused with gigantic inclusion sets.

Christophe Pettus

unread,
Sep 9, 2011, 2:21:28 PM9/9/11
to django...@googlegroups.com

On Sep 9, 2011, at 9:44 AM, Pewpewarrows wrote:

> I don't understand the 10-15 limit either. I've done giant "in" queries in the past that perform fine with large data sets and proper indexing.

You can set the rule of thumb whereever you like; eventually, the curves between an IN and a join will cross, however.

Stuart

unread,
Sep 12, 2011, 2:13:10 PM9/12/11
to Django users
On Sep 9, 8:28 am, Pewpewarrows <marco.cho...@gmail.com> wrote:
> Tim Shaffer's response would have you doing N+1 queries, and having to loop
> through all of your Target objects in-memory. Technically it would work, but
> as soon as you have a decently sized amount of data in there it'd slow to a
> crawl.
>

I just wanted to chime in and point out that select_related() is a
helpful tool for avoiding n+1 queries in django.

https://docs.djangoproject.com/en/1.3/ref/models/querysets/#select-related

It may not be particularly suitable for this particular problem (due
to the requirement of bringing back only the most recent property on
each target) but I wanted to mention it for the archives since no one
else had.


--Stuart
Reply all
Reply to author
Forward
0 new messages