Django ORM - Am I trying to do the impossible?

31 views
Skip to first unread message

Paolo

unread,
Oct 20, 2011, 4:24:57 PM10/20/11
to django...@googlegroups.com
Hi all,

I'm developing a Django app that has a slightly complex model, as follows:

Teams have Assignments, which in turn have Assignment Updates.  The status of the Assignment (and therefore the Team) is based on the latest Assignment Update, which has a status of ACTIVE, STANDBY etc.

My issue is I've found it impossible so far to retrieve the current Status for a group of Teams/Assignments without resorting to loading everything into the app and processing, or (as I've actually done to get around the impending speed issues) have fields which act as 'pointers' to the latest Status Report for each Assignment.

The problem comes when I try and pull the assignment update id of the record.  It seems that when I use the "annotate" method, I can only get the value I'm grouping by (assignment_id) and the max value - if I try and pull out the assignment_update_id (e.g. the thing I'm interested in), the ORM just pulls back all records?

Broken code is something like the following:

Assignment.objects.filter(id__in=AssignmentUpdate.objects.filter(id__in=AssignmentUpdate.objects.filter(actual_report_time__isnull=False).values('assignment_id').annotate(actual_report_time=Max('actual_report_time')).values_list('id', flat=True), status=ASSIGNMENT_STANDBY).values_list('assignment_id', flat=True)) 

Any pointers would be appreciated - I'm not too excited about storing the latest_assignment_update_id in my Assignment model!!

Thanks,

Paul

Venkatraman S

unread,
Oct 20, 2011, 11:28:28 PM10/20/11
to django...@googlegroups.com

On Fri, Oct 21, 2011 at 1:54 AM, Paolo <pau...@gmail.com> wrote:
Any pointers would be appreciated - I'm not too excited about storing the latest_assignment_update_id in my Assignment model!!

Why need to store this?

If i understand the requirement right,  the status of the Assignement is based on the last-assignment-update. So, every update has an associated 'status' field? - right?  If yes(which looks wrong to me, as the status should be for the assignment and not for the update), then you need order by desc on the last-update, and pick the top record.

I will wait to hear more from you before i jump the gun :)

Paolo

unread,
Oct 21, 2011, 3:20:17 AM10/21/11
to django...@googlegroups.com
An additional requirement for assignments is to keep status history, so we can see "assignment 1 was active mon-wed, standby thu-fri then active again" etc.

As for the query, yes that was my initial idea but I thought it was going to be inefficient for times when I wanted to query for all assignments in a current status.  So essentially we have:

'Load all assignments into Django. Loop through each one doing a "select top record from assignment_update where assignment_id = this assignment", add it to a list of matched_assignments then pass back to the caller to display"

Where what I'm trying to ask the database for is:

"Group all assignment updates by assignment and select the one with the latest date and status X, giving me the assignment id from this record back so I can load it into a list of filtered assignments without looking at each one in turn".

Does that make sense? Mountain out of molehill you think? Interesting anyway, right? ;)

Cheers,

Paul

Matthias Kestenholz

unread,
Oct 21, 2011, 3:28:19 AM10/21/11
to django...@googlegroups.com

I think you should store _all_ fields you need both on the Assignment
and the AssignmentUpdate. In AssignmentUpdate.save, you do something
like this:

def save(self, *args, **kwargs):
super(AssignmentUpdate, self).save(*args, **kwargs)
self.assignment.field1 = self.field1
self.assignment.field2 = self.field2
self.assignment.field3 = self.field3
self.assignment.save()


If you need the current values, you don't have to know anything about
the AssignmentUpdate table, and it's quite easy to write efficient
code for searching and showing all these values.

If you need the history, sort the AssignmentUpdate instances by their
creation time and compare all fields' values for differences, f.e.
like this: http://dpaste.com/hold/638491/

(Note: You have to create the first AssignmentUpdate when creating the
Assignment itself, otherwise you'll be unable to compare the first
update to the initial values of all Assignment's fields.)


Hope it helps,
Matthias

--
https://github.com/feincms/ Want to build your own CMS?

Venkatraman S

unread,
Oct 21, 2011, 3:59:35 AM10/21/11
to django...@googlegroups.com
Actually, thinking about this thread(and other one which needs something similar), i think the best way is not to confuse the 'history/update' with the actual 'data'; i mean, that history/updates are just for tracking the revisions, the individual attributes of the entity should be present in the parent table; and only this parent table should be referred for any 'top' or 'recent' data, as it in its entirety contains the 'present' state of things.

If at all you have to show the revisions/updates/history of the data, then with the pk from the parent table, query the history table and show the records.

-V

Paolo

unread,
Oct 21, 2011, 4:28:07 AM10/21/11
to django...@googlegroups.com
Thanks guys, I think you're essentially saying the same thing in different ways - 'denormalise' the data so I can make use of the Assignment model for querying without looking at the Update models.

This clears things up for me, cheers!


Jirka Vejrazka

unread,
Oct 21, 2011, 6:50:01 AM10/21/11
to django...@googlegroups.com
Hi there,

I may be missing something obvious here, but have you tried thinking
about it the other way round? The below is just an untested hint:

class Team(models.Model):
def get_status(self):
team_status =
AssignmentUpdate.objects.filter(assignment__team=self).latest().status
return team_status.

Obviously you'd have to handle some edge cases (like non-existing
AssignmentUpdate etc.).

So rather that working your way down from Team, work your way up from
AssignmentUpdate, filter by foreign keys (check documentation for
filters that span relationships) and get your latest() set up properly
(hint: class Meta). You should not really need values_list for this
type of request.

Hope it helps

Jirka

Paolo

unread,
Oct 21, 2011, 6:57:16 AM10/21/11
to django...@googlegroups.com
Hi Jirka,

Thanks for the response, but if I understand your post then the issue is still the need to iterate over every Team (as opposed to every assignment) in order to construct a collection of Teams in a certain Status... if that makes sense.

I think the crux of the issue is a limitation of Django's ORM to retrieve anything except the fields you are grouping by/working with with the "annotate" and "values" methods.  E.g. I'm grouping by the assignment_id in assignment_update and getting the max_date from each of the groups, but Django is unable to give me the assignment_update_id of the max_date record it finds (in this case it just returns all records...).

Cheers,

Paul

Jirka Vejrazka

unread,
Oct 21, 2011, 8:29:59 AM10/21/11
to django...@googlegroups.com
I thought I missed something :)

First of all, nothing stops you from writing raw SQL query if you had
one in mind. It might be more sensible than trying to massage the ORM
too much.

If you insist on ORM, but something along these lines *might* work (I
really don't have the opportunity to test it right now, need to catch
flight :)

AssignmentUpdate.objects.order_by('-actual_report_time').values_list('assignment__team',
'status').distinct()
and then you can simply select the first entry for each team from the list.

HTH

Jirka

creecode

unread,
Oct 21, 2011, 12:34:25 PM10/21/11
to django...@googlegroups.com
I recently ran across a similar sounding technique < http://www.databasejournal.com/features/mssql/article.php/3712571/SqlCredit-150-Part-11-Change-Tracking-Using-History-Records.htm > which I'm attempting to use in a current project.  It's not Django specific but it the techniques discussed may be of use.

Toodle-loooooooooo...........
creecode
Reply all
Reply to author
Forward
0 new messages