Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
For QuerySet qs, list(qs)[k] does not always equal qs[k]
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  14 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
David Gouldin  
View profile  
 More options Apr 24 2009, 10:09 am
From: David Gouldin <dgoul...@gmail.com>
Date: Fri, 24 Apr 2009 07:09:45 -0700 (PDT)
Local: Fri, Apr 24 2009 10:09 am
Subject: For QuerySet qs, list(qs)[k] does not always equal qs[k]
I've run into a bit of an odd problem with the ORM.  I have a model
without an order_by in its Meta.  When I create a queryset and iter
through it, I get the items in a sensical fashion.  However, when I
access the the queryset items by index (qs[0], qs[1], etc), not only
do I get the items back in a different order from iter, I occasionally
get the same item multiple times!

So I dug into the QuerySet class, and I think I understand why this is
happening.  On __getitem__ (i.e. qs[k]), a limit/offset is applied to
a clone of the queryset.  While I think the intent is to retrieve the
kth item from the queryset just as it would have been if reached in
the process of an iter (in other words, list(qs)[k] == qs[k] for all
k), the limit/offset method of retrieval is inherently different due
to a lack of ordering in the database.

Lest you criticize accessing all queryset items by index, Django is
currently doing just that in
django.forms.models.BaseModelFormSet._construct_form.  This means that
when creating an inline for a model without an order_by, Django can
(and has for me) paired forms in the inline formset with the wrong
model instance!

As a solution to this problem, I propose that all querysets include an
order by on their primary key.  This will not only guarantee that
duplicate items are not retrieved from getitem; it will also mean that
the answer from iter will match getitem's answer.

For those turned off by the idea of adding an order_by to all
querysets, perhaps a compromise can be reached by only adding it to
the cloned queryset created by getitem.  That way, all items returned
by iter(qs) would also be returned in qs[k] for all k, even if we
could not guarantee that it would match iter's ordering.

In any case, I think it's necessary to ensure qs[k] consistently
returns the same object for a given qs.  I've opened a ticket for the
issue:

http://code.djangoproject.com/ticket/10915


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Gaynor  
View profile  
 More options Apr 24 2009, 10:14 am
From: Alex Gaynor <alex.gay...@gmail.com>
Date: Fri, 24 Apr 2009 10:14:26 -0400
Local: Fri, Apr 24 2009 10:14 am
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

This is only ever an issue if you are using an unsorted queryset, and by
definition if it's unsorted there is no defined ordereding.  Indeed
BaseModelFormset was changed just yesterday to ensure there is always an
ordering on the queryset to avoid just this issue.

Alex

--
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Gouldin  
View profile  
 More options Apr 24 2009, 10:24 am
From: David Gouldin <dgoul...@gmail.com>
Date: Fri, 24 Apr 2009 07:24:33 -0700 (PDT)
Local: Fri, Apr 24 2009 10:24 am
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]
There's a more subtle way to introduce the same bug.  You can order on
a field that is not unique and run into the same issue: an undefined
return order due to duplicate values on the order column.

On Apr 24, 9:14 am, Alex Gaynor <alex.gay...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jacob Kaplan-Moss  
View profile  
 More options Apr 24 2009, 11:05 am
From: Jacob Kaplan-Moss <jacob.kaplanm...@gmail.com>
Date: Fri, 24 Apr 2009 10:05:39 -0500
Local: Fri, Apr 24 2009 11:05 am
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

On Fri, Apr 24, 2009 at 9:09 AM, David Gouldin <dgoul...@gmail.com> wrote:
> Lest you criticize accessing all queryset items by index, Django is
> currently doing just that in
> django.forms.models.BaseModelFormSet._construct_form.  This means that
> when creating an inline for a model without an order_by, Django can
> (and has for me) paired forms in the inline formset with the wrong
> model instance!

I actually fixed that yesterday: http://code.djangoproject.com/changeset/10625

1.1 now includes a ``QuerySet.ordered`` property which FormSet uses to
artificially apply ordering, if needed. You can -- and should -- do
the same.

Jacob


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Apr 24 2009, 3:01 pm
From: Karen Tracey <kmtra...@gmail.com>
Date: Fri, 24 Apr 2009 15:01:59 -0400
Local: Fri, Apr 24 2009 3:01 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

On Fri, Apr 24, 2009 at 10:24 AM, David Gouldin <dgoul...@gmail.com> wrote:

> There's a more subtle way to introduce the same bug.  You can order on
> a field that is not unique and run into the same issue: an undefined
> return order due to duplicate values on the order column.

This is true, and that could lead to your model formset processing during
POST incorrectly matching up POST data with model instances, still, even
with the recent fix in place.  The recent fix only addresses a subset
(hopefully a large one) of the problem.

If we are going to consider fixing remaining problems in this area, I just
want to note that there is an entirely different way to trigger mismatching
POST data with model instances when using model formsets, that will continue
to exist no matter how perfectly we solve this ordering issue.  That is if
the affected QuerySet changes between the time the formset is originally
constructed during GET and when it is eventually POSTed back.

Say, for example, I am going to add my rating for a movie in some
hypothetical Django app that lets people rate movies.  I get the movie's
rating page, which happens to have a model formset containing all existing
ratings for that movie.  Say there are already 5 ratings.  However before I
manage to fill in my data (maybe I'm interrupted by a question, go out to
lunch, whatever) and hit "Save", some number of (lets say 4) other people
have added their ratings.

Now when I come back and hit "Save" my POST data may not match up properly
with the instance data from the current QuerySet. I post data for 6 entries
(the 5 that existed plus mine), the QuerySet computed during POST and passed
in to construct the formset will have 9.  Things may turn out OK (though I
haven't actually tried this) -- the data I post may be matched up properly
with instances, if the ordering is set such that the 'new' ratings that have
no matching POST data come last in the QuerySet.  If not, though, my POSTed
data may get matched to entirely different instances.

I believe the situation is worse if items are deleted in the interim.  Maybe
my app is tracking unresolved issues, and while I was off having lunch
someone updated things such that when I get back there are only 2 unresolved
issues plus the one I was adding, whereas when I got the page originally
there were 5.  In this case (though again I haven't tried it) I think the
model formset creation code will index off the end of the QuerySet and cause
a server error.

Essentially the model formset code seems to assume the QuerySet using during
get and post processing is going to be exactly the same, but I think
applications will have a hard time guaranteeing that.  Unfortunately the
failure modes when that assumption doesn't hold are likely to be pretty
cryptic and hard to track down, at least at the moment.

I don't see how to solve the general problem at the framework level, but
there may be some additional sanity checks we could add to help out here.
For instance we could make the formset validation return False if the number
of items in the passed QuerySet is not what we expect based on the data
dictionary, and/or check (during validation?) that the pk values in the post
data match up to the instance pk values after we've matched them up, etc.
To anyone with a better knowledge of model fomsets than me: does this sort
of additional validation sound reasonable?

Sorry for the tangent -- but this issue with indexing and ordering I think
has only come up as many times as it has because of the model formset's use
of this construct.  If people are running into this index/order problem I
fear they will also start running into these other potential problems with
how model formsets match up data to instances.  Also, if we look at
addressing this other problem it may not be as necessary to worry about
fixing whatever subset of cases isn't covered by the recent ordering fix --
that is, if the resulting error was clear enough, we could leave it up to
the programmer to fix the situation (provide a unique ordering) rather than
having to figure out inside model formset creation whether the ordering that
exists is unique.

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joseph Kocherhans  
View profile  
 More options Apr 24 2009, 3:24 pm
From: Joseph Kocherhans <jkocherh...@gmail.com>
Date: Fri, 24 Apr 2009 14:24:55 -0500
Local: Fri, Apr 24 2009 3:24 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

On Fri, Apr 24, 2009 at 2:01 PM, Karen Tracey <kmtra...@gmail.com> wrote:

> I don't see how to solve the general problem at the framework level, but
> there may be some additional sanity checks we could add to help out here.
> For instance we could make the formset validation return False if the number
> of items in the passed QuerySet is not what we expect based on the data
> dictionary, and/or check (during validation?) that the pk values in the post
> data match up to the instance pk values after we've matched them up, etc.
> To anyone with a better knowledge of model fomsets than me: does this sort
> of additional validation sound reasonable?

I might as well put this out there in case someone crazy like Alex wants to
try it. :) My thought for the hard way to fix the issue for formsets is to
use the primary key as part of the form prefix and/or put the primary keys
into hidden fields then use the data from the form to look up the objects.
It's obviously going to be more complicated than that (was the object
already deleted? is it allowed to be changed?), but it's an approach that
could work I think.

Joseph


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Gaynor  
View profile  
 More options Apr 24 2009, 4:41 pm
From: Alex Gaynor <alex.gay...@gmail.com>
Date: Fri, 24 Apr 2009 16:41:04 -0400
Local: Fri, Apr 24 2009 4:41 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

On Fri, Apr 24, 2009 at 3:24 PM, Joseph Kocherhans <jkocherh...@gmail.com>wrote:

We already have most of the pieces in place for this, we need to play with
the architecture a bit though.  Any form in a ModelFormset already stores
the pk for it's object:
http://code.djangoproject.com/browser/django/trunk/django/forms/model....
 However we don't use that when providing the instance variable:
http://code.djangoproject.com/browser/django/trunk/django/forms/model....
 The issue is that that pk field doens't exist on the form until it's
been
constructed, so we need to duplicate a tiny bit of the logic to pull the pk
out of the data before the form has been constructed.  I can take a look at
it this weekend(time permitting) unless someone else wants to do it.  There
shouldn't be too much work that needs doing, but writing tests for this
might be a bit messy.

Alex

--
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Apr 24 2009, 5:49 pm
From: Karen Tracey <kmtra...@gmail.com>
Date: Fri, 24 Apr 2009 17:49:56 -0400
Local: Fri, Apr 24 2009 5:49 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

We already have most of the pieces in place for this, we need to play with

> the architecture a bit though.  Any form in a ModelFormset already stores
> the pk for it's object:
> http://code.djangoproject.com/browser/django/trunk/django/forms/model....  However we don't use that when providing the instance variable:
> http://code.djangoproject.com/browser/django/trunk/django/forms/model....  The issue is that that pk field doens't exist on the form until it's been
> constructed, so we need to duplicate a tiny bit of the logic to pull the pk
> out of the data before the form has been constructed.  I can take a look at
> it this weekend(time permitting) unless someone else wants to do it.  There
> shouldn't be too much work that needs doing, but writing tests for this
> might be a bit messy.

Note if we go with this approach (if I'm understanding it correctly) we are
slightly changing the way in which we expect model formsets to be used.  We
currently document that the same QuerySet should be passed in during POST as
GET:

http://docs.djangoproject.com/en/dev/topics/forms/modelforms/#using-a...

Switching to an approach that uses the pk ids in the POST data for lookup
means we'd:

- silently ignore a queryset argument when we're given a data dictionary?
- issue some sort of deprecation warning about the now-unnecessary and
unused parameter?

Also, it would be nice if we could accomplish the lookup in one DB call
rather than looking up each individual item in a separate query.  Having a
model formset with hundreds of entries suddenly require hundreds of DB hits
during POST processing instead of one isn't too nice.  Not sure if you were
planning that?

These two concerns pushed me more towards thinking that leaving the DB
lookup as-is and adding some validation sanity checks for when the passed
queryset doesn't match up with the POST data might be a "good enough"
approach.

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Gaynor  
View profile  
 More options Apr 24 2009, 5:52 pm
From: Alex Gaynor <alex.gay...@gmail.com>
Date: Fri, 24 Apr 2009 17:52:38 -0400
Local: Fri, Apr 24 2009 5:52 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

My approach would be to use only items in the iniital queryset argument.

> - issue some sort of deprecation warning about the now-unnecessary and
> unused parameter?

> Also, it would be nice if we could accomplish the lookup in one DB call
> rather than looking up each individual item in a separate query.  Having a
> model formset with hundreds of entries suddenly require hundreds of DB hits
> during POST processing instead of one isn't too nice.  Not sure if you were
> planning that?

My appraoch was basically going to be

pk = something
kwargs['instance'] = [o for o in self.queryset if o.pk == pk][0]

Which would only be 1 query, and basically has the same behavior as we do
now, but correctly :).

> These two concerns pushed me more towards thinking that leaving the DB
> lookup as-is and adding some validation sanity checks for when the passed
> queryset doesn't match up with the POST data might be a "good enough"
> approach.

> Karen

Alex

--
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joseph Kocherhans  
View profile  
 More options Apr 24 2009, 5:59 pm
From: Joseph Kocherhans <jkocherh...@gmail.com>
Date: Fri, 24 Apr 2009 16:59:35 -0500
Local: Fri, Apr 24 2009 5:59 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

My thought on this was to continue to use the queryset argument and do
something like qs.filter(pk__in=(list_of_pks)) for access restriction in
case someone decided it would be fun to play with the POST data and try to
change objects they shouldn't be allowed to.

Joseph


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Apr 24 2009, 6:02 pm
From: Karen Tracey <kmtra...@gmail.com>
Date: Fri, 24 Apr 2009 18:02:11 -0400
Local: Fri, Apr 24 2009 6:02 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

Ah, OK I see now (I think). So you would still use the queryset arg, just
make sure the results got matched up by pk correctly.  (And still index off
the end if an item had been deleted in between GET and POST....)  The n
repetitions of a linear search through the n-item queryset to find the right
item doesn't thrill me, but I suppose it is better than n DB hits.

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Gaynor  
View profile  
 More options Apr 24 2009, 6:05 pm
From: Alex Gaynor <alex.gay...@gmail.com>
Date: Fri, 24 Apr 2009 18:05:46 -0400
Local: Fri, Apr 24 2009 6:05 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

Yeah, that was a pretty lazy O(n^2) algorithm, the smart way to do it would
be:

if not hasattr(self, '_queryset_dict'):
    self._queryset_dict = dict([(o.pk, o) for o in self.queryset])
pk = something
kwargs['instance'] = self._queryset_dict[pk]

Alex

--
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Apr 24 2009, 6:06 pm
From: Karen Tracey <kmtra...@gmail.com>
Date: Fri, 24 Apr 2009 18:06:41 -0400
Local: Fri, Apr 24 2009 6:06 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

On Fri, Apr 24, 2009 at 5:59 PM, Joseph Kocherhans <jkocherh...@gmail.com>wrote:

It'd be nice if that construct were guaranteed to return the objects is the
order in list_of_pks...but I don't think that's guaranteed?

At any rate you're both continuing to use the queryset arg and doing the
lookup in one query, so my concerns as to this approach were unfounded.

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Karen Tracey  
View profile  
 More options Apr 24 2009, 6:27 pm
From: Karen Tracey <kmtra...@gmail.com>
Date: Fri, 24 Apr 2009 18:27:16 -0400
Local: Fri, Apr 24 2009 6:27 pm
Subject: Re: For QuerySet qs, list(qs)[k] does not always equal qs[k]

On Fri, Apr 24, 2009 at 6:05 PM, Alex Gaynor <alex.gay...@gmail.com> wrote:
> Yeah, that was a pretty lazy O(n^2) algorithm, the smart way to do it would
> be:

> if not hasattr(self, '_queryset_dict'):
>     self._queryset_dict = dict([(o.pk, o) for o in self.queryset])
> pk = something
> kwargs['instance'] = self._queryset_dict[pk]

OK, I'm convinced you can make it more efficient for large n.  (Still don't
want to deal with items that have been deleted though, huh?)  Anyway I
opened a ticker for this -- #10922.  Not sure it needs to be solved for
1.1...we've got a bunch of other open tickets still to fix and I'm not sure
how common running into this problem is....

Karen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »