Why doesn't ModelChoiceField.queryset support slicing?

640 views
Skip to first unread message

Kamil Śliwak

unread,
Oct 21, 2014, 12:41:29 PM10/21/14
to django-d...@googlegroups.com
Hi

I have just stumbled upon a problem with how ModelChoiceField handles its 'queryset' argument and I'd like to ask whether it's a deliberate design choice or if I should report it as a bug.


Let's say you have a model called Book:

    class Book(models.Model):
        rating = models.IntegerField()


 You want to create a form that lets user select one of tghe top rated books. So you try:

    class BookForm(forms.Form):
        book = forms.ModelChoiceField(queryset = Book.objects.order_by('-rating')[:100])


It appears to work - the form can be rendered and you can choose one of a hundered top-rated books. But when you submit, you get the following error:

    AssertionError: Cannot filter a query once a slice has been taken.

The error is caused by ModelChoiceField.to_python() validating the existence of the selected item by calling get() on the queryset:

    value = self.queryset.get(**{key: value})

And this is not supported for sliced querysets as the error above states. It might have been be a deliberate choice not to support this use case in ModelChoiceField though I don't see any comments about that in the code or any mentions in ModelChoiceField.queryset docs. I would also expect it to be a common need. This and the fact that it can be trivially solved by replacing get() with SQL INTERSECT (implemented as operator & in Django) makes me suspect that there would be some issue with that.

Can anyone shed some light on this? Are there any performance or database-combatibility issues involved? Or is it just a bug?

Marc Tamlyn

unread,
Oct 22, 2014, 3:51:11 AM10/22/14
to django-d...@googlegroups.com

Personally I don't feel that it's a particularly common need, but I can see your use case. More common is that a reasonable filter is applicable - say books with a rating over 80.

As far as I know we don't have any support for SQL intersects and I don't believe there are any plans to introduce it.

I do agree that this should be both documented and preferably raise an error - please open a ticket.

As for a work around, using a ChoiceField directly shouldn't be particularly difficult in this case. ModelChoiceField has some nice features but it is limited.

Marc

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/e264b254-4208-4e39-a50b-6657f8817996%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shai Berger

unread,
Oct 22, 2014, 5:15:49 AM10/22/14
to django-d...@googlegroups.com
On Wednesday 22 October 2014 10:50:51 Marc Tamlyn wrote:
>
> As for a work around, using a ChoiceField directly shouldn't be
> particularly difficult in this case. ModelChoiceField has some nice
> features but it is limited.
>

Using a ChoiceField as a substitute for ModelChoiceField isn't hard, but one
should be aware of the pitfalls. The obvious nicety handled by
ModelChoiceField is the transformation from ids back to objects; the less
obvious one is that a ModelChoiceField runs the query anew for every form;
your ChoiceField cannot be initialized by a query in its definition, and needs
to have its choices set in the form's __init__().

HTH,
Shai.

Thomas

unread,
Oct 22, 2014, 8:27:19 AM10/22/14
to django-d...@googlegroups.com
I believe the problem is related to the fact that a sliced queryset cannot be filtered (it currently raises an AssertionError when trying to do so). If you need to produce a "top n results" queryset which can be filtered — which is implicitely required for the queryset you pass to a ModelChoiceField — you can base your condition on some subquery, for instance :

class BookForm(forms.Form):
        book = forms.ModelChoiceField(queryset=Book.objects.filter(pk__in=Book.objects.order_by('-rating')[:100].values_list('pk')))

I don't know if there is a better way to do that...

Kamil Śliwak

unread,
Nov 6, 2014, 6:16:38 AM11/6/14
to django-d...@googlegroups.com
Sorry, for a late reply.


Personally I don't feel that it's a particularly common need, but I can see your use case. More common is that a reasonable filter is applicable - say books with a rating over 80.

Yeah, I thought about using some other filter but it's a bit harder to control the number of results that way. You can get more or less than you want if you can't predict the contents of the database. This makes sense in some situations but in others, like in mine, you just want top N results, no matter what they are and getting that with a filter is just a cumbersome workaround for something that can be easily expressed in SQL.


As far as I know we don't have any support for SQL intersects and I don't believe there are any plans to introduce it.

OK. My bad. I was under impression that you can get an INTERSECT with something like

Book.objects.filter(id__gt = 100) & Book.objects.filter(id = 300)

But looking at the SQL now I see that this simply combines the conditions with AND and that something like this fails:

Book.objects.filter(id__gt = 100)[:10] & Book.objects.filter(id = 300)

In that case the solution is not as straightforward as I originally thought.


I do agree that this should be both documented and preferably raise an error - please open a ticket.

OK.

The obvious nicety handled by
ModelChoiceField is the transformation from ids back to objects; the less
obvious one is that a ModelChoiceField runs the query anew for every form;
your ChoiceField cannot be initialized by a query in its definition, and needs
to have its choices set in the form's __init__().

Thanks, good to know. I actually do want the instance to make this query, not the class as since the content of my table is not entirely constant.

I can also add another gotcha to your list - if you fetch objects in class definition, the query is performed before the test runner can switch database to TEST_DATABASE so the data comes from the default database. I ran into this when trying to set 'initial' and it was breaking my tests. I have to move it to __init__().


you can base your condition on some subquery, for instance :

class BookForm(forms.Form):
        book = forms.ModelChoiceField(queryset=Book.objects.filter(pk__in=Book.objects.order_by('-rating')[:100].values_list('pk')))

I don't know if there is a better way to do that...

Thanks, that's a really nifty workaround. I thought that something like this would make two queries but I see that Django is smart enough to combine it into a single query with a subquery. Nice. A great thing is that I don't see any downsides to it except that it's a bit less clear than doing the limit in the main query. And I think that it shouldn't really degrade performance as the outer query is trivial.

Maybe having the from detect LIMIT and do something like this internally would be a good way to handle this in the Form class?

Marc Tamlyn

unread,
Nov 6, 2014, 3:34:53 PM11/6/14
to django-d...@googlegroups.com
The dangers of fetching objects in the class definition is precisely why the extra `.all()` calls exist in `ModelChoiceField` and part of the reason why this does not work. The proposed code from Thomas is dangerous, and even putting that code in the `__init__` will result in rather inefficient queries, although it will work.

To be more explicit as to why this is dangerous: django queries executed at module level will be executed on start of the server, and not rerun as the data or database underneath them changes until the server is restarted. Depending on where they are, they may completely break the `migrate` command needed to create the tables!

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Thomas

unread,
Nov 7, 2014, 4:33:22 AM11/7/14
to django-d...@googlegroups.com
It's true that the generated queries might be inefficient (I'm not sure all DBMS can optimize something like "SELECT ... FROM foo WHERE id IN (SELECT id FROM foo ORDER BY bar LIMIT 10)"). I would not use such a construction on a big production dataset and if performance really matters I would do something else.

However, I don't think this is subject to the kind of problem you describe as the inner queryset should be turned into a subquery of the main queryset.

Kamil Śliwak

unread,
Nov 7, 2014, 5:10:54 AM11/7/14
to django-d...@googlegroups.com
However, I don't think this is subject to the kind of problem you describe as the inner queryset should be turned into a subquery of the main queryset.

Yes, I can confirm that I used this solution and it returns a QuerySet. It does not perform query at module load time in my case.


It's true that the generated queries might be inefficient (I'm not sure all DBMS can optimize something like "SELECT ... FROM foo WHERE id IN (SELECT id FROM foo ORDER BY bar LIMIT 10)"). I would not use such a construction on a big production dataset and if performance really matters I would do something else.

My table is not very large (5-10k rows) so the maintenance cost of a solution that would require me to periodically check if the query is not returning too little or to many rows is much bigger concern than performance. But I guess it does matter if it were to be implemented like this internally in Django as I proposed. You're right, the version with a subqery is much slower than the plain one, at least in PostgreSQL. i tried it on one of my tables (unrelated to my example) that contains 1906 rows:

Plain query


EXPLAIN ANALYZE SELECT
   
"baseapp_award"."id",
   
"baseapp_award"."profile_awards_id",
   
"baseapp_award"."name",
   
"baseapp_award"."date_achieved",
   
"baseapp_award"."recognition_level"
FROM
"baseapp_award"
ORDER BY
"baseapp_award"."id" DESC
LIMIT
100;



                                                                       QUERY PLAN                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------
 
Limit  (cost=0.28..4.51 rows=100 width=38) (actual time=0.024..0.118 rows=100 loops=1)
   
->  Index Scan Backward using baseapp_award_pkey on baseapp_award  (cost=0.28..80.94 rows=1906 width=38) (actual time=0.022..0.097 rows=100 loops=1)
 
Total runtime: 0.171 ms



Subquery

EXPLAIN ANALYZE SELECT
   
"baseapp_award"."id",
   
"baseapp_award"."profile_awards_id",
   
"baseapp_award"."name",
   
"baseapp_award"."date_achieved",
   
"baseapp_award"."recognition_level"
FROM
"baseapp_award"
WHERE
   
"baseapp_award"."id" IN (
        SELECT U0
."id"
        FROM
"baseapp_award" U0
        ORDER BY U0
."id" DESC
        LIMIT
100
   
);

                                                                                QUERY PLAN                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Hash Semi Join  (cost=6.76..48.94 rows=100 width=38) (actual time=0.763..0.841 rows=100 loops=1)
   
Hash Cond: (baseapp_award.id = u0.id)
   
->  Seq Scan on baseapp_award  (cost=0.00..36.06 rows=1906 width=38) (actual time=0.007..0.324 rows=1908 loops=1)
   
->  Hash  (cost=5.51..5.51 rows=100 width=4) (actual time=0.112..0.112 rows=100 loops=1)
         
Buckets: 1024  Batches: 1  Memory Usage: 4kB
         
->  Limit  (cost=0.28..4.51 rows=100 width=4) (actual time=0.009..0.083 rows=100 loops=1)
               
->  Index Only Scan Backward using baseapp_award_pkey on baseapp_award u0  (cost=0.28..80.94 rows=1906 width=4) (actual time=0.009..0.063 rows=100 loops=1)
                     
Heap Fetches: 100
 
Total runtime: 0.904 ms

The plain version runs in about 0.2 sec on my development machine while the one with subquery requires more than 4x times as much. Mostly because of the extra sequential scan and a semi-join.
Reply all
Reply to author
Forward
0 new messages