'too many SQL variables' error with ModelMultipleChoiceField

613 views
Skip to first unread message

Lukas Zilka

unread,
Apr 22, 2012, 9:38:52 PM4/22/12
to Django users
Hello,

I have a form with ModelMultipleChoiceField on it, and I am getting
'DatabaseError: too many SQL variables' (using SQLite) when the user
picks more than 1000 entries in the selection widget and posts the
form.

The problem seems to be the method clean of ModelMultipleChoiceField,
which tries to select objects from the database simply by the IN SQL
clause (e.g. SELECT * FROM projects WHERE id IN (1,2,3)). When the
number of numbers in the IN argument rises over 1000 the too many SQL
variables happens.

I am using the most recent version of Django (1.5.dev17922), though I
think it is irrelevant because similar issues happened even with older
versions.

I have always worked around this problem by custom temporary models
that I used for joins of more complex queries. But more and more it
seems to me that, either there already is a systematic solution to
this problem that I am missing (which I hope somebody could point out
to me), or there at least needs to be a need for it -- how do you
address similar situations? -- in which case I can start thinking how
to propose something general that could later be incorporated into
Django.

Thanks for any suggestions or help.

Russell Keith-Magee

unread,
Apr 22, 2012, 10:56:19 PM4/22/12
to django...@googlegroups.com
Hi Lukas,

I haven't looked into the problem in detail, but it doesn't surprise me that it exists.

It's also probable that it's something that is new to 1.4 (or, at least, that it manifests in slightly different ways in 1.4). One of the features added in 1.4 is bulk insertion of data. This means that you can use a single SQL statement to insert all your m2m relation; however, the downside is that on SQLite, there is a limit of 500 insertions that can be done at any one time (2 values per insertion).

This is logged as ticket #17788. There have been some discussions about a fix for this problem that will break the bulk insertion into batches. I suspect that when this problem is fixed, your problem will go away.

That said, I would also suggest you have a think about how you represent this widget in your UI. The approach you're describing is going to be very slow for the end user -- every time they load the page, they're going to have to download the HTML for a widget 1000 choices; when they send their selections, they're going to have to POST up to 1000 choices back to the server. This all takes time, and won't result in a particularly responsive web page.

If you've got a situation where there are 1000 options, but usually only a small number -- say, O(10) -- are selected, then perhaps an AJAX autocomplete widget of some kind would be a better approach. If the user is going to select a lot more options, then you might need to use a more exotic widget that avoids the need to transfer full lists back and forth.

Yours,
Russ Magee %-)

> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To post to this group, send email to django...@googlegroups.com (mailto:django...@googlegroups.com).
> To unsubscribe from this group, send email to django-users...@googlegroups.com (mailto:django-users...@googlegroups.com).
> For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

akaariai

unread,
Apr 23, 2012, 8:58:19 AM4/23/12
to django...@googlegroups.com
On Monday, April 23, 2012 1:56:19 AM UTC+3, Russell Keith-Magee wrote:
It's also probable that it's something that is new to 1.4 (or, at least, that it manifests in slightly different ways in 1.4). One of the features added in 1.4 is bulk insertion of data. This means that you can use a single SQL statement to insert all your m2m relation; however, the downside is that on SQLite, there is a limit of 500 insertions that can be done at any one time (2 values per insertion).

This is logged as ticket #17788. There have been some discussions about a fix for this problem that will break the bulk insertion into batches. I suspect that when this problem is fixed, your problem will go away.

There are at least three different instances of the 1000 variables problem. One is the bulk_create() which is likely to get fixed. Then there is delete(). This might get fixed. The last one is qs.filter(id__in=large_list) which will not get fixed.

The reason for fixing bulk_create() is that it is somewhat easy to do so. Just split the batch of objects into smaller batches and iterate. delete() isn't as easy, but the same split + iterate approach could work. If it gets too complicated then it will not be fixed, if it is easy then it might get fixed. The last one, qs.filter(id__in=large_list) is pretty much impossible to fix. The reason is you can't split that query into multiple parts. Or not for any non-trivial query anyways. For example ORDER BY is enough to break any attempt to split the query into parts.

While Django's ORM should hide limitations in the underlying database it can't do it perfectly. My viewpoint is that Django should hide limitations if it is easy to do so. It is all about cost-benefit ratio.

For your particular problem you have at least these choices:
  1. Use different database. I guess you have a reason to use SQLite so this might not be an option.
  2. Compile SQLite yourself with a higher parameter limit. If you need to deploy your software on multiple machines this might be hard to do.
  3. Work around the issues in your code. Maybe a ModelMultipleChoiceField subclass which does the .clean() in a safe way would work. Or better yet do as Russell said: try to remove the need to have 1000 options in one field altogether.

 - Anssi

Lukas Zilka

unread,
Apr 23, 2012, 10:16:28 AM4/23/12
to django...@googlegroups.com
Thanks for your reply Anssi.

My previous post probably got lost somewhere (or maybe just awaiting moderation?), so I am going to repeat the relevant part. It is related with your point on qs.filter(id__in=large_list), so I will try to say it again.

Why do you use SQL variables when you construct query for the IN operator? It does not make sense to me for both -- bulk inserts (which you are currently solving), as well as the IN operator -- or at least not for integer arguments. I think there is no limit on the IN clause argument size in SQLite and for databases where there is, it could be solved by breaking it into OR clauses, as is already done). A simple solution for the IN operator problems would be to change the Django code in db/models/sql/where.py in a way that it puts the parameters into the SQL query straight away instead of as parameters, e.g.:
201:                return ('%s IN (%s)' % (field_sql,
202:                                        ', '.join(repeat('%s', len(params))) % tuple(params)) , ())

This worked for my particular case. The only limitation that can be reached is a 1MB size limit on query in SQLite. Is this going to break anything?

Best, 
Lukas

Javier Guerra Giraldez

unread,
Apr 23, 2012, 10:43:51 AM4/23/12
to django...@googlegroups.com
On Mon, Apr 23, 2012 at 3:58 AM, akaariai <akaa...@gmail.com> wrote:
> The last one, qs.filter(id__in=large_list) is pretty much impossible to fix.

what's wrong with:

qs.filter((Q(id__in=large_list[:n])|Q(id__in=large_list[n:]))) ?

>   1. Use different database. I guess you have a reason to use SQLite so this
> might not be an option.

Oracle has a 1000-element limit for the IN (....) clause. Of course,
it does work with (IN(<1sthalf>) or IN(<2ndhalf>))

--
Javier

akaariai

unread,
Apr 23, 2012, 10:54:44 AM4/23/12
to Django users
On Apr 23, 1:43 pm, Javier Guerra Giraldez <jav...@guerrag.com> wrote:
> On Mon, Apr 23, 2012 at 3:58 AM, akaariai <akaar...@gmail.com> wrote:
> > The last one, qs.filter(id__in=large_list) is pretty much impossible to fix.
>
> what's wrong with:
>
> qs.filter((Q(id__in=large_list[:n])|Q(id__in=large_list[n:])))  ?
>
> >   1. Use different database. I guess you have a reason to use SQLite so this
> > might not be an option.
>
> Oracle has a 1000-element limit for the IN (....)  clause.  Of course,
> it does work with (IN(<1sthalf>) or IN(<2ndhalf>))

SQLite has the limit for the whole query, Oracle for just one IN
clause. For Oracle we already do that split. For SQLite we would need
to split the _whole query_ into two parts (that is, run
two .exectutes()) and then "join" the parts in Python. The joining in
Python is the hard part.

- Anssi

Lukas Zilka

unread,
Apr 23, 2012, 7:48:01 AM4/23/12
to Django users
Hey Russ,

thanks for your reply. I looked at the bulk insertion problem, and
that made me think that probably using the variables themselves is the
problem. It does not really make sense to me why anybody bothers to
use variables in a long SQL query for the IN clause, let alone a bulk
insertion? Does that benefit anything?

Anyway, I conducted an experiment and tried a simple SQL query with
lots of items in the IN clause to prove that its the variables
themselves, not the limitation of IN clause in SQLite, is whats
causing the problem. I have been successfully able to execute a query
as "SELECT * FROM projects WHERE id IN (...)" where on the place of
`...` there were 15,000 numbers. So SQLite has no problem with lot of
stuff as an argument of the IN operator. Now, the only limitation is
the length of the SQL query itself. According to the SQLite
documentation it is 1megabyte, and that suffices for 100k+ elements.
With that big query, you are right, it would be very impractical and
slow for the end users to interact with the application (sending 1M of
data to the webserver will probably be very unresponsive), so I think
this is a fair limitation that should never be exceeded for this use.

In Django code it seems that it would suffice to make a change in the
file 'db/models/sql/where.py'. Particularly, condition on the number
of elements of IN, and, if it is over let's say 100 of them, put them
into the SQL query directly as a string (e.g. '1,2,3') - not as
variables('?, ?, ?').

Though, for the future, I still believe there should be a better
solution than to rely on not reaching this limit. I would propose that
a temporary table should be created, filled up with the right
arguments of IN operator, and the query rewritten as a JOIN on the
original left argument of IN (or for simplicity, and perhaps worse
performance, a nested query) and this temporary table. That of course
only if the number of elements on the right side of IN is more than
some number. But this is for another discussion.

My question is therefore: Will the change in Django code that I
propose have any bad consequences or do you think it might actually
work satisfactorily?

Thanks.

Best,
Lukas

On Apr 23, 12:56 am, Russell Keith-Magee <russ...@keith-magee.com>
wrote:

akaariai

unread,
Apr 23, 2012, 12:06:12 PM4/23/12
to Django users
First the "write the query as "1, 2, 3", not as "%s, %s, %s", (1, 2,
3). The problem is SQL injection. You could do that for integer
parameters easily, but on the whole it is not a nice way.

The create temp table + join seems hard. But you could do "exists"
query instead. A query like "select * from tbl where id in (a list)"
could be rewritten to "select * from tbl where exists (select 1 from
temp_table where temp_table.id = tbl.id)". This could be handled in
sql/where.py somewhat easily.

For me this issue isn't that important. I don't use SQLite except for
testing. If somebody wants to work on this issue, I must warn that it
is possible (if not likely) that some core developer will say "too
ugly" to this solution. I might be willing to accept the solution if
it was clean enough, as this would nicely abstract away this limit of
SQLite.

So, in short: this idea is definitely worth more investigation.

- Anssi

Daniel Sokolowski

unread,
Apr 23, 2012, 3:54:36 PM4/23/12
to Django users
+1

I use SQlite3 on lower traffic production sites due to it's ease of install
and deployment. Even though I only once I reached this 1000 variable limit I
would like to see some django work around.

- Anssi

--

You received this message because you are subscribed to the Google Groups
"Django users" group.

To post to this group, send email to django...@googlegroups.com.


To unsubscribe from this group, send email to

django-users...@googlegroups.com.


For more options, visit this group at
http://groups.google.com/group/django-users?hl=en.


Daniel Sokolowski
Web Engineer
KL Insight
http://klinsight.com/
Tel: 613-344-2116 | Fax: 613.634.7029
993 Princess Street, Suite 212
Kingston, ON K7L 1H3, Canada

akaariai

unread,
Apr 23, 2012, 8:18:09 PM4/23/12
to Django users
On Apr 23, 6:54 pm, "Daniel Sokolowski"
<daniel.sokolow...@klinsight.com> wrote:
> +1
>
> I use SQlite3 on lower traffic production sites due to it's ease of install
> and deployment. Even though I only once I reached this 1000 variable limit I
> would like to see some django work around.

I thought a little more about this, and I find it unlikely that the
qs.filter() issue will ever be solved. The idea presented upthread
might work, but it will be a mess to implement. It adds some serious
complexity to the query generation: temp table creation, insertion
into the temp table, the where clause generation, and finally you need
to clean up the temp table. All this to work around a limitation in a
single database. An issue you can work around by recompiling SQLite.
Or you could just use a different database.

The implementation would need to be extremely good to have any chance
to get in. The work needed seems huge compared to what is gained.

So, as said upthread: there is only so much Django can hide about the
underlying database. The 1000 parameters limit for single query seems
to be one of those issues. If there is some easy trick Django will use
that. But the temp table workaround seems just too messy. Still, I
would be happy to be proven wrong here.

- Anssi
Reply all
Reply to author
Forward
0 new messages