Query with huge number of parameters is not handled correctly

818 views
Skip to first unread message

Jonathan Marshall

unread,
May 29, 2009, 10:50:11 AM5/29/09
to sqlalchemy
I have an automatically generated query that occasionally generates a
huge number of parameters (50,000+). Sometimes Sqlalchemy does not
correctly serialise the parameters. If I have 50,000 parameters often
one is incorrect, for 100,000 parameters usually 3 are incorrect. More
parameters seems to correlate with a higher chance of a problem
occurring.

Code to reproduce:

from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import text

PARAMETERS = 100000

Base = declarative_base()
class Value(Base):
__tablename__ = 'value'

id = Column(Integer, primary_key=True)
value = Column(Integer, unique=True)

db_engine = create_engine('sqlite:///?', echo=True)
Base.metadata.create_all(db_engine)

Session = sessionmaker(db_engine)
session = Session()

print 'Seeding database with %d values' % PARAMETERS
for i in range(PARAMETERS):
session.add(Value(value=i))
session.commit()

params = {}
in_clause = []
for i in range(PARAMETERS):
params['in%d' % i] = i
in_clause.append(':in%d' % i)

sql = 'select value from value where value in (%s)' % ', '.join
(in_clause)
print 'Querying'
print sql
results = session.execute(text(sql), params=params).fetchall()

expected = set(range(PARAMETERS))
got = set((row[0] for row in results))

missing = expected - got
if missing:
print 'Missing values %s' % missing
r = dict(((v, k) for k, v in params.iteritems()))
for i in missing:
print '%s was parameter %s' % (i, r[i])


When run the output contains:
Missing values set([97490, 28181, 29654])
97490 was parameter in97490
28181 was parameter in28181
29654 was parameter in29654

Looking through the output I see that 97490 was in my query. The
output from for Sqlalchemy for the query parameters however contains
this:
..., 97488, 97489, 93435, 97491, 97492, ...

My test machine is a MacPro running SQLAlchemy 0.5.4p2 on python 2.5.
I have observed this issue with both sqlite and mysql backends

Cheers,
Jon.

Michael Bayer

unread,
May 29, 2009, 11:34:44 AM5/29/09
to sqlal...@googlegroups.com
There is nothing special about SQLAlchemy's handling of many parameters
versus a few, and it's likely a limiation of the IN clause as implemented
on those backends. IN does not support arbitrarily large numbers of
parameters. On Oracle for example the limit is 1000. If you need to do
a large number of IN parameters, you need to batch your queries such that
the size of the IN clause is limited.

Jonathan Marshall

unread,
May 29, 2009, 12:27:23 PM5/29/09
to sqlalchemy
How is the sqlalchemy's outptut of parameters generated? It has the
wrong value in it.

I find it interesting that 99,997 values are correct on a 100,000
parameter query and 50,000 parameters works sometimes but not always.
It implies that this is a bug somewhere, like a race condition, rather
than a hard limit that is being breached.

For what it's worth MySQL has no limit on IN queries (other than max
packet size).

Anyway I shall leave it here. It's taken me a lot longer to create a
test case and investigate this issue than it took me to work around
it!

Ta,
Jon.

Michael Bayer

unread,
May 29, 2009, 12:49:11 PM5/29/09
to sqlal...@googlegroups.com
Jonathan Marshall wrote:
>
> How is the sqlalchemy's outptut of parameters generated? It has the
> wrong value in it.

there is a loop on line 227 of sqlalchemy/engine/default.py.

> I find it interesting that 99,997 values are correct on a 100,000
> parameter query and 50,000 parameters works sometimes but not always.
> It implies that this is a bug somewhere, like a race condition,

there's no threads being spawned within SQLAlchemy, and if your app were
multithreaded, neither the connection or cursor object is shared outside
of the callstack of the statement execution. If you are sharing your
connection among threads yourself (which you shouldn't) and MySQLdb has an
issue with that, that is something different.

> Anyway I shall leave it here. It's taken me a lot longer to create a
> test case and investigate this issue than it took me to work around
> it!

your test case does not produce the results you illustrate on SQLite.
pysqlite does not accept that many bind parameters:


sqlalchemy.exc.OperationalError: (OperationalError) too many SQL variables
u'select value from value where value in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, < goes on and on with 100K
parameters >

here's a pure SQLite test illustrating the same thing:

import sqlite3

PARAMETERS = 100000

conn = sqlite3.connect(":memory:")

cursor = conn.cursor()

cursor.execute("CREATE TABLE value (id INTEGER PRIMARY KEY, value INTEGER)")

cursor.executemany("INSERT INTO value (value) values(?)", [[x] for x in
range(PARAMETERS)])

assert cursor.execute("SELECT count(1) FROM value").fetchone()[0] ==
PARAMETERS

sql = 'select value from value where value in (%s)' % ', '.join("?" for x
in range(PARAMETERS))
cursor.execute(sql, [x for x in range(PARAMETERS)])


Assuming you are having the "random values" failure only on MySQL and
assuming MySQL does in fact have no limit on the size of an IN clause then
the issue is very likely caused by the MySQLdb DBAPI since MySQLdb has
string manipulation logic that handles the rendering of bind parameters -
try creating a pure MySQLdb test case and posting on their bugtracker.

Michael Bayer

unread,
May 29, 2009, 1:39:29 PM5/29/09
to sqlal...@googlegroups.com
Jonathan Marshall wrote:
>
> How is the sqlalchemy's outptut of parameters generated? It has the
> wrong value in it.

hold that thought, I've reproduced the bug.


Michael Bayer

unread,
May 29, 2009, 2:13:28 PM5/29/09
to sqlal...@googlegroups.com

its a hash collision resulting from a hopefully undocumented usage of the
"params" dict you send to execute, whereby you can place a bindparam()
construct as a key in that dict. the bindparam() objects are created by
the text() construct, the hash values compare because after a lot of
values, you get a hash collision against the integer values being sent in,
and __eq__() returns True since all clauseelements produce an object
that's not False.

Since we had 0.5.4p3 scheduled for launch very soon, I'll look for a
workaround for this or remove the behavior altogether if I can get a sense
that nobody is using this feature.


Michael Bayer

unread,
May 29, 2009, 3:02:39 PM5/29/09
to sqlal...@googlegroups.com

the behavior is removed in r5991. 0.5.4p3 is probably out today which
was already planned.

I'd be very surprised if anyone was using this behavior since it was never
documented, it essentially looks like:

bar = bindparam("bar")
s = table.select().where(table.c.foo == bar)
conn.execute(s, {bar:5})

All of our documentation would write this as:

conn.execute(s, {"bar":5})

So its the {bar:5} feature that is removed.

Jonathan Marshall

unread,
Jun 1, 2009, 7:39:25 AM6/1/09
to sqlalchemy
Thanks for all your hard work tracking down this issue.
Jon.
Reply all
Reply to author
Forward
0 new messages