engine.execute() slow with parameterized queries?

722 views
Skip to first unread message

Seth P

unread,
May 12, 2014, 3:48:44 PM5/12/14
to sqlal...@googlegroups.com
After tracking down some extreme slowness in loading a one-to-many relationship (e.g. myobject.foobars), I seem to have isolated the issue to engine.execute() being much slower with parameterized queries than with explicit queries. The following is actual code and output for loading 10,971 rows from a database table. (The actual table has more columns than I'm including here, and is not designed by me.) Note that each explicit query (where I explicitly set the WHERE clause parameter and pass the resulting SQL statement to engine.execute()) runs in under 0.1 seconds, whereas each parameterized query (where I let SQLAlchemy bind the WHERE clause parameter) takes over 8 seconds.

The difference in runtimes is smaller when the number of rows returned is smaller, which seems odd since I would have thought that the binding of the WHERE clause parameters is just done once and would be virtually instantaneous.

Any thoughts?

Thanks,

Seth


import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import time

engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat')
session = sessionmaker(bind=engine, autoflush=False, expire_on_commit=False)()

class FooBar(declarative_base()):
    __tablename__ = 'sec_dprc'
    gvkey = sa.Column(sa.String(6), primary_key=True)
    datadate = sa.Column(sa.DateTime, primary_key=True)
    value = sa.Column(sa.Float, name='prcod')

def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        rows = func(*arg)
        t2 = time.time()
        print("%30s() len=%d, last=%s, runtime=%0.3fs" % (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1))
        return t2 - t1
    return wrapper

if __name__ == '__main__':

    key = '001045'
    query = session.query(FooBar.datadate, FooBar.value).filter(sa.and_(FooBar.gvkey == key)).order_by(FooBar.datadate)
    sql_select_statement_base = str(query)
    print(sql_select_statement_base)

    @print_timing
    def execute_explicit_query():
        sql_select_statement_explicit = sql_select_statement_base.replace(":gvkey_1", "'%s'" % key)
        rows = engine.execute(sa.text(sql_select_statement_explicit)).fetchall()
        return rows

    @print_timing
    def execute_parameterized_query():
        rows = engine.execute(sa.text(sql_select_statement_base), {'gvkey_1':key}).fetchall()
        return rows

    num_iterations = 5
    explicit_runtime = 0.0
    parameterized_runtime = 0.0
    for i in range(num_iterations):
        explicit_runtime += execute_explicit_query()
        parameterized_runtime += execute_parameterized_query()
    print("Total runtime for %d explicit queries = %0.3fs." % (num_iterations, explicit_runtime))
    print("Total runtime for %d parameterized queries = %0.3fs." % (num_iterations, parameterized_runtime))


SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS sec_dprc_prcod
FROM sec_dprc
WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate
        execute_explicit_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.082s
   execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.852s
        execute_explicit_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.032s
   execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.754s
        execute_explicit_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.039s
   execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.182s
        execute_explicit_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.028s
   execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.416s
        execute_explicit_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.080s
   execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.425s
Total runtime for 5 explicit queries = 0.260s.
Total runtime for 5 parameterized queries = 44.629s.

Seth P

unread,
May 12, 2014, 4:11:15 PM5/12/14
to sqlal...@googlegroups.com
Forgot to mention that I'm running SQLAlchemy 0.9.4  on 64-bit Python 3.4.0 on Windows.

Michael Bayer

unread,
May 12, 2014, 5:00:27 PM5/12/14
to sqlal...@googlegroups.com

well there’s only one parameter being processed here so there is clearly negligible difference in time spent within Python as far as getting the statement ready to execute and then executing it.

So the time is either in what SQL Server spends fetching the rows, or the number of rows being fetched (which seems to be the same).   Which leaves pretty much that SQL Server is making a different choice about the query plan for this SELECT statement, this is typically due to an INDEX being used or not.    You’d need to analyze the plan being used.   With SQL Server, the option to get a plan within programmatic execution seems to be per this answer http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan to execute “SET SHOWPLAN_TEXT ON” ahead of time.

Besides that, you can confirm where the time is being spent exactly using Python profiling.   A description on how to achieve that is here: http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Seth P

unread,
May 12, 2014, 6:33:31 PM5/12/14
to sqlal...@googlegroups.com
Is it possible that the (primary key index (which is a composite index that begins with gvkey, and is the only index on the table) isn't being used because the the gvkey parameter is somehow passed as an integer rather than as a string?
The first EXEC below is pretty much instantaneous, whereas the second takes about 8 seconds (and produces the same results).

EXEC sp_executesql
N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS sec_dprc_prcod
FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate',
N'@gvkey VARCHAR(6)', '001045'

EXEC sp_executesql
N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS sec_dprc_prcod
FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate',
N'@gvkey INT', 001045

Michael Bayer

unread,
May 12, 2014, 6:40:48 PM5/12/14
to sqlal...@googlegroups.com
On May 12, 2014, at 6:33 PM, Seth P <spad...@gmail.com> wrote:

Is it possible that the (primary key index (which is a composite index that begins with gvkey, and is the only index on the table) isn't being used because the the gvkey parameter is somehow passed as an integer rather than as a string?

There’s nothing in SQLAlchemy that coerces strings to integers.  If the actual type of the column on the DB is an integer, then there might be some conversion within pyodbc or the ODBC driver.

if you’ve got it narrowed down this much the next step is to figure out a raw pyodbc script that illustrates what the problem is. 

Seth P

unread,
May 12, 2014, 7:09:08 PM5/12/14
to sqlal...@googlegroups.com
Yep, it's not a SQLAlchemy issue. The following code demonstrates the problem with direct pyodbc access.

import pyodbc
import time


def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        rows = func(*arg)
        t2 = time.time()
        print("%30s() len=%d, last=%s, runtime=%0.3fs" % (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1))
        return t2 - t1
    return wrapper

if __name__ == '__main__':
    cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=Compustat;DATABASE=Compustat')
    cursor = cnxn.cursor()
    sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc WHERE gvkey = ? ORDER BY datadate"
    key = '001045'

    @print_timing
    def execute_explicit_query():
        sql_select_statement_explicit = sql_select_statement_base.replace("?", "'%s'" % key)
        rows = cursor.execute(sql_select_statement_explicit).fetchall()

        return rows

    @print_timing
    def execute_parameterized_query():
        rows = cursor.execute(sql_select_statement_base, key).fetchall()

        return rows

    num_iterations = 5
    explicit_runtime = 0.0
    parameterized_runtime = 0.0
    for i in range(num_iterations):
        explicit_runtime += execute_explicit_query()
        parameterized_runtime += execute_parameterized_query()
    print("Total runtime for %d explicit queries = %0.3fs." % (num_iterations, explicit_runtime))
    print("Total runtime for %d parameterized queries = %0.3fs." % (num_iterations, parameterized_runtime))

Seth P

unread,
May 12, 2014, 7:35:35 PM5/12/14
to sqlal...@googlegroups.com
Looks like other people have encountered similar problems with indices being ignored by prepared sql statements: http://www.postgresql.org/message-id/43250AFA...@arbash-meinel.com. (If the diagnosis there is correct, then I'm guessing the server would use a unique index where all the columns of the index are specified.) Also, Thierry Florac's post https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds similar.
(I suspect my earlier hypothesis about int vs varchar is a red herring.)

I think it would be useful (albeit risky, if not careful) to have an option to plug in parameters client-side. I presume not trivial to add to SQLAlchemy? I don't see such an option for pyodbc.

Michael Bayer

unread,
May 12, 2014, 7:50:03 PM5/12/14
to sqlal...@googlegroups.com
On May 12, 2014, at 7:35 PM, Seth P <spad...@gmail.com> wrote:

Looks like other people have encountered similar problems with indices being ignored by prepared sql statements: http://www.postgresql.org/message-id/43250AFA...@arbash-meinel.com. (If the diagnosis there is correct, then I'm guessing the server would use a unique index where all the columns of the index are specified.) Also, Thierry Florac's post https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds similar.
(I suspect my earlier hypothesis about int vs varchar is a red herring.)

I think it would be useful (albeit risky, if not careful) to have an option to plug in parameters client-side. I presume not trivial to add to SQLAlchemy? I don't see such an option for pyodbc.

there’s mechanisms for this but they aren’t very widely advertised since as you know allowing people to do such would be an *enormous* security hole, and I don’t have the resources to be responsible for parameter escaping.   It would be better if you could try pymssql (much more actively maintained than pyodbc from what i can tell) and/or file a bug with pyodbc.

Seth P

unread,
May 12, 2014, 8:06:08 PM5/12/14
to sqlal...@googlegroups.com
Fair enough. I'll take a look at pymssql, though I suspect it may be a SQL Server rather than a driver issue.



On Monday, May 12, 2014 7:50:03 PM UTC-4, Michael Bayer wrote:
On May 12, 2014, at 7:35 PM, Seth P <spad...@gmail.com> wrote:

Looks like other people have encountered similar problems with indices being ignored by prepared sql statements: http://www.postgresql.org/message-id/43250AFA.7010900@arbash-meinel.com. (If the diagnosis there is correct, then I'm guessing the server would use a unique index where all the columns of the index are specified.) Also, Thierry Florac's post https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds similar.

Seth P

unread,
May 12, 2014, 8:23:56 PM5/12/14
to sqlal...@googlegroups.com
pymssql produces the same results as pyodbc. So it looks like a SQL Server issue.

Michael Bayer

unread,
May 12, 2014, 8:38:37 PM5/12/14
to sqlal...@googlegroups.com
well or a FreeTDS issue, more likely, if that’s what you’re using.

the SQL compiler has a parameter called “literal_binds” that will make it render a bound parameter as an inline string, but it only supports a few very basic types.    As far as getting this parameter set for a general class of queries, it depends on when you’d want it to happen and how.   It likely would require some subclassing and possibly monkey patching.

Michael Bayer

unread,
May 13, 2014, 6:15:39 AM5/13/14
to sqlal...@googlegroups.com
OK well of course also, as we have the exact same thing being asked in regards to Oracle right now in another thread, you can of course always bypass a “bound” value in the most direct way, using text() or literal_column():

q = s.query(Something).filter(Something.foo = literal_column(“‘my value’”))

Seth P

unread,
May 13, 2014, 8:44:49 PM5/13/14
to sqlal...@googlegroups.com
Yep, literal_column() fixed my performance problem.

Thanks again for all your help
Reply all
Reply to author
Forward
0 new messages