Speeding up inserts of lines with fast_executemany option of pyODBC

5,975 views
Skip to first unread message

Jevgenij Kusakovskij

unread,
Jan 2, 2018, 6:46:04 AM1/2/18
to sqlalchemy
I would like to send a large pandas.DataFrame to a remote server running MS SQL. I am using pandas-0.20.3, pyODBC-4.0.21 and sqlalchemy-1.1.13.

My first attempt of tackling this problem can be reduced to following code:

 import sqlalchemy as sa
 
 engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
 data_frame.to_sql(table_name, engine, index=False)

Simple, but very slow... Took about 2.5 minutes to insert 1000 rows.

Using the following code, that does not involve SQLAlchemy, the same task is performed in less than a second:

 import pyodbc as pdb

 list_of_tuples = convert_df(data_frame)

 connection = pdb.connect(cnxn_str)

 cursor = self.connection.cursor()
 cursor.fast_executemany = True
 cursor.executemany(sql_statement, list_of_tuples)
 connection.commit()

 cursor.close()
 connection.close()

Is there a way to flip the fast_executemany switch on when using SQLAlchemy?


Mike Bayer

unread,
Jan 2, 2018, 9:27:53 AM1/2/18
to sqlal...@googlegroups.com
On Tue, Jan 2, 2018 at 6:46 AM, Jevgenij Kusakovskij <kus...@gmail.com> wrote:
> I would like to send a large pandas.DataFrame to a remote server running MS
> SQL. I am using pandas-0.20.3, pyODBC-4.0.21 and sqlalchemy-1.1.13.
>
> My first attempt of tackling this problem can be reduced to following code:
>
> import sqlalchemy as sa
>
>
>
> engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
> data_frame.to_sql(table_name, engine, index=False)
>
>
> Simple, but very slow... Took about 2.5 minutes to insert 1000 rows.

that's really weird, 1000 rows is very few. I'm pretty sure if I ran
1000 rows over pyodbc into SQL server here it would take about 300 ms
tops. 2.5 minutes is more like you're trying to send 800K rows.
that alone is kind of concerning, and if pandas is not sending all the
rows to connection.execute() at once and is instead running one row at
a time, then the fast_executemany flag will have no effect for you.

>
> Using the following code, that does not involve SQLAlchemy, the same task is
> performed in less than a second:
>
> import pyodbc as pdb
>
> list_of_tuples = convert_df(data_frame)
>
> connection = pdb.connect(cnxn_str)
>
> cursor = self.connection.cursor()
> cursor.fast_executemany = True
> cursor.executemany(sql_statement, list_of_tuples)
> connection.commit()
>
> cursor.close()
> connection.close()
>
>
> Is there a way to flip the fast_executemany switch on when using SQLAlchemy?

easiest would be to use cursor execution events:

http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=before_cursor_execute#sqlalchemy.events.ConnectionEvents.before_cursor_execute

you get the cursor right there, set the flag. You can set a custom
execution_option:

conn = conn.execution_options(pyodbc_fast_execute=True)

then in your event you can look for it:

@event.listens_for(SomeEngine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, parameters,
context, executemany):
if context.execution_options.get('pyodbc_fast_execute', False):
cursor.fast_executemany = True


but....2.5 minutes for 1000 rows is much more wrong than that, you
should figure out what's happening there.






>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Jevgenij Kusakovskij

unread,
Jan 2, 2018, 9:54:57 AM1/2/18
to sqlalchemy
@event.listens_for(SomeEngine, 'before_cursor_execute') 
def receive_before_cursor_execute(conn, cursor, statement, parameters, 
context, executemany): 
    if context.execution_options.get('pyodbc_fast_execute', False): 
        cursor.fast_executemany = True 

Maybe I am missing something, but should it be:

     if context.execution_options.get('pyodbc_fast_execute', True): 
          cursor.fast_executemany = True 

Mike Bayer

unread,
Jan 2, 2018, 10:15:52 AM1/2/18
to sqlal...@googlegroups.com
On Tue, Jan 2, 2018 at 9:54 AM, Jevgenij Kusakovskij <kus...@gmail.com> wrote:
>> @event.listens_for(SomeEngine, 'before_cursor_execute')
>> def receive_before_cursor_execute(conn, cursor, statement, parameters,
>> context, executemany):
>> if context.execution_options.get('pyodbc_fast_execute', False):
>> cursor.fast_executemany = True
>
>
> Maybe I am missing something, but should it be:
>
> if context.execution_options.get('pyodbc_fast_execute', True):
> cursor.fast_executemany = True

dict.get('some_key', True) means if the key is not found, you get True
back, e.g. True is the default.

if you want the default to be False, then if some_key is present use that, it's

dict.get('some_key', False)

if key is present, you get key back, assuming it's True you get True
key is not present, you get default back, e.g. False

Jevgenij Kusakovskij

unread,
Jan 2, 2018, 11:24:25 AM1/2/18
to sqlalchemy
I see... I should have warned that I am new to Python and that questions of this caliber could be expected.

If I may ask one more thing, I would like to check with you if it is possible to achieve the same effect
without any custom options by simply the executemany flag in the if clause. It would be:

@event.listens_for(SomeEngine, 'before_cursor_execute') 
def receive_before_cursor_execute(conn, cursor, statement, parameters, 
context, executemany): 
    if executemany: 
        cursor.fast_executemany = True 

If I use this code, then there is not need for any custom options and the code runs almost as fast as the
raw connection, i.e. in 2 seconds instead of 2.5 minutes.

Thank you kindly for the clarifications, for the quick responses and for your patience!

Mike Bayer

unread,
Jan 2, 2018, 11:48:35 AM1/2/18
to sqlal...@googlegroups.com
On Tue, Jan 2, 2018 at 11:24 AM, Jevgenij Kusakovskij <kus...@gmail.com> wrote:
> I see... I should have warned that I am new to Python and that questions of
> this caliber could be expected.
>
> If I may ask one more thing, I would like to check with you if it is
> possible to achieve the same effect
> without any custom options by simply the executemany flag in the if clause.
> It would be:
>
> @event.listens_for(SomeEngine, 'before_cursor_execute')
> def receive_before_cursor_execute(conn, cursor, statement, parameters,
> context, executemany):
> if executemany:
> cursor.fast_executemany = True
>
>
> If I use this code, then there is not need for any custom options and the
> code runs almost as fast as the
> raw connection, i.e. in 2 seconds instead of 2.5 minutes.

oh yes, absolutely, do that way. good call.

we have a built-in option for a similar feature in the psycopg2 driver
and support for this option can be added as a feature to the pyodbc
dialect. Feel free to add a ticket.

Jevgenij Kusakovskij

unread,
Jan 2, 2018, 11:54:51 AM1/2/18
to sqlalchemy
Great! Thanks a lot for all your help!
Reply all
Reply to author
Forward
0 new messages