Difference between insert().values() and connection.execute(insert, values) using psycopg2

311 views
Skip to first unread message

viktor....@gmail.com

unread,
Dec 16, 2015, 8:33:18 AM12/16/15
to sqlalchemy
Hi,

Today we increased the performance of one of our services eightfold by changing the row:

connection.execute(insert_statement, values=[...]) 

into

insert_statement = table.insert().values(values)
connection.execute(insert_statement)

We are using the db driver psycopg2.

The reason for this behaviour is that in the first case, the executemany method of the underlying psycopg2 cursor will be used wich uses a naive looping approach instead of a multirow insert (http://www.postgresql.org/message-id/CA+mi_8Zeeg93nHDXiZo0U16y...@mail.gmail.com). In the second case, the proper sql (which will be multirow insert with the postgres dialect) will be generated by SQLAlchemy and then sent to the execute method of the underlying cursor.
 
As an application developer you would expect that these code snippets would have the same performance characteristics and we were very surprised at first at the results. Also, in the example in the documentation http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements a multirow insert is generated which made us belive that is what we would get (allthough it does state clearly that the underlying db driver's executemany method will be used).

We think this is an issue, since it is a surprising behavoir from a application developers point of view (preferbly you should not need to know the details of the underlaying db driver in the application layer) but we don't have a clear solution to the problem. Maybe add a note in the executing-multiple-statements section of the documenation warning about this behavour with psycodb2. And/or even log a warning about this when using the multiple values api with psycodb2 driver.

/Viktor

   


Mike Bayer

unread,
Dec 16, 2015, 10:42:05 AM12/16/15
to sqlal...@googlegroups.com
The multiple "values" syntax to my knowledge is not standard SQL (e.g.
is not available on any other database) and the bug in psycopg2 is that
they do not use prepared statements for executemany(). Looping over the
same INSERT statement many times is extremely fast when prepared
statements are used.

The multiple "values" syntax will also no longer exhibit the great
performance you see if you need to insert hundreds of thousands of rows.
Postgresql will be tasked with parsing a many-megabyte SQL string on
every invocation which is very wasteful both in terms of memory and
time.

Basically the multiple-values syntax is not at all portable and is very
awkward at every level. For the average SQL developer would be
extremely surprising if SQLAlchemy's use of normal DBAPI executemany
(https://www.python.org/dev/peps/pep-0249/#id18) were arbitrarily
circumvented in this very specific case, leading to a new set of
behavioral quirks and issues. You should definitely contact the
psycopg2 developers and ask them as well why they don't switch to
multiple-VALUES within an executemany() as well as why they aren't on
prepared statements for this operation (if that's still the case).








>
> /Viktor
>
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

unread,
Dec 16, 2015, 11:02:30 AM12/16/15
to sqlalchemy


On Wednesday, December 16, 2015 at 8:33:18 AM UTC-5, viktor....@gmail.com wrote:
As an application developer you would expect that these code snippets would have the same performance characteristics and we were very surprised at first at the results.

No, you shouldn't -  especially when SQL is involved.  Incredibly slight changes to a query can drastically alter how a database performs a task, and each database behaves differently.


We think this is an issue, since it is a surprising behavoir from a application developers point of view (preferbly you should not need to know the details of the underlaying db driver in the application layer)

 SqlAlchemy provides a unified way to use multiple databases with multiple drivers, and drivers are subject to constantly change.  You don't need to know the details of the underlying driver to write functional code, but you're not talking about that -- you're talking about optimizing how sqlalchemy handles two similar (but different) situations with the psycopg2 driver.

This is also basically stated in the docs already:


    It is essential to note that passing multiple values is NOT the same as using traditional executemany() form. The above syntax is a special syntax not typically used. To emit an INSERT statement against mutliple rows, the normal method is to pass a mutiple values list to theConnection.execute() method, which is supported by all database backends and is generally more efficient for a very large number of parameters.

    See also

    Executing Multiple Statements - an introduction to the traditional Core method of multiple parameter set invocation for INSERTs and other statements.


there are typos in there, so I should do a PR on the docs!


Reply all
Reply to author
Forward
0 new messages