[Q] Bulk insert with session

1,185 views
Skip to first unread message

Ladislav Lenart

unread,
Sep 8, 2012, 10:33:25 AM9/8/12
to sqlal...@googlegroups.com
Hello.

I would like to convert the following

# bulk insert of several rows
new_rows = [{'foo': 1, 'bar': 2}, ...]
cls.__table__.insert().execute(new_rows)

to session.execute(...) style.

I came up with

ins = cls.__table__.insert()
for each in new_rows:
ins = ins.values(each)
session.execute(ins)

but this only executes the last insert.

My assumption is that the bulk insert will be faster than several independent
one-row inserts.

I think I need to execute it via session because my entire transaction is as
follows:
* I create a new object in the session (using ORM layer).
* Flush the session so the new object gets id.
* Try to bulk-insert several rows to some table with ForeignKey to the newly
created object.

When I use the form without session

cls.__table__.insert().execute(new_rows)

I got IntegrityError saying that the new object is not in DB.

When I use

session.commit()
cls.__table__.insert().execute(new_rows)

everything works as expected except I don't like this. One user action must be
one atomic DB transaction.


Thank you in advance,

Ladislav Lenart

Michael Bayer

unread,
Sep 9, 2012, 1:15:03 AM9/9/12
to sqlal...@googlegroups.com

On Sep 8, 2012, at 10:33 AM, Ladislav Lenart wrote:

> Hello.
>
> I would like to convert the following
>
> # bulk insert of several rows
> new_rows = [{'foo': 1, 'bar': 2}, ...]
> cls.__table__.insert().execute(new_rows)
>
> to session.execute(...) style.
>
> I came up with
>
> ins = cls.__table__.insert()
> for each in new_rows:
> ins = ins.values(each)
> session.execute(ins)
>
> but this only executes the last insert.
>
> My assumption is that the bulk insert will be faster than several independent
> one-row inserts.
>
> I think I need to execute it via session because my entire transaction is as
> follows:
> * I create a new object in the session (using ORM layer).
> * Flush the session so the new object gets id.
> * Try to bulk-insert several rows to some table with ForeignKey to the newly
> created object.
>
> When I use the form without session
>
> cls.__table__.insert().execute(new_rows)

My first suggestion would be to remove ".bind" from your MetaData object, which is what allows the usage of insert().execute() here. I've been discouraging this usage for several years now, and I'm probably going to remove some more mentions of it from the documentation. The binding of engines to schema constructs is not of general use and it's seeming convenience is often overshadowed by the confusion it introduces to new users.

The next aspect of this is to understand what insert.values() does. In standard SQL, an INSERT statement has a VALUES clause, which in most cases specifies one and only one set of parameters, corresponding to exactly one row. These days, there are extensions to this syntax, most notably on Postgresql, whereby VALUES can support multiple rows in one statement. The SQLAlchemy insert() construct models its values() method around the more well known "single row" form. So calling values() multiple times will discard the previous set of values which you've assigned to those keys, as this was not the intended usage of the method. It may be better off emitting a warning or error in this use case, but this usage was never anticipated.

So calling "ins = ins.values(params)" multiple times will discard each set of "params" on a per-key basis as new keys come in to replace the previous ones of the same name, and that's not the method we're looking for here.

The DBAPI, which is standard Python database interface SQLAlchemy and all other Python database tools rely upon to emit commands to the database, offers two kinds of execute() calls; "execute()" and "executemany()". The "bulk insert" you're looking for here would be using the DBAPI executemany() method, which passes a set of parameter lists of arbitrary size to the DBAPI where it can pass them to the database in an optimized way.

All versions of execute() in SQLAlchemy, including engine.execute(), connection.execute(), statement.execute() (however discouraged), and Session.execute(), offer the ability to invoke either of these variants based on the type of parameter datastructure passed; if a list of dictionaries, executemany() is used, and if just a dictionary, execute() is used.

Your rows can be passed to Session.execute() in conjunction with an insert() to achieve executemany() as follows:

session.execute(table.insert(), new_rows)

I checked the docs for Session.execute() and noticed that the description for the "params" argument referred to it only as a "dictionary"; this is inaccurate and I've rewritten the docstring for Session.execute() entirely. Please see http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html?highlight=session.execute#sqlalchemy.orm.session.Session.execute for the new documentation. If anything is still unclear let me know as I'd like to get these right.









>
> I got IntegrityError saying that the new object is not in DB.
>
> When I use
>
> session.commit()
> cls.__table__.insert().execute(new_rows)
>
> everything works as expected except I don't like this. One user action must be
> one atomic DB transaction.
>
>
> Thank you in advance,
>
> Ladislav Lenart
>
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Reply all
Reply to author
Forward
0 new messages