inserted_primary_key in version 1.4.

555 views
Skip to first unread message

Thorsten von Stein

unread,
May 13, 2021, 8:49:30 PM5/13/21
to sqlalchemy

I use SQLAlchemy with MySQL. In one application, I need to obtain the server-generated primary key values of newly inserted rows by obtaining the value of the first inserted row with the property inserted_primary_key. However, after upgrading to version 1.4.15, this property return only (None,).

See the following trivial example:

from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, Text, MetaData, Table

​engine = create_engine('mysql+pymysql://testuser:xyz6789@localhost/testdb?charset=utf8')


metadata = MetaData()
messages = Table(
    'messages', metadata,
    Column('id', Integer, primary_key=True),
    Column('message', Text),
)

# on first run:
# messages.create(bind=engine)


values = [
    {'message': 'Hello World'},
    {'message': 'Hallo Welt'},         
         ]

with engine.begin() as conn:
    result = conn.execute(messages.insert().values(values))
    first_id = result.inserted_primary_key[0]

    
print(f'first_id = {first_id}')


Result in SQLAlchemy-1.3.15:

first_id = 1


But in SQLAlchemy-1.4.15:

first_id = None


Converting to new-style querying does not help. The new property inserted_primary_key_rows returns just nested tuples of None whenever more than one row is inserted. This is metioned in the documentation for backends that do not support returned primary keys -- but MySQL obviously does!

Any help highly appreciated. Right now I cannot think of any other solution than sticking with 1.3.

Mike Bayer

unread,
May 13, 2021, 10:29:42 PM5/13/21
to noreply-spamdigest via sqlalchemy
So the use case you are doing there was never something supported, and in particular the value you are getting on 1.3 is not really universally reliable, as it is relying upon cursor.lastrowid that is not defined for multiple-row inserts, even though for InnoDB specifically the behavior is likely deterministic.

Per MySQLs docs at https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_last-insert-id, "If you insert multiple rows using a single INSERT statement, LAST_INSERT_ID() returns the value generated for the first inserted row only. The reason for this is to make it possible to reproduce easily the same INSERT statement against some other server.".  

now can you assume that if you inserted five rows and last_insert_id() gave you the number 1, the rows will be numbered 1, 2, 3, 4, 5?  won't concurrent inserts possibly interfere with this?   the consensus seems to be "the values probably are predictable, for a single server MySQL database, but it's not documented anywhere", see  https://stackoverflow.com/questions/6895679/mysqls-auto-increment-behavior-in-a-multiple-row-insert .    that MySQL won't document this I would say is pretty telling that this is not a long term / generalized guarantee (like what happens if this is on Galera for example?) even though it might be the current behavior.    

For SQLAlchemy, which gets cursor.lastrowid from other database engines also, it was definitely never assumed that "cursor.lastrowid" is useful for an INSERT statement that has multiple rows, and this has always been an undefined use case.   The spec itself documents this: https://www.python.org/dev/peps/pep-0249/#lastrowid  "The semantics of .lastrowid are undefined in case the last executed statement modified more than one row, e.g. when using INSERT with .executemany()."
    
This is also not what we refer to as a "returned primary keys", which usually refers to the RETURNING SQL syntax.  Only MariaDB as of 10.5 supports that syntax, which we have not yet added support for - other than that MySQL DBs only give us cursor.lastrowid.

1.4's architecture is more explicit about how things are handled so when you send an insert().values() with  a list, that's automatically a multi-row insert and we don't collect cursor.lastrowid in this case.  this caused no regressions because cursor.lastrowid was never considered to be defined in this case and no tests have covered it.

if you send just one row at a time, you get the inserted_primary_key back:

with engine.begin() as conn:
    result = conn.execute(messages.insert().values({'message': 'Hello World'}))
    first_id = result.inserted_primary_key[0]

to bypass the result and go straight to the pymysql cursor if you'd like to rely upon the driver-specfic behavior, you can get to it as result.context.cursor.lastrowid:

with engine.begin() as conn:
    result = conn.execute(messages.insert().values(values))
    first_id = result.context.cursor.lastrowid
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Thorsten von Stein

unread,
May 14, 2021, 12:10:50 PM5/14/21
to sqlalchemy
Mike, thank you very much for this very thorough answer. I now understand that my use case cannot be generally supported by SQLAlchemy, given the limitations of the backends.

Yet, I still wonder how to efficiently perform inserts of large numbers of rows where the server-generated ids are subsequently needed as foreign key values. Replacing one multi-row insert by many single-row inserts must impose an enormous performance penalty, whereas retrieving the ids via a subsequent SELECT would have to rely on the specifics of the inserted data (which may not be unique).

Mike Bayer

unread,
May 15, 2021, 1:28:21 PM5/15/21
to noreply-spamdigest via sqlalchemy
What we are starting to support now, which we currently support for psycopg2 only but will be adding more support soon, is a single INSERT with VALUES and RETURNING, we can then get all the autogenned primary keys back in the result set for a single INSERT statement that has multiple rows inside of it.

For MySQL we'd only be able to do this with MariaDB 10.5 since that's the only backend that supports RETURNING.
Reply all
Reply to author
Forward
0 new messages