How can I use bulk_save_objects when I know the primary key?

1,276 views
Skip to first unread message

James Fennell

unread,
Mar 12, 2020, 9:51:39 PM3/12/20
to sqlal...@googlegroups.com
Hey all! I'm having a problem using the bulk_save_objects function.

Background: I'm working on an application that updates state in a database from an external feed. The process is: I read the feed, convert the data into SQLAlchemy objects which are detached from the session and then call session.merge on each one. The merge can result in an insert or an update, depending on if the entity has been seen before.

Given the number of objects I'm working with, this has turned out to be very CPU intensive in production. I've profiled it, and most of the work is in the merge operation. So, I want to use bulk_save_objects to speed things up and in my case the tradeoffs (lack of cascading etc.) are 100% worth it.

Problem: I can't get bulk_save_objects to do updates; no matter what, it tries to insert and this results in primary key constraint failures:

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "trip_pkey"

Having read the documentation, I'm pretty sure the problem relates to this comment in the docs:

For each object, whether the object is sent as an INSERT or an UPDATE is dependent on the same rules used by the Session in traditional operation; if the object has the InstanceState.key attribute set, then the object is assumed to be “detached” and will result in an UPDATE. Otherwise, an INSERT is used.

In all cases I'm not playing with the instance state. I'm essentially manually stamping primary keys on detached objects, so I'm guessing SQLAlchemy thinks it needs to insert? Any suggestions for how I can proceed?

Thanks!
James

Mike Bayer

unread,
Mar 12, 2020, 10:54:18 PM3/12/20
to noreply-spamdigest via sqlalchemy
you could try using bulk_update_mappings  (https://docs.sqlalchemy.org/en/13/orm/session_api.html?highlight=bulk_update_mappings#sqlalchemy.orm.session.Session.bulk_update_mappings) so that you don't have to play with object state at all.  Otherwise, use make_transient_to_detached (https://docs.sqlalchemy.org/en/13/orm/session_api.html?highlight=bulk_update_mappings#sqlalchemy.orm.session.make_transient_to_detached) to give them a key.  you can see that the key is there by using inspect(object).key.   However, if you're looking for speed, using the objects and making keys and such is all overhead that you wouldn't get with bulk_update_mappings.




Thanks!
James


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages