Hey all! I'm having a problem using the bulk_save_objects function.
Background: I'm working on an application that updates state in a database from an external feed. The process is: I read the feed, convert the data into SQLAlchemy objects which are detached from the session and then call session.merge on each one. The merge can result in an insert or an update, depending on if the entity has been seen before.
Given the number of objects I'm working with, this has turned out to be very CPU intensive in production. I've profiled it, and most of the work is in the merge operation. So, I want to use bulk_save_objects to speed things up and in my case the tradeoffs (lack of cascading etc.) are 100% worth it.
Problem: I can't get bulk_save_objects to do updates; no matter what, it tries to insert and this results in primary key constraint failures:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "trip_pkey"
Having read the documentation, I'm pretty sure the problem relates to this comment in the docs:
For each object, whether the object is sent as an INSERT or an UPDATE is dependent on the same rules used by the Session
in traditional operation; if the object has the InstanceState.key
attribute set, then the object is assumed to be “detached” and will result in an UPDATE. Otherwise, an INSERT is used.
In all cases I'm not playing with the instance state. I'm essentially manually stamping primary keys on detached objects, so I'm guessing SQLAlchemy thinks it needs to insert? Any suggestions for how I can proceed?
Thanks!
James