handling Integrity error gracefully

1,435 views
Skip to first unread message

Milind Vaidya

unread,
Jul 23, 2014, 2:15:03 PM7/23/14
to sqlal...@googlegroups.com
I have a script which updates DB periodically. There are some base table which may undergo some change but very rarely. In such case if this script tries to insert the already present value it will throw integrity error.
Is there any way of dealing with this other that first fetch the value manually and then only insert if not present ?

Jonathan Vanasco

unread,
Jul 23, 2014, 3:00:08 PM7/23/14
to sqlal...@googlegroups.com
If this happens rarely and your database supports savepoints, you can just do the insert within one.  if you catch an integrity error, do a rollback and then select.

`begin_nested` is the SqlAlchemy command to handle savepoints.

Michael Bayer

unread,
Jul 23, 2014, 3:01:18 PM7/23/14
to sqlal...@googlegroups.com
use a savepoint:

from sqlalchemy import exc
try:
   with session.begin_nested():
        session.add(MyObject())
        session.flush()
except exc.IntegrityError:
    pass
session.commit()


personally I prefer just to emit a SELECT first, which may be a per-row SELECT or may be a prefetch of all the rows from the table in question.



On Jul 23, 2014, at 2:15 PM, Milind Vaidya <kav...@gmail.com> wrote:

I have a script which updates DB periodically. There are some base table which may undergo some change but very rarely. In such case if this script tries to insert the already present value it will throw integrity error.
Is there any way of dealing with this other that first fetch the value manually and then only insert if not present ?

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Milind Vaidya

unread,
Jul 23, 2014, 4:52:35 PM7/23/14
to sqlal...@googlegroups.com
Well only caveat here is that I am inserting a list of objects in one go using session.add_all()

Michael Bayer

unread,
Jul 23, 2014, 7:19:16 PM7/23/14
to sqlal...@googlegroups.com
On Jul 23, 2014, at 4:52 PM, Milind Vaidya <kav...@gmail.com> wrote:

Well only caveat here is that I am inserting a list of objects in one go using session.add_all() 


if you do an add_all(), then say flush(), it will emit INSERT statements for all of them.  If any INSERT fails, the whole operation is rolled back.

there is a tradeoff here between atomicity and ability to skip failures (as opposed to preventing them ahead of time).   you have to make that choice.

Milind Vaidya

unread,
Jul 29, 2014, 7:32:29 AM7/29/14
to sqlal...@googlegroups.com

With reference to you comment above "personally I prefer just to emit a SELECT first, which may be a per-row SELECT or may be a prefetch of all the rows from the table in question."

This approach is faster as it allows me to create a bunch of objects that are new and use add_all() to push them to DB.

But peers are unhappy about the fact of fetching ALL records every time, which may also be a performance hit. I had to add the logic of extracting name from the object selected from DB and get the difference (as in sets) with new names that were supplied and then create object for all the NEW names and them push. I

Can the __hash__ be used for mapper classes or is  there any way to take advantage of PK to check for new objects only in efficient manner ?

Michael Bayer

unread,
Jul 29, 2014, 9:25:39 AM7/29/14
to sqlal...@googlegroups.com
you can load just primary key attributes, sure, or you can load batches of objects in that correspond to the field of identifiers you’re working with.  Sort the input in PK order so that you only need to fetch a chunk at a time.

Mapped objects can be hashed normally too, though the default hash isn’t on primary key, it’s just on object identity in memory.   you can override __hash__() if you want.

Jonathan Vanasco

unread,
Jul 29, 2014, 11:00:46 AM7/29/14
to sqlal...@googlegroups.com
If you're doing a huge SELECT or batched INSERT/UPDATEs, the operation may be more efficient -- but the transactions will be longer, which means you will create issues with other connections.  unless you grab exclusive locks on tables, you also run a higher risk of integrity errors.

In my experience, if the DB is in active use, you'll be better off with the shortest transactions possible.

I'm a big fan of loading things in parallel instead of serial, but I do that for display -- not to select data for a long write operation.

Milind Vaidya

unread,
Aug 5, 2014, 12:11:45 PM8/5/14
to sqlal...@googlegroups.com
Is there any way to achieve

INSERT ON DUPLICATE KEY UPDATE and REPLACE INTO


query to solve this problem through sqlalchemy?

Milind Vaidya

unread,
Aug 5, 2014, 7:07:39 PM8/5/14
to sqlal...@googlegroups.com
Can you please comment on implementation from following link:

http://stackoverflow.com/questions/6611563/sqlalchemy-on-duplicate-key-update

Michael Bayer

unread,
Aug 6, 2014, 10:47:09 PM8/6/14
to sqlal...@googlegroups.com
It seems OK though I am not a huge fan of “INSERT ON DUPLICATE KEY” as first off it’s a MySQL thing, secondly, the behavior of insert/update defaults and such is difficult to predict when you don’t know if a given row is to be insert or update, not to mention DBAPIs are all going to act awkwardly with it as well.   The issue of MERGE and upsert is one that is so thorny, SQLAlchemy hasn’t gotten into it yet, not even for Core.  Getting ORM integration with it, that’s not even on the radar for now, it would be a very hard problem to solve.



Reply all
Reply to author
Forward
0 new messages