Conditional insert in one transaction

2,718 views
Skip to first unread message

Moch Ramis

unread,
Jun 13, 2011, 1:23:14 PM6/13/11
to sqlalchemy
Hello.

I am trying to create a transaction that should insert a value
depending of the result of a select. This should be done at once by
the database in order to avoid some conflicts or duplicates.
An equivalent sql version of the transaction would be :

IF EXISTS (SELECT * FROM Table1 WHERE Column1='SomeValue')
INSERT INTO Table1 VALUES (...)

Is there a way to do that using sqlalchemy ?
Having the "if" done by python is easy but not safe and i can't find
out how to do this using some begin/commit statements.

Thanks to help !

Michael Bayer

unread,
Jun 15, 2011, 5:17:48 PM6/15/11
to sqlal...@googlegroups.com
Assuming you're talking about the ORM. You can of course emit the exact SQL you refer to any time using execute() from an engine, connection, or session. The ORM works within a transaction at all times. so an expression like:

myobject = MyObject(id=5)
myobject = Session.merge(myobject)
Session.commit()

will perform the equivalent steps, subject to the transaction isolation in effect.

To lock the row during the SELECT, you can use query.with_lockmode(True).filter(criterion) to SELECT the row with FOR UPDATE, will return None if the row does not exist.

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Moch Ramis

unread,
Jun 17, 2011, 8:41:19 AM6/17/11
to sqlal...@googlegroups.com
2011/6/15 Michael Bayer <mik...@zzzcomputing.com>
Assuming you're talking about the ORM.  You can of course emit the exact SQL you refer to any time using execute() from an engine, connection, or session.   The ORM works within a transaction at all times.    so an expression like:

myobject = MyObject(id=5)
myobject = Session.merge(myobject)
Session.commit()

will perform the equivalent steps, subject to the transaction isolation in effect.
 
I'm not sure this will do the trick as i don't know if the object i'm looking for already exist or not. thus, i don't have it's id ( i don't even know if a range of id is free or not).
 

To lock the row during the SELECT, you can use query.with_lockmode(True).filter(criterion) to SELECT the row with FOR UPDATE, will return None if the row does not exist.

 I'm not sure a lockMode is necessary. However, I'm far from being an expert or to clearly know the internal mechanisms of DBMS, but reading the Deadlock example in the part 12.3.3 of this page of the postgresql documentation, I thought, that sending a bunch of requests in one transaction would allow to do it as if it was all at once, avoiding any conflicts that could occurs if two almost identical requests were overlapping (like one executing its first statement, the other the second ect..).

Thanks

PS: Please forgive me the amount of time i took to answer. i honestly missed your answer. (at a point i thought my message was lost and wanted to re-post it.)


Michael Bayer

unread,
Jun 17, 2011, 10:17:01 AM6/17/11
to sqlal...@googlegroups.com
On Jun 17, 2011, at 8:41 AM, Moch Ramis wrote:

2011/6/15 Michael Bayer <mik...@zzzcomputing.com>
Assuming you're talking about the ORM.  You can of course emit the exact SQL you refer to any time using execute() from an engine, connection, or session.   The ORM works within a transaction at all times.    so an expression like:

myobject = MyObject(id=5)
myobject = Session.merge(myobject)
Session.commit()

will perform the equivalent steps, subject to the transaction isolation in effect.
 
I'm not sure this will do the trick as i don't know if the object i'm looking for already exist or not. thus, i don't have it's id ( i don't even know if a range of id is free or not).

OK then similar,

myobject = s.query(MyClass).filter(...).first()

if myobject is None:
# ... etc


 

To lock the row during the SELECT, you can use query.with_lockmode(True).filter(criterion) to SELECT the row with FOR UPDATE, will return None if the row does not exist.

 I'm not sure a lockMode is necessary.

right actually this doesn't apply here since we are talking about an INSERT not an UPDATE, sorry.

However, I'm far from being an expert or to clearly know the internal mechanisms of DBMS, but reading the Deadlock example in the part 12.3.3 of this page of the postgresql documentation, I thought, that sending a bunch of requests in one transaction would allow to do it as if it was all at once, avoiding any conflicts that could occurs if two almost identical requests were overlapping (like one executing its first statement, the other the second ect..).

yah Session is in a transaction which completes when you call rollback() or commit().   I don't think there's any additional atomicity to an "IF NOT EXISTS...SELECT" type of phrase versus a SELECT then an INSERT (if there is that would be surprising).


Eric Ongerth

unread,
Jun 18, 2011, 4:32:30 PM6/18/11
to sqlalchemy
Just curious and learning here -- There are two separate issues here,
aren't there? (1.) Atomicity of the transaction, taken care of by the
above discussion, and (2.) what if there was a need to have it be not
only atomic but consume as little time as possible between the read
and write, let's say for financial purposes?

All I really know about (2.) so far is that it's a fine concern in
theory, but in practice there are so many things affecting the timing
that it's impractical to worry about it beyond just coding for atomic
transactions and trying to minimize bottlenecks in the system.



On Jun 17, 7:17 am, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Jun 17, 2011, at 8:41 AM, Moch Ramis wrote:
>
> > 2011/6/15 Michael Bayer <mike...@zzzcomputing.com>

Michael Bayer

unread,
Jun 19, 2011, 10:14:46 AM6/19/11
to sqlal...@googlegroups.com

On Jun 18, 2011, at 4:32 PM, Eric Ongerth wrote:

> Just curious and learning here -- There are two separate issues here,
> aren't there? (1.) Atomicity of the transaction, taken care of by the
> above discussion, and (2.) what if there was a need to have it be not
> only atomic but consume as little time as possible between the read
> and write, let's say for financial purposes?

Right, so the "executing two statements at once instead of two statements" thing in this kind of case is a fleeting optimization. That is, its a micro optimization that is easily nullified by the surrounding context. Such as, if the way the app works in reality is that the row already exists 95% of the time, the optimization saves negligible time. Or if it's trivial to just pre-insert the rows in question, or a whole selection of 100 rows can be selected at once and just the ones that aren't present can be INSERTed in one multi-row statement, would provide even better performance.

Taking a SELECT then an INSERT and making the choice to turn it into a non-ORM, single statement, database-specific call is something you'd do once the app is up and running, and the separate SELECT/INSERT pair has been observed to be a definite bottleneck with no feasible workaround. I.e. a non-premature optimization.


Vitaly Kruglikov

unread,
Feb 3, 2021, 8:49:04 PM2/3/21
to sqlalchemy
How can the query/merge or query/add be performed atomically? What would happen if between the two calls another process inserted a row with the same unique key? Wouldn't the `s.merge()` then trigger a `psycopg2.errors.UniqueViolation` exception (in postgres case) when the new row insert is flushed? Is there a way to make the execution of the ORM atomic so that the `s.merge()` would be guaranteed to succeed (barring network errors, etc.)?

Mike Bayer

unread,
Feb 3, 2021, 9:43:10 PM2/3/21
to noreply-spamdigest via sqlalchemy
for inserts, not really.   if you want to guard against duplicate, the pattern is use a savepoint (begin_nested()), catch integrity error, then use the new row if someone else inserted it.    this is the more performant option as well as opposed to pessimistic locking.



The *other* way to do it is to use INSERT..ON CONFLICT (PostgreSQL, SQLite) or INSERT..ON DUPLICATE KEY (MySQL, MariaDB) statements.  These are popular choices as well but you need to construct them as statement objects.   




On Wed, Feb 3, 2021, at 8:49 PM, Vitaly Kruglikov wrote:
How can the query/merge or query/add be performed atomically? What would happen if between the two calls another process inserted a row with the same unique key? Wouldn't the `s.merge()` then trigger a `psycopg2.errors.UniqueViolation` exception (in postgres case) when the new row insert is flushed? Is there a way to make the execution of the ORM atomic so that the `s.merge()` would be guaranteed to succeed (barring network errors, etc.)?


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Vitaly Kruglikov

unread,
Feb 4, 2021, 8:42:40 PM2/4/21
to sqlalchemy
Thanks Mike! I followed up on your suggestion concerning savepoints and came up with the following code snippet. Does this look right to you?

```
try:
    with orm_session.begin_nested():
        orm_session.add(record)
except psycopg2.errors.UniqueViolation as error: # or is it `sqlalchemy.exc.IntegrityError`???
    # Matching metadata row already inserted by someone else
    # Fetch and return existing record

# We inserted it!
return record
```

Vitaly Kruglikov

unread,
Feb 4, 2021, 8:49:20 PM2/4/21
to sqlalchemy
Additional background: In the proposed code snippet, I placed the try/except around the `begin_nested()` block instead of just the `orm_session.add()` statement because the `add()` might not get flushed to database until the savepoint is released in the event the session is not auto-flushing.

I also came across a nice graphical illustration of the race condition associated with concurrent INSERT here: http://rachbelaid.com/handling-race-condition-insert-with-sqlalchemy/

Mike Bayer

unread,
Feb 5, 2021, 9:38:24 AM2/5/21
to noreply-spamdigest via sqlalchemy
SQLAlchemy will wrap that psycopg2 exception underneath IntegrityError (or at least DBAPIError at the bottommost) so you need to catch the SQLAlchemy-specific expression.   of course you should also test it with an existing row to make sure it actually catches and works!
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages