Refresh session: rollback() or commit()?

685 views
Skip to first unread message

Pedro Werneck

unread,
Jan 25, 2013, 4:33:13 PM1/25/13
to sqlal...@googlegroups.com
I'm having a problem with many concurrent scripts, workers and uwsgi instances writing and reading the same tables and rows almost simultaneously, and sometimes one of them seems to get an older state, even from an object it never touched in the first place and I'm querying for the first time. I find that weird, but I assume it has to do with the database isolation level.

The problem is, how to adequately deal with that and make sure it never happens? I added a session.commit() before doing anything and it works, I assume rollback would work too. Is there any better solution? 

Michael Bayer

unread,
Jan 25, 2013, 4:42:48 PM1/25/13
to sqlal...@googlegroups.com

On Jan 25, 2013, at 4:33 PM, Pedro Werneck wrote:

> I'm having a problem with many concurrent scripts, workers and uwsgi instances writing and reading the same tables and rows almost simultaneously, and sometimes one of them seems to get an older state, even from an object it never touched in the first place and I'm querying for the first time. I find that weird, but I assume it has to do with the database isolation level.

sure, if the updates to that row are still pending in an uncommitted transaction, the outside world would still see the old data.


>
> The problem is, how to adequately deal with that and make sure it never happens? I added a session.commit() before doing anything and it works, I assume rollback would work too. Is there any better solution?

You should be committing *after* you've done some work. Then when a new request comes in, it should start out with a brand new Session which will make a new database connection as soon as the database is accessed. When the request completes, the Session should be closed out. The documentation at http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#session-frequently-asked-questions discusses this, and continues the discussion at http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#using-thread-local-scope-with-web-applications .


Pedro Werneck

unread,
Jan 25, 2013, 5:12:19 PM1/25/13
to sqlal...@googlegroups.com
Well... I'm afraid it's not as simple as that. I'll give an example:

I have a webservice A, which triggers a callback and calls webservice B, creating a new row in the database with status = 0 and commiting the transaction.

Then I have a script which finds all rows with status = 0, and sends their id, one by one, to a worker, who is supposed to get lots of data from many sources and then send that to another webservice C.

Now, sometimes, especially when things happen too fast, the query the worker does for the row with that id returns empty, even though that isn't in an uncommited transaction, and the script who called the worker itself found it. In principle, if things are running smoothly, that isn't supposed to happen.

Get the problem? The worker doesn't have uncommitted changes, actually it never does any changes at all. It got the id from a script who got the row, so it exists for someone who just started a new session.

So, how can I be sure the worker will see that new row? I'm doing a commit with the empty transaction the worker has, as soon as it's called, and it seems to be working, but is there any better way?






--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.





--
---
Pedro Werneck

Michael Bayer

unread,
Jan 25, 2013, 5:24:07 PM1/25/13
to sqlal...@googlegroups.com
On Jan 25, 2013, at 5:12 PM, Pedro Werneck wrote:

Well... I'm afraid it's not as simple as that. I'll give an example:

I have a webservice A, which triggers a callback and calls webservice B, creating a new row in the database with status = 0 and commiting the transaction.

Then I have a script which finds all rows with status = 0, and sends their id, one by one, to a worker, who is supposed to get lots of data from many sources and then send that to another webservice C.

Now, sometimes, especially when things happen too fast, the query the worker does for the row with that id returns empty, even though that isn't in an uncommited transaction, and the script who called the worker itself found it. In principle, if things are running smoothly, that isn't supposed to happen.

Is there some kind of distribution to the database, like master/slave?  Otherwise, once data is committed, it is readable to all new transactions subsequent to that commit.  If the script that is searching for status=0 is finding rows that are committed, then the worker that is querying for those rows should be able to see them, unless the worker has been holding open a long running transaction.   Long running transactions here are more of the antipattern.   The worker should ensure it responds to new messages from the status=0 script with a brand new transaction to read the message in question.


So, how can I be sure the worker will see that new row? I'm doing a commit with the empty transaction the worker has, as soon as it's called, and it seems to be working, but is there any better way?

The worker should wait for a request from the script in a non-transactional state, without a Session.  A request from the script comes in- the worker starts a new Session to respond to that request, hence new transaction.   Thinking about transaction demarcation in reverse still seems to suggest that this worker is leaving a dormant connection open as it waits for new jobs.

All of that said, this is only based on what you're telling me so far.  There may be many more details here that entirely change how this might have to work.




Pedro Werneck

unread,
Jan 25, 2013, 5:35:36 PM1/25/13
to sqlalchemy
If the script that is searching for status=0 is finding rows that are committed, then the worker that is querying for those rows should be able to see them, unless the worker has been holding open a long running transaction.  
Exactly.
 
Long running transactions here are more of the antipattern.   The worker should ensure it responds to new messages from the status=0 script with a brand new transaction to read the message in question.

That's the point. What's the best way to do that, considering the worker is never updating anything, only reading? Should I commit in the end of every task then, even without anything to commit? Should I start a new session on every call? The commit does that automatically if I'm not using autocommit=True, right?

 
The worker should wait for a request from the script in a non-transactional state, without a Session.  A request from the script comes in- the worker starts a new Session to respond to that request, hence new transaction.   Thinking about transaction demarcation in reverse still seems to suggest that this worker is leaving a dormant connection open as it waits for new jobs.

I'm pretty sure it does. I'm using Flask SQLAlchemy and Celery for the workers. The workers reach the global app for the session and are keeping the connection open, but they do have work almost all the time and never sleep for more than a few secs.


---
Pedro Werneck

Michael Bayer

unread,
Jan 25, 2013, 5:45:19 PM1/25/13
to sqlal...@googlegroups.com
On Jan 25, 2013, at 5:35 PM, Pedro Werneck wrote:

If the script that is searching for status=0 is finding rows that are committed, then the worker that is querying for those rows should be able to see them, unless the worker has been holding open a long running transaction.  
Exactly.
 
Long running transactions here are more of the antipattern.   The worker should ensure it responds to new messages from the status=0 script with a brand new transaction to read the message in question.

That's the point. What's the best way to do that, considering the worker is never updating anything, only reading? Should I commit in the end of every task then, even without anything to commit? Should I start a new session on every call? The commit does that automatically if I'm not using autocommit=True, right?

just do it like this:


def receive_some_request(args):
    session = Session(some_engine)   # "connect" to the database (in reality, pulls a connection from a pool as soon as the Session is used to emit SQL)
    try:
       .. do things with session ...
        session.commit()    # if you have data to commit
    finally:
       session.close()   # close what was opened above.

just like it were a plain database connection.  that's per request received by your worker.




 
The worker should wait for a request from the script in a non-transactional state, without a Session.  A request from the script comes in- the worker starts a new Session to respond to that request, hence new transaction.   Thinking about transaction demarcation in reverse still seems to suggest that this worker is leaving a dormant connection open as it waits for new jobs.

I'm pretty sure it does. I'm using Flask SQLAlchemy and Celery for the workers. The workers reach the global app for the session and are keeping the connection open, but they do have work almost all the time and never sleep for more than a few secs.


---
Pedro Werneck

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To post to this group, send email to sqlal...@googlegroups.com.

Pedro Werneck

unread,
Jan 25, 2013, 8:02:41 PM1/25/13
to sqlal...@googlegroups.com
That works, but now I'll have to change how my models use the session.

Would this all be solved if I just use READ COMMITTED transaction isolation?
--
---
Pedro Werneck

Michael Bayer

unread,
Jan 25, 2013, 9:04:50 PM1/25/13
to sqlal...@googlegroups.com
On Jan 25, 2013, at 8:02 PM, Pedro Werneck wrote:

That works, but now I'll have to change how my models use the session.

hmm, is that because your model objects themselves are controlling the scope of the transaction ?    That's another pattern I don't really recommend...



Would this all be solved if I just use READ COMMITTED transaction isolation?

maybe?   If the problem is really just exactly those rows needing to be visible.  But the long running "dormant" transaction thing is still kind of an antipattern that will generally have negative effects.




To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.

Pedro Werneck

unread,
Jan 26, 2013, 8:16:30 AM1/26/13
to sqlalchemy
hmm, is that because your model objects themselves are controlling the scope of the transaction ?    That's another pattern I don't really recommend...


As I mentioned, I'm using Flask-SQLAlchemy, where I have a global db instance of the SQLAlchemy class, which holds the engine, current session, session factory, etc.

All models have high-level methods for save, delete, revert, etc, which use the global db.session. This isn't an issue for the web part of the application, because Flask creates a new session for each request context, but the workers are outside the request context and the tasks use the db.session directly, either through the models or by themselves. The session created by Flask on the request is a subclass of the scoped session with some signal handling extras.

So, your solution works, but to implement that without changing everything, I have to replicate whatever the high-level methods do using that local session created when the task is called. I think it might be possible to create a request context on each task call, so everyone will have a fresh session on the global db.session, as if it were a web request, but I'll have to go into Flask internals to figure how to do that.
 

Would this all be solved if I just use READ COMMITTED transaction isolation?

maybe?   If the problem is really just exactly those rows needing to be visible.  But the long running "dormant" transaction thing is still kind of an antipattern that will generally have negative effects.  


Well... I do realize that, but unfortunately it's an application with many bad design decisions, but it has to work somehow until we can afford fixing everything.



Thanks a lot!
 
---
Pedro Werneck

Pedro Werneck

unread,
Jan 26, 2013, 9:08:42 AM1/26/13
to sqlalchemy

Well... the solution really is to use Flask context...

Instead of:

def receive_some_request(args):
    session = Session(some_engine)   # "connect" to the database (in reality, pulls a connection from a pool as soon as the Session is used to emit SQL)
    try:
       .. do things with session ...
        session.commit()    # if you have data to commit
    finally:
       session.close()   # close what was opened above.


It just needs:

def receive_some_request(args):
    with app.app_context()
       .. do things with session ...


And it manages the session on the background, like it does for a web request.


Problem solved. Thanks a lot for the help.

--
---
Pedro Werneck
Reply all
Reply to author
Forward
0 new messages