scoped_session, sessions and thread safety

1,308 views
Skip to first unread message

Mehdi GMIRA

unread,
Mar 30, 2016, 12:37:50 PM3/30/16
to sqlalchemy
I've read a lot of stuff on scoped_session, thread safety, and sessions, and i just don't get it.
For me, a session is just a "wrapper" around the actual database behind it. And databases do not like concurrent updates of the same row within multiple transactions.
So, it is my understanding that you should never try to change the same row within multiple threads. 
And this is a limitation that is more strict than just having thread safe sessions.
For example, i think that something like this is unsafe, even though the session used is thread safe (because item n°3 is present in two threads):

import threading


def worker1():
    items
= scoped_session.query(Item).filter(Item.id.in_([1,2,3]))
   
# do some stuff with items


def worker2():
    items
= scoped_session.query(Item).filter(Train.id.in_([3,4,5]))
   
# do some stuff with items


threads
= []


t1
= threading.Thread(target=worker1)
t2
= threading.Thread(target=worker2)
t1
.start()
t2
.start()

So, i don't see the point of keeping one session per thread. There is no harm in sharing the session, as long as the rows behind it are not used by multiple threads.

I'm a little bit confused and some explanations/examples would be much appreciated :)

Mike Bayer

unread,
Mar 30, 2016, 1:00:16 PM3/30/16
to sqlal...@googlegroups.com


On 03/30/2016 12:37 PM, Mehdi GMIRA wrote:
> I've read a lot of stuff on scoped_session, thread safety, and sessions,
> and i just don't get it.
> For me, a session is just a "wrapper" around the actual database behind
> it. And databases do not like concurrent updates of the same row within
> multiple transactions.
> So, it is my understanding that you should never try to change the same
> row within multiple threads.
> And this is a limitation that is more strict than just having thread
> safe sessions.
> For example, i think that something like this is unsafe, even though the
> session used is thread safe (because item n°3 is present in two threads):
>
> |
> importthreading
>
>
> defworker1():
> items =scoped_session.query(Item).filter(Item.id.in_([1,2,3]))
> # do some stuff with items
>
>
> defworker2():
> items =scoped_session.query(Item).filter(Train.id.in_([3,4,5]))
> # do some stuff with items
>
>
> threads =[]
>
>
> t1 =threading.Thread(target=worker1)
> t2 =threading.Thread(target=worker2)
> t1.start()
> t2.start()
> |
>
> So, i don't see the point of keeping one session per thread. There is no
> harm in sharing the session, as long as the rows behind it are not used
> by multiple threads.
>
> I'm a little bit confused and some explanations/examples would be much
> appreciated :)

The Session object itself is not thread safe. It is a stateful object
and it does not use any mutexing when it manipulates this internal
state. If you run multiple threads on it without applying mutexing to
all Session operations (which includes all queries and all attribute
access on all objects, since these trigger lazy loads and flushes), this
state will be corrupted. At the very least you will get lots of
warnings and errors since the Session does check for a few concurrency
situations that can occur even without multiple threads in use, see
example below.

Additionally, the database transaction itself can only do one operation
at a time, so the DBAPI connection also has to mutex appropriately in
order to ensure SQL operations run correctly; if the backend database
requires a separate "fetch the last inserted ID" step for an INSERT,
this can also be corrupted by concurrent threads.


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)

e = create_engine("postgresql://scott:tiger@localhost/test")
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

s = Session(e)

import threading
import time
import random
import warnings

warnings.simplefilter("always")

def go():
while True:
time.sleep(1.5 * random.random())
try:
s.add_all([A() for j in range(5)])
s.flush()
except Exception as e:
print "Exception! %s" % e

workers = [threading.Thread(target=go) for i in range(10)]
for worker in workers:
worker.start()

for worker in workers:
worker.join()



output:

SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
SAWarning: Usage of the 'Session.add_all()' operation is not currently
supported within the execution stage of the flush process. Results may
not be consistent. Consider using alternative event listeners or
connection-level operations instead.
Exception! Session is already flushing
Exception! Session is already flushing







>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Mehdi GMIRA

unread,
Mar 31, 2016, 3:30:53 AM3/31/16
to sqlalchemy
Thanks for the reply !
Ok, so if i understood what you said, the advice to use only one session per thread is related to the fact that the session's internals are not thread safe. It has nothing to do with the backend database's concurrency constraint. 

Some more questions:

1) One limitation that i find to the scoped_session is that you're limited to exactly one session by thread. For example if i have a script update.py that is scheduled in a crontab, and that does some work in the database (multiple commits). Suppose I want to make sure that only one script update.py runs at a time. What i would want to do is start a transaction a the beginning of update.py and lock a row in some table. When the script is over i would commit. This way, if an other update.py starts it will fail because it cannot get the lock.
The problem is that if i use a scoped sessions, I cannot have independant transactions (any commit in update.py will unlock the row). Can a scoped session work in a case like this ?

2) when i use a session, it's usually within a context manager:
with session_ctx_manager() as s:
    do_some_work
(s)

where session_ctx_manager is smth like this:
@contextmanager
def session_ctx_manager():
    s
= Session()
   
try:
       
yield s
   
finally:
        s
.remove()

       

Is it a bad idea to design an application where I make sure not to spawn/switch thread when i'm inside the session_ctx_manager ? 
This would allow me to solve problem #1 because I can create as many independant sessions as i want within the same thread and commit them when i want to.


Mike Bayer

unread,
Mar 31, 2016, 9:40:15 AM3/31/16
to sqlal...@googlegroups.com


On 03/31/2016 03:30 AM, Mehdi GMIRA wrote:
> Thanks for the reply !
> Ok, so if i understood what you said, the advice to use only one session
> per thread is related to the fact that the session's internals are not
> thread safe. It has nothing to do with the backend database's
> concurrency constraint.

the database DBAPI itself might also not be thread safe, and as
mentioned before, all DB operations have to be serialized for one
transaction anyway.

>
> Some more questions:
>
> 1) One limitation that i find to the scoped_session is that you're
> limited to exactly one session by thread.

that's not true at all, make as many sessions as you want from the
sessionmaker() inside of it (or just use Session())

For example if i have a script
> update.py that is scheduled in a crontab, and that does some work in the
> database (multiple commits). Suppose I want to make sure that only one
> script update.py runs at a time. What i would want to do is start a
> transaction a the beginning of update.py and lock a row in some table.
> When the script is over i would commit. This way, if an other update.py
> starts it will fail because it cannot get the lock.
> The problem is that if i use a scoped sessions, I cannot have
> independant transactions (any commit in update.py will unlock the row).
> Can a scoped session work in a case like this ?
>
> 2) when i use a session, it's usually within a context manager:
> |
> withsession_ctx_manager()ass:
> do_some_work(s)
> |
>
> where session_ctx_manager is smth like this:
> |
> @contextmanager
> defsession_ctx_manager():
> s =Session()
> try:
> yields
> finally:
> s.remove()
> |
>
>
> Is it a bad idea to design an application where I make sure not to
> spawn/switch thread when i'm inside the session_ctx_manager ?


i dont see any intrinsic issue with spawning threads inside a context
manager any more than if you did so inside of an "if:" statement
> > an email to sqlalchemy+...@googlegroups.com <javascript:>
> > <mailto:sqlalchemy+...@googlegroups.com <javascript:>>.
> > To post to this group, send email to sqlal...@googlegroups.com
> <javascript:>
> > <mailto:sqlal...@googlegroups.com <javascript:>>.
> <https://groups.google.com/group/sqlalchemy>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.

Mehdi GMIRA

unread,
Mar 31, 2016, 10:38:29 AM3/31/16
to sqlalchemy

>
> 1) One limitation that i find to the scoped_session is that you're
> limited to exactly one session by thread.

that's not true at all, make as many sessions as you want from the
sessionmaker() inside of it (or just use Session())


Yes but then you'd have to import explicitly the engine and sessionmaker(), which seems a little bit awkward. What i meant is that once you start using scoped_session instead of Session(), it becomes awkward to create multiple sessions within the same thread.

 

i dont see any intrinsic issue with spawning threads inside a context
manager any more than if you did so inside of an "if:" statement


Isn't there a risk that you'd be passing objects that are session-dependant to other threads ? 

Mike Bayer

unread,
Mar 31, 2016, 8:30:21 PM3/31/16
to sqlal...@googlegroups.com


On 03/31/2016 10:38 AM, Mehdi GMIRA wrote:
>
> >
> > 1) One limitation that i find to the scoped_session is that you're
> > limited to exactly one session by thread.
>
> that's not true at all, make as many sessions as you want from the
> sessionmaker() inside of it (or just use Session())
>
>
> Yes but then you'd have to import explicitly the engine and
> sessionmaker(), which seems a little bit awkward. What i meant is that
> once you start using scoped_session instead of Session(), it becomes
> awkward to create multiple sessions within the same thread.

the session_factory provides this:

session = my_scoped_session.session_factory()



>
>
> i dont see any intrinsic issue with spawning threads inside a context
> manager any more than if you did so inside of an "if:" statement
>
>
> Isn't there a risk that you'd be passing objects that are
> session-dependant to other threads ?

Using a context manager for a block of code doesn't have any impact on
what the code is doing with objects inside that block so I don't see how
those two things are related.

Mehdi GMIRA

unread,
Apr 1, 2016, 7:32:55 AM4/1/16
to sqlalchemy


Le vendredi 1 avril 2016 02:30:21 UTC+2, Mike Bayer a écrit :


On 03/31/2016 10:38 AM, Mehdi GMIRA wrote:
>
>      >
>      > 1) One limitation that i find to the scoped_session is that you're
>      > limited to exactly one session by thread.
>
>     that's not true at all, make as many sessions as you want from the
>     sessionmaker() inside of it (or just use Session())
>
>
> Yes but then you'd have to import explicitly the engine and
> sessionmaker(), which seems a little bit awkward. What i meant is that
> once you start using scoped_session instead of Session(), it becomes
> awkward to create multiple sessions within the same thread.

the session_factory provides this:

session = my_scoped_session.session_factory()


Ok, I wasn't aware of this function. Thanks !
 

>
>
>     i dont see any intrinsic issue with spawning threads inside a context
>     manager any more than if you did so inside of an "if:" statement
>
>
> Isn't there a risk that you'd be passing objects that are
> session-dependant to other threads ?

Using a context manager for a block of code doesn't have any impact on
what the code is doing with objects inside that block so I don't see how
those two things are related.



I think I'm not being clear. What i mean is that If you only use a session as a context manager, you don't have to think about Session() not being thread safe, unless you're inside that context manager. 
But since scoped_session has session_factory(), I think that as you said, scoped_session is indeed the way to go if the application is multithreaded.
Thank you for your patience :)
 

>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages