Databases and threading

Éloi Rivard

unread,

Nov 19, 2019, 5:44:00 AM11/19/19

to zodb

Hello, There are a few things I wonder about ZODB and concurrency, and I am not sure of the answers the documentation gives.

The documentation states tha database connections, transactions managers and transactions are not thread safe. It also states that we cannot share databases between processes. But can we share databases an storages between threads? Is this code safe?

import ZODB, threading
db = ZODB.DB(None)
def work():
    with db.transaction() as conn:
        do_stuff()

threading.Thread(target=work).start()
threading.Thread(target=work).start()

Thank you for your answers

Éloi

Jason Madden

unread,

Nov 19, 2019, 7:58:45 AM11/19/19

to zodb

> On Nov 19, 2019, at 04:44, Éloi Rivard <azm...@gmail.com> wrote:
>
> Hello, There are a few things I wonder about ZODB and concurrency, and I am not sure of the answers the documentation gives.
> The documentation states tha database connections, transactions managers and transactions are not thread safe. It also states that we cannot share databases between processes. But can we share databases an storages between threads?

A Connection instance, and all the persistent objects accessed through it, may be used from exactly one thread (or greenlet!) at a time; the same goes for the transaction/manager associated with the connection.

A DB instance is there to vend Connection instances, and it can do so from any number of threads concurrently. How the DB, Connection, and storage cooperate to make this work are implementation details.

So this example should be fine:

> import ZODB, threading
> db = ZODB.DB(None)
> def work():
> with db.transaction() as conn:
> do_stuff()
>
> threading.Thread(target=work).start()
> threading.Thread(target=work).start()

But this would be broken:

def process_chunk(container, chunk_number):
chunk = container[chunk_number]
# Do stuff with chunk

db = get_the_db()
conn = db.open()
root = conn.root()

threading.Thread(target=process_chunk, args=(1,)).start()
threading.Thread(target=process_chunk, args=(2,)).start()

That's broken because it's sharing the same persistent object (the root) between threads without any sort of locking to ensure that the underlying Connection (hidden in _p_jar), is only used from one thread at a time.

The cross-thread sharing is pretty blatant here. A more subtle trap is shared caches of persistent objects.

cache = {}
db = get_the_db()
def process_chunk(chunk_number):
result = []
conn = db.open()
root = conn.root()
chunk = root[chunk_number]
for part in chunk:
answer = cache.get(part)
if answer is None
answer = part.find_or_compute_answer()
if answer._p_jar is None:
conn.add(answer) # could be new, could be previously saved
cache[part] = answer
result.append(answer)
return result

threading.Thread(target=process_chunk, args=(1,)).start()
threading.Thread(target=process_chunk, args=(2,)).start()

If any of the chunks use the same part, we could be accessing objects from different connections in different threads simultaneously.

Sometimes a shared cache like that can be found at a module level. That's destined to eventually result in a ConnectionStateError, but because of ZODB's caching, it may not be until the application is under significant load in a production setting. (Don't do that.)

Speaking of ZODB's caching, very often many of the cross-threading issues can be hidden by it: if an object is already in memory, there will be no need to use the _p_jar at all and ZODB is taken out of the equation entirely. Then you're just left with the normal concerns about using objects from multiple threads. (All of the examples above are probably that way.) That's all fine and good until an object gets ghosted (usually at the most inopportune time) or your access pattern changes just enough that an object that was usually in memory before, or was never accessed at all before, suddenly has to be loaded. I suggest having your tests frequently make use of DB.cacheMinimize() (e.g., between requests/transactions) to help smoke out issues like that; you can even make adjustments to the DB's connection pool so that sequential requests don't get the same Connection object.

Is all of that to say there's no way to parallelize work using a single Connection and transaction? Not at all. It can be done, but (like anything involving concurrency) it takes careful design. It's certainly easier to stick to the one-thread-per-connection/one-connection-per-thread/nothing-shared-except-the-DB model that's encouraged by default.

HTH,
Jason

Christopher Lozinski

unread,

Nov 19, 2019, 8:05:51 AM11/19/19

to Éloi Rivard, zodb

It is an interesting question.

I am running uwsgi. That supports multiple threads.

In the configuration file, I specify how many threads to support.

How does uwsgi do it? I do not know. So I read the source.

When my applications starts it reads a config file, and creates a db object.

On each request uwsgi opens a new connection. I am using the cromlech framework.

https://github.com/Cromlech/cromlech.zodb/blob/b91d3b7b384cd107e7ea8423f4850fd5fd6749ad/src/cromlech/zodb/middleware.py#L38

So I think that if you open a new connection in each thread, you should be fine.

Furthermore, I think your code has the problem, that you have to first create a connection,

before trying to invoke transaction machinery.

Hope that helps.

Did I get that right?

Chris

--
You received this message because you are subscribed to the Google Groups "zodb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zodb+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zodb/590aa292-4e00-4f26-94a0-8f02f36388fa%40googlegroups.com.

Reply all

Reply to author

Forward