server-less multi-process concurrency for durus

Matthew Scott

unread,

Sep 15, 2009, 12:28:13 AM9/15/09

to durus...@mems-exchange.org, Schevo List

David (and anyone else interested),

I'd like you to nitpick this idea I have to add server-less, multi-
process concurrency to Durus.

Goal: Allow more than one process and/or thread on a single machine
to reliably access and write to a Durus file, without having to
maintain a separate process for a Durus server, and without having to
install more Python packages.

I'll present several actions that a process or thread might take, and
how Durus would behave to support those actions.

Please let me know if you think this will work, any details or quirks
you can think of that might get in the way, etc. In return I will put
some time into this to see if I can produce a working patch.

== Lock file ==
Kept as a separate file, "mydatabase.durus.lock" being locked only
during write operations.
This is to allow all processes to at least read up to a "known good"
EOF marker while another process is writing a transaction.

== Packing ==
I haven't thought through this part as of yet, but I'm not worried
about it at the moment.

== Initial opening of a file ==
1. seek(SEEK_END), tell(), keep as current EOF offset
2. read initial state from file, create in-memory state

== Retrieve from a file that has not been updated ==
1. seek(SEEK_END), tell(), offset is the same as the current EOF offset
2. read requested object

== Retrieve from a file that has been updated ==
1. seek(SEEK_END), tell(), offset is different from the current EOF
offset
2. read records from current EOF offset to new EOF, update in-memory
state
3. read requested object

== commit() objects to a file that has not been updated ==
1. acquire exclusive lock
2. seek(SEEK_END), tell(), offset is the same as the current EOF offset
3. write records for new objects, update current EOF to new EOF
4. upon commit or rollback, release lock

== commit() objects to a file that has been updated ==
1. acquire exclusive lock
2. seek(SEEK_END), tell(), offset is different from current EOF offset
3. read records from current EOF offset to new EOF, update in-memory
state
a. if conflict, raise WriteConflictError
b. otherwise, write records for new objects, update current EOF to
new EOF
4. always, release lock

== Perform a consistent read across a data set ==
1. call pause() on the Durus connection
2. seek(SEEK_END), tell(), if offset is different from current EOF
offset, read records and update in-memory state
3. read requested objects, never doing seek/tell dance
... some time later ...
4. call continue() on the Durus connection

Thanks!

- Matt

Matthew Scott

unread,

Sep 15, 2009, 12:48:26 AM9/15/09

to Schevo List

To elaborate on this in the context of how Schevo 3.1 would use this
in a multi-threaded or multi-process environment:

- Each thread or process would open the file for itself, and keep its
own information about state.
- Two new methods would be added to a Schevo database:
- pause() would disable transaction execution but allow consistent
reads
- continue() would re-enable transaction execution but follow writes
made by other processes
- execute() would attempt to execute a transaction, retrying several
times (configurable) if a write lock or conflict occurs

Matthew Scott

unread,

Sep 15, 2009, 1:05:23 AM9/15/09

to durus...@mems-exchange.org, Schevo List

FYI

I will be trying this out soon, using these branches:

http://github.com/11craft/durus/tree/concurrency

http://github.com/11craft/schevo/tree/durus-concurrency

On Mon, Sep 14, 2009 at 21:28, Matthew Scott <gldn...@gmail.com> wrote:

David (and anyone else interested),

I'd like you to nitpick this idea I have to add server-less, multi-process concurrency to Durus.

--
Matthew R. Scott

Etienne Robillard

unread,

Sep 15, 2009, 8:57:30 AM9/15/09

to sch...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthew,

Here's some comments in response to your latest post surrounding
Schevo's concurrency issues! :-)

1. epoll: Linux (2.6+) supports the epoll() system call, so that might
be useful to implement some kind of polling/non-blocking io mecanism for
accessing the Durus database file across multiple threads/processes.

2. context managers: Can we define a custom context manager to use for
opening the database file more safely ?

3. Durus user/group: Add support for a Schevo or Durus user/group on
Linux. Maybe not immediately related to implementing a thread-safe
Schevo but could be used to enforce Unix permissions, etc.

4. Allow Schevo to use epoll/whatever at configure time if threads
are enabled. However this would perhaps requires some changes to the
Schevo build system..

http://kovyrin.net/2006/04/13/epoll-asynchronous-network-programming/
http://docs.python.org/reference/datamodel.html#context-managersmplement

Cheers,

Etienne

- --
Etienne Robillard <robillar...@gmail.com>
Green Tea Hackers Club <http://gthc.org/>
Blog: <http://gthc.org/blog/>
PGP Fingerprint: 178A BF04 23F0 2BF5 535D 4A57 FD53 FD31 98DC 4E57
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqvjzoACgkQ/VP9MZjcTldShwCfXAppUMb2Gx/WZq/ldFLEH1nu
IgsAoIknqUApdt/VNeceN6D4uEJxHcn7
=yPaH
-----END PGP SIGNATURE-----

Matthew Scott

unread,

Sep 15, 2009, 12:02:24 PM9/15/09

to Binger David, durus...@mems-exchange.org, Schevo List

David,

Thanks for the feedback.

On Tue, Sep 15, 2009 at 03:28, Binger David <dbi...@mems-exchange.org> wrote:

The net improvement, then, is that you don't need to maintain a separate process for the durus server.
Maintaining the process seems like no inconvenience at all: am I right that it is having to *start* the
separate process that is the requirement that we would like to eliminate?

Somewhat correct.

Specifically, I'd like to be able to work with one or more Durus databases directly as files during development, but have the same concurrency semantics as one would expect to get with a client/server arrangement.

For instance, the way I perform tests with Durus databases is to create and destroy a database for each test case. This works rather well for me. Having to start a server in a separate thread or process, connect to it, use it, shut it down, for each test case seems like it would add a bit of overhead.

Additionally, in some of my tests I reset some objects to an initial state and keep other objects around, rather than a full destroy/recreate. This dictates the use of multiple Durus files which from my understanding is only feasible with multiple Durus servers.

An alternative approach would be to have every client attempt to start a server process whenever
none is present, and this attempt will just fail for all except the first. Then the human management can avoid
thinking about the server process.

This could pose a "handoff" problem, where in process A (an application) starts the Durus server, process B (a Python shell inspecting some things under the hood) connects to the server started by process A, then A shuts down, taking the Durus server with it, now process B is left hanging without Durus.

Not a problem with a deployment where one would have a long-running server process, but a potential problem with a more chaotic development or desktop-app environment.

== Packing ==
I haven't thought through this part as of yet, but I'm not worried about it at the moment.

I seem to recall that this is one of the more difficult issues to solve.
It is important, too, so I'd suggest worrying about it now.

You got me there. :) It would be quite a dance indeed, and I still haven't thought of what the sequence of operations could be.

The other hard issue is oid allocation. ShelfStorage gets a big advantage
from having the entire space of oids be contained within a compact range
of values, so you can't just spread oids out. And of course, there must be
some way to make sure that no two clients use the same new oid.

This is a good point. Thank you for reminding me of these kinds of details. :)

== commit() objects to a file that has been updated ==
1. acquire exclusive lock
2. seek(SEEK_END), tell(), offset is different from current EOF offset
3. read records from current EOF offset to new EOF, update in-memory state
a. if conflict, raise WriteConflictError

This can be done, but it sounds easier than it really is. You'll need to
read the tail, find the oids, and make sure that none of them have states
loaded in your cache during this transaction. It isn't enough just to
look for conflict with the oids you are writing. This could potentially be
a slow operation, and it has a cost that the server-based durus avoids.

I'm guessing that using a Durus server avoids this because the Durus server keeps such information in memory?

I can see where this would be a slowdown across several processes. If 4 processes are working on the same file, and one of them writes, then 3 processes now have to read+process the tail.

== Perform a consistent read across a data set ==
1. call pause() on the Durus connection
2. seek(SEEK_END), tell(), if offset is different from current EOF offset, read records and update in-memory state

You must also check for conflicts here and raise an exception if any of the loaded objects
have a new state.

Any time that the length of the file has changed, you must read the oids,
process them as invalidations, and raise the conflict exception if any of
the changed or new oids have state that is loaded into your memory.

This scenario was more to support a situation where you are querying the database in a readonly manner, but where the query might last for a period of time during which a write occurs.

So, rather than read and invalidate on each file length change, we'd just "pretend" that the file isn't growing at all, and perform an analysis on a snapshot of the database. When the client code was finished, it would call continue(), at which point the database state would be allowed to sync with latest changes -- client code wouldn't care though, since it is done with its analysis.

However, as this would add new semantics and API to Durus, perhaps this is something that would be best tackled some other time. :)

--
Matthew R. Scott

Matthew Scott

unread,

Sep 16, 2009, 8:12:42 PM9/16/09

to sch...@googlegroups.com

On Tue 2009-09-15, at 05:57, Etienne Robillard wrote:

> 1. epoll: Linux (2.6+) supports the epoll() system call, so that might
> be useful to implement some kind of polling/non-blocking io mecanism
> for
> accessing the Durus database file across multiple threads/processes.

The server side of Duruses uses cogen under the hood, which takes
advantage of epoll, kqueue, and iocp depending on platform and
availability.
http://code.google.com/p/cogen/

I tested various client/server combinations using Mac OS X 10.6,
Ubuntu 9.04, and Windows XP SP3, and was pleased to see that it works
fine across the whole spectrum.

For now, the client side is synchronous using built-in sockets, since
it matches the synchronous nature of the Schevo client API.

Multiple threads will be served by having thread-local connections,
e.g. maintain a worker threadpool where each thread will have its own
connection. This fits in with the current breed of WSGI servers nicely.

Multiple processes will simply need to create a connection to the
server and use it.

> 3. Durus user/group: Add support for a Schevo or Durus user/group on
> Linux. Maybe not immediately related to implementing a thread-safe
> Schevo but could be used to enforce Unix permissions, etc.

Because it's over the wire, I'm going to add some basic authentication
to the Duruses protocol before I do any public deployments.

Thanks for your input. I hope you'll be a beneficiary of this work. :)

Etienne Robillard

unread,

Sep 16, 2009, 9:34:16 PM9/16/09

to sch...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks Matthew,

Looking forward to checking out this in Schevo. Will this changes
anything to Schevo transaction methods and for opening database files ?

Cheers,

Etienne

Matthew Scott wrote:
>

- --
Etienne Robillard <robillar...@gmail.com>
Green Tea Hackers Club <http://gthc.org/>
Blog: <http://gthc.org/blog/>
PGP Fingerprint: 178A BF04 23F0 2BF5 535D 4A57 FD53 FD31 98DC 4E57

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqxkhgACgkQ/VP9MZjcTldNtgCfR6inMfuc+EXFszHaDBzFfRPL
twgAnip6w+AQLamTVsmzdAYQp7UyE9vF
=nM27
-----END PGP SIGNATURE-----

Matthew Scott

unread,

Sep 16, 2009, 11:47:04 PM9/16/09

to sch...@googlegroups.com

On Wed, Sep 16, 2009 at 18:34, Etienne Robillard <robillar...@gmail.com> wrote:

Looking forward to checking out this in Schevo. Will this changes
anything to Schevo transaction methods and for opening database files ?

There will be no changes to the API for working with an open database.

There is a change to how you specify the name of a database. Instead of using the concept of a filename, an optional backend name, and optional backend arguments, an URL is now used.

For example, consider creating a new database:

schevo db create --app=myapp file1.db

With the new URL-based scheme, you'd use this instead:

schevo db create --app=myapp durus:///file1.db

This opens the door for handling connections to a server. Here's how you'd create a database named "data1" on a Duruses server running on host "dbserver":

schevo db create --app=myapp duruses://dbserver/data1

That said, there are bits for convenience and backwards-compatibility — for instance, when using just a filename as in the first example, "file1.db" is translated behind the scenes to "durus:///file1".

--
Matthew R. Scott

Reply all

Reply to author

Forward