pushed multi-user changes

Mark Hammond

unread,

Aug 23, 2010, 5:01:42 AM8/23/10

to raindr...@googlegroups.com

I pushed quite a few changes relating to multi-user mode. The key
features are:

* Every user gets their own set of queues.
* All cmd-line scripts support a --userid option so you can specify a
specific user to use. The default (of None) means the cmdline scripts
use the default single-user databases as before.
* Storm works fully in multi-user mode, all tests pass.

This has been done with some raindropalchemy.

* A new module raindrop.context defines 2 magic 'contexts' - a
'UserContext' and a 'StorageContext' Each thread has its own
UserContext, but all threads which have the same user specified in its
UserContext shares a StorageContext. This was all done in the flavor of
sqlalchemy's scoped_storage implementation.

* A UserContext stores the user-id (a string/int) and the user-config
(the location of the blob stores).

* A StorageContext owns a sqlalchemy 'engine', blob storage and a queue
manager.

Thus - all threads which have the same user-id share the same engine and
the same set of queues. I believe they could later share the metadata,
removing the need for ThreadLocalMetaData.

* Just like sqlalchemy, these Contexts are hidden behind some magic
'global variables". Eg, any thread can say 'UserContext.id' to get the
user-id they are working with, or StorageContext.engine and
StorageContext.q_mgr to get the engine and queue manager for that user.
As a result, the get_q_mgr() global has been removed (replaced with
StorageContext.q_mgr.

* When storm is run in single-user mode, a light-weight 'database'
middleware is used in place of the multi-user aware one. This
middle-ware unconditionally associates each new request with the
single-user user-ID, under the assumption that the UserContext has
already been setup (ie, no database bootstrapping etc is done as part of
the request)

* When storm is run in multi-user mode, Shane's database middleware
layer is used which makes no assumptions about the state of the user
database for the request - if necessary, StorageContext for the user is
configured as part of the request.

Note that the intent above is for a massively-scaled raindrop to use a
lighter-weight multi-user aware middleware layer under the assumption
that the auth layer and DB creation layer is done outside raindrop (ie,
by Gozer's middleware layers using memcached etc to lookup the per-user
databases etc.)

* Some minor 'bootstrap' issues remain - eg, if you manually delete a
user's DB while the record remains in the central DB, it doesn't get
bootstrapped.

There might be a few loose ends I've missed - please let me know if I
broke anything...

Cheers,

Mark

Shane Caraveo

unread,

Aug 23, 2010, 5:05:51 PM8/23/10

to raindr...@googlegroups.com, Mark Hammond

On 10-08-23 2:01 AM, Mark Hammond wrote:
> I pushed quite a few changes relating to multi-user mode. The key
> features are:

cool! I still have to have a session init or something to configure the
the user when the user database is not prefilled.

> Thus - all threads which have the same user-id share the same engine and
> the same set of queues. I believe they could later share the metadata,
> removing the need for ThreadLocalMetaData.

I'd have to dig through the code again, but if I recall correctly,
metadata can only be global or per thread, much like Session.configure
will configure all new sessions rather than just the sessions for that
thread.

> * Just like sqlalchemy, these Contexts are hidden behind some magic
> 'global variables". Eg, any thread can say 'UserContext.id' to get the
> user-id they are working with, or StorageContext.engine and
> StorageContext.q_mgr to get the engine and queue manager for that user.
> As a result, the get_q_mgr() global has been removed (replaced with
> StorageContext.q_mgr.
>
> * When storm is run in single-user mode, a light-weight 'database'
> middleware is used in place of the multi-user aware one. This
> middle-ware unconditionally associates each new request with the
> single-user user-ID, under the assumption that the UserContext has
> already been setup (ie, no database bootstrapping etc is done as part of
> the request)
>
> * When storm is run in multi-user mode, Shane's database middleware
> layer is used which makes no assumptions about the state of the user
> database for the request - if necessary, StorageContext for the user is
> configured as part of the request.
>
> Note that the intent above is for a massively-scaled raindrop to use a
> lighter-weight multi-user aware middleware layer under the assumption
> that the auth layer and DB creation layer is done outside raindrop (ie,
> by Gozer's middleware layers using memcached etc to lookup the per-user
> databases etc.)

The auth layer will not be run off of gozer's side, but mysql db
creation will happen outside. Those db's will be empty, so raindrop is
still the driver on table creation/migration.

> * Some minor 'bootstrap' issues remain - eg, if you manually delete a
> user's DB while the record remains in the central DB, it doesn't get
> bootstrapped.

I fixed initializing the db for multi user in storm, and I think
bootstrap is only necessary if importing accounts from the old .raindrop
file.

Shane

Mark Hammond

unread,

Aug 23, 2010, 6:39:38 PM8/23/10

to Shane Caraveo, raindr...@googlegroups.com

On 24/08/2010 7:05 AM, Shane Caraveo wrote:
>> Thus - all threads which have the same user-id share the same engine and
>> the same set of queues. I believe they could later share the metadata,
>> removing the need for ThreadLocalMetaData.
>
> I'd have to dig through the code again, but if I recall correctly,
> metadata can only be global or per thread, much like Session.configure
> will configure all new sessions rather than just the sessions for that
> thread.

That may turn out to be true, but my reading of the thread-local
implementation is that the only real difference between a normal metdata
and a thread-local metadata is that it keeps the binding in a
threading.local object - so ThreadLocalMetaData starts off in each
thread as unbound. Thus, it implies to me that it should be possible to
arrange for a very similar arrangement, but instead of binding per
thread we can bind per StorageContext.

OTOH, I'm not sure it would actually buy us that much - binding seems
inexpensive - the fact we are not creating a new engine per thread is
probably the biggest win.

>> Note that the intent above is for a massively-scaled raindrop to use a
>> lighter-weight multi-user aware middleware layer under the assumption
>> that the auth layer and DB creation layer is done outside raindrop (ie,
>> by Gozer's middleware layers using memcached etc to lookup the per-user
>> databases etc.)
>
> The auth layer will not be run off of gozer's side, but mysql db
> creation will happen outside. Those db's will be empty, so raindrop is
> still the driver on table creation/migration.

Can you explain in more detail why this is the case? I would have
thought it might make sense in large installations to have the API
servers do only the API serving and leave the auth layers - and even the
DB selection - to higher parts in the stack.

It makes sense to me that these higher layers can do the memcache work
and DB failover/rotation work and avoid adding that complexity to the
API servers, plus it allows more flexibility in scaling just parts the
parts of the system which are the bottlenecks - or do you envisage all
of that becoming part of storm itself?

Cheers,

Mark

Shane Caraveo

unread,

Aug 23, 2010, 9:24:36 PM8/23/10

to raindr...@googlegroups.com, Mark Hammond

On 10-08-23 3:39 PM, Mark Hammond wrote:

>>> Note that the intent above is for a massively-scaled raindrop to use a
>>> lighter-weight multi-user aware middleware layer under the assumption
>>> that the auth layer and DB creation layer is done outside raindrop (ie,
>>> by Gozer's middleware layers using memcached etc to lookup the per-user
>>> databases etc.)
>>
>> The auth layer will not be run off of gozer's side, but mysql db
>> creation will happen outside. Those db's will be empty, so raindrop is
>> still the driver on table creation/migration.
>
> Can you explain in more detail why this is the case? I would have
> thought it might make sense in large installations to have the API
> servers do only the API serving and leave the auth layers - and even the
> DB selection - to higher parts in the stack.
>
> It makes sense to me that these higher layers can do the memcache work
> and DB failover/rotation work and avoid adding that complexity to the
> API servers, plus it allows more flexibility in scaling just parts the
> parts of the system which are the bottlenecks - or do you envisage all
> of that becoming part of storm itself?

The same work has to be done regardless of where it is handled, that is
the auth dance and tying the auth to a db. The beaker session, which is
the important part of the auth implementation here, can be backed by
memcache and a couple other backends.

Having the authentication built into the app simplifies local testing
and dev, makes a multi-user raindrop easily installable by others. It's
also one of the least of our concerns in regards to scalability.

Shane

Reply all

Reply to author

Forward