a quick status

3 views

Skip to first unread message

Shane Caraveo

unread,

Aug 17, 2010, 2:52:35 PM8/17/10

to raindr...@googlegroups.com

Since we'll be traveling, I thought I'd mention what I've been working on.

Yesterday I got raindrop working with multiple databases and a
separation of sqlalchemy metadata and sessions per request.

Some notes on what is happening:

- The api server will work in either single user mode or multi user
mode, with the default as single user mode.

- there is an openid middleware and a database middleware, the database
middleware is responsible for setting per-request the session/metadata,
the BaseController is no longer involved.

- the auth api's are not in a controller, so will not show up in the
/docs api, we'll have to solve that somehow

- There is a separate central database with a single table that maps a
user id (generally the openid identifier) to a json string. The json
string can contain whatever, but two items I currently use are the dburl
and bloburl. The database middleware gets that data and initializes the
session. The table can be pre-populated with the json config data
allowing gozer to pre-create a bunch of mysql databases. When using
sqlite, the databases are automatically created.

- I'm thinking sync data (e.g. last sync per service for a user) should
be in the central database as well to make it easier for the backend to
schedule sync.

- the recent queue changes have me temporarily broken (*explanation
below), I think we need separate application init and session init at
this point, or the model initialization for the extensions need to
otherwise be separated from extension loading. Another option (since
the queue isn't really necessary for the api server), is that we just
have separate initialization code for the command line and the api
server. I'll be working out a quick fix so I can move on.

- I have to make some adjustments to the command line processes to work
with the multiple database setup before I push any changes, otherwise
everyone would likely be hosed for a bit.

* raindrop.runner.initialize is initializing with both db models and
queue, db may be initialized more than once in a single process for
multiple databases, but the queue cannot be. Q: does the queue deal
with multiple different users for send?

Shane

Mark Hammond

unread,

Aug 17, 2010, 7:50:28 PM8/17/10

to raindr...@googlegroups.com, Shane Caraveo

On 18/08/2010 4:52 AM, Shane Caraveo wrote:
> Since we'll be traveling, I thought I'd mention what I've been working on.
>
> Yesterday I got raindrop working with multiple databases and a
> separation of sqlalchemy metadata and sessions per request.

Excellent!

> - The api server will work in either single user mode or multi user
> mode, with the default as single user mode.

For interest, why is this? Is there a significant penalty in multi-user
mode? If not, I wonder if multi-user mode should be the only mode,
simply to keep fewer code paths we don't hit in development?

> - there is an openid middleware and a database middleware, the database
> middleware is responsible for setting per-request the session/metadata,
> the BaseController is no longer involved.

Does the fact openid is in middleware imply that the pylons part of
raindrop will not do the openid stuff (ie, that require_auth param will
be deleted)? A problem with the term middleware is that it means
different things to different people/problems, so maybe I'm missing
something here.

I'm also interested to know how this will interact with the middleware
Gozer will be using for the hosted version. Indeed, I'm quite in the
dark about what pieces will be where in the current thinking - maybe
Gozer could write up a short summary of his view?

> - I'm thinking sync data (e.g. last sync per service for a user) should
> be in the central database as well to make it easier for the backend to
> schedule sync.

I've no problem with that, but am still trying to get my head around
things - Gozer's middleware will still be responsible for initializing
the sync using a kind of 'round robin' approach but also taking the
current load into account?

> - the recent queue changes have me temporarily broken (*explanation
> below), I think we need separate application init and session init at
> this point, or the model initialization for the extensions need to
> otherwise be separated from extension loading. Another option (since the
> queue isn't really necessary for the api server), is that we just have
> separate initialization code for the command line and the api server.
> I'll be working out a quick fix so I can move on.
>
> - I have to make some adjustments to the command line processes to work
> with the multiple database setup before I push any changes, otherwise
> everyone would likely be hosed for a bit.
>
>
> * raindrop.runner.initialize is initializing with both db models and
> queue, db may be initialized more than once in a single process for
> multiple databases, but the queue cannot be. Q: does the queue deal with
> multiple different users for send?

We need to think about this some more. I expect we want one set of
queues per user, otherwise one user with a huge amount of messages may
wind up monopolizing the queues - eg, if synching one user's email
causes 100000 items to be queued, we don't want the next user with only
100 items to be forced to wait.

Indirectly, the APIs *do* cause items to be queued. So assuming a set
of queues per user, the database and the queues are a single unit - if
the API needs to setup/use a different database for a single request, it
must also setup/use a different set of queues for that request. Thus,
the API *will* need to deal with the fact that both the DB *and* the
queues need to be initialized more than once in a single process.

Of course, this doesn't work very well while our queues are in-process -
which probably means we need to jump on the rabbitmq bandwagon ASAP.
I'm thinking that given rabbitmq is slightly painful to setup, we only
use rabbitmq when in "multi-user" mode, but have a fallback to inproc
queues for single-user mode?

Cheers,

Mark

Shane Caraveo

unread,

Aug 18, 2010, 5:56:10 PM8/18/10

to Mark Hammond, raindr...@googlegroups.com

On 10-08-17 4:50 PM, Mark Hammond wrote:
> On 18/08/2010 4:52 AM, Shane Caraveo wrote:
>> Since we'll be traveling, I thought I'd mention what I've been working
>> on.
>>
>> Yesterday I got raindrop working with multiple databases and a
>> separation of sqlalchemy metadata and sessions per request.
>
> Excellent!
>
>> - The api server will work in either single user mode or multi user
>> mode, with the default as single user mode.
>
> For interest, why is this? Is there a significant penalty in multi-user
> mode? If not, I wonder if multi-user mode should be the only mode,
> simply to keep fewer code paths we don't hit in development?

when you disable the use of openid, there is no authentication data to
tie a user to a database. That is essentially single user mode, I just
strengthened that by enforcing via config that you either must use
openid or not. The penalty is actually on the single user mode in that
it still has to init the db every request. There is more we can
(should) do in that area to reduce the penalty, but even now it's not
that big.

>> - there is an openid middleware and a database middleware, the database
>> middleware is responsible for setting per-request the session/metadata,
>> the BaseController is no longer involved.
>
> Does the fact openid is in middleware imply that the pylons part of
> raindrop will not do the openid stuff (ie, that require_auth param will
> be deleted)? A problem with the term middleware is that it means
> different things to different people/problems, so maybe I'm missing
> something here.

I'm talking about wsgi middleware, which is the common term used when
talking about wsgi. You can have multiple middleware objects that wrap
the actual app in layers. Pylons uses a couple middleware objects by
default, one being an error handler.

> I'm also interested to know how this will interact with the middleware
> Gozer will be using for the hosted version. Indeed, I'm quite in the
> dark about what pieces will be where in the current thinking - maybe
> Gozer could write up a short summary of his view?
>
>> - I'm thinking sync data (e.g. last sync per service for a user) should
>> be in the central database as well to make it easier for the backend to
>> schedule sync.
>
> I've no problem with that, but am still trying to get my head around
> things - Gozer's middleware will still be responsible for initializing
> the sync using a kind of 'round robin' approach but also taking the
> current load into account?

The central database would be shared by both gozers processes and
raindrop. gozer may have additional tables or fields that he will add,
but that shouldn't have any effect on the use inside raindrop.

I was thinking that in the interim, to get things running, the backend
processes can setup the session/metadata as single thread. This solves
a number of issues (e.g. with thread safe metadata, each thread has to
initialize metadata, this is a pain). The backend scripts have to take
a parameter to figure out which user it is being run for.

What I'm not clear about is what queues (other than send) are being
initiated by the api, that will need to be dealt with somehow.

With AMQP, having the multiple queues per user is easy. It might be a
nice project to create an in-process queue system that uses the same api
as the python amqp library, allowing us (and anyone else in a similar
situation) to easily abstract the queue system.