MongoDB setup

17 views
Skip to first unread message

Chris

unread,
Sep 11, 2009, 9:47:16 PM9/11/09
to pylons-discuss
I saw Ben's blog app 'minger' has been built with MongoDB. Thanks for
sharing that. A few questions about mongodb usage with pylons:
- is setting up the mongo connection in lib/app_globals.py the
recommended way to go? (like minger)
- it looks like pymongo has a built-in connection pool (defaults to
1), should that be higher for production sites? Should it match the
number of threads that Paste (or others) will spawn?

Thanks!

Ben Bangert

unread,
Sep 12, 2009, 2:37:28 AM9/12/09
to pylons-...@googlegroups.com
On Sep 11, 2009, at 6:47 PM, Chris wrote:

> I saw Ben's blog app 'minger' has been built with MongoDB. Thanks for
> sharing that. A few questions about mongodb usage with pylons:
> - is setting up the mongo connection in lib/app_globals.py the
> recommended way to go? (like minger)

For connection pools and such, I'd consider app_globals the best place
to hang it. Since app_globals is global to the application, the object
there should be thread-safe of course.

> - it looks like pymongo has a built-in connection pool (defaults to
> 1), should that be higher for production sites? Should it match the
> number of threads that Paste (or others) will spawn?

Ah yes, I suppose I could up it. I actually hadn't realized at the
time that it was using its own connection pool. Ideally it should
probably match however many connections your MongoDB server handling,
and of course, less than or equal to the number of threads that are
being used for the application.

Cheers,
Ben

Chris

unread,
Sep 14, 2009, 11:18:34 PM9/14/09
to pylons-discuss
Ok. Thanks Ben. I have a follow up question about the thread safety
setup.

def __call__(self, environ, start_response):
""" base.py controller. Get mongodb site context aware object
based on subdomain """
conn = environ['pylons.pylons'].app_globals.mongo_conn
site = get_site_context(environ)
self.db = Mongo(mongo_conn, 'myproject', site) # subclass of
pymongo.database, with site context awareness
return WSGIController.__call__(self, environ, start_response)

Here the Mongo(mongo_conn, 'myproject', site) is just a simple
subclass of the pymongo.database, but is aware of the requesting site/
subdomain. This way, if I say self.db.events.save({name:'Concert'}),
self.db knows how to save that document to the requesting site's
collection. (my mongo collections are segregated by site/subdomain).

From my questionable testing, it seems to be thread safe, that is,
docs are getting saved to proper collections. From googling around
though, it seems I may need to use something like threading.local()?
Can anybody point me on the right direction on this? Sorry.. not very
well versed in threading.

Also, also if anybody has a better approach or tips on this, I'm all
ears :)

Thanks,
Chris

Hans Lellelid

unread,
Sep 15, 2009, 1:44:26 PM9/15/09
to pylons-...@googlegroups.com
Hi Chris,


> def __call__(self, environ, start_response):
> """ base.py controller. Get mongodb site context aware object
> based on subdomain """
> conn = environ['pylons.pylons'].app_globals.mongo_conn
> site = get_site_context(environ)
> self.db = Mongo(mongo_conn, 'myproject', site) # subclass of
> pymongo.database, with site context awareness
> return WSGIController.__call__(self, environ, start_response)
>
> Here the Mongo(mongo_conn, 'myproject', site) is just a simple
> subclass of the pymongo.database, but is aware of the requesting site/
> subdomain. This way, if I say self.db.events.save({name:'Concert'}),
> self.db knows how to save that document to the requesting site's
> collection. (my mongo collections are segregated by site/subdomain).
>
> From my questionable testing, it seems to be thread safe, that is,
> docs are getting saved to proper collections. From googling around
> though, it seems I may need to use something like threading.local()?
> Can anybody point me on the right direction on this? Sorry.. not very
> well versed in threading.

I am by no means an expert in thread-safety, but I'll take a stab at
answering here. I'm also not a Pylons expert, so hopefully if I say
something wrong here someone will correct me :)

Basically the thing you need to be concerned about with thread safety is
*shared data* -- things like module-level globals are dangerous, for
example, since it's very possible they could be referenced by different
threads concurrently. *local* variables within functions -- e.g. inside
your __call__() method -- do not need to be thread safe, because only one
thread will ever be in that particular execution context at a time. (Sure
multiple threads could be invoking __call__() concurrently, but they'll
each have their own local vars.)

So, in looking at that code, I would mention a few things. First a couple
assumptions:

1) that the mongo_conn object itself is thread-safe (does not have internal
state that could get overridden by concurrent calls) and
2) that app_globals.mongo_conn was set in a thread-safe way. I assume that
this is happening in some module-level initialization code, which I
*believe* is thread-safe in python (not positive on that point).

Also, you are setting something into the controller's instance:

self.db = Mongo(mongo_conn, 'myproject', site)

Is that sub-class also thread safe? It seems likely that you could have
two different threads in __call__() that are each setting different
yourapp.Mongo instances to the same controller instance var (self.db).
Here's where my knowledge of Pylons is a bit thin: does pylons guarantee a
new instance of your controller object for handling every request (and
hence, per thread)? It may very well, in which case you can ignore my
concerns :) If it doesn't, though, you should probably consider using
thread-local storage for your Mongo instance. You could probably use the
sqlalchemy model for that.

Hans

Chris

unread,
Sep 15, 2009, 7:41:47 PM9/15/09
to pylons-discuss
Thanks Hans. Good tips. I'm slowly learning.. :)

> 1) that the mongo_conn object itself is thread-safe (does not have internal
> state that could get overridden by concurrent calls) and

I'll have to look at the pymongo Connection class and see if it is
thread safe in order to verify that mongo_conn is thread-safe. I may
post over on the mongodb user group to verify.


> 2) that app_globals.mongo_conn was set in a thread-safe way. I assume that
> this is happening in some module-level initialization code, which I
> *believe* is thread-safe in python (not positive on that point).

Yes, mongo_conn is initialized in the lib/app_globals.py Globals
__init__, which should be called just on the application load/launch.
Ben mentioned above, that is area is thread-safe.

> does pylons guarantee a
> new instance of your controller object for handling every request (and
> hence, per thread)?

I think that is a key question ... so I searched around a bit more - I
found this (see Mike Orr's post)
http://groups.google.com/group/paste-users/browse_thread/thread/bcf2ed96f6581f52
""... Pylons assumes its native controllers are not thread
safe, and instantiates one for each request. ...""

So, from that I assume Pylons instantiates a new controller instance
per request.

What I've gathered thus far: each controller will get its own instance
of Mongo assigned to self.db. However, all self.db Mongo instances
share the same reference to the connection object mongo_conn. So, if
something is not thread safe with the connection then I may see thread
related problems (with saving/updating etc). Does that sound right?

I guess, my next step is to determing in pymongo's connection is
thread safe or if I need to use thread local.

(I've been looking at sqlalchemy/pylons code to figure out how they do
threading and whatnot with the model, but for the uninitiated it can
be a little confusing if you don't know what you're looking for.)

Thanks,
Chris

Hans Lellelid

unread,
Sep 15, 2009, 8:23:30 PM9/15/09
to pylons-...@googlegroups.com
Hi Chris,

> Thanks Hans. Good tips. I'm slowly learning.. :)

No prob -- me too! :) On the topic of concurrency, I strongly recommend
Brian Goetz' book: Java Concurrency in Practice. While it's certainly
about Java, a lot of the principles apply pretty directly to Python. Of
course, if you never plan to write a line of Java, you may be able to find
the key points in free online sources. Anyway, I had to implement a java
network server recently, and that was an invaluable guide.



>> 1) that the mongo_conn object itself is thread-safe (does not have
> internal
>> state that could get overridden by concurrent calls) and
>
> I'll have to look at the pymongo Connection class and see if it is
> thread safe in order to verify that mongo_conn is thread-safe. I may
> post over on the mongodb user group to verify.

Yeah. I would /imagine/ that it's threadsafe; that seems to be the trend
for database API libraries in python, but certainly worth checking with
them.

>> 2) that app_globals.mongo_conn was set in a thread-safe way. I assume
> that
>> this is happening in some module-level initialization code, which I
>> *believe* is thread-safe in python (not positive on that point).
>
> Yes, mongo_conn is initialized in the lib/app_globals.py Globals
> __init__, which should be called just on the application load/launch.
> Ben mentioned above, that is area is thread-safe.

Ok -- I guess I missed that comment. I suspected, but that's good to know
for sure.

>> does pylons guarantee a
>> new instance of your controller object for handling every request (and
>> hence, per thread)?
>
> I think that is a key question ... so I searched around a bit more - I
> found this (see Mike Orr's post)
>
http://groups.google.com/group/paste-users/browse_thread/thread/bcf2ed96f6581f52
> ""... Pylons assumes its native controllers are not thread
> safe, and instantiates one for each request. ...""
>
> So, from that I assume Pylons instantiates a new controller instance
> per request.

Good research; that's great to know for the future. So, yes, you should be
all set with your Mongo class given that the controller gets instantiated
for every request.

> What I've gathered thus far: each controller will get its own instance
> of Mongo assigned to self.db. However, all self.db Mongo instances
> share the same reference to the connection object mongo_conn. So, if
> something is not thread safe with the connection then I may see thread
> related problems (with saving/updating etc). Does that sound right?

Yup - you got it.

> I guess, my next step is to determing in pymongo's connection is
> thread safe or if I need to use thread local.
>
> (I've been looking at sqlalchemy/pylons code to figure out how they do
> threading and whatnot with the model, but for the uninitiated it can
> be a little confusing if you don't know what you're looking for.)

Yeah, I concur. Frankly, the Paste internals are really hard to follow,
and SQLAlchemy is not much simpler. Paste adds additional complexity by
using the StackedObjectProxy, which is more than simply a thread local.

The basic idea with thread local (threading.local class in Python) is that
the instance will be created per thread. So, if in your module you put:

from threading import local as ThreadLocal

container = ThreadLocal()

# And then in functions/classes in your code you add stuff to your
container, like:

def function():
container.db = Mongo()
# etc.

Those will all be stored in a thread-local context; i.e. you won't have to
worry about concurrent access to anything you put in your threading.local
instance.

This is certainly getting a bit off-Pylons topic, but another approach
would be to use a lock with global data (e.g. data in app_globals). For
example an RLock (reentrant lock, allows same thread to enter the locked
area but blocks other threads) to manage access to module globals:

# network.py:
servers = []
servers_lock = threading.RLock()

# othermodule.py:
# And then when you want to select or change network.servers (from some
other module), you do this:
with network.servers_lock:
network.servers = enumerate_my_servers()

# Or:
with network.servers_lock:
if serverobj in network.servers:
do_something()

One thing to note with concurrency is that you have to lock on both read
and write (otherwise you could be reading dirty values). Sometimes you
really do need values that are global across threads; this is for those
cases.

Hope that was helpful, not confusing. I do think concurrency needs more
discussion in python. It seems like a lot of people ignore the issue.
Frameworks like Pylons certainly help keep it out of sight, but it is
certainly there behind the scenes and I think deserves some recognition.

Good luck!

Hans

Mike Orr

unread,
Sep 16, 2009, 1:31:04 AM9/16/09
to pylons-...@googlegroups.com
On Tue, Sep 15, 2009 at 5:23 PM, Hans Lellelid <ha...@velum.net> wrote:
>>> does pylons guarantee a
>>> new instance of your controller object for handling every request (and
>>> hence, per thread)?
>>
>> I think that is a key question ... so I searched around a bit more - I
>> found this (see Mike Orr's post)
>>
> http://groups.google.com/group/paste-users/browse_thread/thread/bcf2ed96f6581f52
>> ""... Pylons assumes its native controllers are not thread
>> safe, and instantiates one for each request. ...""
>>
>> So, from that I assume Pylons instantiates a new controller instance
>> per request.

Yes.


>> I guess, my next step is to determing in pymongo's connection is
>> thread safe or if I need to use thread local.
>>
>> (I've been looking at sqlalchemy/pylons code to figure out how they do
>> threading and whatnot with the model, but for the uninitiated it can
>> be a little confusing if you don't know what you're looking for.)
>
> Yeah, I concur.  Frankly, the Paste internals are really hard to follow,
> and SQLAlchemy is not much simpler.  Paste adds additional complexity by
> using the StackedObjectProxy, which is more than simply a thread local.

The PylonsExecutionAnalysis goes through it step by step.
http://wiki.pylonshq.com/display/pylonscookbook/Pylons+Execution+Analysis+0.9.6

There are other dimensions besides threads. For instance, two Pylons
applications, or two instances of the same application, mounted under
different URL prefixes in the same process. StackedObjectProxy
handles both the thread dimension and the application dimension.

I don't know MongoDB, but most database connections are not thread
safe, and SQLAlchemy sessions are not either. The ``meta.Session``
object in the default Pylons/SQLAlchemy configuration is a scoped
session, meaning it's automatically thread-local. We're not sure if
it's safe with multiple application instances in the same process, but
there are rare and nobody has complained. You could put it on
``pylons.app_globals`` to be extra safe, but that makes the model
dependent on the rest of the application.

You can put a connection in ``self`` or ``pylons.c`` and it will be
local to the request. You can do this in the base controller's
.__before__ or .__call__ . This would create a connection for every
request.

If that's too much overhead, you can put a threadlocal object on
``pylons.app_globals``, but you would have to create the threadlocal
yourself. There's a threadlocal constructor somewhere in the Python
stdlib. And you would have to create the connection if it doesn't
exist (i.e., if this is the first request for the thread).

--
Mike Orr <slugg...@gmail.com>

Chris

unread,
Sep 17, 2009, 12:03:06 PM9/17/09
to pylons-discuss
Very good info on the execution details, thanks for putting that
together, super helpful.

I posted over on the mongodb-user group and the developer of pymongo
stated that it should be ok to share the global connection across
threads. I suppose that means it is thread-safe, although it was not
explicitly stated such.
http://groups.google.com/group/mongodb-user/browse_thread/thread/9e98b2c41845c9da

Thanks for yall's help.


On Sep 16, 12:31 am, Mike Orr <sluggos...@gmail.com> wrote:
> On Tue, Sep 15, 2009 at 5:23 PM, Hans Lellelid <h...@velum.net> wrote:
> >>> does pylons guarantee a
> >>> new instance of your controller object for handling every request (and
> >>> hence, per thread)?
>
> >> I think that is a key question ... so I searched around a bit more - I
> >> found this (see Mike Orr's post)
>
> >http://groups.google.com/group/paste-users/browse_thread/thread/bcf2e...
> >> ""... Pylons assumes its native controllers are not thread
> >> safe, and instantiates one for each request. ...""
>
> >> So, from that I assume Pylons instantiates a new controller instance
> >> per request.
>
> Yes.
>
> >> I guess, my next step is to determing in pymongo's connection is
> >> thread safe or if I need to use thread local.
>
> >> (I've been looking at sqlalchemy/pylons code to figure out how they do
> >> threading and whatnot with the model, but for the uninitiated it can
> >> be a little confusing if you don't know what you're looking for.)
>
> > Yeah, I concur.  Frankly, the Paste internals are really hard to follow,
> > and SQLAlchemy is not much simpler.  Paste adds additional complexity by
> > using the StackedObjectProxy, which is more than simply a thread local.
>
> The PylonsExecutionAnalysis goes through it step by step.http://wiki.pylonshq.com/display/pylonscookbook/Pylons+Execution+Anal...
>
> There are other dimensions besides threads.  For instance, two Pylons
> applications, or two instances of the same application, mounted under
> different URL prefixes in the same process.  StackedObjectProxy
> handles both the thread dimension and the application dimension.
>
> I don't know MongoDB, but most database connections are not thread
> safe, and SQLAlchemy sessions are not either.  The ``meta.Session``
> object in the default Pylons/SQLAlchemy configuration is a scoped
> session, meaning it's automatically thread-local.  We're not sure if
> it's safe with multiple application instances in the same process, but
> there are rare and nobody has complained.  You could put it on
> ``pylons.app_globals`` to be extra safe, but that makes the model
> dependent on the rest of the application.
>
> You can put a connection in ``self`` or ``pylons.c`` and it will be
> local to the request.  You can do this in the base controller's
> .__before__ or .__call__ .  This would create a connection for every
> request.
>
> If that's too much overhead, you can put a threadlocal object on
> ``pylons.app_globals``, but you would have to create the threadlocal
> yourself.  There's a threadlocal constructor somewhere in the Python
> stdlib.  And you would have to create the connection if it doesn't
> exist (i.e., if this is the first request for the thread).
>
> --
> Mike Orr <sluggos...@gmail.com>

Mike Orr

unread,
Sep 17, 2009, 5:32:00 PM9/17/09
to pylons-...@googlegroups.com
On Thu, Sep 17, 2009 at 9:03 AM, Chris <fractal...@gmail.com> wrote:
>
> I posted over on the mongodb-user group and the developer of pymongo
> stated that it should be ok to share the global connection across
> threads.  I suppose that means it is thread-safe, although it was not
> explicitly stated such.
> http://groups.google.com/group/mongodb-user/browse_thread/thread/9e98b2c41845c9da

It may be that key-value databases / document-oriented databases don't
have the threading problems relational databases do. From my little
understanding of CouchDB, all writes modify entire records atomically,
and all reads generate a snapshot of data which is then off-line as
far as the database is concerned.

Whereas in a relational database, writes may modify individual fields
in a record, and reads may keep a stateful pointer into the database
(if you use that USE_RESULT option, which few people do). Plus a
transaction may be held open for an arbitrarily long time, and you
have to join tables, and SQLAlchemy has to synchronize database
records to Python instances.

On the other hand, Durus is not thread safe without a daemon, and Zope
needs an extra layer in multithreaded environments.

--
Mike Orr <slugg...@gmail.com>

Reply all
Reply to author
Forward
0 new messages