Understanding Pyramid Thread Generation

1,599 views
Skip to first unread message

icook

unread,
Jul 8, 2013, 2:33:44 AM7/8/13
to pylons-...@googlegroups.com
Hello all,

I've been working on a site I'm building with Pyramid for a while now, and I'm hitting a bit of an understanding stumbling block. I've read the docs that mention thread locals and how they interact with requests, but what I'm still fuzzy on is how and when does Pyramid spawn new threads? Is there a new thread for each request? Is there a single master thread that dispatches view callables to run in a new thread? This is how I imagine it working, but short of reading through the code thoroughly I'm at a bit of a loss. While it's probably not critical to know this to reach my end goal, I like to know how stuff ticks.

Oh, and a bit of info on my end goal for those interested. In a nutshell I'm trying to setup some way for my views to send tasks that are too time consuming (big db updates, etc) to a worker thread to be done asynchronously.

Also, if contributors read this I wanted to say thanks. Pyramid is a great framework.

Isaac

Chris McDonough

unread,
Jul 8, 2013, 2:39:33 AM7/8/13
to pylons-...@googlegroups.com
On Sun, 2013-07-07 at 23:33 -0700, icook wrote:
> Hello all,
>
>
> I've been working on a site I'm building with Pyramid for a while now,
> and I'm hitting a bit of an understanding stumbling block. I've read
> the docs that mention thread locals and how they interact with
> requests, but what I'm still fuzzy on is how and when does Pyramid
> spawn new threads? Is there a new thread for each request? Is there a
> single master thread that dispatches view callables to run in a new
> thread? This is how I imagine it working, but short of reading through
> the code thoroughly I'm at a bit of a loss. While it's probably not
> critical to know this to reach my end goal, I like to know how stuff
> ticks.


Pyramid itself doesn't handle any of that. All of the threading is done
by the WSGI server, so it depends which WSGI server you're using. In
general, though, most WSGI web servers are either multiprocess or
multithreaded. The multithreaded ones generally either spawn a new
thread for each request or keep a thread pool around where threads are
reused to perform work.
>
>
> Oh, and a bit of info on my end goal for those interested. In a
> nutshell I'm trying to setup some way for my views to send tasks that
> are too time consuming (big db updates, etc) to a worker thread to be
> done asynchronously.

I'd probably suggest an existing message queue system for this. There
are several; Celery seems to be popular.
>
>
> Also, if contributors read this I wanted to say thanks. Pyramid is a
> great framework.

Thank you!

- C

>
>
> Isaac
>
> --
> You received this message because you are subscribed to the Google
> Groups "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/pylons-discuss.
> For more options, visit https://groups.google.com/groups/opt_out.
> --
> You received this message because you are subscribed to the Google
> Groups "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/pylons-discuss.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>


Jonathan Vanasco

unread,
Jul 8, 2013, 11:42:46 AM7/8/13
to pylons-...@googlegroups.com
On a related note, there are also forking concerns when you use a WSGI container like uwsgi.

I never dug deep enough into Pyramid's or uwsgi's internals to see learn mroe about forking , but you may need a snippet like this if you use pycrypto with pyramid

try:
    import uwsgi
    from Crypto.Random import atfork
    def post_fork_hook():
        atfork()
    uwsgi.post_fork_hook = post_fork_hook
except:
    pass

Mike Orr

unread,
Jul 8, 2013, 2:12:51 PM7/8/13
to pylons-...@googlegroups.com
Is that bare "except:" intentional? It's usually best to catch as few
exceptions as possible, or at most to do "except Exception:" which
lets a few system-exiting exceptions and non-errors through.
> --
> You received this message because you are subscribed to the Google Groups
> "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/pylons-discuss.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Mike Orr <slugg...@gmail.com>

Jonathan Vanasco

unread,
Jul 8, 2013, 2:56:15 PM7/8/13
to pylons-...@googlegroups.com

not sure.  I literally copy/pasted that from the uwsgi site way back.

looking at the code, it would probably be better to do something like:

if RUN_UNDER_UWSGI :
    import uwsgi
    from Crypto.Random import atfork
    def post_fork_hook():
        atfork()
    uwsgi.post_fork_hook = post_fork_hook

and then force errors.  i'm guessing whomever first wrote it dropped things into the except to deal with situations where youre not running under uwsgi ( ie, dev )

icook

unread,
Jul 8, 2013, 3:00:03 PM7/8/13
to pylons-...@googlegroups.com
On Sunday, July 7, 2013 11:39:33 PM UTC-7, Chris McDonough wrote:
On Sun, 2013-07-07 at 23:33 -0700, icook wrote:
> Hello all,
>
>
> I've been working on a site I'm building with Pyramid for a while now,
> and I'm hitting a bit of an understanding stumbling block. I've read
> the docs that mention thread locals and how they interact with
> requests, but what I'm still fuzzy on is how and when does Pyramid
> spawn new threads? Is there a new thread for each request? Is there a
> single master thread that dispatches view callables to run in a new
> thread? This is how I imagine it working, but short of reading through
> the code thoroughly I'm at a bit of a loss. While it's probably not
> critical to know this to reach my end goal, I like to know how stuff
> ticks.


Pyramid itself doesn't handle any of that.  All of the threading is done
by the WSGI server, so it depends which WSGI server you're using.  In
general, though, most WSGI web servers are either multiprocess or
multithreaded.  The multithreaded ones generally either spawn a new
thread for each request or keep a thread pool around where threads are
reused to perform work.
 
Ah, that makes a lot of sense, thanks. Google is a great tool, but it doesn't prevent you from looking in the wrong place for what you think you need. For those searching this is very helpful:
 
>
> Oh, and a bit of info on my end goal for those interested. In a
> nutshell I'm trying to setup some way for my views to send tasks that
> are too time consuming (big db updates, etc) to a worker thread to be
> done asynchronously.

I'd probably suggest an existing message queue system for this.  There
are several; Celery seems to be popular.

So, after a bit more reading I've developed these rough conclusions:

1. pserve, being based on BaseHTTPServer, is a single process, single thread server by default.
2. In a system like Apache's worker MPM, memory is by default shared among threads. Is this why Pyramid uses thread locals, to prevent changes to the global registry from within a request?
3. Something like a Celery app instance will be shared by all request threads, and if Apache is running multiple processes then each will have their own instance of Celery, or whatever global var.

Are these correct, or close to correct?

And lastly another question. If I plop my own interface into the registry at initialization time (with the configurator), all request threads will be accessing a single instance of this interface? However their actual instance of the registry is a thread local duplicate?

Thanks for all the replies.

Chris McDonough

unread,
Jul 8, 2013, 3:49:40 PM7/8/13
to pylons-...@googlegroups.com
"pserve" is not a server. It's a command to launch a server. By
default, if you've used the Pyramid scaffolding, it launches a server
named Waitress. Waitress is a multithreaded server.

> 2. In a system like Apache's worker MPM, memory is by default shared
> among threads. Is this why Pyramid uses thread locals, to prevent
> changes to the global registry from within a request?

Using thread locals doesn't prevent the mutation of shared state.
Pyramid creates thread locals for developer convenience, so a user can
obtain the request or registry in parts of code where he doesn't have
access to the request. It also allows the user to get "the right"
request object (based on the current thread).

> 3. Something like a Celery app instance will be shared by all request
> threads, and if Apache is running multiple processes then each will
> have their own instance of Celery, or whatever global var.

More like Celery will be running in a separate process, and the Pyramid
processes will communicate with it through some persistence system like
redis or whatever.

> Are these correct, or close to correct?
>
Negative ;-)
>
> And lastly another question. If I plop my own interface into the
> registry at initialization time (with the configurator), all request
> threads will be accessing a single instance of this interface? However
> their actual instance of the registry is a thread local duplicate?

Making something thread local does not copy it. It just makes it
accessible in places that don't have access to a reference.

- C

Michael Merickel

unread,
Jul 8, 2013, 3:49:47 PM7/8/13
to Pylons
On Mon, Jul 8, 2013 at 2:00 PM, icook <is...@simpload.com> wrote:
So, after a bit more reading I've developed these rough conclusions:

1. pserve, being based on BaseHTTPServer, is a single process, single thread server by default.

No, pserve is a "server runner", meaning it is a CLI that can run tons of different conforming WSGI servers depending on what [server] section you have defined in your ini file. If you "use = egg:waitress" then pserver will run waitress, a thread-based server. If you "use = egg:pyramid#wsgiref" then you're using wsgiref.simple_server which is a single-threaded server but there are different WSGI servers like gunicorn (and maybe uwsgi?) and cherrypy, etc that also support pserve's ini syntax.
 
2. In a system like Apache's worker MPM, memory is by default shared among threads. Is this why Pyramid uses thread locals, to prevent changes to the global registry from within a request?

Pyramid does not prevent changes to the global registry or make any assumptions about anything global except that request-handling will be done in some request-local context, and most WSGI apps tend to define this as a thread. Gevent will monkey patch this to change it to a greenlet-local instance instead.
 
3. Something like a Celery app instance will be shared by all request threads, and if Apache is running multiple processes then each will have their own instance of Celery, or whatever global var.

Celery is a separate process for handling the asynchronous tasks and should have nothing to do with apache.
 
Are these correct, or close to correct?

And lastly another question. If I plop my own interface into the registry at initialization time (with the configurator), all request threads will be accessing a single instance of this interface? However their actual instance of the registry is a thread local duplicate?

No, there is no duplicate of the registry, it is a shared instance and should be treated as read-only.

icook

unread,
Jul 8, 2013, 4:14:15 PM7/8/13
to pylons-...@googlegroups.com
Thanks guys, this helps a lot, glad I'm getting my misunderstandings corrected.

For my last question about Celery it appears that I wasn't terribly clear. It's my understanding at this point that celery can be basically split into two realms, one adding tasks, and one removing. Certainly the side that takes tasks and processes them will happen in a separate worker process. Say for a second celery is using RabbitMq. When I ask Celery to add a task it must have a connection of some kind to RabbitMQ, right? I'm making the assumption that this connection is managed by some object in Celery, but is this object (and consequently connection) going to be created for every request, or will it persist as part of the process?

Whit Morriss

unread,
Jul 8, 2013, 4:45:46 PM7/8/13
to <pylons-discuss@googlegroups.com>


On Jul 8, 2013, at 3:14 PM, icook <is...@simpload.com>
wrote:

> Thanks guys, this helps a lot, glad I'm getting my misunderstandings corrected.
>
> For my last question about Celery it appears that I wasn't terribly clear. It's my understanding at this point that celery can be basically split into two realms, one adding tasks, and one removing. Certainly the side that takes tasks and processes them will happen in a separate worker process. Say for a second celery is using RabbitMq. When I ask Celery to add a task it must have a connection of some kind to RabbitMQ, right? I'm making the assumption that this connection is managed by some object in Celery, but is this object (and consequently connection) going to be created for every request, or will it persist as part of the process?
>

I can't speak to exactly what the default in celery would do, but you should be able to manage this for your use case (by say adding the client object to the registry on start up provided the client object is thread safe).

-w


d. "whit" morriss
Platform Codemonkey
wh...@surveymonkey.com


Jonathan Vanasco

unread,
Jul 10, 2013, 12:53:25 PM7/10/13
to pylons-...@googlegroups.com

Celery is typically integrated with a few moving parts.  If you're using RabbitMQ, it probably works like this:

1. The Celery daemon is a "worker" and longstanding process.  It pulls things out of a queue and does work.  Sometimes it will save a result.  Sometimes it will create new work.  By default it uses a connection pool to the queue.
2. Your Python app typically adds tasks to Celery via messaging through a queue.  

I've never used Celery under Pyramid, so I'm not sure how one would typically set it up.  There might be a connection pool that re-uses a handful of connections, or there might be a new connection on every request, or that new connection on every request might be a 'lazy' object that doesn't really do anything unless you trigger it.  That would all depend on how you integrate Celery.

In any event, your Python app isn't actually connected to the Celery daemon -- your Python app sends the daemon a request through RabbitMQ , and each side manages their connections to RabbitMQ.   So it looks like this:

    Pyramid <<>> RabbitMq <<>> Celery

And as far as Celery is concerned, it doesn't know anything about "requests" in Pyramid.  

icook

unread,
Jul 10, 2013, 8:25:51 PM7/10/13
to pylons-...@googlegroups.com
Thanks for these replies, I think I've got it straightened out. I realized my confusion had more to do with how wsgi process management effected memory sharing across requests than it did with Celery.
Reply all
Reply to author
Forward
0 new messages