Reminder: Threading is hard

681 views
Skip to first unread message

Ben Darnell

unread,
Jan 12, 2011, 3:53:13 PM1/12/11
to Tornado Mailing List
There have been a lot of questions about threading on the mailing list lately so I just wanted to take this opportunity to remind everyone that multithreaded programming is notoriously difficult and should generally be a last resort.  This is doubly true in python, where the GIL limits the benefits of multithreading (i.e. it doesn't help you take advantage of multiple cores).

If you use multiple threads, it is your responsibility to understand the thread-safety semantics of any libraries you use (this can be tricky as the relative rarity of threads in the python community means that thread-safety semantics are often undocumented).  In the absence of specific evidence of thread-safety, assume that it's not safe to access shared objects directly or indirectly from multiple threads.  In tornado, this often means the IOLoop.  Nothing that refers to the IOLoop even indirectly can be used from other threads, except to use IOLoop.add_callback to transfer control back to the main thread.

In general, you should think about IO strategies for tornado apps in this order:

1) Do it synchronously and block the IOLoop.  This is most appropriate for things like memcache and database queries that are under your control and should always be fast.  If it's not fast, make it fast by adding the appropriate indexes to the database, etc.  

2) Use an async library if available (e.g. AsyncHTTPClient).  This comes below the synchronous option even though it would be kind of silly to use a synchronous HTTPClient in tornado, because the best outcome is for everything necessary to render a page to be so fast you don't mind blocking the IOLoop for it.  

3) Move the work out of the tornado process.  If you're sending email, for example, just write it to the database and let another process (whose latency doesn't matter) read from the queue and do the actual sending.

4) Do the work in a separate thread.  Keep your threads' work units small - do a single synchronous operation and then hand the result back to the main thread (remember that you're not getting any CPU parallelism by doing more work on the other threads).  

-Ben

Phil Plante

unread,
Jan 12, 2011, 8:04:19 PM1/12/11
to python-...@googlegroups.com
Thank you for taking the time to post this.  There seems to be a lot of discussion surrounding this topic lately, much of it stemmed from confusion surrounding the blocking/non-blocking design of Tornado.

Wish this could be made a sticky in the forum.

Otávio Souza

unread,
Jan 12, 2011, 8:18:29 PM1/12/11
to python-...@googlegroups.com
Ben, and what about multiprocessing module? By using the fork option.

I have a new project in my desk, and it's a kind of Storage solution, with Image processing, file uploads, move files across servers, multiple servers, and such things.

And I'm thinking how would I do the blocking part of the work.

So, should I stay with threading or multiprocessing (multi-process) module?

Thanks in advance.

2011/1/12 Ben Darnell <b...@bendarnell.com>



--
Otávio Souza
* KinuX Linux Creator <http://kinuxlinux.org>
* Participante Linux-SE
* Criador dos lunatiKos (Grupo de usuários KDE do Nordeste)
* Linux User #415774

David P. Novakovic

unread,
Jan 12, 2011, 8:21:11 PM1/12/11
to python-...@googlegroups.com

Use something like celery. :)

Didip Kerabat

unread,
Jan 12, 2011, 8:08:30 PM1/12/11
to python-...@googlegroups.com
1 suggestion: it can go to github wiki.

- Didip -

Ben Darnell

unread,
Jan 12, 2011, 9:59:48 PM1/12/11
to python-...@googlegroups.com
I've put it up on the wiki.  It's kind of undiscoverable right now, but this will give us something to link to in the future.


-Ben

Ben Darnell

unread,
Jan 12, 2011, 10:08:05 PM1/12/11
to python-...@googlegroups.com
I don't have any firsthand experience with the multiprocessing module.  Processes are tricky too, but in different ways than threads.  In general I would recommend running multiple truly independent processes that can communicate by HTTP or over some sort of message queue instead of using the multiprocessing module to manage child processes as a part of each tornado process.  Among other things this lets you size the two jobs independently, instead of requiring that a number of child processes that is an integer multiple of the number of tornado servers.

-Ben

2011/1/12 Otávio Souza <aragao...@gmail.com>

Peter Bengtsson

unread,
Jan 13, 2011, 6:57:17 AM1/13/11
to python-...@googlegroups.com
Why not use a message queue?

Ben clearly has pointed out that threading is risky and multi processing is no child play either. Message queueing is child play once set up. Much easier to debug and to scale. 

Jeremy Kelley

unread,
Jan 13, 2011, 1:51:00 PM1/13/11
to python-...@googlegroups.com

I second the message queue. I've successfully used beanstalkd
(http://kr.github.com/beanstalkd/) in other projects and it was
braindead simple to use and very fast.

There are others (zeromq, rabbitmq) that have a ton more features if
beanstalkd doesn't do everything you need it to.

-j


--
The Christian ideal has not been tried and found wanting;
it has been found difficult and left untried – G. K. Chesterton

drewpvogel

unread,
Jan 13, 2011, 3:29:29 PM1/13/11
to Tornado Web Server
Using a message queue (like Qpid, rabbitmq, et al) is a great
approach. It falls under "3) Move the work out of the tornado
process".



On Jan 13, 12:51 pm, Jeremy Kelley <jer...@33ad.org> wrote:

paolo.losi

unread,
Jan 13, 2011, 5:58:03 PM1/13/11
to Tornado Web Server
We are successfully using tornado and AMQP in many scenarios.
Up to now we've been using the kludgy tornado-amqp [0] integration
library. But I'm almost ready to announce a pure
tornado (ioloop based) amqp 0.9.1 fully compliant library [1].

The library needs small cleanups but it's 95% ready.
I'd be very glad to receive some feedback before tagging the 0.1
version.

The result are really encouraging in term of performance:
it's almost 4 times faster then node.js + node-amqp in
examples/benchmark.py

Paolo

[0] http://code.google.com/p/tornado-amqp/
[1] https://github.com/paolo-losi/stormed-amqp

Didip Kerabat

unread,
Jan 17, 2011, 12:40:04 PM1/17/11
to python-...@googlegroups.com
About message queue,

I found HotQueue the other day. It's a small queue library built on top of Redis. (http://richardhenry.github.com/hotqueue/tutorial.html)

It's quite stable and the codebase is quite small. I'm happy with it.

- Didip -

Reply all
Reply to author
Forward
0 new messages