Run Tornado servers on all available CPUs with pre-forking

2,009 views
Skip to first unread message

Bret Taylor

unread,
Dec 9, 2009, 3:41:25 AM12/9/09
to python-...@googlegroups.com
I just checked in a (experimental and backwards-compatible) change to enable "pre-forking" Tornado servers, and I would love your feedback:


Previously, every Tornado server ran on a single thread within a single process. If your server had 8 cores, you would have to run 8 separate Tornado processes on 8 different ports to utilize all CPUs and put a reverse proxy like nginx in front to balance requests between all those processes (see http://www.tornadoweb.org/documentation#running-tornado-in-production). With this change, you can easily run a single Tornado server listening to a single port on all available CPUs of a server.

To try out this change, change your call from listen(port) to bind(port) followed by start(), e.g.,

    http_server = tornado.httpserver.HTTPServer(application)
    http_server.bind(options.port)
    http_server.start() # Pre-forks multiple child processes
    tornado.ioloop.IOLoop.instance().start()

start() detects the number of CPUs on the current machine and "pre-forks" that number of child processes so that we have one Tornado process per CPU, all with their own IOLoop. You can also pass in the specific number of child processes you want to run with if you want to override this auto-detection. listen(port) is simply a shortcut for bind(port); start(1).

Background
The "pre-forking" phrase is used to mean a couple of different things, so it is worth clarifying what it means for Tornado. Essentially, we bind a single server socket to a port, and we fork() one process per CPU. Each of those CPUs calls accept() on the shared server socket, and the Linux kernel gives new requests to one of the child processes on a first-come-first-serve basis. Each of the child processes has their own epoll-based IO loop, and a single request is handled entirely within one of the child processes. There is no shared memory or shared state between the forked child processes, only a shared port.

This technique is used by a number of high performance servers, including Unicorn (see http://tomayko.com/writings/unicorn-is-unix).

Apache pre-forking means something entirely different (forking a process for each request), so please don't get confused if you Google "pre-forking" and see people talking about the performance characteristics of Apache.

Bret

Stanislav

unread,
Dec 9, 2009, 4:11:48 AM12/9/09
to Tornado Web Server
Thank you very much =)
This is going to help a lot!

On Dec 9, 12:41 am, Bret Taylor <btay...@gmail.com> wrote:
> I just checked in a (experimental and backwards-compatible) change to enable
> "pre-forking" Tornado servers, and I would love your feedback:
>
> http://github.com/facebook/tornado/commit/6fb90ae694190fcedc48d9fb98b...
>
> Previously, every Tornado server ran on a single thread within a single
> process. If your server had 8 cores, you would have to run 8 separate
> Tornado processes on 8 different ports to utilize all CPUs and put a reverse
> proxy like nginx in front to balance requests between all those processes
> (seehttp://www.tornadoweb.org/documentation#running-tornado-in-production).
> With this change, you can easily run a single Tornado server listening to a
> single port on all available CPUs of a server.
>
> To try out this change, change your call from listen(port) to bind(port)
> followed by start(), e.g.,
>
>     http_server = tornado.httpserver.HTTPServer(application)
>     http_server.bind(options.port)
>     http_server.start() # Pre-forks multiple child processes
>     tornado.ioloop.IOLoop.instance().start()
>
> start() detects the number of CPUs on the current machine and "pre-forks"
> that number of child processes so that we have one Tornado process per
> CPU, all with their own IOLoop. You can also pass in the specific number of
> child processes you want to run with if you want to override
> this auto-detection. listen(port) is simply a shortcut for bind(port);
> start(1).
>
> *Background*
> The "pre-forking" phrase is used to mean a couple of different things, so it
> is worth clarifying what it means for Tornado. Essentially, we bind a single
> server socket to a port, and we fork() one process per CPU. Each of those
> CPUs calls accept() on the shared server socket, and the Linux kernel gives
> new requests to one of the child processes on a first-come-first-serve
> basis. Each of the child processes has their own epoll-based IO loop, and a
> single request is handled entirely within one of the child processes. There
> is no shared memory or shared state between the forked child processes, only
> a shared port.
>
> This technique is used by a number of high performance servers, including
> Unicorn (seehttp://tomayko.com/writings/unicorn-is-unix).

Dave Fowler

unread,
Dec 9, 2009, 4:34:34 AM12/9/09
to Tornado Web Server
Awesome! Thanks!

On Dec 9, 12:41 am, Bret Taylor <btay...@gmail.com> wrote:
> I just checked in a (experimental and backwards-compatible) change to enable
> "pre-forking" Tornado servers, and I would love your feedback:
>
> http://github.com/facebook/tornado/commit/6fb90ae694190fcedc48d9fb98b...
>
> Previously, every Tornado server ran on a single thread within a single
> process. If your server had 8 cores, you would have to run 8 separate
> Tornado processes on 8 different ports to utilize all CPUs and put a reverse
> proxy like nginx in front to balance requests between all those processes
> (seehttp://www.tornadoweb.org/documentation#running-tornado-in-production).
> With this change, you can easily run a single Tornado server listening to a
> single port on all available CPUs of a server.
>
> To try out this change, change your call from listen(port) to bind(port)
> followed by start(), e.g.,
>
>     http_server = tornado.httpserver.HTTPServer(application)
>     http_server.bind(options.port)
>     http_server.start() # Pre-forks multiple child processes
>     tornado.ioloop.IOLoop.instance().start()
>
> start() detects the number of CPUs on the current machine and "pre-forks"
> that number of child processes so that we have one Tornado process per
> CPU, all with their own IOLoop. You can also pass in the specific number of
> child processes you want to run with if you want to override
> this auto-detection. listen(port) is simply a shortcut for bind(port);
> start(1).
>
> *Background*
> The "pre-forking" phrase is used to mean a couple of different things, so it
> is worth clarifying what it means for Tornado. Essentially, we bind a single
> server socket to a port, and we fork() one process per CPU. Each of those
> CPUs calls accept() on the shared server socket, and the Linux kernel gives
> new requests to one of the child processes on a first-come-first-serve
> basis. Each of the child processes has their own epoll-based IO loop, and a
> single request is handled entirely within one of the child processes. There
> is no shared memory or shared state between the forked child processes, only
> a shared port.
>
> This technique is used by a number of high performance servers, including
> Unicorn (seehttp://tomayko.com/writings/unicorn-is-unix).

Stanislav

unread,
Dec 9, 2009, 4:37:13 AM12/9/09
to Tornado Web Server
[E 091209 01:36:31 httpserver:168] Cannot run in multiple processes:
IOLoop instance has already been initialized. You cannot call
IOLoop.instance() before calling start_multi_cpu()
Traceback (most recent call last):
File "application.py", line 124, in <module>
main()
File "application.py", line 120, in main
http_server.start() # Pre-forks multiple child processes
File "/home/stanislav/workspace/guildwork-tornado/src/tornado/
httpserver.py", line 184, in start
ioloop.IOLoop.READ)
File "/home/stanislav/workspace/guildwork-tornado/src/tornado/
ioloop.py", line 126, in add_handler
self._impl.register(fd, events | self.ERROR)
IOError: [Errno 17] File exists

Keep getting this error

Stanislav

unread,
Dec 9, 2009, 4:40:48 AM12/9/09
to Tornado Web Server
Ignore that msg.. I didn't change listen to bind, however, now its
just saying the following

ERROR:root:Cannot run in multiple processes: IOLoop instance has
already been initialized. You cannot call IOLoop.instance() before
calling start_multi_cpu()


Bret Taylor

unread,
Dec 9, 2009, 4:46:56 AM12/9/09
to python-...@googlegroups.com
Basically, you should call IOLoop.instance() before calling start(). The reason is somewhat technically complex, but the short reason is that each chile process needs to have its own IOLoop instance, and if you call it before forking child processes, the loop will be shared and lead to incorrect behaviors.

Are you implicitly or explicitly calling IOLoop.instance() before the call to start().

Bret

Stanislav

unread,
Dec 9, 2009, 4:52:59 AM12/9/09
to Tornado Web Server
Uhh I tried to run it on our server rather then my local Ubuntu VM,
and it works perfectly fine on the server and says

INFO:root:Pre-forking 4 server processes

However locally I continue to get

Cannot run in multiple processes: IOLoop instance has already been
initialized. You cannot call IOLoop.instance() before calling
start_multi_cpu()

However, not like I need it to run in multiple processes on my local
machine anyway. Thank you for the update. =)

Bret Taylor

unread,
Dec 9, 2009, 5:48:37 AM12/9/09
to python-...@googlegroups.com
If you can write a short program to reproduce and let me know what your local dev environment is, I would love to debug the issue.

Bret

Neil

unread,
Dec 9, 2009, 6:18:08 AM12/9/09
to Tornado Web Server
The cause of the error may be that autoreload.py is initializing
IOLoop in advance of your call to http_server.start() on your local
machine. This would be the case if in your settings, you have
debug=True.

Neil

Bret Taylor

unread,
Dec 9, 2009, 6:36:31 AM12/9/09
to python-...@googlegroups.com
Thanks for the analysis - I will take a look at this issue, as we tested in production mode, not debug mode.

Bret

Creotiv

unread,
Dec 15, 2009, 1:18:06 PM12/15/09
to Tornado Web Server
I think that call must be like this

http_server = httpserver.HTTPServer(App)
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()
http_server.start()

Cause if make http_server.start() before starting IOLoop this cause
following error:

Traceback (most recent call last):
File "helloworld.py", line 78, in <module>
main()
File "helloworld.py", line 73, in main
http_server.start(2)
File "/home/creotiv/tornado-app/branches/andrew/lib/tornado/
httpserver.py", line 160, in start
assert not self._started
AssertionError

Creotiv

unread,
Dec 15, 2009, 1:45:02 PM12/15/09
to Tornado Web Server
There is some strange behavior of the http_server.start method. It
always get 1 process as a parametr
and only then it's calling with number of process equal to CPUs number

if for example change start method to this:

num_processes = 2
if(not self._started):
self._started = True
if num_processes is None:
# Use sysconf to detect the number of CPUs (cores)
try:
num_processes = os.sysconf("SC_NPROCESSORS_CONF")
except ValueError:
logging.error("Could not get num processors from sysconf; "
"running with one process")
num_processes = 2
if num_processes > 1 and ioloop.IOLoop.initialized():
logging.error("Cannot run in multiple processes: IOLoop instance
"
"has already been initialized. You cannot call "
"IOLoop.instance() before calling start()")
num_processes = 1
if num_processes > 1:
logging.info("Pre-forking %d server processes", num_processes)
for i in range(num_processes):
print i
if os.fork() == 0:
ioloop.IOLoop.instance().add_handler(
self._socket.fileno(), self._handle_events,
ioloop.IOLoop.READ)
return
os.waitpid(-1, 0)
else:
io_loop = self.io_loop or ioloop.IOLoop.instance()
io_loop.add_handler(self._socket.fileno(), self._handle_events,
ioloop.IOLoop.READ)

It would fork 2 process. But if remove hard encoded number of process
it will always create only singe process.
Reply all
Reply to author
Forward
0 new messages