Problems running Tornado behind nginx - Connection leak

662 views
Skip to first unread message

Jan v. E.

unread,
Aug 10, 2015, 11:47:37 AM8/10/15
to Tornado Web Server
Hi everyone!

I have 2 Tornado instances (started by supervisord) running behind nginx as a load balancer. While testing a coroutine handler of my application with apache bench I noticed a problem where one of the Tornado instances get stuck and stops responding. Sometimes this happens, sometimes it doesn't. This issue is grinding my teeth (also it's my first Tornado project).

I inspected the connections with lsof. While apache bench is running but not continuing I see the established connections from apache bench to nginx and from there an established connection to the stuck Tornado instance.
When apache bench finishes with a timeout, all connections from apache bench to nginx are dropped but the Tornado instance still holds all connections in a close_wait state and I can't connect anymore to this specific instance. First I suspected a Future wouldn't finish but the other running instance is still reachable without any connections in the close_wait state, so absolutely no errors. That's why I think it's more likely a configuration error.

I already set the number of maximum open files for the executing user to 4096. As a base for my nginx configuration I use the example configuration from the Tornado documentation. Here it is: http://pastebin.com/fkx8FqJf
I also call self.request.connection.close() in my on_finish() and on_connection_close() methods. Is that neccessary and correct?

Thanks for any help and hints!
Jan

aliane abdelouahab

unread,
Aug 10, 2015, 5:31:17 PM8/10/15
to Tornado Web Server
how nginx works, is that it takes the request from the user, and stops it, and then send it to tornado, and then after finished, it send it back to the user, while the user doesent notice that
and since nginx doesent release on of the tornado process, that means that you should change the load balancing algorithm and not use the default one, here is the link:

Kevin LaTona

unread,
Aug 10, 2015, 5:56:28 PM8/10/15
to python-...@googlegroups.com

On Aug 10, 2015, at 8:47 AM, Jan v. E. <j...@figo.me> wrote:

I already set the number of maximum open files for the executing user to 4096. 


Just a hunch here but I would look at this setting  worker_connections 4096; to see if it's creating issues.

For just 2 Tornado instances 4096 just feels higher to me than it needs to be….. but not know what the code is really doing that is more gut level hunch than a solid answer to why you should change it.

2 worker_process and 4096 worker_connections seems out balance to me.

Maybe tweak you 4096 down to 256 or 512 or 1024 and see if that shifts anything.

And Or…...Google nginx worker_connections and see what others talk about around that setting to see if it sheds any light on your problem.


The other thing are you 100% sure it's not your code that is choking some how?

Briefly looking at the core Nginx config settings it seems like they are correct, but again I am no Nginx config expert.


If I was working through the problem on my workbench, I would dump as much of the Nginx configs down to an absolute mininum server setup to see how it reacts.

And then starting adding in other settings and or increasing the size numbers to see if one those are chocking things on you some how.


-Kevin

Ben Darnell

unread,
Aug 10, 2015, 11:10:00 PM8/10/15
to Tornado Mailing List
On Mon, Aug 10, 2015 at 11:47 AM, Jan v. E. <j...@figo.me> wrote:
I inspected the connections with lsof. While apache bench is running but not continuing I see the established connections from apache bench to nginx and from there an established connection to the stuck Tornado instance.
When apache bench finishes with a timeout, all connections from apache bench to nginx are dropped but the Tornado instance still holds all connections in a close_wait state and I can't connect anymore to this specific instance. First I suspected a Future wouldn't finish but the other running instance is still reachable without any connections in the close_wait state, so absolutely no errors. That's why I think it's more likely a configuration error.

CLOSE_WAIT means that the "remote" side of the socket (nginx) has closed, but the local side (tornado) has not. Tornado will normally notice all connections in this state and close them on the next iteration of the IOLoop, so my first guess would be that you've got something preventing the IOLoop from iterating - a deadlock or an infinite loop. If that's the case, then calling IOLoop.current().set_blocking_log_threshold(1.0) when your program starts should show you what's going on. 
 

I already set the number of maximum open files for the executing user to 4096. As a base for my nginx configuration I use the example configuration from the Tornado documentation. Here it is: http://pastebin.com/fkx8FqJf

Looks fine to me.
 
I also call self.request.connection.close() in my on_finish() and on_connection_close() methods. Is that neccessary and correct?

It's incorrect (you should let Tornado handle the closing of the connection), but it would not cause this particular problem.

-Ben
 

Thanks for any help and hints!
Jan

--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornad...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevin LaTona

unread,
Aug 11, 2015, 12:18:05 AM8/11/15
to python-...@googlegroups.com

On Aug 10, 2015, at 2:56 PM, Kevin LaTona <li...@studiosola.com> wrote:

2 worker_process and 4096 worker_connections .


Ben 

Do know of any kind of suggested guidelines one can think about as to how many worker_connections to create per process as how that inter-relates to Tornado in these kind of setups?

Or does this really not matter much and Nginx just deals with it on it's side and it has no impact on the Tornado side of things?


-Kevin

Jan v. E.

unread,
Aug 11, 2015, 6:47:35 AM8/11/15
to Tornado Web Server
Thanks for your input. I didn't had the error for a while, but now it's back again. I wrong by pointing my fingers in the direction of nginx for the cause of this issue, because the dead close_wait connections are also happening when querying Tornado directly.
IOLoop.current().set_blocking_log_threshold(1.0) didn't produce any new output, even when the error occurs. Do I have to call it before starting my IOLoop or afterwards? Would that also catch a non finishing Future yield?

Jan v. E.

unread,
Aug 11, 2015, 7:31:09 AM8/11/15
to Tornado Web Server
I missed to tell another important detail, maybe the most important one: I'm running my tornado instances with pypy. After changing the interpreter to cpython everything works. I will submit a bug report to pypy when I've found a good way to replicate this issue.

Best Regards and Thanks!
Jan

Ben Darnell

unread,
Aug 11, 2015, 9:32:24 AM8/11/15
to Tornado Mailing List
On Tue, Aug 11, 2015 at 6:47 AM, Jan v. E. <j...@figo.me> wrote:
IOLoop.current().set_blocking_log_threshold(1.0) didn't produce any new output, even when the error occurs. Do I have to call it before starting my IOLoop or afterwards?

Either way works.
 
Would that also catch a non finishing Future yield?

No, it only catches things that completely block the IOLoop. But a Future that never finishes should not be able to lead to CLOSE_WAIT connections.

Ben Darnell

unread,
Aug 11, 2015, 9:40:01 AM8/11/15
to Tornado Mailing List
This mostly just affects the nginx side. The nginx worker_process setting is unrelated to the number of tornado backends (1 or 2 should be plenty for proxying; you may need more if you're having nginx do more work). worker_connections includes both client and backend connections; you may want a lot more worker_connections than you do tornado connections. Increasing this is cheap but you may need to raise the open-file rlimit to match.

-Ben
 


-Kevin

Kevin LaTona

unread,
Aug 11, 2015, 11:05:05 AM8/11/15
to python-...@googlegroups.com


Ben, thanks for thoughts.

I didn't realize how cheap these connections are to maintain. I went out to do more research on it and one person was suggesting that 10,000 inactive keep alive connections only takes about 2.5MB of ram, so yeah pretty cheap and no reason not to rasie it if and when one needs to.

I also came along using the 'ulimit -a'  to see what the current limit settings are at before tweaking them. Surprisingly a stock Ubuntu 15.04 server is set at 1024 open files and for fun I looked at my Mac dev machine and it's limited to 256 open files.


I came along this blog post which provides a rather in-depth look at tuning a Nginx server for larger traffic loads



Finally I came upon a web server a few weeks back that looks promising for those times when some one is looking for max speed and needs to eeck out huge connections per second.

The server is written in X86-64 assembly and for what it does, I doubt anyone will find a web server to outpace it any time soon.


In terms of Tornado users, right now this server could make sense for static content delivery needs. Or if one wanted to connect them up together  it's looking like that would have happen via a Unix socket.

-Kevin
Reply all
Reply to author
Forward
0 new messages