Hello!
On Thu, Dec 19, 2013 at 7:18 AM, Ron Schwartz wrote:
> I've have some more information on this problem.
> I'm using apache banch (ab) for load testing in our testing environment and
> every time i call the openresty server with more than 100 concurrent
> connections (ab -n 2000 -c 100
http://our_server) i get "lua tcp socket read
> timed out" on red:eval_sha(....) command.
>
The problem with tools like ab is that they always try to send
requests on all the concurrent connections as fast as possible,
driving the server to its throughput limit (or request rate limit).
When you're at the throughput limit, increasing the concurrency level
will surely sacrifice the request latency. This is a natural
consequence otherwise you'll just break the basic maths.
Try using those tools that can generate a constant request rate that
is not near the throughput limit for experimenting various different
concurrency levels.
> I've checked the redis slowlog and it seems the redis opearation takes
> around 0.01 seconds.
10ms is already a very long latency for redis. Consider that every
redis server is single-threaded and all the requests has to queue up
within the redis server for processing. So the queuing time for the
requests would increase dramatically when the request rate is high.
Try optimizing your redis side Lua code and reduce this processing
time within redis.
> After raising the connection timeout to 10 seconds,
> (red:set_timeout(10000)), it appears that there are no "lua tcp socket read
> time out" errors with 100 concurrent connections, but when i try with 200
> concurrent connections, they are back.
>
Requests could queue in both the individual nginx worker processes and
the redis server process. You can use tools like tcp-recv-queue and
epoll-loop-block-distr on both sides to measure the queuing effects if
you're on Linux:
https://github.com/agentzh/nginx-systemtap-toolkit#tcp-recv-queue
https://github.com/agentzh/stapxx#epoll-loop-blocking-distr
You can also generate the on-CPU and off-CPU flamegraphs for both your
nginx workers and redis servers under load to look for spots that can
be optimized to raise the throughput limit:
https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt
https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt-off-cpu
For high-performance web apps, the throughput limit is everything.
When exceeding the throughput limit, both the concurrency level and
the request latency will drop immediately. It is even more so when you
use tools like ab that always beats the server to its throughput
limit.
Best regards,
-agentzh