Re: About the meaning of "Nginx sub-request is synchronous but non-blocking"

Yichun Zhang (agentzh)

unread,

Aug 28, 2015, 8:25:34 PM8/28/15

to Zhe Zhang, openresty-en

Hello!

On Sat, Aug 29, 2015 at 3:31 AM, Zhe Zhang wrote:
> I have been studying openresty these days(in order to develop a
> highly-scalable image-download service for my company). While I’m really
> impressed by the power openresty adds to Nginx,

Thank you for trying out OpenResty and glad you like it :)

> I got some confusion in
> understanding the true meaning of “Nginx sub-request is synchronous but
> non-blocking”.
>

Synchronous is for the coding paradigm, you write code synchronously
instead of asynchronously (i.e., you will not have to mess up with a
lot of callbacks, for example).

Nonblocking is for the nature of I/O. That is, your I/O operations do
not block *any* OS threads so that a single OS thread can handle a lot
of concurrent connections at the same time.

> I’m listing my questions below. Can you help take a look? Thanks a lot in
> advance!
>

You're recommended to post such questions on the openresty-en mailing
list in the future. Please see https://openresty.org/#Community for
more details. And I'm cc'ing the list in the hope of helping other
users with similar questions.

> I’m using the following code block as an example:
[...]
> location / {
> content_by_lua '
> res = ngx.location.capture("/fast_route")
> if res.status >= 400 then
> res = ngx.location.capture("/slow_route")
[...]
>
> This example is basically letting Nginx download from a fast route first(by
> issuing a sub-request), and if it fails, try from a slow route(by issuing
> another sub-request).
>

One caveat with ngx.location.capture is that it always fully buffers
the response of the subrequest and load it into the Lua space. So it's
not ideal for large responses. Maybe use of the standard try_files
directive of nginx is better for this very use case? Please see

http://nginx.org/en/docs/http/ngx_http_core_module.html#try_files

> - Question 1
> So, by "synchronous but non-blocking", does it mean that: once the Nginx
> single-thread worker sends out the request to www.fastroute.com, then,
> rather than waiting for the response from www.fastroute.com and rather than
> proceed to send sub-request to /slowroute, the worker process will
> immediately leave the current stack to handle the next event?
>

The worker process never leaves the current C stack. Instead it yields
the current running Lua coroutine and gives control back to the nginx
event loop in case of an IO operation that cannot complete
immediately, for example. You can say that it immediately leaves the
current *Lua* stack (which resides on the heap rather than the C
stack).

> - Question 2
>
> Where is the continuation(i.e. stack) of the current request saved before
> the single thread moves to the next task?
>

The current request's state is on the heap rather on the C stack. So
there's really nothing to save explicitly. Basically:

1. On the Lua side, we simply yield the current Lua coroutine and the
Lua VM just "freezes" the execution state of the current Lua coroutine
naturally.
2. On the Nginx side, request state is just on heap, anchored by those
event handlers' data structures registered in epoll/kqueue/etc.

There's nothing to save on the C stack or CPU registers.

>
> When Nginx worker process comes back to execute the continuation after the
> response from fastroute is available, the worker process has to grab the
> continuation object from somewhere.
>

The suspended Lua coroutine objects contain the "continuation"
objects. Basically we just need to anchor the Lua coroutines into the
Lua VM's registry (which is a "GC root") so that the Lua GC will not
collect these coroutine objects prematurely.

>
> -Question 3
>
> How does Nginx help reduce context-switch?
>

Nginx worker processes are single threaded and usually bind to
individual (logical) CPU cores (usually via the worker_cpu_affinity
directive). So there is no need to do context switching among the
worker processes at all. Context switching only happens when multiple
OS threads (or OS processes) compete for the same CPU core.

>
> To me, a worker process leaving the current stack to handle the next
> event(which is another stack) is essentially context-switching: a single
> thread switching through a list of stacks from the event queue.

Apparently we have different definitions of context switching. Context
switching is usually defined as those performed by the operating
system to implement transparent multitasking (transparent to the
userland). It is the OS level context switching being expensive (the
OS has to save and restore CPU register values and virtual memory page
tables and etc).

> Based on
> this, it seems each Nginx worker process is doing context-switching ALL THE
> TIME(if there're lots of requests going on), which makes it hard for me to
> understand why Nginx would help reduce context-switching(compared to
> Apache), as quoted below from
> https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/:
>

Nginx is using I/O multiplexing for concurrent request handling
exclusively while Apache usually performs blocking I/O and relies on
multiple OS threads to handle concurrency.

>
> When an NGINX server is active, only the worker processes are busy. Each
> worker process handles multiple connections in a non-blocking fashion,
> reducing the number of context switches.
>

Exactly :)

Best regards,
-agentzh

Yichun Zhang (agentzh)

unread,

Aug 31, 2015, 1:33:00 AM8/31/15

to Zhe Zhang, openresty-en

Hello!

On Mon, Aug 31, 2015 at 1:15 PM, Zhe Zhang wrote:
> The server I’m building will be an origin server for our media-CDN: the
> media-CDN nodes(that back our website) will fetch images from this server.
> The image server is expected to receive around 800,000 download requests per
> day from our CDN.

Thanks is not much :)

> The images range from tens of kilobytes to several
> megabytes in size.
> Upon receiving a download request, the server will first try fetching it
> from our AWS S3 instances in China. If not found, it will try fetching it
> from a more-remote server in US. In either case, if the image is found, then
> my image server will add watermark to it and send it to our media-CDN.
> I’ve also attached the Nginx conf file that summarizes its logic.
>

ngx.location.capture always fully buffers the response data (in the
Lua space) which is not suitable for large responses. There are two
options:

1. You can just use cosockets (or lua-resty-http libraries supporting
streaming) to stream the response data with a constant size buffer.
2. You can use try_files and access_by_lua as well as ngx_proxy to
control the flow.

But the water-mark thing might defeat the streaming processing effort.

> 1. Since I need to get the images watermarked, then it seems that each image
> has to be fully loaded into memory any way, which seems similar in memory
> cost to ngx.location.capture fully bufferning the images. What do you think?

Yes.

> Do you think there’s any way to reduce the memory footprint of
> downloading-and-watermarking large photos? Would try_files directive help
> here?
>

No, try_files has nothing to do with image processing.

> 2. Do you know around how much memory overhead in general does each Nginx
> connection cost? Is there any command that I can use to get the answer? The
> reason I’m asking this is for for estimating memory cost of the image
> server: for instance, assuming every image is 1MB, then the memory cost for
> 1000 concurrent connections would be 1000 * (overhead_per_conn + 1MB).
>

Yeah, that's why we need streaming processing in the first place.

Because 800,000 requests per day is really not much (assuming the
traffic distribute relatively evenly), you might never run into that
many concurrent connections in normal conditions and you can always
use the standard ngx_limit_conn module to limit the concurrency level
automatically for you (and maybe request rate too via the standard
ngx_limit_req module).

> 3. If there’s no good way to limit memory cost for download-and-watermark
> large photos, then maybe I can cache the large watermarked photos in Nginx?
> Would it be easy to code this logic?
>

Yes, it's easy to configure. See above.

Regards,
-agentzh

P.S. It's required to subscribe to the mailing list before sending mails to it.

Yichun Zhang (agentzh)

unread,

Aug 31, 2015, 1:34:12 AM8/31/15

to Zhe Zhang, openresty-en

Hello!

On Mon, Aug 31, 2015 at 1:32 PM, Yichun Zhang (agentzh) wrote:
> On Mon, Aug 31, 2015 at 1:15 PM, Zhe Zhang wrote:
>> The server I’m building will be an origin server for our media-CDN: the
>> media-CDN nodes(that back our website) will fetch images from this server.
>> The image server is expected to receive around 800,000 download requests per
>> day from our CDN.
>
> Thanks is not much :)
>

Sorry, typo here. It should read

That is not much :)

Regards,
-agentzh

Zhe Zhang

unread,

Sep 1, 2015, 1:39:05 PM9/1/15

to Yichun Zhang (agentzh), openresty-en

Thanks!

For cosocket, I learnt about it before I previously replied, and indeed it looks nice. I would switch to use it if I don’t need to watermark the images.

For rate-limiting, not being a follow-up comment for your suggestion of using ngx_limit_req to limit requests to my server, but it seems that ngx_limit_req does not support per-URI request-limiting(i.e. limit requests only based on URI and not based on remote IP). The reason I mentioned URI-level request limiting is that I need to limit per-URI requests to the /us_cdn location so that, for each image that’s not found in /china_s3, only a single request in any second will be allowed to hit /us_cdn, therefore images found in /us_cdn will only be uploaded once to our S3(our /us_cdn service has the functionality of uploading to S3 so that future requests can be handled by /china_s3).

I tried the following directive for per-URI limiting, but it doesn’t seem to work. So, I guess I’ll point /us_cdn to a Jetty service and implement the rate-limiting logic using Java.

limit_req_zone $uri zone=one:3m rate=1r/s;

Zhe

—
Sent from Mailbox

Yichun Zhang (agentzh)

unread,

Sep 1, 2015, 10:55:08 PM9/1/15

to Zhe Zhang, openresty-en

Hello!

On Wed, Sep 2, 2015 at 1:39 AM, Zhe Zhang wrote:
> For rate-limiting, not being a follow-up comment for your suggestion of
> using ngx_limit_req to limit requests to my server, but it seems that
> ngx_limit_req does not support per-URI request-limiting(i.e. limit requests
> only based on URI and not based on remote IP).

Not true. The limit_req_zone directive accepts any nginx variables as the key:

http://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req_zone

The official document just uses $binary_remote_addr as an example.

> I tried the following directive for per-URI limiting, but it doesn’t seem to work.
> So, I guess I’ll point /us_cdn to a Jetty service and implement the rate-limiting logic using Java.
>
> limit_req_zone $uri zone=one:3m rate=1r/s;

You know that $uri excludes any querystrings in the URL? Maybe you
need $request_uri (or better, an nginx variable holding a canonical
form of the URL) here instead.

BTW, we're already off topic here. Please note the topic of this
thread is "About the meaning of 'Nginx sub-request is synchronous but
non-blocking'". Please create separate topic in the openresty-en
mailing list for any further (unrelated) questions. Thank you.

Regards,
-agentzh

Reply all

Reply to author

Forward