Request taking too long to be responded

Skip to first unread message

Alejandro Gonzalez

Jul 15, 2021, 4:32:40 PMJul 15
to openresty-en
I have an openresty used mainly as a reverse proxy for a couple of upstream services. This openresty has a simple endpoint used for health checks:

    location /ping { return 200 '{"ping":"successful"}'; }

The problem we're having is that this ping takes too long to be responded:

    $ cat /proc/loadavg; date ; httpstat localhost/ping?foo=bar
    2.93 1.98 1.94 8/433 16725
    Thu Jul 15 15:25:08 UTC 2021
    Connected to from

    HTTP/1.1 200 OK
    Date: Thu, 15 Jul 2021 15:26:24 GMT
    X-Request-ID: b8d276b0b3828113cfee3bf2daa01293

      DNS Lookup   TCP Connection   Server Processing   Content Transfer
    [     4ms    |       0ms      |      76032ms      |        0ms       ]
                 |                |                   |                  |
        namelookup:4ms            |                   |                  |
                            connect:4ms               |                  |
                                          starttransfer:76036ms          |

That ^ is telling me that the average load is low at the time of the request (2.93 the 1m average load for an 8-core server is ok)
Curl/httpstat initiated the request at 15:25:08 and response was obtained 15:26:24.
Connection was stablished fast, request sent, then it took 76s for the server to respond.

If I look at the access log for this ping I see "req_time":"0.000" (this is the $request_time variable).

  {"t":"2021-07-15T15:26:24+00:00","id":"b8d276b0b3828113cfee3bf2daa01293","cid":"18581172","pid":"13631","host":"localhost","req":"GET /ping?foo=bar HTTP/1.1","scheme":"","status":"200","req_time":"0.000","body_sent":"21","bytes_sent":"373","content_length":"","request_length":"85","stats":"","upstream":{"status":"","sent":"","received":"","addr":"","conn_time":"","resp_time":""},"client":{"id":"#","agent":"curl/7.58.0","addr":","},"limit_status":{"conn":"","req":""}}                                                                 

This is the access log format in case anybody wonders what are the rest of the values:

  log_format main escape=json '{"t":"$time_iso8601","id":"$ring_request_id","cid":"$connection","pid":"$pid","host":"$http_host","req":"$request","scheme":"$http_x_forwarded_proto","status":"$status","req_time":"$request_time","body_sent":"$body_bytes_sent","bytes_sent":"$bytes_sent","content_length":"$content_length","request_length":"$request_length","stats":"$location_tag","upstream":{"status":"$upstream_status","sent":"$upstream_bytes_sent","received":"$upstream_bytes_received","addr":"$upstream_addr","conn_time":"$upstream_connect_time","resp_time":"$upstream_response_time"},"client":{"id":"$http_x_auth_appid$http_x_ringdevicetype#$remote_user$http_x_auth_userid","agent":"$http_user_agent","addr":"$http_x_forwarded_for,$remote_addr:$remote_port"},"limit_status":{"conn":"$limit_conn_status","req":"$limit_req_status"}}';

My question is: where could nginx have spent these 76s if the request just took 0s to be processed and responded?

Something special to mention is that the server is timing out a lof of connections with the upstreams at that moment as well: we see a lot of "upstream timed out (110: Connection timed out) while reading response header from upstream" and "upstream server temporarily disabled while reading response header from upstream". 
So, these two are related, what I can't see is why upstream timeouts would lead to a /ping taking 76s to be attended and responded when both cpu and load are low/acceptable.

Any idea?


Reply all
Reply to author
0 new messages