HTTP client timeout

715 views
Skip to first unread message

envee

unread,
May 4, 2023, 6:33:12 PM5/4/23
to golang-nuts
Hi All,
I have an application which sends HTTP requests at a rate of say 6000 per second. In addition, my application also has about 100K goroutines and these 6000 requests are basically issued by a subset of these goroutines.

When I increase the rate to more than 6000, I get the error :
"context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

From a PCAP trace, I have seen that the server does respond within the timeout (which I've set to 7s), but the application is not able to probably finish reading the response.

From the code of the HTTP client, I see that there is a timer/ticker which is run in the background for every request.

My understanding is that the goroutine which issued the HTTP request was not able to get scheduled on a Processor/Thread and hence timed out. 

Possibly due to multiple goroutines becoming Runnable, but not being scheduled ?

Is my understanding of why I see timeouts correct ?

If a goroutine starts a timer/ticker, then if it gets taken off a P as it is waiting for a Network I/O response, will the timer also pause when it is removed from a P by the scheduler ?

envee

unread,
May 5, 2023, 8:45:07 AM5/5/23
to golang-nuts
An update. 
The VM on which I am running has 40 vCPUs.
I changed my GOMAXPROCS to a lower value of 8.
And also changed my client to use 90 HTTP/2 connections towards the HTTP/2 server.
With this setup, I can now achieve about 9000 Txns per second.
However, if I go any higher than that, my client starts throwing the "context deadline exceeded" errors. I am sure that the responses are received by the client well within the timeout of 7 seconds.
When I observe the CPU utilization of my client process, it only uses about 300% vCPU i.e. 3 out of 40 vCPUs.
Still, I don't understand why I cannot achieve higher throughput than 9000 per second.
Here is a sample output from the atop utility. As seen below, only 257% shows as CPU utilization.

Is this because goroutines are not getting context switched onto a logical processor or thread. I thought if there is so much CPU available, then context switching should be very fast and not an issue at all.
Thus leading to timeouts ?

ATOP - myserver                                      2023/05/05  22:16:19                                      y--------P----                                      10s elapsed
PRC | sys    6.63s | user  21.56s |              | #proc    543 | #trun      4 | #tslpi   639 |               | #tslpu     0 | #zombie    0 | clones    11 |              | no  procacct |
CPU | sys      61% | user    213% | irq       4% | idle   3724% | wait      0% | steal     0% |  guest     0% |              | ipc notavail | cycl unknown | curf 3.00GHz | curscal   ?% |
CPL | avg1    2.28 | avg5    2.40 |              | avg15   2.45 |              |              |  csw   656705 | intr  515996 |              |              | numcpu    40 |              |
MEM | tot   503.7G | free  468.6G | cache  26.9G | dirty   0.1M | buff    1.0G | slab    1.4G |  slrec   1.2G | shmem   4.1G | shrss 117.0M | vmbal   0.0M | hptot   0.0M | hpuse   0.0M |
SWP | tot     4.0G | free    4.0G |              |              |              |              |               |              |              |              | vmcom   6.0G | vmlim 255.8G |
LVM | g_vd0-lv_var | busy      0% | read       0 |              | write    158 | KiB/r      0 |  KiB/w      4 | MBr/s    0.0 | MBw/s    0.1 |              | avq     5.00 | avio 0.01 ms |
DSK |          sda | busy      0% | read       0 |              | write    122 | KiB/r      0 |  KiB/w      5 | MBr/s    0.0 | MBw/s    0.1 |              | avq     2.00 | avio 0.01 ms |
NET | transport    | tcpi  181070 | tcpo  289884 | udpi      18 | udpo      18 | tcpao      0 |  tcppo     26 | tcprs    556 | tcpie      0 | tcpor      1 | udpnp      0 | udpie      0 |
NET | network      | ipi   181092 | ipo   201839 |              | ipfrw      0 | deliv 181091 |               |              |              |              | icmpi      0 | icmpo      0 |
NET | bond1     0% | pcki  181057 | pcko  290688 | sp   20 Gbps | si   16 Mbps | so  190 Mbps |  coll       0 | mlti       0 | erri       0 | erro       0 | drpi       0 | drpo       0 |
NET | net2      0% | pcki  181056 | pcko  290688 | sp   20 Gbps | si   16 Mbps | so  190 Mbps |  coll       0 | mlti       0 | erri       0 | erro       0 | drpi       0 | drpo       0 |

  PID         TID       SYSCPU        USRCPU        VGROW        RGROW        RUID           EUID           ST        EXC        THR       S        CPUNR        CPU       CMD         1/1
40753           -        4.08s        21.52s           0K       -11.7M        mtx            mtx            --          -         15       S           20       257%       5gclient
40753       40753        0.00s         0.00s           0K       -11.7M        mtx            mtx            --          -          1       S           20         0%       5gclient          
40753       40754        0.00s         0.00s           0K       -11.7M        mtx            mtx            --          -          1       S            5         0%       5gclient          
40753       40755        0.00s         0.00s           0K       -11.7M        mtx            mtx            --          -          1       S            3         0%       5gclient          
40753       40756        0.00s         0.00s           0K       -11.7M        mtx            mtx            --          -          1       S           21         0%       5gclient          
40753       40757        0.45s         2.35s           0K       -11.7M        mtx            mtx            --          -          1       S            6        28%       5gclient          
40753       40758        0.44s         2.31s           0K       -11.7M        mtx            mtx            --          -          1       S           20        28%       5gclient          
40753       40759        0.44s         2.52s           0K       -11.7M        mtx            mtx            --          -          1       S            0        30%       5gclient          
40753       40760        0.36s         1.80s           0K       -11.7M        mtx            mtx            --          -          1       S           28        22%       5gclient          
40753       40761        0.41s         2.04s           0K       -11.7M        mtx            mtx            --          -          1       S            9        25%       5gclient          
40753       40762        0.48s         2.46s           0K       -11.7M        mtx            mtx            --          -          1       R            1        30%       5gclient          
40753       40763        0.00s         0.00s           0K       -11.7M        mtx            mtx            --          -          1       S           23         0%       5gclient          
40753       40764        0.45s         2.48s           0K       -11.7M        mtx            mtx            --          -          1       S            2        29%       5gclient          
40753       40765        0.21s         1.17s           0K       -11.7M        mtx            mtx            --          -          1       S            1        14%       5gclient          
40753       40766        0.45s         2.46s           0K       -11.7M        mtx            mtx            --          -          1       S            5        29%       5gclient          
40753       41140        0.39s         1.90s           0K       -11.7M        mtx            mtx            --          -          1       R           29        23%       5gclient          

Robert Engels

unread,
May 5, 2023, 8:56:58 AM5/5/23
to envee, golang-nuts
Have you tried using curl as the client in order to narrow down the problem?

On May 5, 2023, at 7:45 AM, envee <neeraj....@gmail.com> wrote:

An update. 
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/593823f5-1be6-4d60-9341-6fbc3da82c55n%40googlegroups.com.

Jan Mercl

unread,
May 5, 2023, 9:01:22 AM5/5/23
to envee, golang-nuts
On Fri, May 5, 2023 at 2:45 PM envee <neeraj....@gmail.com> wrote:

> Still, I don't understand why I cannot achieve higher throughput than 9000 per second.

What is the max bitrate of the network interface used in the
measurement and what is the size of the payload that has to get
through for every connection?

envee

unread,
May 5, 2023, 10:21:12 AM5/5/23
to golang-nuts
Thanks Jan.
My interface speed is 20 Gb/s
The payload size is about 2.8Kb. 
So at 9000 Txn/s this works out to 9000 x 3 ≅27MB/s ≅ 200 Mb/s (This matches the so (socket out) value shown in my atop command output)
I guess the interface speed is sufficient ?

envee

unread,
May 5, 2023, 10:23:38 AM5/5/23
to golang-nuts
Hi Robert, Yes I have tried HTTP request via curl. That yields a response in about 200ms.

Robert Engels

unread,
May 5, 2023, 10:35:04 AM5/5/23
to envee, golang-nuts
I mean you can use curl to simulate the 1000’s of clients (depending on protocol dependencies)

On May 5, 2023, at 9:23 AM, envee <neeraj....@gmail.com> wrote:

Hi Robert, Yes I have tried HTTP request via curl. That yields a response in about 200ms.

Jan Mercl

unread,
May 5, 2023, 10:49:57 AM5/5/23
to envee, golang-nuts
On Fri, May 5, 2023 at 4:21 PM envee <neeraj....@gmail.com> wrote:
>
> Thanks Jan.
> My interface speed is 20 Gb/s
> The payload size is about 2.8Kb.
> So at 9000 Txn/s this works out to 9000 x 3 ≅27MB/s ≅ 200 Mb/s (This matches the so (socket out) value shown in my atop command output)
> I guess the interface speed is sufficient ?

Yes it is.

Next question: what's the speed of the slowest segment en route? ;-)
Reply all
Reply to author
Forward
0 new messages