I have a question about target burst capacity that I hope somebody can help with. It seems to not be working as I would expect, or maybe I have the wrong understanding and need clarification.
I'm using Knative 0.19 on OCP with Kourier. My worker pod is a web server that can only handle 1 request at a time, so my containerConcurrency is set to 1. The type of request is a web socket connection. My minScale is 1 so there is always 1 pod running initially. I have experimented with setting targetBurstCapacity: '-1' so that the activator is always in the request path, which I hoped would solve the buffering problem i'm seeing in the following scenario but so far it does not:
When only 1 pod is running (Pod1) I make 2 web socket requests (reqA and reqB) from command line, each of the requests should connect, wait 30 seconds, then disconnect.
What I expect:
reqA should be sent to Pod1, at which point the containerConcurrency is reached and a second pod begins to scale (Pod2). reqB is held in the activator.
Pod2 is now scaled and ready, so reqB is sent to Pod2. Both web socket connections are successful and complete after 30s and (30s + time to scale Pod2) respectively.
What I actually observe:
reqA is sent to Pod1. Pod2 begins to scale.
Pod2 is ready, but reqB is not run yet, it seems to be buffered for Pod1 instead. Both requests are handled by Pod1 and takes 60 seconds total. Pod2 is not used at all.
From reading the docs it sounds like targetBurstCapacity should solve this problem but it seems not, am I understanding it wrong or is there something else I could try?