Hello Stéphane,
The problem can be reproduced with a standard nginx version, with ssl and http/2 activated, as by default, nginx recycles http2 connection every 1000 requests
I've just modified the standard nginx config ( from ubuntu standard package) with http2 and ssl config:
server {
listen 443 ssl http2 default_server;
include snippets/snakeoil.conf;
root /var/www/html;
server_name localhost;
}
To test, I used the gatling BasicSimulation slightly modified:
class BasicSimulation extends Simulation {
val httpProtocol = http
.baseUrl("https://localhost") // Here is the root for all relative URLs
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") // Here are the common headers
.doNotTrackHeader("1")
.acceptLanguageHeader("en-US,en;q=0.5")
.acceptEncodingHeader("gzip, deflate")
.userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0")
.enableHttp2
.disableCaching // caching disabled to reproduce the problem with static resources
val scn = scenario("Scenario Name") // A scenario is a chain of requests and pauses
.forever(
exec(http("request_1")
.get("/"))
.pause(10 milliseconds)
.exitHereIfFailed
)
setUp(scn.inject(constantConcurrentUsers(1) during (5 minutes)).protocols(httpProtocol))
}
The reproduction of the problem depends on the pause duration, the latency between client and server and probably the load of the different systems:
- with a remote system and a slow network, the problem can occurs sometimes with 100ms pause
- with a local/fast server, I had to reduce the pause to 10 ms
- On my real test, I use pace, so when the system starts to slow, I believe the pause time is reduced a lot and explain why the problem occurs
when the error occurs, the following messages can be seen:
---- Errors --------------------------------------------------------------------
> i.n.h.c.h.Http2Exception$StreamException: Cannot create stream 1 (100,0%)
and then:
> i.g.h.c.i.RequestTimeoutException: Request timeout to localhos 1 (20,00%)
t/127.0.0.1:443 after 60000 ms
What I suspect is a kind of race condition between Gatling starting using a connection (from netty pool as I understand) and netty processing a GOAWAY message just after.
In addition
- I think that this kind of problems may occurs if by bad luck a full gc occurs on Gatling (I know, we should try to avoid at any cost full GC ...)
- The problems disappears if you switch back to http1
Thanks for your help and tell me if you need more info on the problem
Regards,
Emmanuel