Strange WS connection timeouts when ramping users

394 views
Skip to first unread message

Constantin Gerstberger

unread,
Feb 29, 2016, 12:41:18 PM2/29/16
to Gatling User Group
Hi,

i've run into a (to me) strange problem while load-testing the WebSocket endpoint of a SockJs-server:

Whenever i set the scenario to spawn all users at once, everything works fine (there has been only one rare case in which the server did not accept any connection at all).
However, when spawning 10% of the user immediately and ramping up the rest over time, in most (~50%-80%) of the runs, 3-8% of the ws-open requests time-out and subsequent actions for the involved connections fail. 

Some additional observations:
  • Which connections are affected, seems rather random
  • It seems to happen less likely when the ramp duration is <= 5 seconds
  • The server does not report any errors (unless they are buried within the debug log ...)
  • The client reports a timeout of the HTTP-Upgrade request after ~60 seconds
  • Disabling the firewall on my machine did not change anything
  • I regularly see gatling.http.action.ws.CheckTimeout's while running the simulation, though their numbers does not match the one of failed connections (see below)
  • I've run multiple successful load-tests using 500 users for ~3 mins resulting in ~5k requests/s, so i do not think file handles or the like at the server side are causing the issue
  • I've run a mostly equivalent load-test using HTTP request-response instead of WS messages and did not have any problems in about 20 runs
Here's the simulation code (client and server simply ping-pong mostly the same json content):

val numAtOnceUsers = 10
val numRampUsers = 90
val rampUpDuration = 9.seconds

val numberOfMessagesSentPerUser = 10
val pingPongTimeout = 2.seconds
val timeBetweenPings = 0.1.seconds

val data = Data(numberOfTargetIds = 1, messages = Set("foo"))

val httpConf = http
  .wsBaseURL("ws://localhost:9000")

val incrementalUserId = new AtomicInteger

val scn = scenario("WsLoadSimulation")
 
.exec(
    _
.set("clientId", incrementalUserId.getAndIncrement())
 
)
 
.exec(ws("Open WebSocket")
   
.open("/session/websocket?clientId=${clientId}")
 
)
 
.repeat(numberOfMessagesSentPerUser, "i") {
    feed
(data.pingPongFeeder, Feeds.oneExpression)
     
.exec(
       
ws("Send Ping Message")
         
.sendText("${ping}")
         
.check((wsAwait within pingPongTimeout until 1).message is "${pong}")
     
)
     
.pause(timeBetweenPings)
 
}
 
.exec(ws("Close WebSocket")
   
.close
 
)

setUp
(
 
scn.inject(
    atOnceUsers
(numAtOnceUsers),
    rampUsers(numRampUsers) over rampUpDuration
  )
) protocols httpConf

Environment
  • Gatling 2.1.7
  • play2-sockjs 0.5.0-SNAPSHOT
  • Play framework  2.5.0-M2
I cannot change the Play version at this point because i cannot downgrade the play2-sockjs (plugin) version. Given that everything works fine when not ramping up users, i do not think the server is to blame, too.

CheckTimeout example log:

INFO] [02/29/2016 18:32:20.012] [GatlingSystem-akka.actor.default-dispatcher-9] [akka://GatlingSystem/deadLetters] Message [io.gatling.http.action.ws.CheckTimeout] from Actor[akka://GatlingSystem/deadLetters] to Actor[akka://GatlingSystem/deadLetters] was not delivered. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.


Netty timeout log



18:36:21.308 [DEBUG] c.n.h.c.p.n.r.t.TimeoutTimerTask - Request timed out to localhost/127.0.0.1:9000 of 60000 ms for NettyResponseFuture{currentRetry=0,

       isDone=false,

       isCancelled=false,

       asyncHandler=com.ning.http.client.ws.WebSocketUpgradeHandler@1ae11f07,

       nettyRequest=com.ning.http.client.providers.netty.request.NettyRequest@5fe3923f,

       content=null,

       uri=ws://localhost:9000/session/websocket?clientId=64,

       keepAlive=true,

       httpHeaders=null,

       exEx=null,

       redirectCount=0,

       timeoutsHolder=com.ning.http.client.providers.netty.request.timeout.TimeoutsHolder@402ef78,

       inAuth=false,

       statusReceived=false,

       touch=49009365} after 60007 ms


18:36:21.308 [DEBUG] c.n.h.c.p.n.c.ChannelManager - Closing Channel [id: 0x529af2fe, /127.0.0.1:64137 => localhost/127.0.0.1:9000]


18:36:21.308 [DEBUG] c.n.h.c.p.n.r.NettyRequestSender - Aborting Future NettyResponseFuture{currentRetry=0,

       isDone=false,

       isCancelled=false,

       asyncHandler=com.ning.http.client.ws.WebSocketUpgradeHandler@1ae11f07,

       nettyRequest=com.ning.http.client.providers.netty.request.NettyRequest@5fe3923f,

       content=null,

       uri=ws://localhost:9000/session/websocket?clientId=64,

       keepAlive=true,

       httpHeaders=null,

       exEx=null,

       redirectCount=0,

       timeoutsHolder=com.ning.http.client.providers.netty.request.timeout.TimeoutsHolder@402ef78,

       inAuth=false,

       statusReceived=false,

       touch=49009365}


18:36:21.308 [DEBUG] c.n.h.c.p.n.h.Processor - Channel Closed: [id: 0x529af2fe, /127.0.0.1:64137 :> localhost/127.0.0.1:9000] with attribute INSTANCE


18:36:21.313 [DEBUG] c.n.h.c.p.n.r.NettyRequestSender - Request timed out to localhost/127.0.0.1:9000 of 60000 ms


java.util.concurrent.TimeoutException: Request timed out to localhost/127.0.0.1:9000 of 60000 ms

       at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:47) [async-http-client-1.9.30.jar:na]

       at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:48) [async-http-client-1.9.30.jar:na]

       at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) [netty-3.10.4.Final.jar:na]

       at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) [netty-3.10.4.Final.jar:na]

       at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) [netty-3.10.4.Final.jar:na]

       at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:na]

       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]


18:36:21.313 [DEBUG] i.g.h.a.w.WsActor - Websocket 'gatling.http.webSocket' failed to open: Request timed out to localhost/127.0.0.1:9000 of 60000 ms




Stéphane LANDELLE

unread,
Feb 29, 2016, 12:59:21 PM2/29/16
to gat...@googlegroups.com
Hi,

Frankly, the only way to investigate this is to provide a sample application to reproduce.

Cheers,

Stéphane Landelle
GatlingCorp CEO


--
You received this message because you are subscribed to the Google Groups "Gatling User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gatling+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Constantin Gerstberger

unread,
Mar 2, 2016, 5:58:27 AM3/2/16
to Gatling User Group
Hi,

thanks for your quick response.
Took me a while to get feedback on this but unfortunately, i cannot provide the code for now.

Regards,
Constantin
...
Reply all
Reply to author
Forward
0 new messages