In order to try to achieve 10k req/s, I've been running Gatling 2.2.1 in parallel, distributed over 8 AWS m4.xlarge instances.
The tests keep failing well before I reach this mark, with multiple timeouts (connect and read timeouts).
Outpu:t "(...)j.n.ConnectException: Connection timed out (and read timeouts)"
All the instances' kernel parameters are tuned according to http://gatling.io/docs/2.2.1/general/operations.html (see below for limits and sysctl configuration I'm using).
Any clues what might be causing such limitation? One thing I noticed is that, once Gatling starts reporting a massive amount of these exceptions, the active connections drop to almost 0 (measured on server side).
The simulation looks something like this:
import scala.concurrent.duration._
import scala.math.ceil
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.jdbc.Predef._
import io.gatling.core.structure.ScenarioBuilder
import io.gatling.core.structure.PopulationBuilder
import io.gatling.http.request.builder.HttpRequestBuilder
class TestFooBar extends Simulation {
var httpConfig = http.disableWarmUp.baseURL("https://www.foobar.com")
val date_format = new java.text.SimpleDateFormat("yyyy-MM-dd")
var date_string = date_format.format(new java.util.Date())
val full_duration = 20 minutes
val rps_scale = 1f
def createSimpleUrlScenario(url: String, users: Int) : PopulationBuilder = {
val scn = scenario(url).exec(
http(url)
.get(url)
)
.inject(
rampUsersPerSec(1) to(ceil(users * rps_scale).toInt) during(full_duration)
)
.throttle(
jumpToRps(1),
reachRps(ceil(users * rps_scale).toInt) in (full_duration)
)
.protocols(
httpConfig
)
return scn
}
def createSimpleUrlScenario(urlBuilder: HttpRequestBuilder, users: Int) : PopulationBuilder = {
val scn = scenario(urlBuilder.toString).feed(csv("foo/tokens.csv").circular).exec(
urlBuilder
)
.inject(
rampUsersPerSec(1) to(ceil(users * rps_scale).toInt) during(full_duration)
)
.throttle(
jumpToRps(1),
reachRps(ceil(users * rps_scale).toInt) in (full_duration)
)
.protocols(
httpConfig
)
return scn
}
setUp(
createSimpleUrlScenario("/ajax/service1", 5),
createSimpleUrlScenario("/ajax/service2", 46),
createSimpleUrlScenario("/ajax/service3", 32),
createSimpleUrlScenario("/ajax/service4", 29),
createSimpleUrlScenario(
http("/ajax/service5")
.get("/ajax/service5")
.headers(Map(
"Cookie" -> "${token}")), 15)
)
.maxDuration(120 minutes)
}
sysctl
---
fs.file-max=300000
fs.nr_open=300000
net.core.netdev_max_backlog=300000
net.core.rmem_default=8388608
net.core.rmem_max=134217728
net.core.somaxconn=40000
net.core.wmem_default=8388608
net.core.wmem_max=134217728
net.ipv4.ip_local_port_range=1025 65535
net.ipv4.tcp_fin_timeout=15
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_max_syn_backlog=40000
net.ipv4.tcp_mem=134217728 134217728 134217728
net.ipv4.tcp_moderate_rcvbuf=1
net.ipv4.tcp_rmem=4096 277750 134217728
net.ipv4.tcp_sack=1
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_wmem=4096 277750 134217728
limits
---
* soft nofile 300000
* hard nofile 300000
Thanks,
Joao
Wed 8 Jun 17:08:06 BST 2016
70901 active connections openings
38 passive connection openings
Wed 8 Jun 17:08:08 BST 2016
70901 active connections openings
38 passive connection openings
my_server$ while true;do date;netstat -s|grep openings;sleep 1;done
Wed 8 Jun 16:54:34 BST 2016
169 active connections openings
17524 passive connection openings
Wed 8 Jun 16:54:35 BST 2016
169 active connections openings
17524 passive connection openings
....
other cross checks/ validation can be reasoning about how many concurrent/active connections you would expect with your gatling simulation:
avg response time of the urls, for example, 20ms
total inject rate users per second 100 ( about what is in your simulation)
average concurrent/active users = 0.02*100 = 2 , ie. not a lot.
with 1 request per user and an inject rate of around 100 per second you are only going to reach 100 rps.
check ss -nat|wc -l
vs. your port range net.ipv4.ip_local_port_range=1025 65535
I think you have done this but mentioned open connections only not all connections.


