TL;DR: I have 16 open connections, but only one looks like it's sending keep-alive traffic. My load balancer thinks these are idle after 5 minutes and trashes them.
I have an application that uses a load balancer between it and my graph database. The load balancer will shut-down any connection that is open for more than 5 minutes if there hasn't been any traffic. Any traffic counts toward this, including keep-alive packets.
To test a theory, I'm using the gremlin console configured the same as my application (many connections in a pool).
I'm watching traffic to and from my loadbalancer box with tcpdump:
"sudo tcpdump -i eth0 dst loadbalancer.company.com".
(hostnames here and below anonymized)This is a sample line from tcpdump for the uninitiated:
19:33:13.653394 IP client.company.com.65503 > loadbalancer.company.com.http: Flags [S], seq 269990012, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 1164606407 ecr 0,sackOK,eol], length 0
In a nutshell, at 7:33 pm,
client.company.com (my laptop) opens local port 65503 to remote host
loadbalancer.company.com port 80
(shown as "http" since it's a known port).
I turned on tcpdump and left it running while I started a gremlin command-line and connected to my remote loadbalancer. My graph database is on the other side of the load balancer.
I see a bunch of lines, all with different
source ports on
client.company.com, which I'm interpreting as all the separate connections being opened. I've configured gremlin console with a min and max connections to 16 for testing.
I see source ports 65503, 65504, 65505, 65507 65508, and so on.
When the flurry of start-up packets is over, it settles down into a cadence of sending this every 10 seconds:
19:39:24.151451 IP client.company.com.49174 > loadbalancer.company.com.http: Flags [P.], seq 612:618, ack 223, win 4121, options [nop,nop,TS val 1164959312 ecr 73550223], length 6: HTTP19:39:24.228576 IP client.company.com.49174 > loadbalancer.company.com.http: Flags [.], ack 225, win 4121, options [nop,nop,TS val 1164959385 ecr 73552723], length 0 Every time I see this, it's always from source port 49174. I'm interpreting this as a keep-alive on
one of the open connections.
So, this smells funny!
Shouldn't every open connection be sending keep-alives every 10 seconds instead of only this one connection?
Thanks for borrowing your brain for this,
- Rich