Advice about TCP Keep-Alive settings for raw TCP Socket API?

Andrew Sutherland

unread,

Feb 3, 2015, 9:15:38 AM2/3/15

to

I raised an issue on the TCP and UDP socket web API spec (http://www.w3.org/2012/sysapps/tcp-udp-sockets/) about exposing TCP keep-alive settings at https://github.com/sysapps/tcp-udp-sockets/issues/80. I did this because for the Firefox OS Gaia email app we were seeing connections that were effectively dead but without reasonable TCP keepalive settings we weren't noticing this (and then were at the mercy of the TCP retransmit timeouts, presumably.)

Especially since we worked around the problem in the email app thus far by more aggressively closing otherwise-idle connections, I do not feel like I have sufficient wisdom/experience to confidently suggest the right course of defaults or whether it's best to expose the raw linux TCP stack settings or something else.

So, if those on this list have opinions, especially those backed up by data, it would be great if you could chime in on https://github.com/sysapps/tcp-udp-sockets/issues/80. It's probably worth noting that from the perspective of Firefox OS, the TCP socket API is a legacy API that exists to support existing pre-web technologies, but that web sockets and its origin-based security model are the way forward (or Web RTC's peer-to-peer mechanisms, etc.). So the API doesn't have to be perfect.

Andrew

Patrick McManus

unread,

Feb 3, 2015, 10:27:35 AM2/3/15

to Andrew Sutherland, Steve Workman, dev-tech-network

cc: steve who did the firefox work around this.

one of the issues you will run into is OS portability - you might very well
standardize something that's isn't portably implementable on standard
kernels.

Using sessions on the order of 2 or 3 minutes with a tiny bit of keep-alive
seems to work pretty well, and closing them after that point. That's the
HTTP/2 strategy right now. We've looked at the distribution of timeouts in
the past - there is a notable spike around 60 seconds, but it still only
made up a small fraction of the total distribution.

If you have some kind of nop like application layer method (such as smtp
nop :)) you can pipeline that just before your real operation and put a
timer on that response, creating a new session if it fails. This doesn't
add any latency due to the pipeline when everything is fine. That's a bit
better than open ended K-A in my opinion, particularly if you really hold
that connection open for a long duration. but some protocols (HTTP/1 e.g.)
aren't well suited to that.

There are of course severe tradeoffs in terms of battery and network
overhead in using tcp keep-alives. Most KA transmissions are going to be
full radio wakeups. I would think that those trades would be especially
painful for Firefox OS. Legacy protocols using that api might be better off
just taking the performance hit of establishing new connections.

On Tue, Feb 3, 2015 at 9:15 AM, Andrew Sutherland <somb...@gmail.com>
wrote:

> _______________________________________________
> dev-tech-network mailing list
> dev-tech...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-network
>
>

Jason Duell

unread,

Feb 3, 2015, 7:31:37 PM2/3/15

to Patrick McManus, Andrew Sutherland, Steve Workman, Fabrice Desré, Sicking, Jonas, dev-tech-network

+1 to everything Patrick said.

Note that our TCP keepalive strategy on B2G at the moment is (in the
absence of other network activity) to ping every 10 seconds for the 1st
minute a connection is alive, and then we drop off sharply to pinging every
10 minutes. Keepalive pings are disabled for long-lived XHRs (and IIRC
websockets too? Steve, do you remember?) and also IIRC HTTP/2 (since Http/2
has it's own ping).

There's an inherent latency/power-use tradeoff here. I've long wondered
what the right tradeoff is for B2G (I've leaned towards stopping keepalive
pings after the 1st minute, to save battery). If someone has thoughts here
I'm interested to hear them. Perhaps it'll be moot if we teach apps to
close connections that have been idle.

Jason

On Tue, Feb 3, 2015 at 7:26 AM, Patrick McManus <mcm...@ducksong.com>
wrote:

> cc: steve who did the firefox work around this.
>
> one of the issues you will run into is OS portability - you might very well
> standardize something that's isn't portably implementable on standard
> kernels.
>
> Using sessions on the order of 2 or 3 minutes with a tiny bit of keep-alive
> seems to work pretty well, and closing them after that point. That's the
> HTTP/2 strategy right now. We've looked at the distribution of timeouts in
> the past - there is a notable spike around 60 seconds, but it still only
> made up a small fraction of the total distribution.
>
> If you have some kind of nop like application layer method (such as smtp
> nop :)) you can pipeline that just before your real operation and put a
> timer on that response, creating a new session if it fails. This doesn't
> add any latency due to the pipeline when everything is fine. That's a bit
> better than open ended K-A in my opinion, particularly if you really hold
> that connection open for a long duration. but some protocols (HTTP/1 e.g.)
> aren't well suited to that.
>
> There are of course severe tradeoffs in terms of battery and network
> overhead in using tcp keep-alives. Most KA transmissions are going to be
> full radio wakeups. I would think that those trades would be especially
> painful for Firefox OS. Legacy protocols using that api might be better off
> just taking the performance hit of establishing new connections.
>
>
> On Tue, Feb 3, 2015 at 9:15 AM, Andrew Sutherland <somb...@gmail.com>
> wrote:
>

> > _______________________________________________
> > dev-tech-network mailing list
> > dev-tech...@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-tech-network
> >
> >
> _______________________________________________
> dev-tech-network mailing list
> dev-tech...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-network
>

--

Jason

Steve Workman

unread,

Feb 3, 2015, 8:13:04 PM2/3/15

to Jason Duell, Andrew Sutherland, Patrick McManus, Fabrice Desré, Sicking, Jonas, dev-tech-network

Another +1 from me.

You mentioned effectively dead connections. It's also possible that
https://bugzilla.mozilla.org/show_bug.cgi?id=1008091 has fixed some of your
issues. That bug closes many connections when the network changes, e.g.
from wifi to cellular, or when an IP address changes.

That being said, Jason and Pat's comments are right re TCP keepalives.

Answering some of Jason's questions on TCP Keepalive Config:

Current Keepalives in central are here:
https://dxr.mozilla.org/mozilla-central/source/modules/libpref/init/all.js?from=all.js#1342
1st minute: ping every 10s
Subsequently: ping every 10mins

I don't see any overrides in b2g.js, nor any other pref file.

Keepalives are disabled for SPDY and HTTP/2, but not for XHR and
Websockets. I think you're referring to the abandoned HTTP Response
Timeout; both XHR and Websockets have their own timeout mechanisms.

Steve.