Got a situation where thread hanged on socket read (old school socket
bio code). One side was in TCP established while the other in
fin_wait_2. The customer was "upgrading" the switches at the time this
happened.
The thread will never complete. It should get a timeout exception. But
it doesn't. There is the call to Socket#setSoTimeout in the code. It
should do the job. My first though was there must be a bug in
setSoTimeout. I never had much faith in SoTimeout. Was not surprised
to find a lot of bug reports related to socketRead0 hangs. Reminded me
of this blog post about hanged postgres connection [1].
I'd use nio and app level timeouts. But it is legacy code that I
can't/don't want to touch.
Been thinking of using a custom SocketFactory that wraps the sockets
with some monitoring code. Pretty ugly. It doesn't feel right.
Found quite a few discussions about this. But not really any solutions
that don't require app level changes.
Any thoughts? Anybody in a similar boat?
[1]
https://tech.zalando.com/blog/hack-to-terminate-tcp-conn-postgres/