Hi all,
The Loggregator Traffic Controller in CF v200 does not properly close websocket connections to doppler in all instances.
This has two problematic outcomes:
1) In general, the Traffic Controller leaks go routines when it does not properly close a connection. This causes degraded performance of logging over time - in memory usage, open connections, and slowed garbage collection.
2) This behavior appears to be aggravating a bug in the Linux kernel v3.12 - v3.15 (present in several stemcells prior to 2859) around hardware interrupts. Loggregator/doppler and loggregator_trafficcontroller VMs have a likelihood of becoming unresponsive during a shutdown. The hung VM would then need to be killed manually via the interface your deployment lives on top of (AWS: console; VMWare: VSphere), and re-deployed.
Work on these issues is being captured in tracker [1].
Resolution(s):
A fix is being tested today to resolve leaking go routines in Traffic Controller, and should be deployed in CF v201.
Upgrading to stemcell 2859 - which contains Linux kernel v 3.16 - is also recommended as soon as possible.
Many Thanks,
Erik Jasiak (PM) and John Tuley (Anchor)
CF Logging and Metrics team