CF v200 Loggregator issues: Traffic Controller degrade; Linux kernel bug causes VM lockup

75 views
Skip to first unread message

Erik Jasiak

unread,
Mar 2, 2015, 12:37:15 AM3/2/15
to vcap...@cloudfoundry.org
Hi all,

   The Loggregator Traffic Controller in CF v200 does not properly close websocket connections to doppler in all instances.

   This has two problematic outcomes:
1) In general, the Traffic Controller leaks go routines when it does not properly close a connection.  This causes degraded performance of logging over time - in memory usage, open connections, and slowed garbage collection.

2) This behavior appears to be aggravating a bug in the Linux kernel v3.12 - v3.15 (present in several stemcells prior to 2859) around hardware interrupts.  Loggregator/doppler and loggregator_trafficcontroller VMs have a likelihood of becoming unresponsive during a shutdown.  The hung VM would then need to be killed manually via the interface your deployment lives on top of (AWS: console; VMWare: VSphere), and re-deployed.

Work on these issues is being captured in tracker [1].

Resolution(s):
A fix is being tested today to resolve leaking go routines in Traffic Controller, and should be deployed in CF v201.

Upgrading to stemcell 2859 - which contains Linux kernel v 3.16 - is also recommended as soon as possible.

Many Thanks,
Erik Jasiak (PM) and John Tuley (Anchor)
CF Logging and Metrics team

Reply all
Reply to author
Forward
0 new messages