docker docker0 bridge dies causing 502 errors on Appengine Managed VMs

184 views
Skip to first unread message

Matt Self

unread,
Oct 20, 2015, 3:35:20 PM10/20/15
to Google App Engine
Has anyone hit this docker issue within Appengine Managed VMs where the container becomes unreachable and results in 502 gateway errors?  Around 5 times a day (for around 3-5 minutes each time) the docker0 veth bridge stops working (and so my app stops serving traffic).  After about 3 minutes it usually comes out of this bad state (after multiple tries - or so it appears in the logs) and things start functioning properly again.

Here are the logs from a container on an instance having trouble:


gcm-Heartbeat:1445317195000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317225000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317255000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317285000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317315000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317345000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
[32872.191195] docker0: port 3(veth116458d) entered disabled state
[32872.198830] device veth116458d left promiscuous mode
[32872.292800] docker0: port 3(veth116458d) entered disabled state
Oct 20 05:02:48 gae-default-20151019t135046-o2s4 kernel: [32872.191195] docker0: port 3(veth116458d) entered disabled state
Oct 20 05:02:49 gae-default-20151019t135046-o2s4 kernel: [32872.198830] device veth116458d left promiscuous mode
Oct 20 05:02:49 gae-default-20151019t135046-o2s4 kernel: [32872.292800] docker0: port 3(veth116458d) entered disabled state
[32872.480267] device vethe94c175 entered promiscuous mode
Oct 20 05:02:49 [32872.610827] IPv6: ADDRCONF(NETDEV_UP): vethe94c175: link is not ready
gae-default-2015[32872.734730] docker0: port 3(vethe94c175) entered forwarding state
1019t135046-o2s4[32872.749925] docker0: port 3(vethe94c175) entered forwarding state
 kernel: [32872.480267] device vethe94c175 enter[32872.771080] IPv6: ADDRCONF(NETDEV_CHANGE): vethe94c175: link becomes ready
ed promiscuous mode
Oct 20 05:02:49 gae-default-20151019t135046-o2s4 kernel: [32872.610827] IPv6: ADDRCONF(NETDEV_UP): vethe94c175: link is not ready
Oct 20 05:02:49 gae-default-20151019t135046-o2s4 kernel: [32872.734730] docker0: port 3(vethe94c175) entered forwarding state
Oct 20 05:02:49 gae-default-20151019t135046-o2s4 kernel: [32872.749925] docker0: port 3(vethe94c175) entered forwarding state
Oct 20 05:02:49 gae-default-20151019t135046-o2s4 kernel: [32872.771080] IPv6: ADDRCONF(NETDEV_CHANGE): vethe94c175: link becomes ready
...
[a few minutes later]
...
Oct 20 05:07:18 gae-default-20151019t135046-o2s4 kernel: [33141.570136] docker0: port 3(vethca972ab) entered forwarding state
Oct 20 05:07:18 gae-default-20151019t135046-o2s4 kernel: [33141.606546] IPv6: ADDRCONF(NETDEV_CHANGE): vethca972ab: link becomes ready
gcm-Heartbeat:1445317645000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
[33156.576281] docker0: port 3(vethca972ab) entered forwarding state
Oct 20 05:07:33 gae-default-20151019t135046-o2s4 kernel: [33156.576281] docker0: port 3(vethca972ab) entered forwarding state
gcm-Heartbeat:1445317675000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317705000
gcm-StatusUpdate:TIME=1445284556000;STATUS=ALL_COMMANDS_SUCCEEDED
gcm-Heartbeat:1445317735000

Any help/thoughts/advice is appreciated.  While I'm all for treating my servers like cattle and not sheep, the frequency of this issue is less than ideal. :) Help!
Thanks again.

Nick (Cloud Platform Support)

unread,
Oct 22, 2015, 7:00:00 PM10/22/15
to Google App Engine
Hey Matt,

From looking at the logs, and from reading your description, this seems like an intermittent issue which would be best looked-into and solved by Google. I'm not ruling out that other users might have ideas or have seen this before, but this looks like it would make a good post to the Cloud Platform Public Issue Tracker.

The App Engine issue tracker had become something of a catch-all for issues which weren't directly related to App Engine, so we also have (not-so) recently put up the Cloud Platform issue tracker, just to let you know so as to not be alarmed by the small number of issues in the list. 
Reply all
Reply to author
Forward
0 new messages