Have I understood how hot restart is supposed to work?

875 views
Skip to first unread message

Tom Steavenson

unread,
Feb 25, 2021, 6:33:12 AM2/25/21
to envoy-users

Hi,


> Envoy can fully reload itself (both code and configuration) without dropping any connections.

I understood that to mean "connections are transferred to the new instance", e.g. TCP state being replicated over, and whanted to know what happens when you do two hot restarts in quick succession.

What I found from testing long lived TCP connections that go via an envoy proxy:
  • Doing one hot restart:
    • The connection remained immediately after hot restart,
    • After the time 'parent-shutdown-time-s' the connection was dropped.
      • The log for shutting down the parent immediately precededd the connection being dropped.
  • Doing two hot restarts in quick succession (interval << parent-shutdown-time-s)
    • The connection remained immediately after the first hot restart,
    • The 2nd hot restart caused the original envoy instance to be killed and the connection to be dropped.

From this, it appears to me the way "hot restart" works is:

  • The new instance comes up with the new config and starts accepting traffic.
  • The old instance is left running to allow request responses to complete and close TCP connections in their own time (assuming it's less than the configured parent-shutdown-time-s).
  • During the drain-time-s the old instance tries to "gracefully" reject new traffic.
  • After a time parent-shutdown-time-s the parent is shut down and any connections it still has are dropped.
  • There is only two instances of  the same "logical envoy instance" allowed to exist symultaneously so if you perform a 2nd hot restart before parent-shutdown-time-s after the first hot restart the original parent is killed.

is this right?

Is there a way to prevent the sudden killing of the original parent on a 2nd hot restart?

If the above is accurate and there isn't a way to change this behavior, should the docs perhap not claim that no connections are dropped, and point out that whatever's triggering the hot restarts needs to be aware of the value of parent-shutdown-time-s and either ensure it doesn't perform hot restarts with an interval less that that time or be prepared for the consequences of doing so?

Thanks!

Matt Klein

unread,
Feb 25, 2021, 7:07:38 PM2/25/21
to Tom Steavenson, envoy-users
Yes that is correct. Documentation clarification PRs appreciated!

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/ded326ed-65f8-4e02-8db6-17b7ff7787acn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages