This sounds like a reasonable feature to me. In many ways, it's similar to Postgres' Smart Shutdown mode:
After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate.
Implementing this in Swarm via a signal also seems reasonable. This also would work well with, e.g. systemd, which could send either SIGTERM or SIGINT to the Swarm client as appropriate (just like in the Postgres systemd unit file). On receiving the signal, the client would need to communicate with the server to do the graceful shutdown. A new backend endpoint would need to be created. When this endpoint is called, it would need to invoke the API equivalent of the "Mark this node temporarily offline" feature in the UI (which waits for the current task to complete, then takes the node offline). The endpoint would be in plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java and would look something like this:
Node node = getNodeByName(name, rsp);
node.toComputer().setTemporarilyOffline(true);
Once the graceful shutdown has been initiated, the client would need to wait for the node to be unused. This could be done with another backend endpoint:
Node node = getNodeByName(name, rsp);
boolean isOffline = node.toComputer().isOffline();
The client would then have to wait in a loop, polling this endpoint for the node to be offline. Once the node is offline, the client could terminate. I welcome any PRs to implement this and would be happy to review them. |