[JIRA] [swarm-plugin] (JENKINS-31084) Swarm slaves should allow some level of control interaction

4 views
Skip to first unread message

regs@akom.net (JIRA)

unread,
Oct 21, 2015, 10:57:01 AM10/21/15
to jenkinsc...@googlegroups.com
Alexander Komarov created an issue
 
Jenkins / Improvement JENKINS-31084
Swarm slaves should allow some level of control interaction
Issue Type: Improvement Improvement
Assignee: Kohsuke Kawaguchi
Components: swarm-plugin
Created: 21/Oct/15 2:56 PM
Labels: plugin
Priority: Minor Minor
Reporter: Alexander Komarov

This is a feature idea that I'm willing to implement but I'd like to hear maintainers' thoughts on this first.

Case:

I manage slaves with puppet. Bringing them up is easy - configure, run java. Shutting down (say, for a reboot or vm teardown) is not so easy - I'm very likely to kill a running job. So I have a bash loop that counts the java processes not including the swarm process itself. If the count is 0, I can shut down. But that's crude and unreliable - not all subprocesses will be java, and there is certainly a chance of a new job starting in the time it takes to kill the swarm instance.

The proper way is of course to interact with the master - mark offline, wait, reboot. But this requires the swarm nodes to have extensive knowledge of the master, which seems to contradict the purpose of swarm (it's managed from the slave side without any master interaction, and thus should be able to dynamically come and go).

Things I'm considering:

SOME COMMAND may be "go offline", "shutdown", "block till idle" etc - but may also be something that can return status - ie "is idle?", "is offline?", etc (obviously not for the Signals approach)

There are definitely some problems with these solutions, so I'm curious what others think. It's also possible that I'm overlooking a simpler way.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 26, 2018, 3:28:28 AM2/26/18
to jenkinsc...@googlegroups.com
Oleg Nenashev assigned an issue to Unassigned
 

KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

Change By: Oleg Nenashev
Assignee: Kohsuke Kawaguchi
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

me@basilcrow.com (JIRA)

unread,
Jun 1, 2019, 2:05:06 PM6/1/19
to jenkinsc...@googlegroups.com
Basil Crow updated an issue
 
Jenkins / New Feature JENKINS-31084
Change By: Basil Crow
Issue Type: Improvement New Feature
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

me@basilcrow.com (JIRA)

unread,
Jun 3, 2019, 1:06:02 PM6/3/19
to jenkinsc...@googlegroups.com

This is a feature idea that I'm willing to implement but I'd like to hear maintainers' thoughts on this first.

This sounds like a reasonable feature to me. In many ways, it's similar to Postgres' Smart Shutdown mode:

After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate.

Implementing this in Swarm via a signal also seems reasonable. This also would work well with, e.g. systemd, which could send either SIGTERM or SIGINT to the Swarm client as appropriate (just like in the Postgres systemd unit file).

On receiving the signal, the client would need to communicate with the server to do the graceful shutdown. A new backend endpoint would need to be created. When this endpoint is called, it would need to invoke the API equivalent of the "Mark this node temporarily offline" feature in the UI (which waits for the current task to complete, then takes the node offline). The endpoint would be in plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java and would look something like this:

Node node = getNodeByName(name, rsp);
node.toComputer().setTemporarilyOffline(true);

Once the graceful shutdown has been initiated, the client would need to wait for the node to be unused. This could be done with another backend endpoint:

Node node = getNodeByName(name, rsp);
boolean isOffline = node.toComputer().isOffline();

The client would then have to wait in a loop, polling this endpoint for the node to be offline. Once the node is offline, the client could terminate.

I welcome any PRs to implement this and would be happy to review them.

Reply all
Reply to author
Forward
0 new messages