| In case it can be useful to anyone, here is the "planned upgrade" process we have for Jenkins in my company. It relies on a custom quiet-mode implementation we've implemented in an internal plugin, which basically allows already running builds to terminate (including Pipelines), but forbids starting execution of new builds (expect if they are necessary for termination of the already running builds). The overall process is automated (we have many Jenkins instances), and it goes like this:
- activate the custom quiet-down mode (forbid starting new builds)
- poll Jenkins until it's idle, for up to X minutes, and then do the upgrade (including an actual restart)
- on time-out of this polling, cancel the planned upgrade (cancel the custom quiet-mode), and retry it all later (sometimes we have to find arrangements with users, so that they don't launch their freaking 18 hours tests suite on the day we are planning to do an upgrade)
We don't have plans/time to publish and maintain this as a community plugin, but if someone wants to do something similar, I will dump the code below, feel free to reuse what you want. Note that we would probably never had written this code if we had not been bitten many times by JENKINS-34256. A few years ago, we were simply using the standard Jenkins quiet-mode, but then stuck Pipelines (when the upgrade was cancelled) really became an issue... Now that JENKINS-34256 is fixed, I don't know, we might consider going back to this standard solution. But I think our users prefer having their Pipelines finished before the upgrade, rather than paused/resumed (mainly because the "resume" part is not always smooth: some plugins upgrades might break compatibility of the serialized data, etc.). Anyway, this is the "interesting" part of the code, the QuietDownQueueTaskDispatcher, which filters which new Queue.Item can actually be started when in (custom) quiet-mode.
@Extension
public class QuietDownQueueTaskDispatcher extends QueueTaskDispatcher {
@Inject
QuietDownStateManager quietDownStateManager;
// key: upstreamProject+upstreamBuild from an UpstreamCause
// value: true if children builds should be allowed to run
private ConcurrentHashMap<String, Boolean> knownUpstreamCauses = new ConcurrentHashMap<>();
// used to decide when cache should be flushed
private AtomicLong quietDownTimestamp = new AtomicLong(0l);
@Override
public @CheckForNull CauseOfBlockage canRun(Queue.Item item) {
QuietDownState currentState = quietDownStateManager.getState();
if (!currentState.isDown()) {
return null;
}
// flush cache if quietDown state has changed
if (quietDownTimestamp.getAndSet(currentState.since()) != currentState.since()) {
knownUpstreamCauses.clear();
}
Queue.Task task = item.task;
// always allow some kind of tasks
if (task instanceof NonBlockingTask || task instanceof ContinuedTask) {
return null;
}
// allow build task because of its upstream cause
if (hasAllowingCause(item.getCauses())) {
return null;
}
// not allowed, let's explain why
return new QuietDownBlockageCause(currentState);
}
private boolean hasAllowingCause(@Nonnull List<Cause> causes) {
boolean result = false;
for (Cause parentCause: causes) {
if (!(parentCause instanceof UpstreamCause)) {
continue;
}
result = result || isAllowingUpstreamCause((UpstreamCause) parentCause);
}
return result;
}
private boolean isAllowingUpstreamCause(@Nonnull UpstreamCause cause) {
String runKey = cause.getUpstreamProject() + ':' + cause.getUpstreamBuild();
Boolean decisionFromCache = knownUpstreamCauses.get(runKey);
if (decisionFromCache != null) {
return decisionFromCache;
}
boolean newDecision = hasAllowingCause(cause.getUpstreamCauses())
|| isRunAllowingDownstreamBuilds(cause.getUpstreamRun());
knownUpstreamCauses.put(runKey, newDecision);
return newDecision;
}
private boolean isRunAllowingDownstreamBuilds(@CheckForNull Run<?, ?> run) {
if (run == null || !run.isBuilding()) {
return false;
}
// a running WorkflowRun or MatrixBuild may wait for its children to complete
// Note: assume there exists no MatrixBuild subclass, it saves an optional plugin dependency
return (run instanceof WorkflowRun || "hudson.matrix.MatrixBuild".equals(run.getClass().getName()));
}
public static class QuietDownBlockageCause extends CauseOfBlockage {
private final @Nonnull QuietDownState quietDownState;
private QuietDownBlockageCause(QuietDownState quietDownState) {
this.quietDownState = quietDownState;
}
public static @CheckForNull QuietDownBlockageCause from(QuietDownState quietDownState) {
if (!quietDownState.isDown()) {
return null;
}
return new QuietDownBlockageCause(quietDownState);
}
@Override
public String getShortDescription() {
return quietDownState.toShortDescriptionString();
}
}
}
The currently implemented policy is to only allow tasks which are:
- NonBlockingTask, or Pipeline ContinuedTask (I can't remember the specific details, I wrote that long time ago)
- children of an already running Pipeline or Matrix build (that's necessary to let these builds terminate, because they can wait for their children termination, but it could be refined: for instance we don't really need to allow builds launched by a Pipeline build step with wait=false parameter)
Other than these, new builds will be declined, and stay in the queue. To avoid spending too much time walking the UpstreamCause of the candidate tasks, we keep a cache of already made decisions (whether a specific build is a legitimate cause for allowing children builds, or not). A QuietDownState has a State (AVAILABLE or QUIET_DOWN enumeration), a starting timestamp, and a cause message.
public class QuietDownState {
private final String cause;
private final State state;
private final long timestamp;
private QuietDownState(@Nonnull State state) {
this(state, null);
}
private QuietDownState(@Nonnull State state, String cause) {
this.cause = cause;
this.state = state;
this.timestamp = System.currentTimeMillis();
}
public static @Nonnull QuietDownState available() {
return new QuietDownState(State.AVAILABLE);
}
public static @Nonnull QuietDownState quietDown(@Nonnull String cause) {
return new QuietDownState(State.QUIET_DOWN, cause);
}
public boolean is(State state) {
return this.state == state;
}
public boolean isDown() {
return state.down;
}
public @CheckForNull String why() {
return cause;
}
public long since() {
return timestamp;
}
public @Nonnull String toApiString() {
StringBuilder sb = new StringBuilder();
sb.append(state);
sb.append(" since ");
sb.append(Util.XS_DATETIME_FORMATTER.format(timestamp));
if (StringUtils.isNotEmpty(cause)) {
sb.append(" - ").append(cause);
}
return sb.toString();
}
// FIXME: better message/formatting
public @Nonnull String toUserString() {
StringBuilder sb = new StringBuilder();
sb.append("Jenkins has been ");
sb.append(state.label);
sb.append(" for ");
sb.append(Util.getTimeSpanString(System.currentTimeMillis() - timestamp));
if (StringUtils.isNotEmpty(cause)) {
sb.append(" - ").append(cause);
}
return sb.toString();
}
// FIXME: make it shorter?
public @Nonnull String toShortDescriptionString() {
return toUserString();
}
public @Nonnull String toString() {
return toApiString();
}
@Override
public int hashCode() {
// <snip>
}
@Override
public boolean equals(Object obj) {
// <snip>
}
public enum State {
AVAILABLE(false, "available"), QUIET_DOWN(true, "sleeping");
private boolean down;
private String label;
private State(boolean down, String label) {
this.down = down;
this.label = label;
}
}
}
The (global) current state can be changed via a QuietDownStateManager, which is a Guice singleton:
public class QuietDownStateManager {
private AtomicReference<QuietDownState> currentState = new AtomicReference<>(QuietDownState.available());
public QuietDownState getState() {
return currentState.get();
}
public QuietDownState quietDown(String cause) {
final QuietDownState newState = QuietDownState.quietDown(cause);
return currentState.updateAndGet(
state -> state.is(QUIET_DOWN) ? state : newState);
// TODO: updating the cause (when already down) could be nice (while still preserving the initial timestamp)
}
public QuietDownState cancelQuietDown() {
final QuietDownState newState = QuietDownState.available();
return currentState.updateAndGet(
state -> state.is(AVAILABLE) ? state : newState);
}
}
@Extension
public class GuiceBindings extends AbstractModule {
@Override
protected void configure() {
//...
bind(QuietDownStateManager.class).in(Singleton.class);
}
}
We control the QuietDownStateManager through a few simple HTTP methods:
- doQuietDown(): enable quiet-down mode (with a cause message)
- doCancelQuietDown(): disable quiet-down mode
- doGetQuietDownStatus(): get current quiet-down status
We also have a method (doActivity() below) which we can poll to know whether Jenkins is BUSY or IDLE (that's what we use to wait for it being idle before triggering an actual restart - this too could be refined, for instance we could consider that Jenkins is idle when the only running Pipelines which are left are actually blocked on input steps).
@Extension
public class SomethingRemoteAPI extends AbstractModelObject implements UnprotectedRootAction {
@Inject
QuietDownStateManager quietDownStateManager;
public String getDisplayName() {
return "SomethingAPI";
}
public String getSearchUrl() {
return getUrlName();
}
public String getIconFileName() {
return null;
}
public String getUrlName() {
return "somethingAPI";
}
// <snip> other unrelated methods
@RequirePOST
public HttpResponse doQuietDown() {
Jenkins.getInstance().checkPermission(Jenkins.ADMINISTER);
return (req, rsp, node) -> {
final QuietDownState state = quietDownStateManager.quietDown(defaultString(req.getParameter("cause")));
rsp.setStatus(HttpServletResponse.SC_OK);
rsp.setContentType("text/plain");
PrintWriter w = rsp.getWriter();
w.println(state.toApiString());
};
}
@RequirePOST
public HttpResponse doCancelQuietDown() {
Jenkins.getInstance().checkPermission(Jenkins.ADMINISTER);
return (req, rsp, node) -> {
final QuietDownState state = quietDownStateManager.cancelQuietDown();
rsp.setStatus(HttpServletResponse.SC_OK);
rsp.setContentType("text/plain");
PrintWriter w = rsp.getWriter();
w.println(state.toApiString());
};
}
public HttpResponse doGetQuietDownStatus() {
return (req, rsp, node) -> {
final QuietDownState state = quietDownStateManager.getState();
rsp.setStatus(HttpServletResponse.SC_OK);
rsp.setContentType("text/plain");
PrintWriter w = rsp.getWriter();
w.println(state.toApiString());
};
}
public HttpResponse doActivity() {
final int httpStatus;
final String body;
try {
body = countBusyExecutors() > 0 ? "BUSY" : "IDLE" ;
httpStatus = HttpServletResponse.SC_OK;
} catch (RuntimeException e) {
LOGGER.log(Level.WARNING, "failed to count busy executors: " + e.getMessage(), e);
body = "UNKOWN" ;
httpStatus = HttpServletResponse.SC_INTERNAL_SERVER_ERROR;
}
return (req, rsp, node) -> {
rsp.setStatus(httpStatus);
rsp.setContentType("text/plain");
PrintWriter w = rsp.getWriter();
w.println(body);
};
}
private int countBusyExecutors() {
// see hudson.model.ComputerSet.getBusyExecutors()
int r = 0;
for (Computer c : Jenkins.get().getComputers()) {
if (c.isOnline()) {
r += c.countBusy();
}
}
return r;
}
}
Finally, we also have some bits of code to display a message in Jenkins GUI when our quiet-mode is enabled (that's part of a more general-purpose system we have for pushing notification messages to our Jenkins users, but that could ofcourse be implemented differently in the context of a dedicated plugin). |