Fwd: Request to host plugin "hierarchy-killer"

Thorsten Möllers

unread,

Sep 17, 2015, 7:19:37 AM9/17/15

to jenkin...@googlegroups.com

Hi,

I would like to contribute a new plugin to allow complex trees of CI jobs o fail early and/or to be aborted by just aborting the root of the tree.

In a complex CI environment, for each commit dozens or even hundreds of different builds and tests are triggered to verify the commit. The purpose is to give feedback to the developer, if the commit is valid or not. There are two opposing views on what the developer expects:

Either the developer wants as much feedback as possible for each commit, or he wants feedback as fast as possible.

If the target is to provide feedback as fast as possible, all testing can be stopped on the first failed build/test job. This is where this plugin comes into play: If configured accordingly, the plugin will kill all upstream- and downstream jobs of a failing job.

The plugin is hosted on https://github.com/thors/hierarchy-killer-plugin
My github username is "thors"
My jenkins-ci username is also "thors"

A more detailed description of what the plugin does is available on https://github.com/thors/hierarchy-killer-plugin/wiki

BTW: I'm not yet very experienced in Java programming, so if anyone would care to review the plugin for obvious flaws, I would be happy.

Best regards,
Thorsten

Best regards,
Thorsten

Daniel Beck

unread,

Sep 17, 2015, 8:02:37 AM9/17/15

to jenkin...@googlegroups.com

On 17.09.2015, at 13:19, Thorsten Möllers <thor...@thorzten.de> wrote:

> if anyone would care to review the plugin for obvious flaws, I would be happy.

Some of the following (especially the one on variable naming) is probably subjective...

- I don't think there's a reason for you to subclass Plugin. It doesn't really do anything here. You could move the static stuff into the listener directly.
- You're missing the resource file that contains the plugin description for the plugin manager (src/main/resources/index.jelly).
- Don't reinvent logging levels (especially if they cannot be changed once compiled), use java.util.logging consistently.
- Don't write in the build log of jobs that you're not enabled for. That can break things like log parsing etc., as well as look weird. AFAIU this is added the moment another (upstream) job is aborted.
- Don't prefix instance variables with 'i'.
- I doubt you really need the RunData, why not query all running jobs for what their (upstream) causes are when you get notified about a run abort? Not sure what's saved/loaded in the Plugin class, but I expect the current design with the static HashMap doesn't handle a Jenkins restart well (the queue is persisted and restored on restart).
- Don't rely on the JENKINS_URL env var to be set.
- Given how many env vars you check, I think at least ENABLE_HIERARCHY_KILLER can be replaced by a job config option (e.g. a JobProperty). This would also have the advantage that the plugin would have some UI, and could have an inline help text explaining how it works. A non-library plugin without any UI is not a great idea as admins may easily forget it exists, and users don't even know it's there.
- It likely doesn't work with Workflow plugin, whose Runs aren't AbstractBuilds. Build Flow plugin probably has the same problem.
- While I understand that Parameterized Trigger allows having an upstream project that still runs, I wonder doubt it's always only one, which seems to be the assumption in your code. While that feature may not consider quiet periods, the node may be missing, so two identically parameterized builds could be collapsed in the queue while waiting for the node to free up.

Those remarks out of the way:

When I first saw the plugin name, I thought of the process tree killing feature of Jenkins (https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller) -- as in 'process hierarchy'. Wouldn't something like downstream-killer or abort-related-jobs be better?

Thorsten Möllers

unread,

Sep 17, 2015, 11:04:27 AM9/17/15

to jenkin...@googlegroups.com

Thanks for the feedback. There are lots of relevant things to keep me
busy for today

> - I don't think there's a reason for you to subclass Plugin. It
> doesn't really do anything here. You could move the static stuff into
> the listener directly.

You are probably right. I will look into it.

> - You're missing the resource file that contains the plugin
> description for the plugin manager (src/main/resources/index.jelly).

Ok

> - Don't reinvent logging levels (especially if they cannot be changed
> once compiled), use java.util.logging consistently.

Ok

> - Don't write in the build log of jobs that you're not enabled for.
> That can break things like log parsing etc., as well as look weird.
> AFAIU this is added the moment another (upstream) job is aborted.

I don't. If the variable "HIERARCHY_KILLER_ENABLED" is not defined, it
will ignore the upstream job.

> - Don't prefix instance variables with 'i'.

Ok

> - I doubt you really need the RunData, why not query all running jobs
> for what their (upstream) causes are when you get notified about a run
> abort? Not sure what's saved/loaded in the Plugin class, but I expect
> the current design with the static > HashMap doesn't handle a
> Jenkins restart well (the queue is persisted and restored on restart).

Because it is not only supposed to kill upstream. If I kill the upstream
build, I want to kill other downstream builds as well. Problem is I
couldn't find a a way to get a list of all downstream builds for a
running build.
Therefore I collect this information at runtime.

Jenkins restart: I have to make sure that no persistent data is stored.

> - Don't rely on the JENKINS_URL env var to be set.

Ok

> - Given how many env vars you check, I think at least
> ENABLE_HIERARCHY_KILLER can be replaced by a job config option (e.g. a
> JobProperty). This would also have the advantage that the plugin would
> have some UI, and could have an inline help text
> explaining how it works.

But it wouldn't automatically be propagated downstream, and if it is,
the downstream job can not easily overwrite it again.

> A non-library plugin without any UI is not a great idea as admins may
> easily forget it exists, and users don't even know it's there.

I agree, you have a point there. Not sure how to align with my previous
comment though.

> - It likely doesn't work with Workflow plugin, whose Runs aren't
> AbstractBuilds. Build Flow plugin probably has the same problem.

These jobs would be ignored by my plugin. (I just updated it to ignore
the notifications for everything but AbstractBuilds.)

> - While I understand that Parameterized Trigger allows having an
> upstream project that still runs, I wonder doubt it's always only one,
> which seems to be the assumption in your code. While that feature may
> not consider quiet periods, the node may be missing, so two
> identically parameterized builds could be collapsed in the queue while
> waiting for the node to free up.

Isn't this something only the Workflow plugin does? I saw it with
workflow, but never with Parameterized Trigger. If this is really an
issue for normal builds as well, I will replace iUpstream also with a
list (vector).

Again, thanks for your feedback. For the naming: "Downstream-killer" is
incomplete, because to fail early it also needs to be able to kill
upstream. abort-related-jobs might work. However, I somehow like the
name; in our company we are often talking about job-hierarchies. Maybe
some others can chime in and give a comment?

Br,
Thorsten

Daniel Beck

unread,

Sep 17, 2015, 7:20:01 PM9/17/15

to jenkin...@googlegroups.com

On 17.09.2015, at 17:04, Thorsten Möllers <thor...@thorzten.de> wrote:

>
>> - I doubt you really need the RunData, why not query all running jobs for what their (upstream) causes are when you get notified about a run abort? Not sure what's saved/loaded in the Plugin class, but I expect the current design with the static > HashMap doesn't handle a Jenkins restart well (the queue is persisted and restored on restart).
>
> Because it is not only supposed to kill upstream. If I kill the upstream build, I want to kill other downstream builds as well. Problem is I couldn't find a a way to get a list of all downstream builds for a running build.
> Therefore I collect this information at runtime.

Right. That's why I wrote you'd need to look at all running builds' causes to determine whether they are downstream from the "reference build". I think that should work and allow you to do this without RunData.

BTW, it looks like you don't look at queue items. If a build is still in the queue when you abort whatever is upstream from it, it'll not be canceled.

> Jenkins restart: I have to make sure that no persistent data is stored.

If you have no persistent data, and the queue gets restored on restart, how can your plugin behave correctly afterwards?

>
>> A non-library plugin without any UI is not a great idea as admins may easily forget it exists, and users don't even know it's there.
> I agree, you have a point there. Not sure how to align with my previous comment though.

Maybe provide both static checkboxes in job config for all the options, and treat them like env-vars, giving users the option to use one or the other? Makes checking whether an option is set a bit more effort, but you have the option to do both full UI (static) config, and allow the alternative of per-build configuration through env vars.

>
>> - While I understand that Parameterized Trigger allows having an upstream project that still runs, I wonder doubt it's always only one, which seems to be the assumption in your code. While that feature may not consider quiet periods, the node may be missing, so two identically parameterized builds could be collapsed in the queue while waiting for the node to free up.
> Isn't this something only the Workflow plugin does? I saw it with workflow, but never with Parameterized Trigger. If this is really an issue for normal builds as well, I will replace iUpstream also with a list (vector).

- You have the build step provided by Parameterized Trigger that triggers a build which the current build is still running.
- A build waits in the queue until a required node ("Restrict where this can run") becomes available.
- Two queue items with identical parameters are collapsed into one and their causes merged.

I think it should be fairly straightforward to test this.

Thorsten Möllers

unread,

Sep 18, 2015, 1:54:22 AM9/18/15

to jenkin...@googlegroups.com

On 18/09/15 01:19, Daniel Beck wrote:
> On 17.09.2015, at 17:04, Thorsten Möllers <thor...@thorzten.de> wrote:
>
>>> - I doubt you really need the RunData, why not query all running jobs for what their (upstream) causes are when you get notified about a run abort? Not sure what's saved/loaded in the Plugin class, but I expect the current design with the static > HashMap doesn't handle a Jenkins restart well (the queue is persisted and restored on restart).
>> Because it is not only supposed to kill upstream. If I kill the upstream build, I want to kill other downstream builds as well. Problem is I couldn't find a a way to get a list of all downstream builds for a running build.
>> Therefore I collect this information at runtime.
> Right. That's why I wrote you'd need to look at all running builds' causes to determine whether they are downstream from the "reference build". I think that should work and allow you to do this without RunData.

I don't think this is an option. We have a huge set of running jobs, a
lot of them short lived, so I try to keep the cost for a single job
exiting minimal. As I see it, currently the memory allocated in my
plugin for a single build is O(1), since the hierarchy structure is more
or less static. The execution time to handle a single notification is
also O(1), for the same reason. Going through all builds would add an
operation O(n), n being the number of currently running builds, which is
quite huge in our system. Since a lot of our builds are short-lived,
this task would be executed quite frequently.
.
Also the log info in the console output is invaluable for us, to see the
full chain of jobs being killed leading to the most upstream one being
killed. This data is collected in RunData.reason.

> BTW, it looks like you don't look at queue items. If a build is still in the queue when you abort whatever is upstream from it, it'll not be canceled.

This would be a good improvement for the next version. But from what I
see, Jenkins does not provide notifications for triggered builds, just
for started ones. Where do I get a list of queued builds?

>
>> Jenkins restart: I have to make sure that no persistent data is stored.
> If you have no persistent data, and the queue gets restored on restart, how can your plugin behave correctly afterwards?

Jenkins restart would be the exception. Usually, Jenkins runs for weeks
in our setup without a restart, and when it restarts, it is usually a
scheduled maintenance break where no jobs are active anyway. In the rare
case of an unexpected Jenkins restart I could live with some builds
being continued instead of aborted, it would be the least of our problems.

>>> A non-library plugin without any UI is not a great idea as admins may easily forget it exists, and users don't even know it's there.
>> I agree, you have a point there. Not sure how to align with my previous comment though.
> Maybe provide both static checkboxes in job config for all the options, and treat them like env-vars, giving users the option to use one or the other? Makes checking whether an option is set a bit more effort, but you have the option to do both full UI (static) config, and allow the alternative of per-build configuration through env vars.

I will look into it, but not for a first release probably.

For the naming: How about "BuildHierarchyKiller"? I think that would be
unambiguous...

Br,
Thorsten

Thorsten Möllers

unread,

Sep 18, 2015, 3:05:06 AM9/18/15

to jenkin...@googlegroups.com

On 18/09/15 07:54, Thorsten Möllers wrote:
> I don't think this is an option. We have a huge set of running jobs, a
> lot of them short lived, so I try to keep the cost for a single job
> exiting minimal. As I see it, currently the memory allocated in my
> plugin for a single build is O(1), since the hierarchy structure is
> more or less static. The execution time to handle a single
> notification is also O(1), for the same reason. Going through all
> builds would add an operation O(n), n being the number of currently
> running builds, which is quite huge in our system. Since a lot of our
> builds are short-lived, this task would be executed quite frequently.

Now here I might have to correct myself. The additional memory
allocation could easily add more complexity than running through the
list of all running jobs. Probably I should run some performance tests
for both approaches in a staging environment.

Br,
Thorsten

Reply all

Reply to author

Forward