[JIRA] (JENKINS-43889) ssh-agent-plugin leaking some ssh-agent processes

Re: ssh-agent-plugin leaking some ssh-agent processes

I had kind of forgotten about this issue, because we've been using a RunListener to work around it, similar to the one I had attached already, but if anyone is interested, the report is still relevant (just checked with ssh-agent plugin code from master, and Jenkins 2.190.2).

Here is a failing test case one can try, to be added in SSHAgentBuildWrapperTest:

 
                                                                    @Issue("JENKINS-43889")
    @Test
    public void sshAgentStoppedOnEarlyBuildFailure() throws Exception {
        List<String> credentialIds = new ArrayList<String>();
        credentialIds.add(CREDENTIAL_ID);

        SSHUserPrivateKey key = new BasicSSHUserPrivateKey(CredentialsScope.GLOBAL, credentialIds.get(0), "cloudbees",
                new BasicSSHUserPrivateKey.DirectEntryPrivateKeySource(getPrivateKey()), "cloudbees", "test");
        SystemCredentialsProvider.getInstance().getCredentials().add(key);
        SystemCredentialsProvider.getInstance().save();

        FreeStyleProject job = r.createFreeStyleProject("I_will_die_during_SCM_checkout");
        job.setAssignedNode(r.createSlave());

        SSHAgentBuildWrapper sshAgent = new SSHAgentBuildWrapper(credentialIds, false);
        job.getBuildWrappersList().add(sshAgent);

        // make sure this job fails during SCM checkout
        job.setScm(new FailingSCM());

        Future<? extends FreeStyleBuild> build = job.scheduleBuild2(0);
        r.assertBuildStatus(Result.FAILURE, build);
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Started(), build.get());
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Stopped(), build.get());
    }

    static class FailingSCM extends SCM {
        @Override
        public ChangeLogParser createChangeLogParser() {
            return null;
        }
        // default implementation of checkout(...) method will fail, that's what we want
    }

(you will then have come `ssh-agent` processes to kill after running this test)

 
                                                            

I'm still not sure where this should get fixed:

either in core, by moving the Environment.tearDown calls up from BuildExecution.doRun to AbstractBuildExecution.run
or in the ssh-agent plugin itself, if what it does (ie., adding an Environment to the build from its BuildWrapper.preCheckout implementation, rather than from BuildWrapper.setUp, so that its already set up during SCM checkout) is really bad/unsupported

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)

tom.gl@free.fr (JIRA)

unread,

Nov 8, 2019, 9:58:02 AM11/8/19

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour commented on

Re: ssh-agent-plugin leaking some ssh-agent processes

To be extra clear in my explanations, here is how the ssh-agent gets launched, starting from AbstractBuild.AbstractBuildExecution#run:

AbstractBuild.AbstractBuildExecution#run(...) - AbstractBuild.java#L498
SCMCheckoutStrategy#preCheckout(...) - SCMCheckoutStrategy.java#L76
SSHAgentBuildWrapper#preCheckout(...) - SSHAgentBuildWrapper.java#L228
SSHAgentBuildWrapper#createSSHAgentEnvironment(...) - SSHAgentBuildWrapper.java#L248
SSHAgentBuildWrapper.SSHAgentEnvironment#SSHAgentEnvironment(...) - SSHAgentBuildWrapper.java#L363

And here is how it gets stopped (when it does), again starting from AbstractBuild.AbstractBuildExecution#run (a few lines below):

AbstractBuild.AbstractBuildExecution#run(...) - AbstractBuild.java#L504
Build#doRun(...) - Build.java#L174
SSHAgentBuildWrapper.SSHAgentEnvironment#tearDown(...) - SSHAgentBuildWrapper.java#L417

tom.gl@free.fr (JIRA)

unread,

Nov 8, 2019, 9:59:03 AM11/8/19

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour edited a comment on

Re: ssh-agent-plugin leaking some ssh-agent processes

I had kind of forgotten about this issue, because we've been using a {{RunListener}} to work around it, similar to the one I had attached already, but if anyone is interested, the report is still relevant (just checked with ssh-agent plugin code from master, and Jenkins 2.190.2).

Here is a failing test case one can try, to be added in {{SSHAgentBuildWrapperTest}}:

{code:java}

    @Issue("JENKINS-43889")
    @Test
    public void sshAgentStoppedOnEarlyBuildFailure() throws Exception {
        List<String> credentialIds = new ArrayList<String>();
        credentialIds.add(CREDENTIAL_ID);

        SSHUserPrivateKey key = new BasicSSHUserPrivateKey(CredentialsScope.GLOBAL, credentialIds.get(0), "cloudbees",
                new BasicSSHUserPrivateKey.DirectEntryPrivateKeySource(getPrivateKey()), "cloudbees", "test");
        SystemCredentialsProvider.getInstance().getCredentials().add(key);
        SystemCredentialsProvider.getInstance().save();

        FreeStyleProject job = r.createFreeStyleProject("I_will_die_during_SCM_checkout");
        job.setAssignedNode(r.createSlave());

        SSHAgentBuildWrapper sshAgent = new SSHAgentBuildWrapper(credentialIds, false);
        job.getBuildWrappersList().add(sshAgent);

        // make sure this job fails during SCM checkout
        job.setScm(new FailingSCM());

        Future<? extends FreeStyleBuild> build = job.scheduleBuild2(0);
        r.assertBuildStatus(Result.FAILURE, build);
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Started(), build.get());
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Stopped(), build.get());
    }

    static class FailingSCM extends SCM {
        @Override
        public ChangeLogParser createChangeLogParser() {
            return null;
        }
        // default implementation of checkout(...) method will fail, that's what we want
    }

{code}

(you will then have come `ssh-agent` processes to kill after running this test)

{code}

I'm still not sure where this should get fixed:

- either in core, by moving the {{Environment.tearDown}} calls up from {{BuildExecution.doRun}} to {{AbstractBuildExecution.run}}
- or in the ssh-agent plugin itself, if what it does (ie., adding an {{Environment}} to the build from its {{BuildWrapper.preCheckout}} implementation, rather than from {{BuildWrapper.setUp}}, so that its already set up during SCM checkout) is really bad/unsupported

tom.gl@free.fr (JIRA)

unread,

Nov 11, 2019, 4:08:02 PM11/11/19

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour updated an issue

Jenkins /

Thomas de Grenier de Latour

Change By:

Environment:	Jenkins 2.32.3 , 2.190.2 ssh-agent-plugin 1.15 , 1.17

tom.gl@free.fr (JIRA)

unread,

Nov 11, 2019, 4:08:02 PM11/11/19

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour updated an issue

Jenkins /

Change By:	Thomas de Grenier de Latour

Component/s:	core

tom.gl@free.fr (JIRA)

unread,

Nov 11, 2019, 4:11:02 PM11/11/19

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour commented on

Re: ssh-agent-plugin leaking some ssh-agent processes

Added "core" to Component/s, because I really don't know who's wrong here (the plugin code or Jenkins code).

tom.gl@free.fr (JIRA)

unread,

Nov 11, 2019, 4:11:02 PM11/11/19

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour edited a comment on

Re: ssh-agent-plugin leaking some ssh-agent processes

(you will then have come some `ssh-agent` processes to kill after running this test)

I'm still not sure where this should get fixed:
- either in core, by moving the {{Environment.tearDown}} calls up from {{BuildExecution.doRun}} to {{AbstractBuildExecution.run}}
- or in the ssh-agent plugin itself, if what it does (ie., adding an {{Environment}} to the build from its {{BuildWrapper.preCheckout}} implementation, rather than from {{BuildWrapper.setUp}}, so that its already set up during SCM checkout) is really bad/unsupported

tom.gl@free.fr (JIRA)

unread,

Feb 22, 2020, 4:56:02 PM2/22/20

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour commented on

Re: ssh-agent-plugin leaking some ssh-agent processes

Revisiting this old issue again, I realize that SimpleBuildWrapper exposes a variation of what's done in the SSHAgentBuildWrapper, allowing to setup an Environment before checkout: SimpleBuildWrapper#runPreCheckout

If such a SimpleWrapperWrapper provides a Disposer and is run as part of an AbstractBuild which fails early (in SCM checkout), then the disposer won't be called.

This flaw is again not documented (just like Environment#tearDown says nothing about not always being called on build failure), and makes me lean toward the "it's a core bug" interpretation of this issue.

I will submit a PR with a test case (using SimpleBuildWrapper), and a possible fix.

tom.gl@free.fr (JIRA)

unread,

Feb 22, 2020, 6:04:03 PM2/22/20

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour commented on

Re: ssh-agent-plugin leaking some ssh-agent processes

PR submitted: https://github.com/jenkinsci/jenkins/pull/4517

tom.gl@free.fr (JIRA)

unread,

Feb 22, 2020, 6:05:02 PM2/22/20

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour assigned an issue to Thomas de Grenier de Latour

Jenkins /

Thomas de Grenier de Latour

Change By:

Assignee:

Thomas de Grenier de Latour

tom.gl@free.fr (JIRA)

unread,

Feb 22, 2020, 6:05:03 PM2/22/20

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour started work on

Change By:	Thomas de Grenier de Latour

Status:	Open In Progress

tom.gl@free.fr (JIRA)

unread,

Feb 22, 2020, 6:05:03 PM2/22/20

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour updated an issue

Jenkins /

Change By:	Thomas de Grenier de Latour

URL:	https://github.com/jenkinsci/jenkins/pull/4517

tom.gl@free.fr (JIRA)

unread,

Feb 23, 2020, 10:08:02 AM2/23/20

to jenkinsc...@googlegroups.com

Thomas de Grenier de Latour updated an issue

Jenkins /

Change By:	Thomas de Grenier de Latour

When a job with the {{SSHAgentBuildWrapper}} enabled fails very early (for instance during SCM checkout), an {{ssh-agent}} process is left behind. The issue is that the {{SSHAgentEnvironment}} is instantiated very early (from {{preCheckout}}), but its {{tearDown}} method will only be called if execution reaches {{BuildExecution.doRun}} (which comes after the SCM checkout phase in {{AbstractBuildExecution.run}}).

Before {{ssh-agent-plugin 1.14}}, there was no {{ssh-agent}} process, so the issue with some {{SSHAgentEnvironment}} not being teared torn down was less visible (but probably there was already some other kind of less obvious resources leaks with {{AgentServer}} not being properly closed).

This kind of issue with some {{Environment}} not being properly teared torn down can happen as soon as they are not instantiated from {{BuildWrapper.setUp}}, but from earlier phases (like {{BuildWrapper.preCheckout}} or {{RunListener.setUpEnvironment}}). As such, maybe that's something that should be fixed in core (maybe in {{AbstractBuildExecution.run}}) rather than specifically in the {{ssh-agent-plugin}}, I don't know...

I've written and attached a "generic workaround" {{RunListener}}, which tries to detect this situation from {{onComplete}}, and call {{tearDown}} for all {{Environment}} if it has not been done already. It's not something I propose for inclusion, but rather some code to exhibit the issue. If an ssh-agent specific fix is desirable, then a similar approach might be an option (but targeting {{SSHAgentEnvironment}} only).