[JIRA] (JENKINS-43889) ssh-agent-plugin leaking some ssh-agent processes

0 views
Skip to first unread message

tom.gl@free.fr (JIRA)

unread,
Nov 8, 2019, 9:33:03 AM11/8/19
to jenkinsc...@googlegroups.com
Thomas de Grenier de Latour commented on Bug JENKINS-43889
 
Re: ssh-agent-plugin leaking some ssh-agent processes

I had kind of forgotten about this issue, because we've been using a RunListener to work around it, similar to the one I had attached already, but if anyone is interested, the report is still relevant (just checked with ssh-agent plugin code from master, and Jenkins 2.190.2).

Here is a failing test case one can try, to be added in SSHAgentBuildWrapperTest:

    @Issue("JENKINS-43889")
    @Test
    public void sshAgentStoppedOnEarlyBuildFailure() throws Exception {
        List<String> credentialIds = new ArrayList<String>();
        credentialIds.add(CREDENTIAL_ID);

        SSHUserPrivateKey key = new BasicSSHUserPrivateKey(CredentialsScope.GLOBAL, credentialIds.get(0), "cloudbees",
                new BasicSSHUserPrivateKey.DirectEntryPrivateKeySource(getPrivateKey()), "cloudbees", "test");
        SystemCredentialsProvider.getInstance().getCredentials().add(key);
        SystemCredentialsProvider.getInstance().save();

        FreeStyleProject job = r.createFreeStyleProject("I_will_die_during_SCM_checkout");
        job.setAssignedNode(r.createSlave());

        SSHAgentBuildWrapper sshAgent = new SSHAgentBuildWrapper(credentialIds, false);
        job.getBuildWrappersList().add(sshAgent);

        // make sure this job fails during SCM checkout
        job.setScm(new FailingSCM());

        Future<? extends FreeStyleBuild> build = job.scheduleBuild2(0);
        r.assertBuildStatus(Result.FAILURE, build);
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Started(), build.get());
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Stopped(), build.get());
    }

    static class FailingSCM extends SCM {
        @Override
        public ChangeLogParser createChangeLogParser() {
            return null;
        }
        // default implementation of checkout(...) method will fail, that's what we want
    }

(you will then have come `ssh-agent` processes to kill after running this test)

I'm still not sure where this should get fixed:

  • either in core, by moving the Environment.tearDown calls up from BuildExecution.doRun to AbstractBuildExecution.run
  • or in the ssh-agent plugin itself, if what it does (ie., adding an Environment to the build from its BuildWrapper.preCheckout implementation, rather than from BuildWrapper.setUp, so that its already set up during SCM checkout) is really bad/unsupported

 

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

tom.gl@free.fr (JIRA)

unread,
Nov 8, 2019, 9:58:02 AM11/8/19
to jenkinsc...@googlegroups.com

To be extra clear in my explanations, here is how the ssh-agent gets launched, starting from AbstractBuild.AbstractBuildExecution#run:

And here is how it gets stopped (when it does), again starting from AbstractBuild.AbstractBuildExecution#run (a few lines below):

tom.gl@free.fr (JIRA)

unread,
Nov 8, 2019, 9:59:03 AM11/8/19
to jenkinsc...@googlegroups.com
I had kind of forgotten about this issue, because we've been using a {{RunListener}} to work around it, similar to the one I had attached already, but if anyone is interested, the report is still relevant (just checked with ssh-agent plugin code from master, and Jenkins 2.190.2).

Here is a failing test case one can try, to be added in {{SSHAgentBuildWrapperTest}}:
{code:java}

    @Issue("JENKINS-43889")
    @Test
    public void sshAgentStoppedOnEarlyBuildFailure() throws Exception {
        List<String> credentialIds = new ArrayList<String>();
        credentialIds.add(CREDENTIAL_ID);

        SSHUserPrivateKey key = new BasicSSHUserPrivateKey(CredentialsScope.GLOBAL, credentialIds.get(0), "cloudbees",
                new BasicSSHUserPrivateKey.DirectEntryPrivateKeySource(getPrivateKey()), "cloudbees", "test");
        SystemCredentialsProvider.getInstance().getCredentials().add(key);
        SystemCredentialsProvider.getInstance().save();

        FreeStyleProject job = r.createFreeStyleProject("I_will_die_during_SCM_checkout");
        job.setAssignedNode(r.createSlave());

        SSHAgentBuildWrapper sshAgent = new SSHAgentBuildWrapper(credentialIds, false);
        job.getBuildWrappersList().add(sshAgent);

        // make sure this job fails during SCM checkout
        job.setScm(new FailingSCM());

        Future<? extends FreeStyleBuild> build = job.scheduleBuild2(0);
        r.assertBuildStatus(Result.FAILURE, build);
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Started(), build.get());
        r.assertLogContains(Messages.SSHAgentBuildWrapper_Stopped(), build.get());
    }

    static class FailingSCM extends SCM {
        @Override
        public ChangeLogParser createChangeLogParser() {
            return null;
        }
        // default implementation of checkout(...) method will fail, that's what we want
    }

{code}
(you will then have come `ssh-agent` processes to kill after running this test)

{code}

I'm still not sure where this should get fixed:
- either in core, by moving the {{Environment.tearDown}} calls up from {{BuildExecution.doRun}} to {{AbstractBuildExecution.run}}
- or in the ssh-agent plugin itself, if what it does (ie., adding an {{Environment}} to the build from its {{BuildWrapper.preCheckout}} implementation, rather than from {{BuildWrapper.setUp}}, so that its already set up during SCM checkout) is really bad/unsupported

 

 

tom.gl@free.fr (JIRA)

unread,
Nov 11, 2019, 4:08:02 PM11/11/19
to jenkinsc...@googlegroups.com
Change By:
Environment: Jenkins 2.32.3 , 2.190.2
ssh-agent-plugin 1.15
, 1.17

tom.gl@free.fr (JIRA)

unread,
Nov 11, 2019, 4:08:02 PM11/11/19
to jenkinsc...@googlegroups.com

tom.gl@free.fr (JIRA)

unread,
Nov 11, 2019, 4:11:02 PM11/11/19
to jenkinsc...@googlegroups.com
Thomas de Grenier de Latour commented on Bug JENKINS-43889
 
Re: ssh-agent-plugin leaking some ssh-agent processes

Added "core" to Component/s, because I really don't know who's wrong here (the plugin code or Jenkins code).

tom.gl@free.fr (JIRA)

unread,
Nov 11, 2019, 4:11:02 PM11/11/19
to jenkinsc...@googlegroups.com
(you will then have come some `ssh-agent` processes to kill after running this test)


I'm still not sure where this should get fixed:
- either in core, by moving the {{Environment.tearDown}} calls up from {{BuildExecution.doRun}} to {{AbstractBuildExecution.run}}
- or in the ssh-agent plugin itself, if what it does (ie., adding an {{Environment}} to the build from its {{BuildWrapper.preCheckout}} implementation, rather than from {{BuildWrapper.setUp}}, so that its already set up during SCM checkout) is really bad/unsupported

 

 

tom.gl@free.fr (JIRA)

unread,
Feb 22, 2020, 4:56:02 PM2/22/20
to jenkinsc...@googlegroups.com

Revisiting this old issue again, I realize that SimpleBuildWrapper exposes a variation of what's done in the SSHAgentBuildWrapper, allowing to setup an Environment before checkout: SimpleBuildWrapper#runPreCheckout

If such a SimpleWrapperWrapper provides a Disposer and is run as part of an AbstractBuild which fails early (in SCM checkout), then the disposer won't be called.

This flaw is again not documented (just like Environment#tearDown says nothing about not always being called on build failure), and makes me lean toward the "it's a core bug" interpretation of this issue.

I will submit a PR with a test case (using SimpleBuildWrapper), and a possible fix.

tom.gl@free.fr (JIRA)

unread,
Feb 22, 2020, 6:04:03 PM2/22/20
to jenkinsc...@googlegroups.com

tom.gl@free.fr (JIRA)

unread,
Feb 22, 2020, 6:05:02 PM2/22/20
to jenkinsc...@googlegroups.com

tom.gl@free.fr (JIRA)

unread,
Feb 22, 2020, 6:05:03 PM2/22/20
to jenkinsc...@googlegroups.com
Status: Open In Progress

tom.gl@free.fr (JIRA)

unread,
Feb 22, 2020, 6:05:03 PM2/22/20
to jenkinsc...@googlegroups.com

tom.gl@free.fr (JIRA)

unread,
Feb 23, 2020, 10:08:02 AM2/23/20
to jenkinsc...@googlegroups.com
Thomas de Grenier de Latour updated an issue
When a job with the {{SSHAgentBuildWrapper}} enabled fails very early (for instance during SCM checkout), an {{ssh-agent}} process is left behind. The issue is that the {{SSHAgentEnvironment}} is instantiated very early (from {{preCheckout}}), but its {{tearDown}} method will only be called if execution reaches {{BuildExecution.doRun}} (which comes after the SCM checkout phase in {{AbstractBuildExecution.run}}).

Before {{ssh-agent-plugin 1.14}}, there was no {{ssh-agent}} process, so the issue with some {{SSHAgentEnvironment}} not being
teared torn down was less visible (but probably there was already some other kind of less obvious resources leaks with {{AgentServer}} not being properly closed).

This kind of issue with some {{Environment}} not being properly
teared torn down can happen as soon as they are not instantiated from {{BuildWrapper.setUp}}, but from earlier phases (like {{BuildWrapper.preCheckout}} or {{RunListener.setUpEnvironment}}). As such, maybe that's something that should be fixed in core (maybe in {{AbstractBuildExecution.run}}) rather than specifically in the {{ssh-agent-plugin}}, I don't know...

I've written and attached a "generic workaround" {{RunListener}}, which tries to detect this situation from {{onComplete}}, and call {{tearDown}} for all {{Environment}} if it has not been done already.  It's not something I propose for inclusion, but rather some code to exhibit the issue.   If an ssh-agent specific fix is desirable, then a similar approach might be an option (but targeting {{SSHAgentEnvironment}} only).
Reply all
Reply to author
Forward
0 new messages