4.16 Community and Sudo Support - Timeout Issue

217 views
Skip to first unread message

Russ Robinson

unread,
Sep 12, 2023, 4:19:05 PM9/12/23
to rundeck-discuss
As we saw that Rundeck Community edition rolled out a newer version of sshj plugin, I had to re-add the same sudo tty entries into the sshj plugin again (reference: https://groups.google.com/g/rundeck-discuss/c/suFtVUp3JVY ).  This allows for sudo to at least work.

However, we are also now seeing whereby after the sudo command runs that the ssh connection times out.  For example if we have a simple bash script which does "date" and then "sleep 480"; and it is ran via "sudo su - /tmp/test.sh" then the output in debug mode looks like:

Starting the script....
Start: Tue Sep 12 19:03:01 UTC 2023 Curr: Tue Sep 12 19:03:01 UTC 2023 Try: 1 of 10000
Sleeping 480
[net.schmizz.sshj.transport.TransportImpl] Dying because - Broken transport; encountered EOF
[net.schmizz.sshj.transport.TransportImpl] Disconnected - UNKNOWN
[net.schmizz.sshj.transport.KeyExchanger] Got notified of net.schmizz.sshj.transport.TransportException: Broken transport; encountered EOF
[net.schmizz.sshj.connection.ConnectionImpl] Notified of net.schmizz.sshj.transport.TransportException: Broken transport; encountered EOF
[net.schmizz.sshj.connection.channel.direct.SessionChannel] Channel #0 got notified of net.schmizz.sshj.transport.TransportException: Broken transport; encountered EOF
[net.schmizz.sshj.connection.ConnectionImpl] Forgetting `session` channel (#0)
[net.schmizz.concurrent.Promise] Setting <<chan#0 / close>> to `SOME`
[net.schmizz.sshj.transport.TransportImpl] Setting active service to null-service
Expect operation fails (timeout: 30000000 ms) for matcher: regexp('~.*\$')
[net.schmizz.concurrent.Promise] Setting <<transport close>> to `SOME`
[sshj-ssh] closing session
[net.schmizz.sshj.transport.Reader] Stopping
[sshj-ssh] disconnected
SSH command execution error: Unknown: net.sf.expectit.ExpectIOException: Expect operation fails (timeout: 30000000 ms) for matcher: regexp('~.*\$')

If I am reading the debug mode output correctly, it appears the underlying ssh connection has terminated during the sleep command (based upon the "Got notified of net.schmizz.sshj.transport.TransportException: Broken transport; encountered EOF").

We do have keepalive enabled:

[sshj-scp] init SSHJDefaultConfig
[sshj-scp] init SSHClient
[sshj-scp] setting timeouts
[sshj-scp] getConnectTimeout timeout: 0
[sshj-scp] getTimeout timeout: 0
[sshj-scp] keepAliveInterval: 5
[sshj-scp] retry: false
[sshj-scp] retryCount: 3
[sshj-scp] adding loadKnownHosts
[sshj-scp] open connection

We keep having to revert back to Rundeck Community 4.7 version with sshj plugin version 1.2.  However, we are being pushed to upgrade to the latest version.  Any ideas on fixes for sshj plugin so that sudo and ssh connection keepalive work together again?

rac...@rundeck.com

unread,
Sep 13, 2023, 8:29:52 AM9/13/23
to rundeck-discuss
Hi, Russ,

The issue has been opened by you here, right? Is it the same issue?

Regards.

Russ Robinson

unread,
Sep 13, 2023, 9:08:43 AM9/13/23
to rundeck-discuss
Yes - That is the link for timeout after getting sudo to work again.

The fixes for actually getting "sudo su -" working again is listed in the thread: https://groups.google.com/g/rundeck-discuss/c/suFtVUp3JVY/m/mPEiT21TAgAJ .

rac...@rundeck.com

unread,
Sep 13, 2023, 11:08:14 AM9/13/23
to rundeck-discuss
Hi Russ,

We are checking the latest changes and we suspect a specific one (apparently fixed in the last version), could you try the latest plugin version? (0.1.9).

In a test environment remove the 0.1.8 jar file (and the 0.1.8 subdirectory on the libext/cache directory, put the 0.1.9 jar file in the libext directory, and test again.

Regards.

Russ Robinson

unread,
Sep 13, 2023, 1:37:30 PM9/13/23
to rundeck-discuss
After downloading the 0.1.9 jar; it has the same sudo (tty) issues.  After cloning the latest repo and making the same changes to the SudoCommand.java file to get sudo to work again; I get the same timeout issues after a long sleep command in the script.

Russ Robinson

unread,
Sep 13, 2023, 2:27:30 PM9/13/23
to rundeck-discuss
Correction - I meant adding into SSHJExec.java (not SudoCommand.java) the following "allocateDefaultPTY()" entry:

session = ssh.startSession();
session.allocateDefaultPTY();

Russ Robinson

unread,
Sep 13, 2023, 3:51:31 PM9/13/23
to rundeck-discuss
When comparing my same job in 4.7 versus the newer 4.16; I see in debug mode the following entries after the sleep command in v0.1.2 of sshj plugin:

Sleeping 3600
[net.schmizz.keepalive.KeepAliveRunner] Sending keep-alive since 5 seconds elapsed
[net.schmizz.keepalive.KeepAliveRunner] Received response from server to our keep-alive.
[net.schmizz.sshj.connection.ConnectionImpl] Making global request for `keep...@openssh.com`
[net.schmizz.keepalive.KeepAliveRunner] Sending keep-alive since 5 seconds elapsed
[net.schmizz.keepalive.KeepAliveRunner] Received response from server to our keep-alive.
[net.schmizz.sshj.connection.ConnectionImpl] Making global request for `keep...@openssh.com`
[net.schmizz.keepalive.KeepAliveRunner] Sending keep-alive since 5 seconds elapsed
[net.schmizz.keepalive.KeepAliveRunner] Received response from server to our keep-alive.
[net.schmizz.sshj.connection.ConnectionImpl] Making global request for `keep...@openssh.com`

However; the newer 4.16 release's 0.1.9 version of sshj plugin does not have these entries after the sleep command repeatedly.

Russ Robinson

unread,
Sep 13, 2023, 4:51:40 PM9/13/23
to rundeck-discuss
In comparing version 0.1.2 and 0.1.9; it looks like SSHJBase.java's connect had the following lines changed from:

        SSHJAuthentication authentication = new SSHJAuthentication(sshjConnection, pluginLogger);
        final DefaultConfig config = SSHJDefaultConfig.init().getConfig();
        config.setLoggerFactory(new SSHJPluginLoggerFactory(pluginLogger));
        config.setKeepAliveProvider(KeepAliveProvider.KEEP_ALIVE);

        pluginLogger.log(3, "["+getPluginName()+"] init SSHClient" );
        pluginLogger.log(3, "["+getPluginName()+"] setting timeouts" );

to just:


        SSHJAuthentication authentication = new SSHJAuthentication(sshjConnection, pluginLogger);

        pluginLogger.log(3, "["+getPluginName()+"] setting timeouts" );

Wouldn't it be the following line that is preventing the KeepAliveRunner from kicking off?

        config.setKeepAliveProvider(KeepAliveProvider.KEEP_ALIVE);

rac...@rundeck.com

unread,
Sep 13, 2023, 4:57:39 PM9/13/23
to rundeck-discuss
Hello, Russ. I updated the GitHub ticket; let's continue our investigation there.

I created a fresh project/config to recap all the details and understand the root cause.

Thanks!
Reply all
Reply to author
Forward
0 new messages