Jenkins 2.60.1 LTS RC testing started

90 views
Skip to first unread message

Oliver Gondža

unread,
Jun 8, 2017, 4:55:25 AM6/8/17
to jenkin...@googlegroups.com
Hello everyone,

Latest LTS RC was made public and it is ready to be tested. Release is
scheduled for 2017-06-21.

Report your findings in this thread, on the test plan wiki page on via
https://www.surveymonkey.com/r/G6R7M3P.

Download bits from
http://mirrors.jenkins-ci.org/war-stable-rc/2.60.1/jenkins.war
Check community maintained LTS test plan
https://wiki.jenkins-ci.org/display/JENKINS/LTS+2.60.x+RC+Testing

Thanks
--
oliver

Mark Waite

unread,
Jun 9, 2017, 12:09:58 AM6/9/17
to jenkin...@googlegroups.com
Oliver,

I updated my Jenkins 2.46.3 LTS based docker image to 2.60.1 RC and updated to the latest released plugins to match it.  The following types of machines are now failing to connect as agents, even though they worked well with 2.46.3:

- Ubuntu 14.04 x64 (3 different machines, all reporting  SSH key for this host does not match the key required in the connection configuration)
- Ubuntu 16.04 x64 (2 different machines, all reporting the same as Ubuntu 14.04)
- Debian 8 x64 and x86 (2 different machines, all reporting the same as Ubuntu 14.04)
- Debian 7 x86 (1 machine, reporting the same as Ubuntu 14.04)
- CentOS 7 x64 (1 machine, reporting the same as Ubuntu 14.04)

Machines that successfully connect include:
- CentOS 6 x64 (2 machines)
- multiple windows machines using slave.jar to connect

I'll revert my docker image back to 2.46.3 and confirm that those machines are all able to connect.

Would you like a bug report for each failing configuration, a single bug report for them all, or something different?

Thanks,
Mark Waite


--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/5c542bd4-a576-7cfb-9350-f3fc19c18e3c%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark Waite

unread,
Jun 9, 2017, 12:41:49 AM6/9/17
to jenkin...@googlegroups.com
I confirmed that the 2.46.3 with latest plugins for that release is able to connect to all those machines.

Oliver Gondža

unread,
Jun 9, 2017, 2:43:44 AM6/9/17
to jenkin...@googlegroups.com
On 2017-06-09 06:41, Mark Waite wrote:
> I confirmed that the 2.46.3 with latest plugins for that release is able
> to connect to all those machines.

Can you point us to a way to reproduce this? It indeed seems related to
7dbcd02cf71eabf92e58522968230d564d9b99e5, the only backported change.


--
oliver

Mark Waite

unread,
Jun 9, 2017, 9:17:55 AM6/9/17
to jenkin...@googlegroups.com
I'll try to spend time on it today to create a duplication environment outside of my home network.

I don't expect it to be difficult to duplicate, but I can provide that environment.  It will be a good test for me to assure that I've understood the details that contribute to the problem.

I won't likely be able to start on it until after work today.  Earliest results probably won't be available until tomorrow morning.

Mark Waite

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Jesse Glick

unread,
Jun 9, 2017, 9:51:46 AM6/9/17
to Jenkins Dev
You can bisect by Jenkins version number; or just check if 2.60 & 2.61
work but 2.62 does not, implying that JENKINS-44120 is at fault.

And of course we would want to improve

https://github.com/jenkinsci/acceptance-test-harness/blob/786be14d7096e2d1464e6dd41e9e003a3e20906f/src/test/java/plugins/SshSlavesPluginTest.java#L75-L81

to reproduce the issue, perhaps by creating variants of

https://github.com/jenkinsci/docker-fixtures/tree/fe4f33921a290c352ad757a896bd570856ae062c/src/main/resources/org/jenkinsci/test/acceptance/docker/fixtures/SshdContainer

that use different key formats (or whatever the problem turns out to be).

Mark Waite

unread,
Jun 9, 2017, 10:01:33 AM6/9/17
to jenkin...@googlegroups.com
On Fri, Jun 9, 2017 at 7:51 AM Jesse Glick <jgl...@cloudbees.com> wrote:
You can bisect by Jenkins version number; or just check if 2.60 & 2.61
work but 2.62 does not, implying that JENKINS-44120 is at fault.


I can run different Jenkins versions in that environment much more easily than I can create an independent duplication environment.  I think what you're telling me is that if I can show that the weekly 2.60 fails, 2.61 fails, and 2.62 works, then that is useful data to isolate the problem?

If so, then I can get that information much sooner.

Can you confirm I've understood correctly?

Mark Waite
 
--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Jesse Glick

unread,
Jun 9, 2017, 10:05:24 AM6/9/17
to Jenkins Dev
On Fri, Jun 9, 2017 at 10:01 AM, Mark Waite <mark.ea...@gmail.com> wrote:
> if I can show that the weekly 2.60 fails, 2.61
> fails, and 2.62 works

The exact opposite. The change to Trilead was introduced in 2.62 in
weeklies before being backported to 2.60.1.

Mark Waite

unread,
Jun 9, 2017, 10:10:07 AM6/9/17
to jenkin...@googlegroups.com
OK, exact opposite is fine as well, since that means I need to test specific weeklies to show which weeklies show the problem in my environment, and which do not.

Mark Waite

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Mark Waite

unread,
Jun 9, 2017, 10:21:54 AM6/9/17
to jenkin...@googlegroups.com
On Fri, Jun 9, 2017 at 7:51 AM Jesse Glick <jgl...@cloudbees.com> wrote:
You can bisect by Jenkins version number; or just check if 2.60 & 2.61
work but 2.62 does not, implying that JENKINS-44120 is at fault.


I had 2.58 already defined in that docker instance, so I ran it.  It failed in what seemed to be the same way that 2.60.1-rc1 failed.

The Jenkins console log included the following output:

Jun 09, 2017 8:14:46 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (Thread-50/447) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.lang.NullPointerException
at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:447)
at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:790)
at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
at java.lang.Thread.run(Thread.java:745)

The message when the agent fails to start says:

[06/09/17 08:17:19] [SSH] Opening SSH connection to mark-pc1.markwaite.net:22.
[06/09/17 08:17:19] [SSH] WARNING: The SSH key for this host does not match the key required in the connection configuration. Connections will be denied until until the host key matches the configuration key.
Key exchange was not finished, connection is closed.
java.io.IOException: There was a problem while connecting to mark-pc1.markwaite.net:22
	at com.trilead.ssh2.Connection.connect(Connection.java:834)
	at com.trilead.ssh2.Connection.connect(Connection.java:703)
	at com.trilead.ssh2.Connection.connect(Connection.java:617)
	at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1265)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:790)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:785)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Key exchange was not finished, connection is closed.
	at com.trilead.ssh2.transport.KexManager.getOrWaitForConnectionInfo(KexManager.java:95)
	at com.trilead.ssh2.transport.TransportManager.getConnectionInfo(TransportManager.java:237)
	at com.trilead.ssh2.Connection.connect(Connection.java:786)
	... 9 more
Caused by: java.io.IOException: The server hostkey was not accepted by the verifier callback
	at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:548)
	at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:790)
	at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
	... 1 more
[06/09/17 08:17:19] Launch failed - cleaning up connection
[06/09/17 08:17:19] [SSH] Connection closed.


The message claims that the host key is not correct, yet that same host key is correct when used with the Jenkins 2.46.3 LTS build.

I'll check with 2.60, 2.61, and 2.62.

Mark Waite
 
--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Mark Waite

unread,
Jun 9, 2017, 10:34:48 AM6/9/17
to jenkin...@googlegroups.com
On Fri, Jun 9, 2017 at 8:21 AM Mark Waite <mark.ea...@gmail.com> wrote:
On Fri, Jun 9, 2017 at 7:51 AM Jesse Glick <jgl...@cloudbees.com> wrote:
You can bisect by Jenkins version number; or just check if 2.60 & 2.61
work but 2.62 does not, implying that JENKINS-44120 is at fault.


I had 2.58 already defined in that docker instance, so I ran it.  It failed in what seemed to be the same way that 2.60.1-rc1 failed.


2.60 and 2.61 fail in the same way. though I did not see the "thread died" entry in the console log.

I assume the next step is for me to bisect to find the weekly version where it first stopped working.

Do I need to be bisecting specific plugin versions as well?  If so, which plugins are most useful to investigate?

Mark Waite

Mark Waite

unread,
Jun 9, 2017, 10:58:41 AM6/9/17
to jenkin...@googlegroups.com
On Fri, Jun 9, 2017 at 8:34 AM Mark Waite <mark.ea...@gmail.com> wrote:
On Fri, Jun 9, 2017 at 8:21 AM Mark Waite <mark.ea...@gmail.com> wrote:
On Fri, Jun 9, 2017 at 7:51 AM Jesse Glick <jgl...@cloudbees.com> wrote:
You can bisect by Jenkins version number; or just check if 2.60 & 2.61
work but 2.62 does not, implying that JENKINS-44120 is at fault.


I had 2.58 already defined in that docker instance, so I ran it.  It failed in what seemed to be the same way that 2.60.1-rc1 failed.


2.60 and 2.61 fail in the same way. though I did not see the "thread died" entry in the console log.


2.62 shows same failure.  I'll continue bisecting to locate the first version that shows the failure.

Mark Waite 

Jesse Glick

unread,
Jun 9, 2017, 11:08:19 AM6/9/17
to Jenkins Dev
On Fri, Jun 9, 2017 at 10:21 AM, Mark Waite <mark.ea...@gmail.com> wrote:
> java.lang.NullPointerException
> at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:447)

So IIUC that is JENKINS-44120 which was supposedly fixed in 2.60.1 / 2.62+.

Mark Waite

unread,
Jun 9, 2017, 5:20:00 PM6/9/17
to jenkin...@googlegroups.com
Since I don't see that failure in anything other than 2.58, I assume it really was fixed in 2.60.1 and in 2.62+

Unfortunately, my ssh agents with manually provided host key verification will not connect, even with 2.64.  I bisected enough to confirm that 2.57 was the last version where the manually provided host key verification of slave agents worked for those machines.

I'm configuring a repro case on AWS now to see if I can duplicate the problem on a fresh configuration.

Mark Waite
 
--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Jesse Glick

unread,
Jun 9, 2017, 5:23:41 PM6/9/17
to Jenkins Dev
On Fri, Jun 9, 2017 at 5:19 PM, Mark Waite <mark.ea...@gmail.com> wrote:
> I bisected enough to confirm that 2.57
> was the last version where the manually provided host key verification of
> slave agents worked for those machines.

So definitely a problem in the Trilead update.

thomasmu...@yahoo.com

unread,
Jun 9, 2017, 8:05:15 PM6/9/17
to Jenkins Developers
I think your problem with manual verification is maybe fixed by https://github.com/jenkinsci/ssh-slaves-plugin/pull/54/files ?

thomasmu...@yahoo.com

unread,
Jun 9, 2017, 8:05:15 PM6/9/17
to Jenkins Developers

Mark Waite

unread,
Jun 9, 2017, 8:09:34 PM6/9/17
to jenkin...@googlegroups.com
On Fri, Jun 9, 2017 at 6:05 PM thomasmulhall410 via Jenkins Developers <jenkin...@googlegroups.com> wrote:
Hi your null pointer looks like it is fixed with https://github.com/jenkinsci/trilead-ssh2/commit/1d365242a839381a9b205c25f66f0055861c9d5c

This is fixed in Jenkins 2.62 I think.


I've submitted JENKINS-44803 with the detailed steps to show the problem.

Since I see the same problem with Jenkins core 2.64, I don't think there is a fix which can be backported to resolve the problem.  However, I'll need to double check Jenkins 2.64 with the clean environment that I used to prepare the steps in JENKINS-44803.  It definitely shows the failure with Jenkins 2.64 in my docker environment.

When I've done that check, I'll add that to the bug description in JENKINS-44803.

Thanks,
Mark Waite
 
--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

James Nord

unread,
Jun 12, 2017, 6:21:02 AM6/12/17
to Jenkins Developers
We also seem to have found some other issues around SSH in our internal test harnesses, we are currently investigating to see if they have the same cause

Jesse Glick

unread,
Jun 12, 2017, 10:07:36 AM6/12/17
to Jenkins Dev
IIUC this issue is resolved with the SSH Slaves 1.18 update?

James Nord

unread,
Jun 13, 2017, 10:49:43 AM6/13/17
to Jenkins Developers
Alas it did not but I think we found the underlying issue[1] which I hope can be back ported as it is trivial and likely to trip up a lot of users...

Currently we are re-running the tests to verify this was the cause of the test failure.

James Nord

unread,
Jun 14, 2017, 4:12:28 AM6/14/17
to Jenkins Developers
Just a follow-up the failing tests did pass when running with java 8 on the agents.
Reply all
Reply to author
Forward
0 new messages