Windows Slave / LinuxMaster hang

50 views
Skip to first unread message

jer...@bodycad.com

unread,
Feb 27, 2017, 12:03:11 PM2/27/17
to Jenkins Users
Hi,

I still got some trouble with my Linux master and Windows slave that often hang.

I don't have anything into the jenkins.log to help
Feb 25, 2017 3:05:41 AM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Workspace clean-up
Feb 25, 2017 3:05:41 AM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Workspace clean-up. 29 ms
Feb 25, 2017 6:03:12 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in Bodycad cpp projects » CAD_CPP_ContinuousBuild. Triggering  #105
Feb 25, 2017 11:31:25 AM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Feb 25, 2017 11:31:30 AM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default


The dmesg -T on the server doesn't help much either but I got the following error after forcing restart Jenkins:
[Mon Feb 27 11:43:37 2017] CIFS VFS: Send error in Close = -512
Yes the server build info are locate on a CIFS share. I had some CFIS error before, I increased the VM memory and tweak the vm_dirty* parameters to get ride of those.

This often occur right after a bat command on the slave
bat returnStatus: false, script: "\"${bcad.msbuild_current}\" ${bcad.msbuild_solution_name} ${bcad.msbuild_default_arg} /t:Build"

The command seem to have successfully completed into the output log but then nothing happen and I cannot kill the build, the Web GUI is accessible but nothing can be done to un jam them, I have to restart Jenkins, which try to recover the build and fail since it a sequential sequence at at that point.

This doesn't show up when starting the build manually. Any way to debug this? It's really interfering with our build system and lock Porject and Slave.

Jerome

Dirk Heinrichs

unread,
Feb 28, 2017, 1:36:03 AM2/28/17
to jenkins...@googlegroups.com
Am 27.02.2017 um 18:03 schrieb jer...@bodycad.com:

Yes the server build info are locate on a CIFS share.

Sounds wrong to me. Did you try with a local directory?

Bye...

    Dirk
--
Dirk Heinrichs
Senior Systems Engineer, Delivery Pipeline
OpenTextTM Discovery | Recommind
Email: dirk.he...@recommind.com
Website: www.recommind.de

Recommind GmbH, Von-Liebig-Straße 1, 53359 Rheinbach

Vertretungsberechtigte Geschäftsführer John Marshall Doolittle, Gordon Davies, Roger Illing, Registergericht Amtsgericht Bonn, Registernummer HRB 10646

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet.

jer...@bodycad.com

unread,
Mar 1, 2017, 9:45:44 AM3/1/17
to Jenkins Users
I have print the thread dump if this may help.
I also check which file are open on the CIFS share folder:
lsof +D /var/lib/jenkins/jobs/
COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
java    52480 jenkins  546w   REG   0,37  2853233 2342225 /var/lib/jenkins/jobs/CAD_CPP/jobs/CAD_CPP_ContinuousBuild/builds/108/log
java    52480 jenkins  548w   REG   0,37       83 2342232 /var/lib/jenkins/jobs/CAD_CPP/jobs/CAD_CPP_ContinuousBuild/builds/108/3.log
java    52480 jenkins  549w   REG   0,37      354 2342353 /var/lib/jenkins/jobs/CAD_CPP/jobs/CAD_CPP_ContinuousBuild/builds/108/55.log
java    52480 jenkins  552r   REG   0,37  2476151 2342350 /var/lib/jenkins/jobs/CAD_CPP/jobs/CAD_CPP_ContinuousBuild/builds/108/54.log

I will try to simulate the same thing without network drive, but I don't have much space on this VM. It look like both master and slave wait for each other, some timing issue at work here when the bat with returnStatus: false command return. I doubt the network is still at fault here since the hang always happen when the bat command exit and is completed. The execution doesn't seem to continue the JenkinsFile flow like if the return of that command is not seen. This never happen when I start the build manually either. The polling with SCM changes seem to chnage something into the context.


Thread dump [Jenkins].pdf

jer...@bodycad.com

unread,
Mar 1, 2017, 9:53:53 AM3/1/17
to Jenkins Users
I wonder if this is related:
  1. https://issues.jenkins-ci.org/browse/JENKINS-28759
  2. https://issues.jenkins-ci.org/browse/JENKINS-33164
So I'm not the only one who have issue with the bat command hanging it seem. Wonder if it will ever get fixed?

jer...@bodycad.com

unread,
Apr 27, 2017, 9:44:16 AM4/27/17
to Jenkins Users
Reply all
Reply to author
Forward
0 new messages