Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail!
Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail!
h1. Investigation
h2. Steps
I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail!
*Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now.
*Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail!
*Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail!
The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail!
|
|
|