[JIRA] [core] (JENKINS-19686) Workspace directory randomly deleted

6 views
Skip to first unread message

ajbarber71@gmail.com (JIRA)

unread,
Jun 2, 2016, 1:23:03 PM6/2/16
to jenkinsc...@googlegroups.com
Andrew Barber commented on Bug JENKINS-19686
 
Re: Workspace directory randomly deleted

I think there is a bug in the workspace cleanup code. We've been battling with disappearing workspaces and I could not figure out what was happening until I stumbled upon this thread. In our case, I moved a job from one slave to a different slave, but the cleanup code seemed to think it was ok to delete based on the old slave. The message in the log was
Deleting <dir> on <old slave name>
We haven't been using that old slave for this job for at least a few weeks. To make matters worse, it deleted the workspace WHILE the job was running on the new slave.
This appears to be the trouble code:

for (Node node : nodes) {

FilePath ws = node.getWorkspaceFor(item);

if (ws == null) { bq. continue; // offline, fine bq. }

boolean check;

try {

check = shouldBeDeleted(item, ws, node);

The first node that it comes across that returns shouldBeDeleted true causes the workspace deleted even if another node (later in the list) is the last builder of that job (meaning the job is still active). This tries to get caught in shouldBeDeleted()

Node lb = p.getLastBuiltOn();

LOGGER.log(Level.FINER, "Directory {0} is last built on {1}", new Object[] {dir, lb});
bq. if(lb!=null && lb.equals) {
bq. // this is the active workspace. keep it.
bq. LOGGER.log(Level.FINE, "Directory {0} is the last workspace for {1}", new Object[] {dir, p});

return false;

}

But since the for loop code takes action before checking all nodes, this check can be pointless.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

ajbarber71@gmail.com (JIRA)

unread,
Jun 2, 2016, 1:24:03 PM6/2/16
to jenkinsc...@googlegroups.com
Andrew Barber edited a comment on Bug JENKINS-19686
I think there is a bug in the workspace cleanup code. We've been battling with disappearing workspaces and I could not figure out what was happening until I stumbled upon this thread. In our case, I moved a job from one slave to a different slave, but the cleanup code seemed to think it was ok to delete based on the old slave. The message in the log was
Deleting <dir> on <old slave name>
We haven't been using that old slave for this job for at least a few weeks. To make matters worse, it deleted the workspace WHILE the job was running on the new slave.
This appears to be the trouble code:
bq.
{{
             for (Node node : nodes) {
bq.                  FilePath ws = node.getWorkspaceFor(item);
bq.                  if (ws == null) {
bq.                      continue; // offline, fine
bq.                  }
bq.                  boolean check;
bq.                  try {
bq.                      check = shouldBeDeleted(item, ws, node); }}

The first node that it comes across that returns shouldBeDeleted true causes the workspace deleted even if another node (later in the list) is the last builder of that job (meaning the job is still active). This tries to get caught in shouldBeDeleted()

bq.             Node lb = p.getLastBuiltOn();
bq.             LOGGER.log(Level.FINER, "Directory {0} is last built on {1}", new Object[] {dir, lb});
bq.             if(lb!=null && lb.equals(n)) {

bq.                 // this is the active workspace. keep it.
bq.                 LOGGER.log(Level.FINE, "Directory {0} is the last workspace for {1}", new Object[] {dir, p});
bq.                 return false;
bq.             }



But since the for loop code takes action before checking all nodes, this check can be pointless.

ajbarber71@gmail.com (JIRA)

unread,
Jun 2, 2016, 1:24:03 PM6/2/16
to jenkinsc...@googlegroups.com
Andrew Barber edited a comment on Bug JENKINS-19686
I think there is a bug in the workspace cleanup code. We've been battling with disappearing workspaces and I could not figure out what was happening until I stumbled upon this thread. In our case, I moved a job from one slave to a different slave, but the cleanup code seemed to think it was ok to delete based on the old slave. The message in the log was
Deleting <dir> on <old slave name>
We haven't been using that old slave for this job for at least a few weeks. To make matters worse, it deleted the workspace WHILE the job was running on the new slave.
This appears to be the trouble code:

{{
             for (Node node : nodes) {
                 FilePath ws = node.getWorkspaceFor(item);
                 if (ws == null) {
                     continue; // offline, fine
                 }
                 boolean check;
                 try {

                     check = shouldBeDeleted(item, ws, node);
}}

The first node that it comes across that returns shouldBeDeleted true causes the workspace deleted even if another node (later in the list) is the last builder of that job (meaning the job is still active). This tries to get caught in shouldBeDeleted()

bq.             Node lb = p.getLastBuiltOn();
bq.             LOGGER.log(Level.FINER, "Directory {0} is last built on {1}", new Object[] {dir, lb});
bq.             if(lb!=null && lb.equals(n)) {
bq.                 // this is the active workspace. keep it.
bq.                 LOGGER.log(Level.FINE, "Directory {0} is the last workspace for {1}", new Object[] {dir, p});
bq.                 return false;
bq.             }


But since the for loop code takes action before checking all nodes, this check can be pointless.

ajbarber71@gmail.com (JIRA)

unread,
Jun 2, 2016, 1:25:02 PM6/2/16
to jenkinsc...@googlegroups.com

ajbarber71@gmail.com (JIRA)

unread,
Jun 2, 2016, 1:26:02 PM6/2/16
to jenkinsc...@googlegroups.com
Andrew Barber edited a comment on Bug JENKINS-19686
I think there is a bug in the workspace cleanup code. We've been battling with disappearing workspaces and I could not figure out what was happening until I stumbled upon this thread. In our case, I moved a job from one slave to a different slave, but the cleanup code seemed to think it was ok to delete based on the old slave. The message in the log was
Deleting <dir> on <old slave name>
We haven't been using that old slave for this job for at least a few weeks. To make matters worse, it deleted the workspace WHILE the job was running on the new slave.
This appears to be the trouble code:



{ { code:java}
             for (Node node : nodes) {
                 FilePath ws = node.getWorkspaceFor(item);
                 if (ws == null) {
                     continue; // offline, fine
                 }
                 boolean check;
                 try {
                     check = shouldBeDeleted(item, ws, node);
{code } }



The first node that it comes across that returns shouldBeDeleted true causes the workspace deleted even if another node (later in the list) is the last builder of that job (meaning the job is still active). This tries to get caught in shouldBeDeleted()



{ { code:java}
             Node lb = p.getLastBuiltOn();

             LOGGER.log(Level.FINER, "Directory {0} is last built on {1}", new Object[] {dir, lb});
             if(lb!=null && lb.equals(n)) {

                 // this is the active workspace. keep it.
                 LOGGER.log(Level.FINE, "Directory {0} is the last workspace for {1}", new Object[] {dir, p});
                 return false;
             }
{code } }



But since the for loop code takes action before checking all nodes, this check can be pointless.

ajbarber71@gmail.com (JIRA)

unread,
Jun 2, 2016, 3:25:02 PM6/2/16
to jenkinsc...@googlegroups.com

A follow up on my case. I just realized that what is different for me is that I share a slave root across slaves. This is ok for me, because jobs are only tied to a single node. But by moving the job off a slave to another, it opened up the workspace for reaping (in the context of the old slave) which I don't want. You could argue user error, but then there is no checking in jenkins to ensure slaves have different roots (which implies that it must be allowed to share roots). The cleanup thread could be enhanced to check with all slaves that share a root to make sure none of them own the active workspace.

m_winter@gmx.de (JIRA)

unread,
Dec 11, 2016, 5:25:02 PM12/11/16
to jenkinsc...@googlegroups.com

How should Jenkins check if slaves share the same slave root? It the slaves connect to the same machine that is possible (but why do you have then different slaves at all?), but what if several different machines have the same NFS directory mounted? It will be quite hard to reliably find out if a slave root is shared.

This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

neuralsandwich@gmail.com (JIRA)

unread,
Feb 21, 2017, 11:24:02 AM2/21/17
to jenkinsc...@googlegroups.com

Currently experiencing this as well. I have many servers connected and workspaces just disappearing.

I am seeing a lot of ConcurrentModifcationExpceptions as well.

junk@jdark.com (JIRA)

unread,
Mar 21, 2018, 2:29:02 PM3/21/18
to jenkinsc...@googlegroups.com

I, too, have been seeing this happen recently.  It's happening on slave jobs (they are restricted to which slave they run on and this configuration never changes) but also on jobs utilizing Publish over SSH plugin for copying files to/from other machines and executing commands via SSH Publishers.

I believe it's related to the Workspace clean-up process.  I'm now seeing other bugs related to this (https://issues.jenkins-ci.org/browse/JENKINS-27329 for example). But we run under tomcat - I'm not sure how to affect a change in this troubling behavior.  The workarounds mentioned there don't look like something I can use.  If anyone has guidance, I'm all ears.

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

dheerajgundavaram@gmail.com (JIRA)

unread,
Jun 11, 2018, 8:08:02 AM6/11/18
to jenkinsc...@googlegroups.com

I am also facing similar issue. I've configured my builds and deployments on both Master and Slave.

Which ever jobs running on Slave are facing similar issue. My SourceCode repository is TFS. Example: when the build is triggered source code is downloaded and build completed and when the package is about to be checked-in the entire workspace is gone, not just for that job or project, but Slave is cleaning up all the workspace on the Slave directory.

Since i've to run all the jobs on Master, now i'm facing Space issues too. Any immediate help would be much appreciated.

georgesimad@yahoo.com (JIRA)

unread,
Oct 1, 2019, 9:39:03 AM10/1/19
to jenkinsc...@googlegroups.com
georges imad updated an issue
 
Jenkins / Bug JENKINS-19686
Workspace directory randomly deleted
Change By: georges imad
Attachment: Jenkins-FolderDeletion.png
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

georgesimad@yahoo.com (JIRA)

unread,
Oct 1, 2019, 9:40:03 AM10/1/19
to jenkinsc...@googlegroups.com

I am facing the same issue. The event viewer details shows the below:

Reply all
Reply to author
Forward
0 new messages