Bug in IO.delete()

91 views
Skip to first unread message

Bruce Mitchener

unread,
May 9, 2012, 5:42:45 AM5/9/12
to simple-b...@googlegroups.com
This issue caused data loss and loss of actual work today, so it is fairly annoying.

When IO.delete() is called with a directory, it recurses over all contents and deletes them as well.  Unfortunately, if you have a symlink in the directory tree that is getting deleted and that symlink points to something valuable, that valuable data will also be deleted.

This happened to me because I have a dist directory that I prepare and run an application from. In that directory is a copy of some web content and instead of editing and re-disting and re-starting the application, I just wanted to symlink to the source files and edit to save me time and effort.  That seems like a pretty reasonable thing to do.  So, my directory structure was:

  - dist
    - bin
    - static-content
    - other stuff

So, static-content was a symlink to something in my source tree and I was making a lot of changes. When IO.delete happened to nuke the dist directory to do a new build, it iterates through the files in dist, deleting each of them. This results in IO.delete being called on static-content. It then iterates over the contents of that deleting that as well.

It would be much better if IO.delete didn't traverse a symlink to another directory, but just deleted the symlink.

 - Bruce

Jason Zaugg

unread,
May 9, 2012, 8:01:12 AM5/9/12
to simple-b...@googlegroups.com
It's a bit tricky pre-Java7 to determine if a file is a symlink, here are a few techniques:


-jason 

Paul Phillips

unread,
May 9, 2012, 11:22:24 AM5/9/12
to simple-b...@googlegroups.com
On Wed, May 9, 2012 at 2:42 AM, Bruce Mitchener
<bruce.m...@gmail.com> wrote:
> When IO.delete() is called with a directory, it recurses over all contents
> and deletes them as well.  Unfortunately, if you have a symlink in the
> directory tree that is getting deleted and that symlink points to something
> valuable, that valuable data will also be deleted.

That's the most terrifying thing I've heard in a while. Will it
RECURSIVELY delete a symlink which points to a directory?

> It's a bit tricky pre-Java7 to determine if a file is a symlink, here are a few techniques:

You don't have to identify it as a symlink. You only have to call
"file.delete()", which will succeed on a symlink and fail on a
(non-empty) directory.

Mark Harrah

unread,
May 10, 2012, 9:48:33 PM5/10/12
to simple-b...@googlegroups.com
The version actually in commons-io always returns false on windows. It isn't clear why, since symbolic links exist on windows (in several forms, apparently). There is this bug, which seems to imply the getCanonicalFile hack doesn't work for windows symlinks:

https://issues.apache.org/jira/browse/IO-295

> http://blog.headius.com/2007/09/java-native-access-jruby-true-posix.html

I'm not especially inclined to resort to native hacks, but this almost seems like it would end up being more reliable. I don't think I would personally put symlinks in a directory that gets cleaned if IO.delete uses the getCanonicalFile hack. I think I'd put a comment in IO.delete explicitly warning against putting symlinks in and add the hack to try to prevent surprises as much as possible. (Of course, that has its drawbacks as well, like relying on that behavior.)

-Mark

> -jason
>

Bruce Mitchener

unread,
May 10, 2012, 9:51:02 PM5/10/12
to simple-b...@googlegroups.com
For the record, I had this happen on Mac OS X and a coworker on Linux, just in case someone thinks this only happens on Windows.

 - Bruce

Mark Harrah

unread,
May 10, 2012, 10:03:28 PM5/10/12
to simple-b...@googlegroups.com
Right. IO.delete has no symlink detection at all so it would happen on all systems. Jason pointed out there is no straightforward way to check if a file is a symlink in Java. He linked to two possible workarounds we could use and my comment was that the pure Java implementation from Apache Commons IO doesn't work on Windows. So, we'd want to verify that and possibly look into the native option.

(Also, please file a bug report to track this. Thanks!)

-Mark

> - Bruce
>

Paul Phillips

unread,
May 10, 2012, 11:13:01 PM5/10/12
to simple-b...@googlegroups.com


On Thu, May 10, 2012 at 6:48 PM, Mark Harrah <dmha...@gmail.com> wrote:
I'm not especially inclined to resort to native hacks, but this almost seems like it would end up being more reliable.  I don't think I would personally put symlinks in a directory that gets cleaned if IO.delete uses the getCanonicalFile hack.  I think I'd put a comment in IO.delete explicitly warning against putting symlinks in and add the hack to try to prevent surprises as much as possible.  (Of course, that has its drawbacks as well, like relying on that behavior.)

It is completely unhinged that you would even consider documenting and leaving this behavior. What am I missing that has you guys still talking about detecting symlinks? Isn't the task in question recursively deleting a directory without blowing away misc other things which happen to be reachable via symlink? You have to go out of your way to do that!

Do you not see that this code is taking extra measures to be this dangerous? All you have to do is "file.delete" and the symlink will be deleted.  The symlink, and not everything the symlink points to.  By testing ".isDirectory" and then recursing on listFiles, you are singlehandedly creating this incredibly dangerous behavior.  Why do you feel you need to recognize symlinks?

/** Deletes `file`, recursively if it is a directory. */
def delete(file: File)
{
translate("Error deleting file " + file + ": ")
{
if(file.isDirectory)
{
delete(listFiles(file))
file.delete
}
else if(file.exists)
file.delete
}
}


Mark Harrah

unread,
May 11, 2012, 7:55:28 AM5/11/12
to simple-b...@googlegroups.com
On Thu, 10 May 2012 20:13:01 -0700
Paul Phillips <pa...@improving.org> wrote:

> On Thu, May 10, 2012 at 6:48 PM, Mark Harrah <dmha...@gmail.com> wrote:
>
> > I'm not especially inclined to resort to native hacks, but this almost
> > seems like it would end up being more reliable. I don't think I would
> > personally put symlinks in a directory that gets cleaned if IO.delete uses
> > the getCanonicalFile hack. I think I'd put a comment in IO.delete
> > explicitly warning against putting symlinks in and add the hack to try to
> > prevent surprises as much as possible. (Of course, that has its drawbacks
> > as well, like relying on that behavior.)
> >
>
> It is completely unhinged that you would even consider documenting and
> leaving this behavior. What am I missing that has you guys still talking
> about detecting symlinks? Isn't the task in question recursively deleting a
> directory without blowing away misc other things which happen to be
> reachable via symlink? You have to go out of your way to do that!
>
> Do you not see that this code is taking extra measures to be this
> dangerous? All you have to do is "file.delete" and the symlink will be
> deleted. The symlink, and not everything the symlink points to. By
> testing ".isDirectory" and then recursing on listFiles, you are
> singlehandedly creating this incredibly dangerous behavior. Why do you
> feel you need to recognize symlinks?

It isn't clear to me what you are describing as dangerous because the compiler's io.Path, which has you as the author, does the same isDirectory check and recurses.

If you are talking about making the code look like this:

def delete(file: File)
{
val deleted = file.delete()
if(!deleted && file.isDirectory)
{
delete(listFiles(file))
file.delete
}
}

then that appears to work as expected. I don't understand why commons-io does the symlink check instead of this way or if it is reliable.

-Mark

Paul Phillips

unread,
May 11, 2012, 9:45:11 AM5/11/12
to simple-b...@googlegroups.com


On Fri, May 11, 2012 at 4:55 AM, Mark Harrah <dmha...@gmail.com> wrote:
It isn't clear to me what you are describing as dangerous because the compiler's io.Path, which has you as the author, does the same isDirectory check and recurses.

I'm thinking we should assess the danger more based on what the code does than on whether I've written similar code.  If it does do that then it's a bug and I'll fix it.   Still, let's realize those are compiler internals and it never runs it on any directory it didn't create itself, so there isn't much opportunity to inject symlinks.  IO.delete() is in the sbt public API.

Eugene Vigdorchik

unread,
May 11, 2012, 10:09:49 AM5/11/12
to simple-b...@googlegroups.com
File.isDirectory appears to follow symlinks and returns true if the
pointed file is directory, so the code above recurses. Modifying it
to:

if(!deleted && file.isDirectory && file.getAbsolutePath ==
file.getCanonicalPath)
...

conservatively checks for symlinks, but is of course slower.

Eugene.
>
> -Mark
>
>> /** Deletes `file`, recursively if it is a directory. */
>> def delete(file: File)
>> {
>> translate("Error deleting file " + file + ": ")
>> {
>> if(file.isDirectory)
>> {
>> delete(listFiles(file))
>> file.delete
>> }
>> else if(file.exists)
>> file.delete
>> }
>> }
>>
>
> --
> You received this message because you are subscribed to the Google Groups "simple-build-tool" group.
> To post to this group, send email to simple-b...@googlegroups.com.
> To unsubscribe from this group, send email to simple-build-t...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/simple-build-tool?hl=en.
>

Paul Phillips

unread,
May 11, 2012, 10:15:10 AM5/11/12
to simple-b...@googlegroups.com


On Fri, May 11, 2012 at 7:09 AM, Eugene Vigdorchik <eugene.v...@gmail.com> wrote:
File.isDirectory appears to follow symlinks and returns true if the
pointed file is directory, so the code above recurses.

The symlink is already gone, so it doesn't recurse.

val deleted = file.delete()

Eugene Vigdorchik

unread,
May 11, 2012, 10:18:07 AM5/11/12
to simple-b...@googlegroups.com
True. The symlink is treated as a normal file when deleting. Thank you
for enlightening me!

Eugene.
>
> val deleted = file.delete()

Mark Harrah

unread,
May 11, 2012, 11:38:00 AM5/11/12
to simple-b...@googlegroups.com
On Fri, 11 May 2012 06:45:11 -0700
Paul Phillips <pa...@improving.org> wrote:

> On Fri, May 11, 2012 at 4:55 AM, Mark Harrah <dmha...@gmail.com> wrote:
>
> > It isn't clear to me what you are describing as dangerous because the
> > compiler's io.Path, which has you as the author, does the same isDirectory
> > check and recurses.
> >
>
> I'm thinking we should assess the danger more based on what the code does
> than on whether I've written similar code.

That was not the intention. When you say a way of doing something is incredibly dangerous and I look at your code to see how you have done it and it does the same thing, I say "I'm not clear what it is you are saying is dangerous" because I must be missing something. That is all.

> If it does do that then it's a
> bug and I'll fix it.

What you say about File.delete() appears to be empirically true. However, this behavior isn't documented in the delete() API and I have seen no code that does recursive deletion this way. It appears you either have an opportunity to enlighten the whole Java world or there is a reason it isn't used.

1. Guava removed deleteRecursively[1], claiming symlink detection is buggy and shelling out is the only correct way. They use the canonical file hack but don't properly implement it, so that isn't too surprising.

2. commons-io uses the canonical file hack[2], but this approach is claimed to be buggy[3] and doesn't work on windows.

3. ant does basically the same canonical file hack for the delete task[4,5]. See also all of the code for deleting a symbolic link in [5] that involves renaming it to make it a broken link first and then deletes it.

4. mvn: original bug[6]. The current implementation [7] uses the canonical file hack.

5. gradle uses commons-io [8]

6. the top X hits (I stopped after ten or so) on google for "java delete recursively" either don't account for symlinks, use Java 7, or use a variation of the canonical path hack.

No one uses your technique. You might wish to hide in your bomb shelter. Your way appears to be the right way to do it (it handles deleting symlinks to plain files as well), but you can understand if I would like to know why no one does it that way.

> Still, let's realize those are compiler internals
> and it never runs it on any directory it didn't create itself, so there
> isn't much opportunity to inject symlinks. IO.delete() is in the sbt
> public API.

It is in the public API of the compiler[9], which gets used outside of the compiler[10].

-Mark

[1] http://docs.guava-libraries.googlecode.com/git-history/v10.0.1/javadoc/com/google/common/io/Files.html#deleteRecursively%28java.io.File%29
[2] http://svn.apache.org/viewvc/commons/proper/io/trunk/src/java/org/apache/commons/io/FileUtils.java?view=markup
[3] http://stackoverflow.com/a/5993886
[4] http://svn.apache.org/viewvc/ant/core/trunk/src/main/org/apache/tools/ant/taskdefs/Delete.java?view=markup
[5] http://svn.apache.org/viewvc/ant/core/trunk/src/main/org/apache/tools/ant/util/SymbolicLinkUtils.java?view=markup
[6] http://jira.codehaus.org/browse/MCLEAN-10
[7] http://svn.apache.org/viewvc/maven/plugins/tags/maven-clean-plugin-2.4.1/src/main/java/org/apache/maven/plugin/clean/Cleaner.java?revision=942400&view=markup
[8] https://github.com/gradle/gradle/blob/master/subprojects/core/src/main/groovy/org/gradle/util/GFileUtils.java#L90
[9] http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/compiler/index.html#scala.tools.nsc.io.AbstractFile
[10] http://stackoverflow.com/questions/5102396/what-is-the-purpose-of-the-scala-tools-nsc-package

Paul Phillips

unread,
May 11, 2012, 11:54:44 AM5/11/12
to simple-b...@googlegroups.com
On Fri, May 11, 2012 at 8:38 AM, Mark Harrah <dmha...@gmail.com> wrote:
> That was not the intention.  When you say a way of doing something is incredibly dangerous and I look at your code to see how you have done it and it does the same thing, I say "I'm not clear what it is you are saying is dangerous" because I must be missing something.  That is all.

I'm totally capable of writing bad code, especially code hastily
written years ago. Like I said, if my code does that it's a bug and I
will fix it. I use strong language here not because of the bug or
because I am incapable of creating similar bugs but because you seem
to be including "intentionally leave things such that one can
accidentally delete their entire home directory" as a leading
contender for the resolution of this bug.

> No one uses your technique.  You might wish to hide in your bomb shelter.

I will assume the second sentence relates to the first in some fashion
which eludes me...

> Your way appears to be the right way to do it (it handles deleting symlinks to plain files as well), but you can understand if I would like to know why no one does it that way.

I am totally used to being the only person in the world who knows the
right way to do something. It's like a daily occurrence with me. I
can understand how it's weird and frightening for you though. No, in
seriousness, I too am very interested in why it's not regularly done
that way, because - among other reasons - I apparently have to fix the
same bug.

> It is in the public API of the compiler[9]

I dispute that this is a "public API" the way this is normally meant.
I don't consider "scaladoc is generated" to mean "this is API."
Certainly I've never endorsed any of it as API. If someone else is
doing that on my behalf then I am sorry you were misled. But we need
not quibble over these semantics, let's just find the right way to
delete things.

Olek Swirski

unread,
May 11, 2012, 12:08:13 PM5/11/12
to simple-b...@googlegroups.com
To add to the list, I noticed the same when removing some war or context
of a webapp in Jetty. Then not only the webapp folder gets deleted but
also all the files in a folder pointed to by a symlink. Do I understand
correctly, that under Java7 this should happen no more?

Mark Harrah

unread,
May 11, 2012, 1:05:14 PM5/11/12
to simple-b...@googlegroups.com
On Fri, 11 May 2012 08:54:44 -0700
Paul Phillips <pa...@improving.org> wrote:

> On Fri, May 11, 2012 at 8:38 AM, Mark Harrah <dmha...@gmail.com> wrote:
> > That was not the intention.  When you say a way of doing something is incredibly dangerous and I look at your code to see how you have done it and it does the same thing, I say "I'm not clear what it is you are saying is dangerous" because I must be missing something.  That is all.
>
> I'm totally capable of writing bad code, especially code hastily
> written years ago. Like I said, if my code does that it's a bug and I
> will fix it. I use strong language here not because of the bug or
> because I am incapable of creating similar bugs but because you seem
> to be including "intentionally leave things such that one can
> accidentally delete their entire home directory" as a leading
> contender for the resolution of this bug.

The proposed fix before I understood your proposed fix was to pick one of Jason's proposed fixes. I suggested still adding a warning to the IO.delete API about handling symlinks being a best effort because at least the first (canonical path hack) looked unreliable. It wasn't suggested that only the warning would suffice.

> > Your way appears to be the right way to do it (it handles deleting symlinks to plain files as well), but you can understand if I would like to know why no one does it that way.
>
> I am totally used to being the only person in the world who knows the
> right way to do something. It's like a daily occurrence with me. I
> can understand how it's weird and frightening for you though. No, in
> seriousness, I too am very interested in why it's not regularly done
> that way, because - among other reasons - I apparently have to fix the
> same bug.

I guess the best way is to go comment on some of these bugs. I'll start with the commons-io bug since that appears to be open and active.

-Mark

Eugene Vigdorchik

unread,
May 11, 2012, 2:01:27 PM5/11/12
to simple-b...@googlegroups.com
> --
> You received this message because you are subscribed to the Google Groups "simple-build-tool" group.
> To post to this group, send email to simple-b...@googlegroups.com.
> To unsubscribe from this group, send email to simple-build-t...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/simple-build-tool?hl=en.
>

From what I understand, Java doesn't have any special treatment for
symlinks and just delegates to the underlying OS.
The POSIX standard demands that 'unlink' method that is triggered as a
result of File.delete() deletes the symlink itself and doesn't touch
the linked contents.
For documentation see 'man 2 unlink'

Eugene.

Mark Harrah

unread,
May 11, 2012, 9:45:31 PM5/11/12
to simple-b...@googlegroups.com
It makes sense that delete() is basically unlink(), but is this documented somewhere? Regardless, I think we'll go with this.

-Mark

> Eugene.
>

Mark Harrah

unread,
May 11, 2012, 9:47:12 PM5/11/12
to simple-b...@googlegroups.com
On Fri, 11 May 2012 18:08:13 +0200
Olek Swirski <oleks...@gmail.com> wrote:

> To add to the list, I noticed the same when removing some war or context
> of a webapp in Jetty. Then not only the webapp folder gets deleted but
> also all the files in a folder pointed to by a symlink. Do I understand
> correctly, that under Java7 this should happen no more?

I'm not sure if you are asking about Jetty only or an sbt plugin with Jetty, but Java 7 allows one to check if a file is a symlink. It does not automatically fix broken directory deletion code. Paul's suggested fix works on version before Java 7 and is probably the right way to do it even on Java 7.

-Mark

Olek Swirski

unread,
May 11, 2012, 10:15:55 PM5/11/12
to simple-b...@googlegroups.com
Hi Mark,

this has happened with standalone Jetty running on Java 6, and in
jetty-dir/etc/webdefault.xml I set up:
<init-param>
<param-name>aliases</param-name>
<param-value>true</param-value>
</init-param>
specifically to have some external resources pointed to only by the
symlink, to avoid them being deleted on webapp reload or removal - dirs
containing uploaded images to be precise. Sadly this didn't work as
expected and anything under symlink still got removed (I was careful so
didn't loose data). I imagine same thing happens when using jetty plugin
in sbt, but probably not many people use aliases with jetty sbt plugin,
because it is usually a short-lived instance mostly for testing.
Reply all
Reply to author
Forward
0 new messages