os.RemoveAll (path.go): open file handles while in deep tree

976 views
Skip to first unread message

Steve Powell

unread,
May 15, 2014, 12:41:43 PM5/15/14
to golan...@googlegroups.com
While looking for a bug in a file/directory copy function we wrote, we noticed the implementation of os.RemoveAll

func RemoveAll(path string) error {
...
// Directory.
fd, err := Open(path)
if err != nil {
return err
}
// Remove contents & return first error.
err = nil
for {
names, err1 := fd.Readdirnames(100)
for _, name := range names {
err1 := RemoveAll(path + string(PathSeparator) + name)
if err == nil {
err = err1
}
}
if err1 == io.EOF {
break
}
// If Readdirnames returned an error, use it.
if err == nil {
err = err1
}
if len(names) == 0 {
break
}
}
// Close directory, because windows won't remove opened directory.
fd.Close()

// Remove directory.
err1 := Remove(path)
if err == nil {
err = err1
}
return err
}
 

(my highlighting).  We can see that the file fd is not closed until after we recurse to the bottom of the directory tree structure and back.
We think this might entail a lot of files (dirs) being kept open for a long time, and (if there is a panic) potentially leak file handles until the end of the process.

We suggest a better implementation would be something like:
// Directory.
// Remove contents & return first error.
err = nil
for {
names, err1 := readNamesInDir(path, 100)
if err1 != nil {
if err == nil {
return err1
}
return err
}
if len(names) == 0 {
break
}
for _, name := range names {
err1 := RemoveAll(path + string(PathSeparator) + name)
if err == nil {
err = err1
}
}
}
... 
where 
func readNamesInDir(path string, limit uint) ([]string, error) {
fd, err := Open(path)
defer fd.Close()

if err != nil {
return nil, err
}
names, err := fd.Readdirnames(limit)
if err == io.EOF {
return nil, nil
}

}

Which reads up to the first 100 files and deals with them, without retaining the open file handle for the directory.

Any comments? Can we make this simpler and cleaner (the error handling is a bit naff)?

Brad Fitzpatrick

unread,
May 15, 2014, 2:45:09 PM5/15/14
to Steve Powell, golang-nuts
If there's actually a panic (I don't see how), the garbage collector will close any unclosed files/dirs.

This only keeps one filehandle open per depth of the directory hierarchy. I think it's fine. It's possible we could close it more aggressively if we reach the end (a common case? less than 100 entries in a dir?) but it doesn't seem urgent.

The problem with slurping them all is that a directory might have millions of files, and then we'd need O(n) memory. The current behavior is only O(n) filehandles by depth, not items in a directory.  I think big directories are more common than deep directories.

Also filehandles are pretty cheap.




--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

andrey mirtchovski

unread,
May 16, 2014, 2:28:35 AM5/16/14
to Steve Powell, golang-nuts
there are other issues with a recursive walker function, at least on
OSX where the max path length is 1024 bytes but one is able to create
arbitrarily deep directory structures. some tools know how to handle
this (find), others do not (du).

i just pushed an iterative walker that i had done a couple of years
ago for a project. it doesn't use recursion and is capable of finding
even the deepest buried files on OSX.

go get github.com/mirtchovski/walk (a tester binary is in
github.com/mirtchovski/walk/walk)

here's a comparison between filepath.Walk and the iterative one. each
"A{+}" is the letter A repeated 255 times:

% x=`perl -e "{print 'A'x255;}"`
% mkdir $x && cd $x # repeat several times
...
% filepathwalk /Users/me/AA*
/Users/me/A{+}
/Users/me/A{+}/A{+}
/Users/me/A{+}/A{+}/A{+}
/Users/me/A{+}/A{+}/A{+}/A{+}: lstat /Users/me/A{+}/A{+}/A{+}/A{+}:
file name too long
% walk /Users/me/A{+}*
/Users/me/A{+}
/Users/me/A{+}/A{+}
/Users/me/A{+}/A{+}/A{+}
/Users/me/A{+}/A{+}/A{+}/A{+}
/Users/me/A{+}/A{+}/A{+}/A{+}/A{+}
/Users/me/A{+}/A{+}/A{+}/A{+}/A{+}/main.go
%

I don't think this code is worthy of replacing filepath.Walk (after
all nobody complained about this issue in 4 years!) but it's good to
have it for special cases.

andrey mirtchovski

unread,
May 16, 2014, 2:33:15 AM5/16/14
to Steve Powell, golang-nuts
err, a word of warning: rm -rf also doesn't know how to descend all
the way to the bottom so you'll have to delete those directories with
find (Finder is hopeless too)

Nick Craig-Wood

unread,
May 16, 2014, 3:28:13 AM5/16/14
to andrey mirtchovski, Steve Powell, golang-nuts
On 16/05/14 07:28, andrey mirtchovski wrote:
> there are other issues with a recursive walker function, at least on
> OSX where the max path length is 1024 bytes but one is able to create
> arbitrarily deep directory structures. some tools know how to handle
> this (find), others do not (du).
>
> i just pushed an iterative walker that i had done a couple of years
> ago for a project. it doesn't use recursion and is capable of finding
> even the deepest buried files on OSX.
>
> go get github.com/mirtchovski/walk (a tester binary is in
> github.com/mirtchovski/walk/walk)

Interesting. I remember having this problem on linux 2.0 (or maybe 2.2
I don't remember).

Calling Chdir in a threaded program (which ultimately go programs are)
is a recipe for disaster though since there is only one CWD for all your
go program.

You could probably avoid this by using syscall.Openat() and rather than
Chdir into each directory, just keep its file handle open while you are
descending the tree.

Actually having said all that, I'm not sure OS X supports openat :-(

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Jharrod LaFon

unread,
May 16, 2014, 9:05:45 AM5/16/14
to golan...@googlegroups.com, andrey mirtchovski, Steve Powell
I ended up tackling the problem of deleting/copying/moving very large directories (that could also be deep) on large file systems.
We ended up writing a system to do this using a small compute cluster, with the core functionality being in a C library (http://hpc.github.io/libcircle/). 
The program calls a callback function once for each item in the tree, just like the ftw system call, except in parallel using OpenMPI.

I was excited to find that someone has since made a Go interface to the original library: https://github.com/losalamos/circle
I haven't tried it out, and it's definitely overkill for a single disk, but I thought someone taking the time to read this might find it interesting.

Steve Powell

unread,
Jun 20, 2014, 5:23:49 AM6/20/14
to golan...@googlegroups.com, spo...@gopivotal.com
Some comments inline.


On Thursday, 15 May 2014 19:45:09 UTC+1, bradfitz wrote:
If there's actually a panic (I don't see how), the garbage collector will close any unclosed files/dirs.

The garbage collector performing 'closes' on file handles is rather naff, IMO. The Java community long ago deprecated the 'finalizer' concept (which this sounds like a special case of), as it is unreliable, unpredictable and unenforceable. It also executes in an unpredictable context.  [Incidentally, how would I tell Go *not* to do this -- I could pass a f-h to a C program, for example, and then lose the reference to the handle in the Go code. Ugh.]

Also filehandles are pretty cheap.
They are a limited system-wide resource. We need to use them sparingly.

I'll try to derive another solution to this problem that might be better and uses as few file handles as possible. I don't think it is hard.

Ian Lance Taylor

unread,
Jun 20, 2014, 9:16:26 AM6/20/14
to Steve Powell, golang-nuts
On Fri, Jun 20, 2014 at 2:23 AM, Steve Powell <spo...@gopivotal.com> wrote:
>
> The garbage collector performing 'closes' on file handles is rather naff,
> IMO. The Java community long ago deprecated the 'finalizer' concept (which
> this sounds like a special case of), as it is unreliable, unpredictable and
> unenforceable. It also executes in an unpredictable context.

Unreferenced os.File objects are indeed closed using finalizers (see
runtime.SetFinalizer). They have the drawbacks you describe, except
for the context one: the context is always a goroutine that exists
only to run finalizers.

> [Incidentally,
> how would I tell Go *not* to do this -- I could pass a f-h to a C program,
> for example, and then lose the reference to the handle in the Go code. Ugh.]

You can safely pass a file descriptor to C, but if you pass any
pointer at all to C then you must be sure to keep a copy of the
pointer on the Go side. Otherwise the value may be collected by the
Go garbage collector at any time.

Ian
Reply all
Reply to author
Forward
0 new messages