filesystem walk

649 views
Skip to first unread message

Keith Rarick

unread,
Sep 18, 2013, 9:55:25 PM9/18/13
to golang-nuts
I've never been terribly thrilled with the callback interface
presented by filepath.Walk. And I really like the iteration
pattern in sql.Rows and bufio.Scanner. So here's a
reworking of filepath.Walk:

http://godoc.org/github.com/kr/fs#Walker

Martin Angers

unread,
Sep 19, 2013, 8:34:28 AM9/19/13
to golan...@googlegroups.com
I love the API. Maybe make lexical order an option?

roger peppe

unread,
Sep 19, 2013, 10:13:21 AM9/19/13
to Keith Rarick, golang-nuts
Very nice. And a neat implementation too.

One thought: a few times I've wanted to truncate the walk at the
current directory
*after* I've seen the entry for the current directory;
usually when searching for directories with particular contents.
I'm wondering if it might be nice if SkipDir was renamed to Skip
(a no-op if the current file is not a directory) and SkipDir was
changed to mean "skip past all other files in the current directory".

Something I'm not sure about is that if you've got a directory
that you can't read, you see the name twice in a row.
I think I'd prefer it if it always tried to read the directory, even
if ends up being skipped, to preserve the nice invariant that
you see each name at most once.

Another thing that would be cool, and not too hard,
would be a breadth-first traversal with the same API.

One final thing: in these kinds of iterators, I've often seen
"Next" used instead of "Step". Worth considering, perhaps.
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Keith Rarick

unread,
Sep 19, 2013, 4:25:03 PM9/19/13
to roger peppe, golang-nuts
On Thu, Sep 19, 2013 at 7:13 AM, roger peppe <rogp...@gmail.com> wrote:
> One thought: a few times I've wanted to truncate the walk at the
> current directory
> *after* I've seen the entry for the current directory;
> usually when searching for directories with particular contents.
> I'm wondering if it might be nice if SkipDir was renamed to Skip
> (a no-op if the current file is not a directory) and SkipDir was
> changed to mean "skip past all other files in the current directory".

Ahhh interesting. I haven't run across this need myself,
but it's the sort of thing that seems annoyingly awkward
to do for the user and pretty straightforward to implement
inside the walker. Not sure about the names. I'd sort of
be inclined to keep the current method and add SkipRest.

> Something I'm not sure about is that if you've got a directory
> that you can't read, you see the name twice in a row.
> I think I'd prefer it if it always tried to read the directory, even
> if ends up being skipped, to preserve the nice invariant that
> you see each name at most once.

Yeah, filepath.Walk does the same thing. It struck me as
a little weird, but I tried to stick to its behavior as closely
as possible.

I don't feel strongly about it. Do others have an opinion?
Is this worth changing?

> Another thing that would be cool, and not too hard,
> would be a breadth-first traversal with the same API.

Sure, but maybe I'll leave BreadthFirstWalker as an
exercise for the reader. :)

> One final thing: in these kinds of iterators, I've often seen
> "Next" used instead of "Step". Worth considering, perhaps.

True, and I vacillated between those names for a while.
Figured Step was more perambulatorily thematic.

roger peppe

unread,
Sep 19, 2013, 4:47:54 PM9/19/13
to Keith Rarick, golang-nuts
On 19 September 2013 21:25, Keith Rarick <k...@xph.us> wrote:
> On Thu, Sep 19, 2013 at 7:13 AM, roger peppe <rogp...@gmail.com> wrote:
>> One thought: a few times I've wanted to truncate the walk at the
>> current directory
>> *after* I've seen the entry for the current directory;
>> usually when searching for directories with particular contents.
>> I'm wondering if it might be nice if SkipDir was renamed to Skip
>> (a no-op if the current file is not a directory) and SkipDir was
>> changed to mean "skip past all other files in the current directory".
>
> Ahhh interesting. I haven't run across this need myself,
> but it's the sort of thing that seems annoyingly awkward
> to do for the user and pretty straightforward to implement
> inside the walker. Not sure about the names. I'd sort of
> be inclined to keep the current method and add SkipRest.
>
>> Something I'm not sure about is that if you've got a directory
>> that you can't read, you see the name twice in a row.
>> I think I'd prefer it if it always tried to read the directory, even
>> if ends up being skipped, to preserve the nice invariant that
>> you see each name at most once.
>
> Yeah, filepath.Walk does the same thing. It struck me as
> a little weird, but I tried to stick to its behavior as closely
> as possible.
>
> I don't feel strongly about it. Do others have an opinion?
> Is this worth changing?

It does seem weird to me and I'd like to see a more obvious behaviour,
but that does have its down-sides (it uses more resources if the
client is going to skip a directory).

It would be nice if:

w := fs.Walk(dir)
var names []string
for w.Step() {
names = append(names, w.Path())
}

was guaranteed to produce a list of all stat-able names
in the directory. Quite often you don't really care about
names in inaccessible directories, and it feels like the current
behaviour could give some quite surprising results in
obscure situations.

Personally I've never relied on that particular filepath
behaviour, but others' mileage may vary.


One other thing that feels slightly awkward to me is
that Stat may return nil if the root directory does not
exist. This is underdocumented and leads to slightly
more awkward code to make things right.

At the cost of slightly more client code, I wonder about:

// Walk returns a new Walker rooted at root.
// It returns an error if the given root is not a directory.
func Walk(root string) (*Walker, error)

Then you can write code that always assumes a valid
os.FileInfo inside the loop, which is worth a fair amount.

>> One final thing: in these kinds of iterators, I've often seen
>> "Next" used instead of "Step". Worth considering, perhaps.
>
> True, and I vacillated between those names for a while.
> Figured Step was more perambulatorily thematic.

"Next" works well to imply the "next step" too, I think.
It would be nice not to invent too many names here,
and it's actually potentially usable as an interface,
though I can't currently think of a good use case.

It's a pity that Scanner chose Scan, not Next IMO.

Nigel Tao

unread,
Sep 19, 2013, 9:14:34 PM9/19/13
to Keith Rarick, golang-nuts
On Thu, Sep 19, 2013 at 11:55 AM, Keith Rarick <k...@xph.us> wrote:
> I've never been terribly thrilled with the callback interface
> presented by filepath.Walk. And I really like the iteration
> pattern in sql.Rows and bufio.Scanner.

Nice work.

I don't know if the inconsistency is unavoidable, but sql.Rows and
bufio.Scanner's users check for error after the for loop, not inside
the for loop like your fs.Walker:

rows, err := db.Query(etc)
if err != nil { etc }
for rows.Next() {
etc
}
if err := rows.Err(); err != nil { etc }

scanner := bufio.NewScanner(r)
for scanner.Scan() {
etc
}
if err := scanner.Err(); err != nil { etc }

walker := fs.Walk("/usr/lib")
for walker.Step() {
if err := walker.Err(); err != nil { etc }
}

Nigel Tao

unread,
Sep 19, 2013, 9:17:59 PM9/19/13
to roger peppe, Keith Rarick, golang-nuts
On Fri, Sep 20, 2013 at 6:47 AM, roger peppe <rogp...@gmail.com> wrote:
> It would be nice if:
>
> w := fs.Walk(dir)
> var names []string
> for w.Step() {
> names = append(names, w.Path())
> }
>
> was guaranteed to produce a list of all stat-able names
> in the directory.

I don't think you can make that guarantee if another process can
modify the file system concurrently.

Keith Rarick

unread,
Sep 19, 2013, 9:17:37 PM9/19/13
to Nigel Tao, golang-nuts
On Thu, Sep 19, 2013 at 6:14 PM, Nigel Tao <nige...@golang.org> wrote:
> Nice work.

Thanks!

> I don't know if the inconsistency is unavoidable, but sql.Rows and
> bufio.Scanner's users check for error after the for loop, not inside
> the for loop like your fs.Walker:

I think this inconsistency reflects a real difference:
sql.Rows and bufio.Scanner both stop when they
encounter an error; fs.Walker keeps going.

roger peppe

unread,
Sep 20, 2013, 2:54:43 AM9/20/13
to Nigel Tao, Keith Rarick, golang-nuts
Indeed not. But even with an unchanging file system, the
fact that an inaccessible directory will cause a duplicate
name in the above slice is a bit surprising, I think.

Keith Rarick

unread,
Sep 20, 2013, 2:51:54 PM9/20/13
to roger peppe, Nigel Tao, golang-nuts
On Thu, Sep 19, 2013 at 11:54 PM, roger peppe <rogp...@gmail.com> wrote:
> Indeed not. But even with an unchanging file system, the
> fact that an inaccessible directory will cause a duplicate
> name in the above slice is a bit surprising, I think.

I agree it's surprising, but I'm having a hard time thinking
of a way to avoid it nicely. The SkipDir option is there, in
part, to avoid calling ReadDir on large directories.

roger peppe

unread,
Sep 20, 2013, 3:12:08 PM9/20/13
to Keith Rarick, Nigel Tao, golang-nuts
The implementation would be a little more complex, but
it shouldn't be too hard to open the directory first,
and only read it after we know it isn't been skipped.

I think you'd only ever need to keep a single fd open at
a time, because we always walk to a directory's contents immediately
after processing the directory itself.

Nathan Youngman

unread,
Sep 22, 2013, 11:57:30 PM9/22/13
to golan...@googlegroups.com
Hi Keith,

I noticed that you called your package fs. Do you have plans to add other high level constructs to it.

While writing Go code I am often surprised at what I end up writing myself:

* Subdirectories() using Walker as you have here
* CopyFile() to copy files from disk to disk
* IsHidden() to check if a file is hidden using os+syscall on Windows (and strings.HasPrefix on Linux, etc.)
* FileExist() as a wrapper around Stat, which I used for determining if a Volume/folder was available

The standard library is one level lower than I'd like sometimes.

Nathan.

Keith Rarick

unread,
Sep 23, 2013, 6:11:59 PM9/23/13
to roger peppe, Nigel Tao, golang-nuts
On Fri, Sep 20, 2013 at 12:12 PM, roger peppe <rogp...@gmail.com> wrote:
> The implementation would be a little more complex, but
> it shouldn't be too hard to open the directory first,
> and only read it after we know it isn't been skipped.

But os.File.Readdir can still return an error, which we
wouldn't get until after the user has seen the directory
once already.

(Plus, keeping a file open between calls to Step would
lose the nice property that the user can just throw away
the walker if they want to break out of the traversal early.
We'd need to add a Close method or something.)

Keith Rarick

unread,
Sep 23, 2013, 6:13:24 PM9/23/13
to Nathan Youngman, golang-nuts
On Sun, Sep 22, 2013 at 8:57 PM, Nathan Youngman <junk...@nathany.com> wrote:
> I noticed that you called your package fs. Do you have plans to add other
> high level constructs to it.

No particular plans. It just seemed like a nice name.
I'd be happy to look at any patches you want to send!
Reply all
Reply to author
Forward
0 new messages