Vim file globbing....

28 views
Skip to first unread message

Eric Tetz

unread,
Feb 14, 2008, 3:59:21 PM2/14/08
to Vim list
Open files matching wildcard

I want to open all files matching a pattern in a given directory (optionally recursive). I tried :e <pattern> but it complains 'too many filenames'.

I found that :arg <pattern> works, but it's shockingly slow. Further investigation suggests it has non-linear efficiency.

Searching a small directory (already cached in memory) dir /s /b build.config takes ~300ms on my machine, while :arg **/build.config takes ~12 seconds, a 40x difference. Searching a larger directory which takes dir 2 seconds (so I would expect Vim to take ~80 seconds), Vim takes over 20 minutes, a 600x difference.

This is a monster machine (2x Xeon E5320), where practically everything I do seems to happen before I type it, so it's depressing to see my favorite file manipulation tool perform so poorly on a seemingly basic task. I'm I using the wrong command or something?

Cheers,
Eric

Tony Mechelynck

unread,
Feb 14, 2008, 4:24:37 PM2/14/08
to vim...@googlegroups.com, Vim list

":arg **/build.config" means "build.config in the current directory and any
directory under it to any depth". Is that what you want? (I'm not familiar
with the /s and /b switches to "dir", and I'm not on Windows anymore, thank
Linus Torvalds.) I seriously doubt _the whole tree_ would already be "cached
in memory".

":e <pattern>", or rather ":e <filename with wildcards>" doesn't work because
expanding the wildcards produces several filenames, and the ":edit" command
admits only one at most.

What exactly do you want to do? If you want to search all those build.config
files anywhere in that tree for a specific string, let's say for the sake of
argument "vim" as a word (i.e., not gvim or vimmers), you can use the
":vimgrep" command:

:map <F2> :cnext<CR>
:map <S-F2> :cprev<CR>
:map <F3> :cnfile<CR>
:map <S-F3> :cpfile<CR>
:map <F4> :cfirst<CR>
:map <S-F4> :clast<CR>

:vimgrep /\<vim\>/g ./**/build.config

(The mappings are only there to help you navigate the list of matches, they
are not really essential).

The first argument to ":vimgrep" is a "vim pattern" (a regular expression)
optionally with g to match any number of times in a line and/or with j to
prevent jumping to the first match after searching the files. The argument(s)
after the first one is/are one or more space-separated filenames, possibly
with wildcards in them.

Of course, the ":vimgrep" command does take some time, because it searches the
full text of all the files you gave it; but if you want to search a large set
of files for a given text (or regular expression) it is remarkably fast -- and
predictable (you don't have to learn another "kind" of regular expressions
than those already built-in to Vim).


Best regards,
Tony.
--
Hail to the sun god
He sure is a fun god
Ra! Ra! Ra!

Benjamin Fritz

unread,
Feb 14, 2008, 4:35:21 PM2/14/08
to vim...@googlegroups.com

There's a clever tip for this:

http://vim.wikia.com/wiki/Load_multiple_files_with_a_single_command

I'm not sure how fast it works, but it's another thing to try!

A.Politz

unread,
Feb 14, 2008, 5:03:20 PM2/14/08
to vim...@googlegroups.com
Eric Tetz wrote:

>
>Searching a small directory (already cached in memory) dir /s /b
>build.config takes ~300ms on my machine, while :arg **/build.config takes
>~12 seconds, a 40x difference. Searching a larger directory which takes dir
>2 seconds (so I would expect Vim to take ~80 seconds), Vim takes over 20
>minutes, a 600x difference.
>
>
>
>

What dimensions are we talking about ? How many files, how deep is
the filetree ? How many targets does it contain (build.config) ?
Anyway, it sounds that something is wrong here, looking at the huge
differences of the shell command versus vims command. Maybe using
:argadd has some unforeseen sideeffects.

-ap

--
:wq

Eric Tetz

unread,
Feb 14, 2008, 5:37:01 PM2/14/08
to vim...@googlegroups.com
On Thu, Feb 14, 2008 at 1:24 PM, Tony Mechelynck
<antoine.m...@gmail.com> wrote:
> ":arg **/build.config" means "build.config in the current directory and any
> directory under it to any depth". Is that what you want?

Yes.

> (I'm not familiar with the /s and /b switches to "dir", and I'm not on
> Windows anymore, thank Linus Torvalds.)

dir /s is roughly equivalent to ls -R

> I seriously doubt _the whole tree_ would already be "cached
> in memory".

Of course it is. It's caching the FAT (tiny), not file contents.

There's a dramatic difference in speed before and after the FAT is
cached. The numbers I've shown are for fully cached directory trees.

> What exactly do you want to do?

I want to open files whose names match a wildcard pattern (i.e. *.vim,
UI_*.c, etc.).

Cheers,
Eric

Eric Tetz

unread,
Feb 14, 2008, 6:17:04 PM2/14/08
to vim...@googlegroups.com
On Thu, Feb 14, 2008 at 2:03 PM, A.Politz <pol...@fh-trier.de> wrote:
> What dimensions are we talking about ? How many files, how deep is the
> filetree ? How many targets does it contain (build.config) ? Anyway, it
> sounds that something is wrong here, looking at the huge differences of the
> shell command versus vims command. Maybe using
> :argadd has some unforeseen sideeffects.

In the worst case, ~10 directories deep, < 200K files, with ~80
'build.config' files among them.

But that's not really relevant, is it? If I pipe 'dir /s /b
build.config' into a buffer it takes 2 seconds, if I concatenate those
lines and pass them directly to :args, the buffer list is populated
instantly. If I dump the entire directory tree into a buffer and grep
it, it's instant. In other words, there's no particular part of this
operation (getting list of file names, pattern matching on their
names, populating the buffer list, etc.) that should be taking any
time at all.

What stands out to me is that the time difference between processing a
small set of files and a large set of files is not linear, suggesting
a problem in an algorithm somewhere.

Cheers,
Eric

Tony Mechelynck

unread,
Feb 14, 2008, 6:22:47 PM2/14/08
to vim...@googlegroups.com
Eric Tetz wrote:
> On Thu, Feb 14, 2008 at 1:24 PM, Tony Mechelynck
> <antoine.m...@gmail.com> wrote:
>> ":arg **/build.config" means "build.config in the current directory and any
>> directory under it to any depth". Is that what you want?
>
> Yes.
>
>> (I'm not familiar with the /s and /b switches to "dir", and I'm not on
>> Windows anymore, thank Linus Torvalds.)
>
> dir /s is roughly equivalent to ls -R
>
>> I seriously doubt _the whole tree_ would already be "cached
>> in memory".
>
> Of course it is. It's caching the FAT (tiny), not file contents.
>
> There's a dramatic difference in speed before and after the FAT is
> cached. The numbers I've shown are for fully cached directory trees.

In order to find files whose names verify some name with wildcards and
recursion, more than the FAT is necessary: the whole directory tree must be
searched, starting at the current directory and going down to any depth.

Basically, the FAT is only a sequence of numbers, saying, for each data
cluster (each fixed-size distinctly-allocatable area) on the disk, one of the
following:

- This cluster is free
- This cluster is the last one of its chain (contains the end-of-file or
equivalent)
- This cluster is followed by cluster number so-and-so.

The directories (except the root directory on some FAT filesystems) can be
anywhere on the disk and can even be fragmented, just like files. In fact,
they can be regarded as a special kind of "files". They, not the FAT, contain
the name, the LFN (if any), the size, the attributes (including isdirectory,
readonly, system, hidden, archive, ...) and the starting location of the files
(and subdirectories). These directories must in this case be searched for (a)
subdirectories of any name (for recursion), and (b) files matching the given
name (for matching).

If you meant all directories anywhere on the disk had already been cached in
memory, well, I would say it is remotely possible, but I still seriously doubt
it. If you just meant that the FAT itself had been cached, I'd say it probably
had, but that isn't enough to find the requested files (with full path and
LFN, as required for Vim's "buffer names").

>
>> What exactly do you want to do?
>
> I want to open files whose names match a wildcard pattern (i.e. *.vim,
> UI_*.c, etc.).
>
> Cheers,
> Eric

Yeah, sure, but you want to open them for some purpose, don't you? Depending
on what you want to open them for, there might (or might not) be some other,
more advantageous command for that.


Best regards,
Tony.
--
Nature is by and large to be found out of doors, a location where, it
cannot be argued, there are never enough comfortable chairs.
-- Fran Leibowitz

Eric Tetz

unread,
Feb 14, 2008, 7:20:37 PM2/14/08
to vim...@googlegroups.com
On Thu, Feb 14, 2008 at 3:22 PM, Tony Mechelynck
<antoine.m...@gmail.com> wrote:
> In order to find files whose names verify some name with wildcards
> and recursion, more than the FAT is necessary [snip OT FAT lesson]

And? My point was merely that I'm not doing a 'cold search' of the
tree. The entire tree has been scanned prior to running these test.
Any relevant information that can be cached, has been cached.
Subsequent searches taking a small fraction of time the 'cold search'
took.

The difference of 2 seconds vs 20 minutes has nothing whatsoever to do
with the hard disk.

> Yeah, sure, but you want to open them for some purpose, don't you?

I want to edit their text, which is why I'm opening them in a text
editor, right?

If I just wanted to search their contents, I would use grep.

If I just wanted to search *for* them, I would use dir (ls, find, etc.)

Cheers,
Eric

A.Politz

unread,
Feb 14, 2008, 9:28:48 PM2/14/08
to vim...@googlegroups.com
Eric Tetz wrote:

>
>What stands out to me is that the time difference between processing a
>small set of files and a large set of files is not linear, suggesting
>a problem in an algorithm somewhere.
>
>
>
>

You could check if vim is swapping ?

-ap


--
:wq

Eric Tetz

unread,
Feb 14, 2008, 11:11:40 PM2/14/08
to vim...@googlegroups.com
On Thu, Feb 14, 2008 at 6:28 PM, A.Politz <pol...@fh-trier.de> wrote:
> You could check if vim is swapping ?

It's not using much RAM at all. Just lots of CPU time.

I guess if no easy answers are forthcoming from the resident experts,
I'll download the source tonight and take a look.

Cheers,
Eric

Charles E Campbell Jr

unread,
Feb 26, 2008, 9:59:56 AM2/26/08
to vim...@googlegroups.com
In using netrw's Explore command (**/*.c), which is a recursive search
for filenames matching *.c, (directories have 3266 files with 460
matches), one a machine that is not a "monster machine", the matches
were found in less than one second. Of course, netrw is using vimgrep
for this (done on a Linux box).

Just thought I'd throw in a datapoint on this; plus, perhaps autocmds
are involved. Try your file opening with :set ei=all ; you won't get
syntax highlighting, but it'll be a clue.

Regards,
Chip Campbell

Eric Tetz

unread,
Feb 26, 2008, 7:10:56 PM2/26/08
to vim...@googlegroups.com
On Tue, Feb 26, 2008 at 6:59 AM, Charles E Campbell Jr
<drc...@campbellfamily.biz> wrote:
> In using netrw's Explore command (**/*.c), which is a recursive search
> for filenames matching *.c, (directories have 3266 files with 460
> matches), one a machine that is not a "monster machine", the matches
> were found in less than one second. Of course, netrw is using vimgrep
> for this (done on a Linux box).

Interesting. My guess is that there's something seriously wrong with
dos/win32 version of the fileglobbing routine. I took at look at it,
but I've never looked at the Vim source before and it made my head
hurt so I went back to work. ^_^

> Just thought I'd throw in a datapoint on this; plus, perhaps autocmds
> are involved. Try your file opening with :set ei=all ; you won't get
> syntax highlighting, but it'll be a clue.

I tried that. No appreciable difference.

Thanks for the info, though.

Cheers,
Eric

Reply all
Reply to author
Forward
0 new messages