surprising glob() result on Windows

22 views
Skip to first unread message

Mike

unread,
Apr 28, 2023, 9:32:55 PM4/28/23
to vim...@googlegroups.com
Briefly, I have a case where glob("*.ext") returns more files than I
expect.

To give an example, in a directory of your choice create two files named
"test.any" and "zest.anyother". The important detail is that the second
filename's extension be prefixed by the first filename's extension.

Then launch Vim in that directory and run the command
:echo glob("*.any")
Both files are returned, not just "test.any".

I see this on Windows running vim 9.0.1240 with normal features built
with Visual C. On the other hand, Vim on my linux box returns only
"test.any", as I would expect, so I don't think this a feature. :)

Any comments?

-mike


Bram Moolenaar

unread,
Apr 29, 2023, 9:26:39 AM4/29/23
to vim...@googlegroups.com, Mike
What file system is being used? Some older filesystems use a trick to
make long file names possible. The file then appears twice in the
directory, once with the short name and once with the long name. Vim
may find a match with the short name, includes it in the list of
matches and then expands it to the long name.
See https://en.wikipedia.org/wiki/Long_filename

--
ARTHUR: Now stand aside worthy adversary.
BLACK KNIGHT: (Glancing at his shoulder) 'Tis but a scratch.
ARTHUR: A scratch? Your arm's off.
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Mike

unread,
Apr 29, 2023, 10:12:18 AM4/29/23
to vim...@googlegroups.com
On 04/29/2023 9:26 AM, Bram Moolenaar wrote:
>
>> Briefly, I have a case where glob("*.ext") returns more files than I
>> expect.
>>
>> To give an example, in a directory of your choice create two files named
>> "test.any" and "zest.anyother". The important detail is that the second
>> filename's extension be prefixed by the first filename's extension.
>>
>> Then launch Vim in that directory and run the command
>> :echo glob("*.any")
>> Both files are returned, not just "test.any".
>>
>> I see this on Windows running vim 9.0.1240 with normal features built
>> with Visual C. On the other hand, Vim on my linux box returns only
>> "test.any", as I would expect, so I don't think this a feature. :)
>>
>> Any comments?
>
> What file system is being used?

Windows 10 with NTFS.

Mike

unread,
Apr 29, 2023, 10:51:37 AM4/29/23
to vim...@googlegroups.com
I've since rebuilt Vim to include patches up to 1494 and still see the
same results on my Windows 10 system. I thought that patches 1400 and
1458 might help but they did not.

>
> Any comments?
>
> -mike
>
>


Mike

unread,
Apr 29, 2023, 11:28:21 AM4/29/23
to vim...@googlegroups.com
More potatoes for the stew.

Create 5 files: test.a, test.ab, test.abc, test.abcd and test.abcde.
Then, using gvim -u NONE -U NONE --noplugin or gvim --clean:
glob("*.a") returns test.a
glob("*.ab") returns test.ab
glob("*.abc") returns test.abc, test.abcd and test.abcde
glob("*.abcd") returns test.abcd

So the problem occurs when the glob pattern has a 3-character extension.

>
>>
>> Any comments?
>>
>> -mike
>>
>>
>
>


Stan Brown

unread,
Apr 29, 2023, 3:37:30 PM4/29/23
to vim...@googlegroups.com


Stan Brown
Tehachapi, CA, USA
https://BrownMath.com
Mike, I saw someone answered this, but maybe their answer didn't reach you?

Short version: Windows is doing what it's supposed to, and so is Vim.

The original MS-DOS and MS-Windows file system, in the 1980s, allowed up
to 8 characters, and then optionally a dot (period, full stop) plus up
to 3 characters. Even if the file was created with lower-case characters
in its name, Windows would change those characters to upper case. We can
call these "8.3 filenames" for short.

Around the turn of the millennium (in Windows XP, if I recall
correctly), Windows added so-called long filenames (LFNs), which could
be longer than 8.3 and could contain lower-case.

Rather than start with a completely new file system (which would then
make floppy disks and other interchangeable media unreadable on the
previous generation of computers, Microsoft gave any filename that
exceeded 8.3 _two_ entries in the directory: one for the actual
filename, and one for an 8.3 "short filename" (SFN). If the new file's
name fit within 8.3, then it would get only that one entry, an SFN, in
the directory. Thus _every_ file had an SFN, but not every file had an
LFN. The graphical interface (called File Explorer, Windows Explorer, or
Explorer) would show an LFN if one existed, otherwise the SFN.

Some time after that, I'm not sure when but certainly by the release of
Windows 10, it became possible to disable SFNs for any particular disk
partition. And sometime after that, "LFNs only" became the default. But
your disk is obviously set to create SFNs from longer filenames.

Your test.a, test.ab, and test.abc all fit in the 8.3 paradigm, and
therefore they have only SFNs. Your test.abcd exceeds 8.3, so when you
created it Windows set up an SFN for it. How is the SFN formed? Windows
ignores any characters beyond the 6.3 limits (6.3, not 8.3), and for the
7th and 8th characters before the dot it adds ~1. Therefore your
test.abcd has two names, test.abcd and test~1.abc (probably ~1, but it
might be ~ and some other number). test.abcde is probably test~2.abc.
When you glob *.abc, the SFN name test~1.abc is caught in that net. But
since Windows prefers to show an LFN when one exists, you see them as
test.abcd and test.abcde.

None of the SFN/LFN business exists on Linux, and since glob() is a
Linux thing in origin it doesn't seem unreasonable to me that it doesn't
handle this.

If you really need to have more than three characters after the dot in
filenames, then the simplest thing would be for you to create a wrapper
function that calls glob and then in its return filters out anything
that doesn't match the input expression.

Mike

unread,
Apr 29, 2023, 8:49:52 PM4/29/23
to vim...@googlegroups.com
If you're referring to Brams' answer, it did. However, his link
primarily referenced FAT-based systems, not NTFS, and so the light-bulb
remained off.

>
> Short version: Windows is doing what it's supposed to, and so is Vim.
>
> The original MS-DOS and MS-Windows file system, in the 1980s, allowed up
> to 8 characters, and then optionally a dot (period, full stop) plus up
> to 3 characters. Even if the file was created with lower-case characters
> in its name, Windows would change those characters to upper case. We can
> call these "8.3 filenames" for short.
>
> Around the turn of the millennium (in Windows XP, if I recall
> correctly), Windows added so-called long filenames (LFNs), which could
> be longer than 8.3 and could contain lower-case.
>
> Rather than start with a completely new file system (which would then
> make floppy disks and other interchangeable media unreadable on the
> previous generation of computers, Microsoft gave any filename that
> exceeded 8.3 _two_ entries in the directory: one for the actual
> filename, and one for an 8.3 "short filename" (SFN). If the new file's
> name fit within 8.3, then it would get only that one entry, an SFN, in
> the directory. Thus _every_ file had an SFN, but not every file had an
> LFN. The graphical interface (called File Explorer, Windows Explorer, or
> Explorer) would show an LFN if one existed, otherwise the SFN.
>
> Some time after that, I'm not sure when but certainly by the release of
> Windows 10, it became possible to disable SFNs for any particular disk
> partition. And sometime after that, "LFNs only" became the default. But
> your disk is obviously set to create SFNs from longer filenames.

Thank you, now I understand.

Motivated by your answer, I've looked-up the NTFS article on wikipedia-
https://en.wikipedia.org/wiki/NTFS
and it says that short filenames are implemented as "hard links". I,
unthinkingly, did not realize this.

>
> Your test.a, test.ab, and test.abc all fit in the 8.3 paradigm, and
> therefore they have only SFNs. Your test.abcd exceeds 8.3, so when you
> created it Windows set up an SFN for it. How is the SFN formed? Windows
> ignores any characters beyond the 6.3 limits (6.3, not 8.3), and for the
> 7th and 8th characters before the dot it adds ~1. Therefore your
> test.abcd has two names, test.abcd and test~1.abc (probably ~1, but it
> might be ~ and some other number). test.abcde is probably test~2.abc.
> When you glob *.abc, the SFN name test~1.abc is caught in that net. But
> since Windows prefers to show an LFN when one exists, you see them as
> test.abcd and test.abcde.
>
> None of the SFN/LFN business exists on Linux, and since glob() is a
> Linux thing in origin it doesn't seem unreasonable to me that it doesn't
> handle this.
>
> If you really need to have more than three characters after the dot in
> filenames, then the simplest thing would be for you to create a wrapper
> function that calls glob and then in its return filters out anything
> that doesn't match the input expression.
>
Actually, I discovered this not because of glob() but because ":packadd"
was sourcing two files- one named pack.vim and the other named
pack.vim9, and both defined global-scope functions with the same name.
When I looked at the vim source code it appeared that it relied on glob
and so I chose to post the issue using glob as it seemed more fundamental.

Again, thanks for taking the time to provide a detailed answer.

-mike




Enan Ajmain

unread,
Apr 30, 2023, 11:58:18 PM4/30/23
to Stan Brown, vim...@googlegroups.com
On Sat, 29 Apr 2023 12:37:19 -0700
Stan Brown <the_sta...@fastmail.fm> wrote:
> Some time after that, I'm not sure when but certainly by the release of
> Windows 10, it became possible to disable SFNs for any particular disk
> partition. And sometime after that, "LFNs only" became the default. But
> your disk is obviously set to create SFNs from longer filenames.

I donno if something changed in Windows 11, but it doesn't seem like
"LFN only" is the default anymore. I didn't change any setting (didnt
even know about them) and I get the same behavior Mike describes. It's
just that I never had multiple files where one's extension is a
substring of the other's, so I havent faced this issue.

--
Enan

Stan Brown

unread,
May 1, 2023, 8:54:44 AM5/1/23
to Enan Ajmain, vim...@googlegroups.com
On my Windows 10 Pro system, my boot partition "C:" had SFNs and LFNs,
but the new partitions I created on the same physical drive had LFNs
only. I don't know if it would have been the same on Windows 10 Home, or
on Windows 11 Home and Pro. In any case, SFNs seem to be enabled on the
partition where the OP is running.

<https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/fsutil-8dot3name>
explains how to enable or disable SFNs on a partition. There are
warnings about not disabling SFNs on a partition that already has them.

Mike

unread,
May 2, 2023, 11:21:00 AM5/2/23
to vim...@googlegroups.com
On 04/29/2023 3:37 PM, Stan Brown wrote:
>
>
Just out of curiosity, I tried the same with python and tcl. Neither
returns multiple files for the *.abc case so their behavior is
different. I don't know why it differs but I would argue that their
outputs do a better job of "meeting expectations".

Perhaps this is an issue I should raise with the vim developers for the
following reasons. First, the Windows outputs are inconsistent with the
linux outputs for the same directory contents. Second, it appears that
vim is using this glob capability when sourcing plugins (I first noticed
this using packadd) and so unwanted files could be sourced on Windows
systems.

>
> stuff snipped
>


Bram Moolenaar

unread,
May 2, 2023, 3:53:50 PM5/2/23
to vim...@googlegroups.com, Mike

> Just out of curiosity, I tried the same with python and tcl. Neither
> returns multiple files for the *.abc case so their behavior is
> different. I don't know why it differs but I would argue that their
> outputs do a better job of "meeting expectations".

That depends on what your expectations are. If you list a directory and
see a file "some~1.abc" then a glob() with *.abc should find it, right?
Or not? The question is whether you expect to match the long name only.

Usually when we encounter something where it's not 100% clear what the
right behavior is, the best choice is to leave it alone. The people who
are happy with the current behavior won't make any remark right now,
thus we have no idea how many we would "hurt" by making a change.

> Perhaps this is an issue I should raise with the vim developers for the
> following reasons. First, the Windows outputs are inconsistent with the
> linux outputs for the same directory contents.

Still, there have been no complaints until now. It appears to be more a
theoretical problem than a practical one.

Adding an optional argument to glob() to avoid the short filenames is
not a good idea, there already are three optional arguments. We could
change this to use a second argument that is a dictionary with the
current options plus the new one. It would be clearer when reading back
the function call. Is it really worth making this change?

> Second, it appears that vim is using this glob capability when
> sourcing plugins (I first noticed this using packadd) and so unwanted
> files could be sourced on Windows systems.

Can you be more specific about "when sourcing plugins" ?
Are there really files using an extension starting with ".vim"?

--
Team-building exercises come in many forms but they all trace their roots back
to the prison system. In your typical team-building exercise the employees
are subjected to a variety of unpleasant situations until they become either a
cohesive team or a ring of car jackers.
(Scott Adams - The Dilbert principle)

Mike

unread,
May 2, 2023, 4:43:12 PM5/2/23
to vim...@googlegroups.com
On 05/02/2023 3:53 PM, Bram Moolenaar wrote:
>
>> Just out of curiosity, I tried the same with python and tcl. Neither
>> returns multiple files for the *.abc case so their behavior is
>> different. I don't know why it differs but I would argue that their
>> outputs do a better job of "meeting expectations".
>
> That depends on what your expectations are. If you list a directory and
> see a file "some~1.abc" then a glob() with *.abc should find it, right?
> Or not? The question is whether you expect to match the long name only.

On my Windows 10 system with NTFS, Windows Explorer and dir will, by
default, show "some.abcd" and not "some~1.abc". Therefore I don't
expect glob("*.abc") to return "some.abcd".

However, the command "dir /X" will display the short-filename
equivalents but I never use that option and suspect that few do.

>
> Usually when we encounter something where it's not 100% clear what the
> right behavior is, the best choice is to leave it alone. The people who
> are happy with the current behavior won't make any remark right now,
> thus we have no idea how many we would "hurt" by making a change.
>
>> Perhaps this is an issue I should raise with the vim developers for the
>> following reasons. First, the Windows outputs are inconsistent with the
>> linux outputs for the same directory contents.
>
> Still, there have been no complaints until now. It appears to be more a
> theoretical problem than a practical one.
>
> Adding an optional argument to glob() to avoid the short filenames is
> not a good idea, there already are three optional arguments. We could
> change this to use a second argument that is a dictionary with the
> current options plus the new one. It would be clearer when reading back
> the function call. Is it really worth making this change?
>
>> Second, it appears that vim is using this glob capability when
>> sourcing plugins (I first noticed this using packadd) and so unwanted
>> files could be sourced on Windows systems.
>
> Can you be more specific about "when sourcing plugins" ?
> Are there really files using an extension starting with ".vim"?
>

By "sourcing plugins" I mean what Vim automatically does when it starts
up and what it does when it executes the "packadd" command. If a
"plugin" directory includes a file with the extension .vim9 for example,
it will be sourced, apparently because the filesystem, if it supports
short filenames, has a duplicate name for "some.vim9" which could be
"some~1.vim".

I do admit that I don't know what actually happens under-the-covers
because Vim-glob does return "some.vim9" and not "some~1.vim". As I
mentioned above, the python & Tcl glob commands do not return some.vim9
when the pattern is *.vim so their behavior is different.


Reply all
Reply to author
Forward
0 new messages