Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Not so logical behavior of glob regarding the letter case on Windows

52 views
Skip to first unread message

Alexandru

unread,
Mar 20, 2019, 12:48:41 PM3/20/19
to
Suppose I have a file named

My_file.txt

and want to use "glob" to find it in different situations (which differ by the ammount of knowledge about the full file name):

glob my*.txt
>> My_file.txt

glob My*.txt
>> My_file.txt

Until now everything is as expected. Note that even if I change the letter case in the glob pattern, the returned value is consistent with the letter case in the file name.

But if I want to use the glob to find out how the letter case of a file is (I know the name but not the letter case):

glob My_file.txt
>> My_file.txt

glob MY_FILE.txt
>> MY_FILE.txt

The result will be the input value. I guess, internally glob checks if the file has any wild cards. If not it checks if the file exists and if yes it returns the initial argument not the true name of the file.

It would be better to always return the true name of the file considering the leter case.

What do you think?

Regards
Alexandru

Rich

unread,
Mar 20, 2019, 1:49:09 PM3/20/19
to
Alexandru <alexandr...@meshparts.de> wrote:

*CRITICALLY IMPORTANT* bit of detail missing:

What Operating System?

> Suppose I have a file named
>
> My_file.txt
>
> and want to use "glob" to find it in different situations (which
> differ by the ammount of knowledge about the full file name):
>
> glob my*.txt
>>> My_file.txt
>
> glob My*.txt
>>> My_file.txt
>
> Until now everything is as expected.

Only if you are using an OS with an inferior case-folding filesystem.
Below is what is *expected* from a proper filesystem:

$ ls -l
total 0
-rw-r--r-- 1 _____ _____ 0 Mar 20 13:40 My_file.txt
$ rlwrap tclsh
% glob my*.txt
no files matched glob pattern "my*.txt"
% glob My*.txt
My_file.txt

> Note that even if I change the
> letter case in the glob pattern, the returned value is consistent
> with the letter case in the file name.

Which implies you are on Windows or Mac, with their case folding
filesystems, where each of: My_file.txt, MY_FILE.TXT, my_file.txt,
etc., all refer to the same filename on disk.

> But if I want to use the glob to find out how the letter case of a
> file is (I know the name but not the letter case):
>
> glob My_file.txt
>>> My_file.txt
>
> glob MY_FILE.txt
>>> MY_FILE.txt
>
> The result will be the input value. I guess, internally glob checks
> if the file has any wild cards. If not it checks if the file exists
> and if yes it returns the initial argument not the true name of the
> file.

No. You are being fooled by the case folding filesystem. With a
proper filesystem, things work as expected:

$ ls -l
total 0
-rw-r--r-- 1 ______ _____ 0 Mar 20 13:40 My_file.txt
$ rlwrap tclsh
% glob my*.txt
no files matched glob pattern "my*.txt"
% glob My*.txt
My_file.txt
% glob My_file.txt
My_file.txt
% glob MY_FILE.txt
no files matched glob pattern "MY_FILE.txt"
%

> It would be better to always return the true name of the file
> considering the leter case.
>
> What do you think?

It would be better to not use an OS with a case folding filesystem.
Then you get what you expect.

The fact is, the bug here is in your chosen OS, not in Tcl.

aldo.www...@gmail.com

unread,
Mar 20, 2019, 4:27:20 PM3/20/19
to
Since on Windows you cannot have myfile.txt and MYFILE.TXT within the same directory, there's no harm.
Once created (let's say "Myfile.txt"), you can work on it using different names (MYfiLE.TXT, myFILE.TXT , and so on..); it's always the same file !

Note: Also on MacOS filesystem you cannot have myfile.txt and MYFILE.txt within the same directory.

Brad Lanam

unread,
Mar 20, 2019, 8:36:09 PM3/20/19
to
You can also convert the filename to the on-disk windows case using
file normalize.
set fn [file tail [file normalize [glob MY_FILE.TXT]]]

It seems like a reasonable request.
It's probably an enhancement, not a bug fix:
Add a -normalize option to the glob command.

MacOS not only has the ignored case, but its filenames
are decomposed utf-8 (NFD). All sorts of fun.
And MacOS's method for localizing directory names sucks.
I have all sorts of work-arounds for MacOS.

Alexandru

unread,
Mar 20, 2019, 9:28:18 PM3/20/19
to
Am Donnerstag, 21. März 2019 01:36:09 UTC+1 schrieb Brad Lanam:
> On Wednesday, March 20, 2019 at 9:48:41 AM UTC-7, Alexandru wrote:
> > Suppose I have a file named
> >
> > My_file.txt
> >
> > and want to use "glob" to find it in different situations (which differ by the ammount of knowledge about the full file name):
> >
> > glob my*.txt
> > >> My_file.txt
> >
> > glob My*.txt
> > >> My_file.txt
> >
> > Until now everything is as expected. Note that even if I change the letter case in the glob pattern, the returned value is consistent with the letter case in the file name.
> >
> > But if I want to use the glob to find out how the letter case of a file is (I know the name but not the letter case):
> >
> > glob My_file.txt
> > >> My_file.txt
> >
> > glob MY_FILE.txt
> > >> MY_FILE.txt
> >
> > The result will be the input value. I guess, internally glob checks if the file has any wild cards. If not it checks if the file exists and if yes it returns the initial argument not the true name of the file.
> >
> > It would be better to always return the true name of the file considering the leter case.
> >
> > What do you think?
>
> You can also convert the filename to the on-disk windows case using
> file normalize.
> set fn [file tail [file normalize [glob MY_FILE.TXT]]]

Great tip! Was not aware, that file normalize takes care of the case folding. I'll use that.

>
> It seems like a reasonable request.
> It's probably an enhancement, not a bug fix:
> Add a -normalize option to the glob command.

Also good point. Agree. Just: I never stated, this is is a bug.

Alexandru

unread,
Mar 20, 2019, 9:43:18 PM3/20/19
to
Am Mittwoch, 20. März 2019 18:49:09 UTC+1 schrieb Rich:
> Alexandru <alexandr...@meshparts.de> wrote:
>
> *CRITICALLY IMPORTANT* bit of detail missing:
>
> What Operating System?

Yes, it's Windows, sorry for writing this only in the title.

>
> > Suppose I have a file named
> >
> > My_file.txt
> >
> > and want to use "glob" to find it in different situations (which
> > differ by the ammount of knowledge about the full file name):
> >
> > glob my*.txt
> >>> My_file.txt
> >
> > glob My*.txt
> >>> My_file.txt
> >
> > Until now everything is as expected.
>
> Only if you are using an OS with an inferior case-folding filesystem.
> Below is what is *expected* from a proper filesystem:
>
> $ ls -l
> total 0
> -rw-r--r-- 1 _____ _____ 0 Mar 20 13:40 My_file.txt
> $ rlwrap tclsh
> % glob my*.txt
> no files matched glob pattern "my*.txt"
> % glob My*.txt
> My_file.txt

The "inferior" file system is actually making my life easier in this case. Like I said, I don't know the case folding and I was trying to find it out using the glob command. I won't do this anymore since now I know brim Brad that I can use file normalize.

BTW: On "superior" file systems I would complain that there is no "-nocase" option in the "glob" command. Perhaps glob can be enhanced with both -nocase and case folding in the next release.
I appreciate your opinion but having a case sensitive file system is not on my wish list for Windows. Reason: If I would have two files with same name but different case folding in the same directory, after one year I would not remember, which file is that I'm looking for. The letter case is very helpful when writing code, but not for every day use in the office. Since not everybody is a programmer, I thing Microsoft made a very wise decision not to include the case folding.

>
> The fact is, the bug here is in your chosen OS, not in Tcl.

Never said this is a bug. I also don't think this is a bug. That said, this proposal should be seen as an enhancement for the glob command, as Brad already said.

Alexandru

unread,
Mar 20, 2019, 9:46:39 PM3/20/19
to
As a Windows user I'm aware of this. Yes, there is no harm and I would like to propose this as an enhancement to the glob command: Return the correct case folding an any OS.

Robert Heller

unread,
Mar 21, 2019, 8:40:12 AM3/21/19
to
At Wed, 20 Mar 2019 18:43:15 -0700 (PDT) Alexandru <alexandr...@meshparts.de> wrote:

>
> Am Mittwoch, 20. M=C3=A4rz 2019 18:49:09 UTC+1 schrieb Rich:
> > Alexandru <alexandr...@meshparts.de> wrote:
> >=20
> > *CRITICALLY IMPORTANT* bit of detail missing:
> >=20
> > What Operating System?
>
> Yes, it's Windows, sorry for writing this only in the title.
>
> >=20
> > > Suppose I have a file named=20
> > >=20
> > > My_file.txt
> > >=20
> > > and want to use "glob" to find it in different situations (which=20
> > > differ by the ammount of knowledge about the full file name):
> > >=20
> > > glob my*.txt
> > >>> My_file.txt
> > >=20
> > > glob My*.txt
> > >>> My_file.txt
> > >=20
> > > Until now everything is as expected.
> >=20
> > Only if you are using an OS with an inferior case-folding filesystem. =20
> > Below is what is *expected* from a proper filesystem:
> >=20
> > $ ls -l=20
> > total 0
> > -rw-r--r-- 1 _____ _____ 0 Mar 20 13:40 My_file.txt
> > $ rlwrap tclsh=20
> > % glob my*.txt
> > no files matched glob pattern "my*.txt"
> > % glob My*.txt
> > My_file.txt
>
> The "inferior" file system is actually making my life easier in this case. =
> Like I said, I don't know the case folding and I was trying to find it out =
> using the glob command. I won't do this anymore since now I know brim Brad =
> that I can use file normalize.
>
> BTW: On "superior" file systems I would complain that there is no "-nocase"=
> option in the "glob" command. Perhaps glob can be enhanced with both -noca=
> se and case folding in the next release.

glob {[Mm][Yy]_[Ff][Ii][Ll][Ee].[Tt][Xx][Tt]}

And one can always write a simple wrapper proc to "expand" a filename to the
above form:

proc glob-nocase {filename} {
set resultname {}
foreach c [split $filename {}] {
if {[string is alpha $c]} {
append resultname {[} [string toupper $c] [string tolower $c] {]}
} else {
append resultname $c
}
}
return [glob $resultname]
}

(Ideally the above should be written to allow for glob's options, but I'll
leave that as an exercise for the reader.)

>
> >=20
> > > Note that even if I change the=20
> > > letter case in the glob pattern, the returned value is consistent=20
> > > with the letter case in the file name.
> >=20
> > Which implies you are on Windows or Mac, with their case folding=20
> > filesystems, where each of: My_file.txt, MY_FILE.TXT, my_file.txt,=20
> > etc., all refer to the same filename on disk.
> >=20
> > > But if I want to use the glob to find out how the letter case of a=20
> > > file is (I know the name but not the letter case):
> > >=20
> > > glob My_file.txt
> > >>> My_file.txt
> > >=20
> > > glob MY_FILE.txt
> > >>> MY_FILE.txt
> > >=20
> > > The result will be the input value. I guess, internally glob checks=20
> > > if the file has any wild cards. If not it checks if the file exists=20
> > > and if yes it returns the initial argument not the true name of the=20
> > > file.
> >=20
> > No. You are being fooled by the case folding filesystem. With a=20
> > proper filesystem, things work as expected:
> >=20
> > $ ls -l=20
> > total 0
> > -rw-r--r-- 1 ______ _____ 0 Mar 20 13:40 My_file.txt
> > $ rlwrap tclsh=20
> > % glob my*.txt
> > no files matched glob pattern "my*.txt"
> > % glob My*.txt
> > My_file.txt
> > % glob My_file.txt
> > My_file.txt
> > % glob MY_FILE.txt
> > no files matched glob pattern "MY_FILE.txt"
> > %=20
> >=20
> > > It would be better to always return the true name of the file=20
> > > considering the leter case.
> > >=20
> > > What do you think?
> >=20
> > It would be better to not use an OS with a case folding filesystem. =20
> > Then you get what you expect.
>
>
> I appreciate your opinion but having a case sensitive file system is not on=
> my wish list for Windows. Reason: If I would have two files with same name=
> but different case folding in the same directory, after one year I would n=
> ot remember, which file is that I'm looking for. The letter case is very he=
> lpful when writing code, but not for every day use in the office. Since not=
> everybody is a programmer, I thing Microsoft made a very wise decision not=
> to include the case folding.

It would be *dumb* to use case alone to distinguish files.

>
> >=20
> > The fact is, the bug here is in your chosen OS, not in Tcl.
>
> Never said this is a bug. I also don't think this is a bug. That said, this=
> proposal should be seen as an enhancement for the glob command, as Brad al=
> ready said.
>

--
Robert Heller -- 978-544-6933
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

Alexandru

unread,
Mar 21, 2019, 8:44:26 AM3/21/19
to
Hi Robert,

So I'm extracting from your answer, that you don't agree with the proposed enhancement of the glob command?

Robert Heller

unread,
Mar 21, 2019, 9:22:37 AM3/21/19
to
At Thu, 21 Mar 2019 05:44:24 -0700 (PDT) Alexandru <alexandr...@meshparts.de> wrote:

>
> Am Donnerstag, 21. M=C3=A4rz 2019 13:40:12 UTC+1 schrieb Robert Heller:
> > At Wed, 20 Mar 2019 18:43:15 -0700 (PDT) Alexandru <alexandru.dadalau@mes=
> hparts.de> wrote:
> >=20
> > >=20
> > > Am Mittwoch, 20. M=3DC3=3DA4rz 2019 18:49:09 UTC+1 schrieb Rich:
> > > > Alexandru <alexandr...@meshparts.de> wrote:
> > > >=3D20
> > > > *CRITICALLY IMPORTANT* bit of detail missing:
> > > >=3D20
> > > > What Operating System?
> > >=20
> > > Yes, it's Windows, sorry for writing this only in the title.
> > >=20
> > > >=3D20
> > > > > Suppose I have a file named=3D20
> > > > >=3D20
> > > > > My_file.txt
> > > > >=3D20
> > > > > and want to use "glob" to find it in different situations (which=3D=
> 20
> > > > > differ by the ammount of knowledge about the full file name):
> > > > >=3D20
> > > > > glob my*.txt
> > > > >>> My_file.txt
> > > > >=3D20
> > > > > glob My*.txt
> > > > >>> My_file.txt
> > > > >=3D20
> > > > > Until now everything is as expected.
> > > >=3D20
> > > > Only if you are using an OS with an inferior case-folding filesystem.=
> =3D20
> > > > Below is what is *expected* from a proper filesystem:
> > > >=3D20
> > > > $ ls -l=3D20
> > > > total 0
> > > > -rw-r--r-- 1 _____ _____ 0 Mar 20 13:40 My_file.txt
> > > > $ rlwrap tclsh=3D20
> > > > % glob my*.txt
> > > > no files matched glob pattern "my*.txt"
> > > > % glob My*.txt
> > > > My_file.txt
> > >=20
> > > The "inferior" file system is actually making my life easier in this ca=
> se. =3D
> > > Like I said, I don't know the case folding and I was trying to find it =
> out =3D
> > > using the glob command. I won't do this anymore since now I know brim B=
> rad =3D
> > > that I can use file normalize.
> > >=20
> > > BTW: On "superior" file systems I would complain that there is no "-noc=
> ase"=3D
> > > option in the "glob" command. Perhaps glob can be enhanced with both -=
> noca=3D
> > > se and case folding in the next release.
> >=20
> > glob {[Mm][Yy]_[Ff][Ii][Ll][Ee].[Tt][Xx][Tt]}
> >=20
> > And one can always write a simple wrapper proc to "expand" a filename to =
> the=20
> > above form:
> >=20
> > proc glob-nocase {filename} {
> > set resultname {}
> > foreach c [split $filename {}] {
> > if {[string is alpha $c]} {
> > append resultname {[} [string toupper $c] [string tolower $c] {=
> ]}
> > } else {
> > append resultname $c
> > }
> > }
> > return [glob $resultname]
> > }
> >=20
> > (Ideally the above should be written to allow for glob's options, but I'l=
> l=20
> > leave that as an exercise for the reader.)
> >=20
> > >=20
> > > >=3D20
> > > > > Note that even if I change the=3D20
> > > > > letter case in the glob pattern, the returned value is consistent=
> =3D20
> > > > > with the letter case in the file name.
> > > >=3D20
> > > > Which implies you are on Windows or Mac, with their case folding=3D20
> > > > filesystems, where each of: My_file.txt, MY_FILE.TXT, my_file.txt,=3D=
> 20
> > > > etc., all refer to the same filename on disk.
> > > >=3D20
> > > > > But if I want to use the glob to find out how the letter case of a=
> =3D20
> > > > > file is (I know the name but not the letter case):
> > > > >=3D20
> > > > > glob My_file.txt
> > > > >>> My_file.txt
> > > > >=3D20
> > > > > glob MY_FILE.txt
> > > > >>> MY_FILE.txt
> > > > >=3D20
> > > > > The result will be the input value. I guess, internally glob check=
> s=3D20
> > > > > if the file has any wild cards. If not it checks if the file exist=
> s=3D20
> > > > > and if yes it returns the initial argument not the true name of the=
> =3D20
> > > > > file.
> > > >=3D20
> > > > No. You are being fooled by the case folding filesystem. With a=3D2=
> 0
> > > > proper filesystem, things work as expected:
> > > >=3D20
> > > > $ ls -l=3D20
> > > > total 0
> > > > -rw-r--r-- 1 ______ _____ 0 Mar 20 13:40 My_file.txt
> > > > $ rlwrap tclsh=3D20
> > > > % glob my*.txt
> > > > no files matched glob pattern "my*.txt"
> > > > % glob My*.txt
> > > > My_file.txt
> > > > % glob My_file.txt
> > > > My_file.txt
> > > > % glob MY_FILE.txt
> > > > no files matched glob pattern "MY_FILE.txt"
> > > > %=3D20
> > > >=3D20
> > > > > It would be better to always return the true name of the file=3D20
> > > > > considering the leter case.
> > > > >=3D20
> > > > > What do you think?
> > > >=3D20
> > > > It would be better to not use an OS with a case folding filesystem. =
> =3D20
> > > > Then you get what you expect.
> > >=20
> > >=20
> > > I appreciate your opinion but having a case sensitive file system is no=
> t on=3D
> > > my wish list for Windows. Reason: If I would have two files with same =
> name=3D
> > > but different case folding in the same directory, after one year I wou=
> ld n=3D
> > > ot remember, which file is that I'm looking for. The letter case is ver=
> y he=3D
> > > lpful when writing code, but not for every day use in the office. Since=
> not=3D
> > > everybody is a programmer, I thing Microsoft made a very wise decision=
> not=3D
> > > to include the case folding.
> >=20
> > It would be *dumb* to use case alone to distinguish files.
> >=20
> > >=20
> > > >=3D20
> > > > The fact is, the bug here is in your chosen OS, not in Tcl.
> > >=20
> > > Never said this is a bug. I also don't think this is a bug. That said, =
> this=3D
> > > proposal should be seen as an enhancement for the glob command, as Bra=
> d al=3D
> > > ready said.
> > > =
> =20
> >=20
> > --=20
> > Robert Heller -- 978-544-6933
> > Deepwoods Software -- Custom Software Services
> > http://www.deepsoft.com/ -- Linux Administration Services
> > hel...@deepsoft.com -- Webhosting Services
>
> Hi Robert,
>
> So I'm extracting from your answer, that you don't agree with the proposed =
> enhancement of the glob command?

No, I don't see it as a useful or necessary enhancement. There are other ways
to handle things using the existing code.

Alexandru

unread,
Mar 21, 2019, 9:31:12 AM3/21/19
to
I was about to write this as a new TIP on https://core.tcl.tk/tips/doc/trunk/index.md but just realized I need Fossil in order to do that. Seriously? Am I missing something?
0 new messages