Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug in glob.glob for files w/o extentions in Windows

5 views
Skip to first unread message

Georgy Pruss

unread,
Nov 29, 2003, 10:47:38 PM11/29/03
to
On Windows XP glob.glob doesn't work properly for files without extensions.
E.g. C:\Temp contains 4 files: 2 with extensions, 2 without.

C:\Temp>dir /b *
aaaaa.aaa
bbbbb.bbb
ccccc
ddddd

C:\Temp>dir /b *.
ccccc
ddddd

C:\Temp>python
Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import glob

>>> glob.glob( '*' )
['aaaaa.aaa', 'bbbbb.bbb', 'ccccc', 'ddddd']

>>> glob.glob( '*.' )
[]

It looks like a bug.

Georgy
--
Georgy Pruss
E-mail: 'ZDAwMTEyMHQwMzMwQGhvdG1haWwuY29t\n'.decode('base64')


Georgy Pruss

unread,
Nov 29, 2003, 11:30:09 PM11/29/03
to
OK, you can call it not a bug, but different behavior.
I've found that the fnmatch module is the reason for that.
Here's other examples:

C:\temp>dir /b *.*
.eee
aaa.aaa
nnn

C:\temp>dir /b * # it's by def synonym for *.*
.eee
aaa.aaa
nnn

C:\temp>dir /b .*
.eee

C:\temp>dir /b *. # it looks strange too
.eee
nnn


C:\temp>python
>>> import glob

>>> glob.glob('*.*')
['aaa.aaa']

>>> glob.glob('*')
['aaa.aaa', 'nnn']

>>> glob.glob('.*')
['.eee']

>>> glob.glob('*.')
[]


It seems that in any case I'll have to extract 'nnn' by myself.
Something like:

if mask.endswith('.'): # no extention implies actually no dots in name at all
list = glob.glob( mask[:-1] )
list = filter( lambda x: '.' not in x, list ) # or [x for x in list if '.' not in x]
else:
list = glob.glob( mask )

G-:


Tim Peters

unread,
Nov 29, 2003, 11:16:19 PM11/29/03
to pytho...@python.org
[Georgy Pruss]

> On Windows XP glob.glob doesn't work properly for files without
> extensions.

I'd say it's Microsoft's dir that doesn't work properly in this case.

> E.g. C:\Temp contains 4 files: 2 with extensions, 2
> without.
>
> C:\Temp>dir /b *
> aaaaa.aaa
> bbbbb.bbb
> ccccc
> ddddd
>
> C:\Temp>dir /b *.
> ccccc
> ddddd

Why on Earth should a pattern specifying a period match filenames that don't
contain a period? Would you expect

*x

to match

abc

? Nope. It's dir that special-cases the snot out of a period, not Python's
glob. For cross-platform sanity, glob has to work the same way across
platforms, and the Unixish shells have the obvious, explainable,
unsurprising semantics here:

$ ls *
aaaaa.aaa bbbbb.bbb ccccc ddddd

$ ls *.
ls: *.: No such file or directory

$ ls *.*
aaaaa.aaa bbbbb.bbb

> C:\Temp>python
> Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)]
> on win32 Type "help", "copyright", "credits" or "license" for more
> information.
> >>> import glob
>
> >>> glob.glob( '*' )
> ['aaaaa.aaa', 'bbbbb.bbb', 'ccccc', 'ddddd']
>
> >>> glob.glob( '*.' )
> []

> It looks like a bug.

Good luck getting Microsoft to fix it <wink>.


Jules Dubois

unread,
Nov 30, 2003, 12:54:04 AM11/30/03
to
On Sun, 30 Nov 2003 03:47:38 GMT, in article
<news:uLdyb.46389$I53.2...@twister.southeast.rr.com>, Georgy Pruss
wrote:

> On Windows XP glob.glob doesn't work properly for files without extensions.
> E.g. C:\Temp contains 4 files: 2 with extensions, 2 without.

> [...]


> C:\Temp>dir /b *.
> ccccc
> ddddd

This is standard Windows behavior. It's compatible with CP/M and therefore
MS-DOS, and Microsoft has preserved this behavior in all versions of
Windows.

Did you ever poke around in the directory system in a FAT partition
(without VFAT)? You'll find that every file name is exactly 11 characters
long and "." is not found in any part of any file name in any directory
entry.

It's bizarre but that's the way it works. If you try

dir /b *

does cmd.exe list only files without extensions?

>>>> glob.glob( '*.' )
> []
>

glob provides "Unix style pathname pattern expansion" as documented in the
_Python Library Reference_: If there's a period (".") in the pattern, it
must match a period in the filename.

> It looks like a bug.

No, it's proper behavior. It's Windows that's (still) screwy.

Georgy Pruss

unread,
Nov 30, 2003, 1:18:36 AM11/30/03
to

"Jules Dubois" <bo...@invalid.tld> wrote in message news:nj2k03e19clm$.uctj11fclu96$.dlg@40tude.net...

| On Sun, 30 Nov 2003 03:47:38 GMT, in article
| <news:uLdyb.46389$I53.2...@twister.southeast.rr.com>, Georgy Pruss
| wrote:
|
| > On Windows XP glob.glob doesn't work properly for files without extensions.
| > E.g. C:\Temp contains 4 files: 2 with extensions, 2 without.
| > [...]
| > C:\Temp>dir /b *.
| > ccccc
| > ddddd
|
| This is standard Windows behavior. It's compatible with CP/M and therefore
| MS-DOS, and Microsoft has preserved this behavior in all versions of
| Windows.

That's what I meant, wanted and liked.

C'mon guys, I don't care if it's FAT, NTFS, Windows, Linux, VMS or whatever.
All I wanted was to get files w/o dots in their names (on my computer :)).
I did it and I can do it on any system if I need.


| Did you ever poke around in the directory system in a FAT partition
| (without VFAT)? You'll find that every file name is exactly 11 characters
| long and "." is not found in any part of any file name in any directory
| entry.
|
| It's bizarre but that's the way it works. If you try
|
| dir /b *
|
| does cmd.exe list only files without extensions?

By definition it's the same as *.* if my memory serves me right.


| >>>> glob.glob( '*.' )
| > []
| >
|
| glob provides "Unix style pathname pattern expansion" as documented in the
| _Python Library Reference_: If there's a period (".") in the pattern, it
| must match a period in the filename.
|
| > It looks like a bug.
|
| No, it's proper behavior. It's Windows that's (still) screwy.

I see.
Show the world a perfect OS and you'll be a billionaire.

G-:


Jules Dubois

unread,
Nov 30, 2003, 3:27:30 AM11/30/03
to
On Sun, 30 Nov 2003 06:18:36 GMT, in article
<news:0Zfyb.48050$dl.21...@twister.southeast.rr.com>, Georgy Pruss wrote:

> "Jules Dubois" <bo...@invalid.tld> wrote in message news:nj2k03e19clm$.uctj11fclu96$.dlg@40tude.net...
>| On Sun, 30 Nov 2003 03:47:38 GMT, in article
>|

> C'mon guys, I don't care if it's FAT, NTFS, Windows, Linux, VMS or whatever.
> All I wanted was to get files w/o dots in their names (on my computer :)).

I was just pointing out the reason for the behavior.

>| dir /b *
>|
>| does cmd.exe list only files without extensions?
>
> By definition it's the same as *.* if my memory serves me right.

I'm sure ".*" was the same as "*.*". Win2k's cmd.exe won't run under Wine,
so I couldn't test "*".

>| No, it's proper behavior. It's Windows that's (still) screwy.
>

> Show the world a perfect OS and you'll be a billionaire.

We agree, then, that every operating system has its good points and its bad
points. (I guess we don't agree on whether "*." should or shouldn't match
files without periods in their name.)

Georgy Pruss

unread,
Nov 30, 2003, 3:59:55 AM11/30/03
to

"Jules Dubois" <bo...@invalid.tld> wrote in message news:b6xinmmkc0wp.16hmc77xoj9t2$.dlg@40tude.net...

|
| We agree, then, that every operating system has its good points and its bad
| points. (I guess we don't agree on whether "*." should or shouldn't match
| files without periods in their name.)

Anyway, "*." is not a bad DOS convention to select files w/o extention, although
it comes from the old 8.3 name scheme. BTW, how can you select files w/o
extention in Unix's shells?

G-:

Francis Avila

unread,
Nov 30, 2003, 4:30:49 AM11/30/03
to
Georgy Pruss wrote in message ...

>OK, you can call it not a bug, but different behavior.


That's true. But calling dir's behavior "different" here is quite a
euphemism!

>It seems that in any case I'll have to extract 'nnn' by myself.
>Something like:
>
> if mask.endswith('.'): # no extention implies actually no dots in
name at all
> list = glob.glob( mask[:-1] )
> list = filter( lambda x: '.' not in x, list ) # or [x for x in list
if '.' not in x]
> else:
> list = glob.glob( mask )
>


I don't understand where 'mask' is coming from. If you want files with no
dots, just filter out those files:

filelist = [file for file in glob.glob('*') if '.' not in file]

Or you can use sets: symmetric difference of all files against the files
with dots.

If you're trying to recast glob in windows' image, you'll have to
specialcase '*.*' too. And then what do you do if someone comes along who
*really* wants *only* names with dots in them!?

Trying to shoehorn windows-style semantics into glob is just braindead--the
windows semantics are wrong because dots are not special anymore. For one
thing, we can have more than one of them, and they can be anywhere in the
filename. Both were not true for DOS, whence windows inherited the *.*
nonsense.

Behold the awesome visage of the One True Glob (TM): (Not that I'm starting
a holy war or anything ;)
*.* -> Filename has a dot in it, and that dot cannot be the first or last
char.
This is NOT the same as '*'!!
.* -> Filename has a dot as the first character.
*. -> Filename has a dot as the last character.
* -> Gimme everything.
--
Francis Avila

Serge Orlov

unread,
Nov 30, 2003, 5:08:57 AM11/30/03
to

> Anyway, "*." is not a bad DOS convention to select files w/o extention, although
> it comes from the old 8.3 name scheme. BTW, how can you select files w/o
> extention in Unix's shells?
The same way as in Python:

filelist = [file for file in glob.glob('*') if '.' not in file]
Shell:
ls|grep -v [.]
Making up special conventions is not the Python way.

-- Serge.


Peter Otten

unread,
Nov 30, 2003, 5:19:21 AM11/30/03
to
Georgy Pruss wrote:

> Anyway, "*." is not a bad DOS convention to select files w/o extention,
> although it comes from the old 8.3 name scheme. BTW, how can you select
> files w/o extention in Unix's shells?

ls -I*.*

The -I option tells the ls command what *not* to show.

Peter

Gerrit Holl

unread,
Nov 30, 2003, 9:23:54 AM11/30/03
to pytho...@python.org
Francis Avila wrote:
> Behold the awesome visage of the One True Glob (TM): (Not that I'm starting
> a holy war or anything ;)
> *.* -> Filename has a dot in it, and that dot cannot be the first or last
> char.
> This is NOT the same as '*'!!
> .* -> Filename has a dot as the first character.
> *. -> Filename has a dot as the last character.
> * -> Gimme everything.

Note that Bash doesn't behave like this either: * does not give
everything, rather it gives everything not starting with a dot. In Bash,
* really means: [!.]*

yours,
Gerrit.

--
102. If a merchant entrust money to an agent (broker) for some
investment, and the broker suffer a loss in the place to which he goes, he
shall make good the capital to the merchant.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger's Syndrome - a personal approach:
http://people.nl.linux.org/~gerrit/english/

Francis Avila

unread,
Nov 30, 2003, 2:50:43 PM11/30/03
to
Gerrit Holl wrote in message ...

>Francis Avila wrote:
>Note that Bash doesn't behave like this either: * does not give
>everything, rather it gives everything not starting with a dot. In Bash,
>* really means: [!.]*


That behavior can be modified with the 'dotglob' shell option:

$shopt -s dotglob
$echo *
a .b b .c d ...

--
Francis Avila


Stein Boerge Sylvarnes

unread,
Nov 30, 2003, 3:12:01 PM11/30/03
to
That's non-standard gnu ls behaviour, I think. (Tested on OpenBSD and SunOS)

>Peter

--
regards/mvh
Stein B. Sylvarnes
stein.s...@student.uib.no

Francis Avila

unread,
Nov 30, 2003, 7:31:10 PM11/30/03
to

Stein Boerge Sylvarnes wrote in message ...

>In article <bqcg7i$olj$03$1...@news.t-online.com>, Peter Otten wrote:
>>Georgy Pruss wrote:
>>
>>> Anyway, "*." is not a bad DOS convention to select files w/o extention,
>>> although it comes from the old 8.3 name scheme. BTW, how can you select
>>> files w/o extention in Unix's shells?


In Windows, how do you create a file with a dot as the last character? In
Unix you can do this, because a dot is just another character in the
filename. It's because you can't do this in Windows that *. is unambiguous.

>>ls -I*.*
>>
>>The -I option tells the ls command what *not* to show.
>>
>That's non-standard gnu ls behaviour, I think. (Tested on OpenBSD and
SunOS)


for N in $(ls -1 | grep -v '\.'); echo $N; done

Not positively sure that the -1 option is posix, but it's at least in
OpenBSD and SunOS (in fact, it's the default when output is not to a
terminal).

Bash also has an extglob option:

$ shopt -s extglob dotglob
$ ls -1 *
a
.b
c.c
d.
$ echo !(*.*)
a

There's also @(), ?(), *(), +(). You can use multiple patterns within the
parens by joining with '|'.
--
Francis Avila

Jules Dubois

unread,
Nov 30, 2003, 7:32:16 PM11/30/03
to
On Sun, 30 Nov 2003 08:59:55 GMT, in article
<news:fkiyb.48948$dl.21...@twister.southeast.rr.com>, Georgy Pruss wrote:

> "Jules Dubois" <bo...@invalid.tld> wrote in message news:b6xinmmkc0wp.16hmc77xoj9t2$.dlg@40tude.net...

>| (I guess we don't agree on whether "*." should or shouldn't match
>| files without periods in their name.)
>
> Anyway, "*." is not a bad DOS convention to select files w/o extention, although
> it comes from the old 8.3 name scheme. BTW, how can you select files w/o
> extention in Unix's shells?

Touche.

Mel Wilson

unread,
Nov 30, 2003, 10:51:59 AM11/30/03
to
In article <0Zfyb.48050$dl.21...@twister.southeast.rr.com>,

"Georgy Pruss" <see_sig...@hotmail.com> wrote:
>
>"Jules Dubois" <bo...@invalid.tld> wrote in message news:nj2k03e19clm$.uctj11fclu96$.dlg@40tude.net...
>| On Sun, 30 Nov 2003 03:47:38 GMT, in article
>| <news:uLdyb.46389$I53.2...@twister.southeast.rr.com>, Georgy Pruss
>| wrote:
>|
>| > On Windows XP glob.glob doesn't work properly for files without extensions.
>| > E.g. C:\Temp contains 4 files: 2 with extensions, 2 without.
>| > [...]
>| > C:\Temp>dir /b *.
>| > ccccc
>| > ddddd
>|
>| This is standard Windows behavior. It's compatible with CP/M and therefore
>| MS-DOS, and Microsoft has preserved this behavior in all versions of
>| Windows.
>
>That's what I meant, wanted and liked.
>
>C'mon guys, I don't care if it's FAT, NTFS, Windows, Linux, VMS or whatever.
>All I wanted was to get files w/o dots in their names (on my computer :)).
>I did it and I can do it on any system if I need.

Looks like you need os.path.glob(), which doesn't exist, yet.

Regards. Mel.

Tim Roberts

unread,
Nov 30, 2003, 11:04:53 PM11/30/03
to
"Georgy Pruss" <see_sig...@hotmail.com> wrote:
>
>"Jules Dubois" <bo...@invalid.tld> wrote in message news:nj2k03e19clm$.uctj11fclu96$.dlg@40tude.net...
>|
>| It's bizarre but that's the way it works. If you try
>|
>| dir /b *
>|
>| does cmd.exe list only files without extensions?
>
>By definition it's the same as *.* if my memory serves me right.

Actually, truth being stranger than fiction, the NT-based systems and the
16-bit systems (95/98/ME) will give you different answers to this
question...
--
- Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Duncan Booth

unread,
Dec 1, 2003, 4:22:29 AM12/1/03
to
"Tim Peters" <tim...@comcast.net> wrote in
news:mailman.1194.107016...@python.org:

> ? Nope. It's dir that special-cases the snot out of a period, not
> Python's glob.

On that basis Python also 'special-cases the snot out of a period': try
win32api.FindFiles('*.') to get the same behaviour as dir.

More accurate would be to say that the operating system has consistent but
weird behaviour, since it is below the application level that trailing dots
are stripped from filenames and ignored from file patterns. The only way to
avoid it on Windows is to retrieve all the filenames and then perform your
own pattern matching on them.

glob.glob is bypassing the operating system's filename matching to provide
behaviour which is consistent across different operating systems, but
inevitably this means that it doesn't match some platform's standard
behaviour.

--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Tim Peters

unread,
Dec 1, 2003, 9:41:00 AM12/1/03
to pytho...@python.org
>> ? Nope. It's dir that special-cases the snot out of a period, not
>> Python's glob.

[Duncan Booth]


> On that basis Python also 'special-cases the snot out of a period':
> try win32api.FindFiles('*.') to get the same behaviour as dir.

It's the intent of win32api.FindFiles to wrap the Win32 API FindFiles()
function, so of course it exposes Win32-specific behavior. It's the purpose
of glob.glob() to give platform-independent results to the extent possible.

> More accurate would be to say that the operating system has
> consistent but weird behaviour, since it is below the application
> level that trailing dots are stripped from filenames and ignored from
> file patterns. The only way to avoid it on Windows is to retrieve all
> the filenames and then perform your own pattern matching on them.

Which glob.glob() does for you.

> glob.glob is bypassing the operating system's filename matching to
> provide behaviour which is consistent across different operating
> systems,

Yes.

> but inevitably this means that it doesn't match some platform's
> standard behaviour.

Of course -- "may not match native platform behavior" is necessarily implied
by "platform-independent".


0 new messages