Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

File::Path::mkpath() incompatiblity in perl-5.10

209 views
Skip to first unread message

Gisle Aas

unread,
May 7, 2008, 3:40:55 PM5/7/08
to Perl 5 Porters, da...@landgren.net
I had some production code using File::Path::mkpath() that started to
misbehave after upgrading to perl-5.10. The bug is demonstrated with
the following script:

#!/usr/bin/perl
use File::Path qw(mkpath);
print "$File::Path::VERSION\n";
mkpath("foo", !shift, 0755);
rmdir("foo");

This script will create a directory called "493" in the current
directory if invoked with 1 as argument. This did not happen with
perl-5.8 and earlier.

--Gisle

David Landgren

unread,
May 7, 2008, 5:51:31 PM5/7/08
to Gisle Aas, Perl 5 Porters

That's my fault. I'll see if I can make it figure out how to do the
right thing. In the meantime,

mkpath(["foo"], !shift, 0755);

will get you through the night.

Sorry about that,
David

David Landgren

unread,
May 8, 2008, 5:02:44 AM5/8/08
to Gisle Aas, Perl 5 Porters

This took a while to replicate because the bug only manifests itself if
the working directory is empty. Without looking at the code I can't
imagine why that would be. Fix forthcoming.

Thanks,
David

David Landgren

unread,
May 8, 2008, 5:49:19 AM5/8/08
to Gisle Aas, Perl 5 Porters
Gisle Aas wrote:
> I had some production code using File::Path::mkpath() that started to
> misbehave after upgrading to perl-5.10. The bug is demonstrated with

This has been fixed in File-Path 2.06 which has been released to CPAN.

The patch also contains a couple of documentation typo fixes and tweaks
(I took the liberty of removing the email addresses of Charles Bailey
and Tim Bunce as these are just spam magnets nowadays).

rmtree() also produces better diagnostics and behaves gracefully if you
try to remove the directory you are in (as this otherwise triggers the
race prevention code and aborts).

Patch against blead.

Thanks,
David

File-Path.pm.patch
File-Path.t.patch

Gisle Aas

unread,
May 8, 2008, 5:58:00 AM5/8/08
to David Landgren, Perl 5 Porters

That's strange as I certainly don't need an empty working directory
for it to happen here.

I think that the only safe way to approach this is to only assume the
new style calling convention when the last argument passed is a hash
reference. Trying some pattern matching to guess if the second
argument is a filename or a boolean is doomed to fail.

--Gisle

Gisle Aas

unread,
May 8, 2008, 6:05:29 AM5/8/08
to David Landgren, Perl 5 Porters
On May 8, 2008, at 11:49, David Landgren wrote:
>
> @@ -57,7 +58,7 @@
> UNIVERSAL::isa($_[0],'ARRAY')
> or (@_ == 2 and (defined $_[1] ? $_[1] =~ /\A\d+\z/ : 1))
> or (@_ == 3
> - and (defined $_[1] ? $_[1] =~ /\A\d+\z/ : 1)
> + and (defined $_[1] ? $_[1] =~ /\A\d*\z/ : 1)
> and (defined $_[2] ? $_[2] =~ /\A\d+\z/ : 1)
> )
> ) ? 1 : 0;

I don't find it unreasonable that there exist code used to call the
mkpath() function like this:

mkpath($dir, "verbose", 0755)

Making this code now create directories called "verbose" and "493" is
just wrong.

--Gisle

David Landgren

unread,
May 10, 2008, 5:46:03 PM5/10/08
to Perl 5 Porters

Please do not apply this patch. There is an error in the test suite,
using close instead of closedir (I thought a change had been made to
warn or die if one attempted to C<close> a directory handle).

I've release 2.06_01 to CPAN, and if the smokes settle down I'll
re-release it as 2.07. There's also the issue that it breaks
Kwiki::Test; I've begun to investigate what's going on there.

Thanks,
David

Jonathan Rockway

unread,
May 11, 2008, 4:22:18 AM5/11/08
to perl5-...@perl.org

Very interesting! For some reason I've had a directory called "436"
showing up in my homedir from time to time. For some reason, I
suspected Gnus... but maybe it's something perlish. I upgraded to 5.10
and started using Gnus at about the same time, so it's possible that
Perl is the cause but my brain assumed Perl wouldn't do something like
this :)

Anyway, thanks for pointing this out. My understanding of the universe
is slightly more complete :)

Regards,
Jonathan Rockway

--
print just => another => perl => hacker => if $,=$"

David Landgren

unread,
May 14, 2008, 5:53:26 PM5/14/08
to Jonathan Rockway, perl5-...@perl.org

Hmm. Sorry about that. I'm working on a more robust approach that will
put an end to all this.

Thanks,
David

Gisle Aas

unread,
May 15, 2008, 7:03:02 AM5/15/08
to David Landgren, Jonathan Rockway, perl5-...@perl.org
On May 14, 2008, at 23:53, David Landgren wrote:

>
> Hmm. Sorry about that. I'm working on a more robust approach that
> will put an end to all this.

Could you at least indicate what this more robust approach would be?

I really want an mkpath() function in 5.10 that's 100% compatible
with the interface documented in 5.8. The only way I can see that
happening is if you require a hash ref as the last argument to
trigger the new interface, or that you simply move the new interface
to a new function name and leave the old interface alone. Using the
name make_path() for the "modern" interface seems like an improvement
in readability as well, and would be my preferred approach.

--Gisle

David Landgren

unread,
May 15, 2008, 10:31:27 AM5/15/08
to Gisle Aas, Jonathan Rockway, perl5-...@perl.org
Gisle Aas wrote:
> On May 14, 2008, at 23:53, David Landgren wrote:
>
>>
>> Hmm. Sorry about that. I'm working on a more robust approach that will
>> put an end to all this.
>
> Could you at least indicate what this more robust approach would be?

It's quite simple, basically, don't try to infer anything about the
second param, just its presence.

The old style is triggered by any of

- one parameter passed
- two parameters passed and second is numeric
- three parameters passed; second and third parameters are numeric

The mistake was to see if the second param was numeric (as per your
"verbose" argument). So the new way is merely:

- one parameter passed
- two parameters passed
- three parameters passed and third parameters is numeric

> I really want an mkpath() function in 5.10 that's 100% compatible with
> the interface documented in 5.8. The only way I can see that happening

So do I; I thought I had done so. Despite your misgivings, I still think
it is possible.

> is if you require a hash ref as the last argument to trigger the new
> interface,

If the routine is called with four or more params, it must be the new
style, because the old style only looked at three params at the most.

No promises were made either way regarding the handling of arguments
beyond those documented, if mkdir('foo', 1, 0755, 'bar') now creates
four directories instead of one, it's because of a latent bug in the
original client code, since the programmer did not read the
documentation correctly.

I don't think we need to bend over quite so far in order to maintain
backwards compatibility, only for code that respected the API.

> or that you simply move the new interface to a new function
> name and leave the old interface alone. Using the name make_path() for
> the "modern" interface seems like an improvement in readability as well,
> and would be my preferred approach.

I wish I had received this feedback a year ago. There might be code out
there now that relies on the new style mkpath(), so it can't be stripped
out now.

There are two other issues I am addressing, the fact that (stat $foo)[1]
(the inode) always returns 0 on Windows (and thus prevents UNC paths
from being removed), and asking to remove the current directory (or
ancestor), for which the code now croaks when it tries to chdir back to
the original directory at the end of the run, and it's been blown away.

The solution to the first can be solved if Win32::IdentifyFile is available.

The second is a hassle. Suppose one is in /path/to/some/dir.

Given

rmtree( '../../to', 'here/too', {verbose=>0} );

(I keep writing 'rmpath', maybe that needs to be aliased as well...)

we are first asked to remove /path/to. When the code finishes, the cwd
is /path. We then try to chdir back to /path/to/some/dir, but this is
one of those cases where we simply cannot restore things to the initial
state. Thus the cwd is irrevocably changed.

Normally this occurs in tear-down situations, when we're cleaning up
just before the end of the program, so if the cwd was left hanging in
limbo (as in the 1.x version) no-one ever noticed.

All this changed when the race prevention code was added, which requires
manipulating the cwd.

Right now, the code croaks in this situation. If we don't croak, then in
the above snippet, we would begin to remove /path/here/too, which would
definitely not be a good idea. (It was /path/to/some/dir/here/too that
was queued for removal).

So I'm thinking that the right things to do is to not croak and just
ignore the remaining directories in the list to be unlinked, and the
current directory remains changed. (I find the idea of attempting to
renormalise relative and absolute directories from one point to another
in the file system fraught with peril, even if there are modules that do
this sort of thing. I don't want to be responsible for someone unlinking
the root directory).

David

Jan Dubois

unread,
May 15, 2008, 11:47:57 AM5/15/08
to David Landgren, Gisle Aas, Jonathan Rockway, perl5-...@perl.org
On Thu, 15 May 2008, David Landgren wrote:
> There are two other issues I am addressing, the fact that (stat $foo)[1]
> (the inode) always returns 0 on Windows (and thus prevents UNC paths
> from being removed), and asking to remove the current directory (or
> ancestor), for which the code now croaks when it tries to chdir back to
> the original directory at the end of the run, and it's been blown away.
>
> The solution to the first can be solved if Win32::IdentifyFile is available.

Note that all the information returned by Win32::IdentifyFile is already
available inside our win32_stat() function (unless you set
$^WIN32_SLOPPY_STAT to a true value). So we could cheaply create a fake
inode from it.

The problem is that the file id can be up to 64 bits (of which only 48
are used on NTFS I think, but it is up to the file system driver to
implement this), and st_ino is only 16 bits in the standard "struct
stat" as defined by MSVCRT.dll. A trivial way to fake an inode would be
to just xor the 64 bit file id in 16 bit chunks to fold it, but of
course you cannot avoid collisions this way. One advantage of this is
that it could be integrated into 5.10 and 5.8 if we wanted to.

Another approach might be to redefine Stat_t to our own definition and
use 64 bits for the inode inside Perl. I'm not sure if this would be
a problem for XS code, or for integration into 5.10.

Just for my information, how does the inode field prevent UNC paths from
being removed and doesn't affect other paths?



> The second is a hassle. Suppose one is in /path/to/some/dir.
>
> Given
>
> rmtree( '../../to', 'here/too', {verbose=>0} );

[...]



> So I'm thinking that the right things to do is to not croak and just
> ignore the remaining directories in the list to be unlinked, and the
> current directory remains changed.

I prefer to keep the croak. Changing the working directory as an
unintended side effect should be a fatal error:

rmtree( '../../to' );
rmtree( 'here/too' );

How do you prevent the second rmtree() from removing the wrong tree?

Cheers,
-Jan


Gisle Aas

unread,
May 15, 2008, 12:37:02 PM5/15/08
to David Landgren, Jonathan Rockway, perl5-...@perl.org
On May 15, 2008, at 16:31, David Landgren wrote:

> Gisle Aas wrote:
>> On May 14, 2008, at 23:53, David Landgren wrote:
>>>
>>> Hmm. Sorry about that. I'm working on a more robust approach that
>>> will put an end to all this.
>> Could you at least indicate what this more robust approach would be?
>
> It's quite simple, basically, don't try to infer anything about the
> second param, just its presence.
>
> The old style is triggered by any of
>
> - one parameter passed
> - two parameters passed and second is numeric
> - three parameters passed; second and third parameters are numeric
>
> The mistake was to see if the second param was numeric (as per your
> "verbose" argument). So the new way is merely:
>
> - one parameter passed
> - two parameters passed
> - three parameters passed and third parameters is numeric

This scheme makes it really complicated to describe what the
interface to mkpath() is, and make the new style interface without
hash as final arg quite inconsistent. It only works if you pass 1 or
4 or more directory names to be created, of 3 where the 3rd name
isn't numeric. I find this really ugly.

>
>> I really want an mkpath() function in 5.10 that's 100% compatible
>> with the interface documented in 5.8. The only way I can see that
>> happening
>
> So do I; I thought I had done so. Despite your misgivings, I still
> think it is possible.

I agree that the new rules above make it compatible with 5.8, but the
cost is to make the exceptions in the new style interface quite
strange and one where you easily produce buggy programs. Consider:

my @dirs = generate_list_of_dirs_that_should_be_there();

mkpath(@dirs); # will this do the right thing or not?

>
>> is if you require a hash ref as the last argument to trigger the
>> new interface,
>
> If the routine is called with four or more params, it must be the
> new style, because the old style only looked at three params at the
> most.
>
> No promises were made either way regarding the handling of
> arguments beyond those documented, if mkdir('foo', 1, 0755, 'bar')
> now creates four directories instead of one, it's because of a
> latent bug in the original client code, since the programmer did
> not read the documentation correctly.
>
> I don't think we need to bend over quite so far in order to
> maintain backwards compatibility, only for code that respected the
> API.
>
> > or that you simply move the new interface to a new function
>> name and leave the old interface alone. Using the name make_path
>> () for the "modern" interface seems like an improvement in
>> readability as well, and would be my preferred approach.
>
> I wish I had received this feedback a year ago. There might be code
> out there now that relies on the new style mkpath(), so it can't be
> stripped out now.

I wished I've noticed this then too, but I do belive it's better to
fix this mistake now. If you silently (or with a warning) continue
support new style mkpath with {} at the end I don't belive there is
that much code that would break. I would also guess that most calls
to mkpath() only pass in a single directory name and that code should
be compatible regardless.

--Gisle

David Landgren

unread,
May 15, 2008, 1:50:52 PM5/15/08
to Jan Dubois, Gisle Aas, perl5-...@perl.org
Jan Dubois wrote:

>> So I'm thinking that the right things to do is to not croak and just
>> ignore the remaining directories in the list to be unlinked, and the
>> current directory remains changed.
>
> I prefer to keep the croak. Changing the working directory as an
> unintended side effect should be a fatal error:
>
> rmtree( '../../to' );
> rmtree( 'here/too' );
>
> How do you prevent the second rmtree() from removing the wrong tree?

Hmm. You don't. That's good enough for me.

Thanks,
David

Dr.Ruud

unread,
May 15, 2008, 2:28:25 PM5/15/08
to perl5-...@perl.org
Gisle Aas schreef:

> my @dirs = generate_list_of_dirs_that_should_be_there();
>
> mkpath(@dirs); # will this do the right thing or not?

Please make that "mkpath( \@dirs )".

--
Affijn, Ruud

"Gewoon is een tijger."

David Landgren

unread,
May 17, 2008, 12:13:46 PM5/17/08
to Dr.Ruud, perl5-...@perl.org
Dr.Ruud wrote:
> Gisle Aas schreef:
>
>> my @dirs = generate_list_of_dirs_that_should_be_there();
>>
>> mkpath(@dirs); # will this do the right thing or not?
>
> Please make that "mkpath( \@dirs )".

Well that's just it. The old interface used a hokey so 1980s
first-param-is-a-scalar-or-a-reference interface. What Gisle is saying
is that you have no idea what
generate_list_of_dirs_that_should_be_there() may return. It may
therefore trigger the old interpretation, or the new, depending on the
phase of the moon.

When the discussion came up for renovating File::Path came up, the two
problems of positional unnamed parameters and scalar-or-ref could be
solved with a straight list, followed by a hashref of args to tweak the
behaviour.

I also wanted to add new functionality, such as delete everything under
the directory except for the directory itself. And I wanted it to be
easy for people to refactor their code with a minimum of fuss to get rid
of the dinky scalar/ref stuff and so on, and make calls to mkpath() and
rmtree() look exactly the same as calls to 99.9999% of other modules.
What's with the [ref] shit?

I cannot say why I made a mental blockage on introducing a new function
into the namespace. In retrospect it seems silly. If someone is going to
fiddle with the arguments to turn C<,1 ,1> into C<{safe => 1, verbose =>
1}>, it certainly would not kill them to edit the function name at the
same time to change it from rmtree to unlink_tree (or whatever, I'll be
happy to listen to a bikeshed argument on that one).

Be that as it may, I made a serious error trying to shoehorn both styles
into the one. The next release will change this situation.

David

Demerphq

unread,
May 18, 2008, 5:42:37 AM5/18/08
to David Landgren, Dr.Ruud, perl5-...@perl.org
2008/5/17 David Landgren <da...@landgren.net>:

Lots of people reviewed your proposals, including me, and noone
suggested this, so if a "serious" error has been made responsibility
for it does not rest solely with you.

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

David Landgren

unread,
May 30, 2008, 5:59:32 PM5/30/08
to Jan Dubois, Gisle Aas, Jonathan Rockway, perl5-...@perl.org
Jan Dubois wrote:
> On Thu, 15 May 2008, David Landgren wrote:
>> There are two other issues I am addressing, the fact that (stat $foo)[1]
>> (the inode) always returns 0 on Windows (and thus prevents UNC paths
>> from being removed), and asking to remove the current directory (or
>> ancestor), for which the code now croaks when it tries to chdir back to
>> the original directory at the end of the run, and it's been blown away.
>>
>> The solution to the first can be solved if Win32::IdentifyFile is available.
>
> Note that all the information returned by Win32::IdentifyFile is already
> available inside our win32_stat() function (unless you set
> $^WIN32_SLOPPY_STAT to a true value). So we could cheaply create a fake
> inode from it.

C:\temp>perl -e "$^WIN32_SLOPPY_STAT=1"
Bareword found where operator expected at -e line 1, near
"$^WIN32_SLOPPY_STAT"
(Missing operator before IN32_SLOPPY_STAT?)
syntax error at -e line 1, near "$^WIN32_SLOPPY_STAT"
Execution of -e aborted due to compilation errors.

Oh hang on, I don't have AS Perl on this machine, only Strawberry.

> The problem is that the file id can be up to 64 bits (of which only 48
> are used on NTFS I think, but it is up to the file system driver to
> implement this), and st_ino is only 16 bits in the standard "struct
> stat" as defined by MSVCRT.dll. A trivial way to fake an inode would be
> to just xor the 64 bit file id in 16 bit chunks to fold it, but of
> course you cannot avoid collisions this way. One advantage of this is
> that it could be integrated into 5.10 and 5.8 if we wanted to.
>
> Another approach might be to redefine Stat_t to our own definition and
> use 64 bits for the inode inside Perl. I'm not sure if this would be
> a problem for XS code, or for integration into 5.10.
>
> Just for my information, how does the inode field prevent UNC paths from
> being removed and doesn't affect other paths?

Getting back to working on this again.

Actually, it's worse than I thought. The approach doesn't work on local
drives either. Consider:

cd \temp # or any non-root directory

C:\temp>perl -le "$,=' ';print +(stat '.')[0,1];print +(stat '..')[0,1]"
2 0
2 0

I would expect the inode value to change. I'm sort of surprised no-one's
commented on this before.

Later,
David

--
stubborn tiny lights vs. clustering darkness forever ok?

Jan Dubois

unread,
May 30, 2008, 6:39:13 PM5/30/08
to David Landgren, Gisle Aas, Jonathan Rockway, perl5-...@perl.org
On Fri, 30 May 2008, David Landgren wrote:
> Jan Dubois wrote:
> > On Thu, 15 May 2008, David Landgren wrote:
> >> There are two other issues I am addressing, the fact that (stat $foo)[1]
> >> (the inode) always returns 0 on Windows (and thus prevents UNC paths
> >> from being removed), and asking to remove the current directory (or
> >> ancestor), for which the code now croaks when it tries to chdir back to
> >> the original directory at the end of the run, and it's been blown away.
> >>
> >> The solution to the first can be solved if Win32::IdentifyFile is available.
> >
> > Note that all the information returned by Win32::IdentifyFile is already
> > available inside our win32_stat() function (unless you set
> > $^WIN32_SLOPPY_STAT to a true value). So we could cheaply create a fake
> > inode from it.
>
> C:\temp>perl -e "$^WIN32_SLOPPY_STAT=1"
> Bareword found where operator expected at -e line 1, near
> "$^WIN32_SLOPPY_STAT"
> (Missing operator before IN32_SLOPPY_STAT?)
> syntax error at -e line 1, near "$^WIN32_SLOPPY_STAT"
> Execution of -e aborted due to compilation errors.
>
> Oh hang on, I don't have AS Perl on this machine, only Strawberry.

This is a core Perl feature; nothing ActivePerl or StrawberryPerl specific.
You just have to write it as ${^WIN32_SLOPPY_STAT} because '^' is not a
word character. Note that the file index information is not available
at the Perl level, only at the C level inside win32_stat(). I was only
trying to say that we could fake up our own inode in there if we wanted to.



> > The problem is that the file id can be up to 64 bits (of which only 48
> > are used on NTFS I think, but it is up to the file system driver to
> > implement this), and st_ino is only 16 bits in the standard "struct
> > stat" as defined by MSVCRT.dll. A trivial way to fake an inode would be
> > to just xor the 64 bit file id in 16 bit chunks to fold it, but of
> > course you cannot avoid collisions this way. One advantage of this is
> > that it could be integrated into 5.10 and 5.8 if we wanted to.

I did some experiments with this, and surprisingly noticed that I get
the least number of collisions if I just use the lowest 16 bits of the
file index instead of xor-ing it together (on NTFS). Maybe there is
some redundancy in the file index, and the xor is therefore canceling
out relevant information.

But I don't think having an inode field with collisions is very useful,
so I don't think we should bother with this approach.

> > Another approach might be to redefine Stat_t to our own definition and
> > use 64 bits for the inode inside Perl. I'm not sure if this would be
> > a problem for XS code, or for integration into 5.10.

Haven't found time to think about this some more yet.

> > Just for my information, how does the inode field prevent UNC paths from
> > being removed and doesn't affect other paths?
>
> Getting back to working on this again.
>
> Actually, it's worse than I thought. The approach doesn't work on local
> drives either. Consider:
>
> cd \temp # or any non-root directory
>
> C:\temp>perl -le "$,=' ';print +(stat '.')[0,1];print +(stat '..')[0,1]"
> 2 0
> 2 0
>
> I would expect the inode value to change. I'm sort of surprised no-one's
> commented on this before.

The inode field is *always* 0 under MSWin32 AFAIK. Documented in perlport.pod:

| device and inode are not meaningful. (Win32)

That's why I was wondering how things are different for UNC paths.

Cheers,
-Jan


David Landgren

unread,
May 31, 2008, 2:55:49 AM5/31/08
to Jan Dubois, Gisle Aas, Jonathan Rockway, perl5-...@perl.org

I don't really care about mapping the number back to a filesystem
entity. It just needs to be a unique, opaque cookie.

Actually, thinking a whole more about this, I'm not even sure it's
possible to create the race condition that exists on Unix.

On Unix, with its reference count implementation, you can say rmdir
`pwd` and it will succeed. You just can't chdir .. afterwards.

On Windows, you cannot:

C:\temp\foo>cd| perl -ne "chomp;rmdir $_ or warn qq{$_: $!}"
C:\temp\foo: Permission denied at -e line 1, <> line 1.

Furthermore, if I have one shell windows set to c:\temp\foo\bar,
and another window set to c:\temp, in the latter window I cannot rename
c:\temp\foo to c:\temp\zyxxy, I get an Access Denied error.

So I'm beginning to think that Windows is impervious to the attack that
can be mounted on Unix. In which case the whole point becomes moot.

>>> Another approach might be to redefine Stat_t to our own definition and
>>> use 64 bits for the inode inside Perl. I'm not sure if this would be
>>> a problem for XS code, or for integration into 5.10.
>
> Haven't found time to think about this some more yet.
>
>>> Just for my information, how does the inode field prevent UNC paths from
>>> being removed and doesn't affect other paths?
>> Getting back to working on this again.
>>
>> Actually, it's worse than I thought. The approach doesn't work on local
>> drives either. Consider:
>>
>> cd \temp # or any non-root directory
>>
>> C:\temp>perl -le "$,=' ';print +(stat '.')[0,1];print +(stat '..')[0,1]"
>> 2 0
>> 2 0
>>
>> I would expect the inode value to change. I'm sort of surprised no-one's
>> commented on this before.
>
> The inode field is *always* 0 under MSWin32 AFAIK. Documented in perlport.pod:
>
> | device and inode are not meaningful. (Win32)

Damn!

> That's why I was wondering how things are different for UNC paths.

Actually they're not. I thought only UNC paths showed this behaviour,
but local paths do too. But if what I said above holds, then all this
checking can be ignored on MSWin32 platforms. I imagine Cygwin would be
immune as well.

Thanks,

0 new messages