How to determine if a directory is a subdirectory of another directory?

Arjen Markus

unread,

Nov 26, 2008, 12:27:56 AM11/26/08

to

Hello,

I ran into a problem with copying files and it turns out to be due to
the fact
that I was copying all files in a partcular directory into another
directory that
was a subdirectory of the original one, like:

Directory contents:
.
..
work/
example.inp

Script:
foreach f [glob *] {
file copy $f work
}

which led to an almost endless loop (the path became too long, that is
why it stopped)

So I wonder how I can best avoid this situation. For instance: if I
have the strings "example"
and "../example/work" and I know that the directory "example" exists
(but not necessarily the
other one), how can I tell that the second is a subdirectory of the
first?

Is normalizing both and then examine the result via [file split] the
most robust way?
Something along these lines:

set dir1 [file normalize "example"]
set dir2 [file normalize "../example/work"]

foreach p1 [file split $dir1] p2 [file split $dir2] {
if { $p1 != $p2 && $p1 != "" && $p2 != "" } {
puts "Subdirectories!"
}
}

(Well, the above does not quite work, but I hope my question is clear
enough. It could probably
serve as a basis)

Regards,

Arjen

Donald Arseneau

unread,

Nov 26, 2008, 2:43:36 AM11/26/08

to

On Nov 25, 9:27 pm, Arjen Markus <arjen.mar...@wldelft.nl> wrote:
> I ran into a problem with copying files and it turns out to be due to
> the fact
> that I was copying all files in a partcular directory into another
> directory that
> was a subdirectory of the original one,

> So I wonder how I can best avoid this situation. For instance: if I
> have the strings "example"
> and "../example/work" and I know that the directory "example" exists
> (but not necessarily the
> other one), how can I tell that the second is a subdirectory of the
> first?
>
> Is normalizing both and then examine the result via [file split] the
> most robust way?

I don't think you need to split the components.

Given variables sourcedir and destdir

if { [string match [file normalize $sourcedir]/* [file normalize
$destdir]] } {
return -code error "Cannot copy into a subdirectory of source dir"

Arjen Markus

unread,

Nov 26, 2008, 3:06:14 AM11/26/08

to

> > Arjen- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Right! That seems quite useful and certainly concise.

Thanks,

Arjen

Andreas Leitgeb

unread,

Nov 26, 2008, 3:38:44 AM11/26/08

to

Arjen Markus <arjen....@wldelft.nl> wrote:
> Directory contents:

> work/
> example.inp
> Script:
> foreach f [glob *] {
> file copy $f work
> }
> which led to an almost endless loop (the path became too long, that is
> why it stopped)

Comparing the normalized source and target (each with a slash appended!)
for substring-equality should suffice, unless you're on a platform that
supports symlinks, where you'd first have to resolve all symlinks along
the path, which is a non-trivial task.

Linux's "cp" detects such situations:
cp: cannot copy a directory, `work', into itself, `work/work'
(but still creates an empty directory `work/work')
Solaris's "cp" runs into the same trouble as tcl.
So, if you're on linux, you could [exec cp -r ...]

Perhaps a TIP to get this protection into tcl's file command might
be adequate (as I'd expect symlinks to be another possible cause of
trouble).

I'm not sure, if the problem is principially solveable at all.
Perhaps by keeping an array of the created inodes and refuse to
re-copy any of these?

Arjen Markus

unread,

Nov 26, 2008, 4:02:40 AM11/26/08

to

On 26 nov, 09:38, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

A quick check does indeed show that with symlinks the discussed
algorithm fails miserably (that is: [file normalize] does not
return the name of the file being linked to, so you end up with
two different names for the same thing).

The usecase I have in mind should not be this involved, but it
would be nice to crush this problem before it really becomes
a problem.

Regards,

Arjen

Glenn Jackman

unread,

Nov 26, 2008, 10:07:35 AM11/26/08

to

At 2008-11-26 12:27AM, "Arjen Markus" wrote:
> Hello,
>
> I ran into a problem with copying files and it turns out to be due to
> the fact
> that I was copying all files in a partcular directory into another
> directory that
> was a subdirectory of the original one, like:
>
> Directory contents:
> .
> ..
> work/
> example.inp
>
> Script:
> foreach f [glob *] {
> file copy $f work
> }
>
> which led to an almost endless loop (the path became too long, that is
> why it stopped)

If you only want to find files: glob -type f -- *

--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous

Arjen Markus

unread,

Nov 26, 2008, 10:21:27 AM11/26/08

to

> Write a wise saying and your name will live forever. -- Anonymous- Tekst uit oorspronkelijk bericht niet weergeven -

>
> - Tekst uit oorspronkelijk bericht weergeven -

Ah, no, the problem arose with the application I put on the
Wiki: http://wiki.tcl.tk/22011. It is definitely copying the
contents of a directory (and perhaps subdirectories).

Regards,

Arjen

Gerald W. Lester

unread,

Nov 26, 2008, 10:26:24 AM11/26/08

to

Arjen Markus wrote:
>...

> A quick check does indeed show that with symlinks the discussed
> algorithm fails miserably (that is: [file normalize] does not
> return the name of the file being linked to, so you end up with
> two different names for the same thing).
>
> The usecase I have in mind should not be this involved, but it
> would be nice to crush this problem before it really becomes
> a problem.

Appropriate use of [file type] and [file link] (or [catch] and [file link])
as well as normalize will get you the "true" absolute paths to the source
and destination.

That being said, in the subdirectories of either could be links to elsewhere
in the tree.

--
+------------------------------------------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+

Andreas Leitgeb

unread,

Nov 27, 2008, 3:43:20 AM11/27/08

to

Gerald W. Lester <Gerald...@cox.net> wrote:

> Arjen Markus wrote:
>> A quick check does indeed show that with symlinks the discussed

>> algorithm fails miserably ...

> Appropriate use of [file type] and [file link] (or [catch] and [file link])
> as well as normalize will get you the "true" absolute paths to the source
> and destination.

the (partially pseudo-)code would be like that:

set fnl [file split $fname]; set ready 0
while {!$ready} {
set ready 1
for {set i 0} {$i<[llength $fnl} {incr i} {
set sfn [file join {*}[lrange $fnl 0 $i]]
if $sfn is symlink {
set fnl [file join+normalize+split \
[link-target of $sfn] [lrange $fnl $i+1 end]]
set ready 0; break
}
}
puts ... [file join {*}$fnl]
}
set fname [file join {*}$fnl]

Rather than splitting, one could also successively search for
occurrances of file separator characters.
At some point one should also check for possible cycles...
But even without loops, one can face exponential (in directory
depth) effort to resolve all symlinks in worst case.

Here's a little testcase:

set aux {8 9 7 8 6 7 5 6 4 5 3 4 2 3 1 2 0 1}; set dir ""
for {set i 0} {$i < $N} {incr i} {
file mkdir 9; foreach {l t} $aux {exec ln -s $dir$t $l}
cd "9"; set dir "../${dir}0/";
}
# btw., tcl's [file link] normalizes the target, thus is not
# usable for this misuse (and other (less mis-)uses).

Andreas Kupries

unread,

Dec 2, 2008, 12:28:48 AM12/2/08

to

"Gerald W. Lester" <Gerald...@cox.net> writes:

> Arjen Markus wrote:
>>...
>> A quick check does indeed show that with symlinks the discussed
>> algorithm fails miserably (that is: [file normalize] does not
>> return the name of the file being linked to, so you end up with
>> two different names for the same thing).

The problem here is that 'file normalize' doesn't touch the last
segment in the path, i.e. the 'file', only the 'directories', right ?

>> The usecase I have in mind should not be this involved, but it
>> would be nice to crush this problem before it really becomes
>> a problem.

> Appropriate use of [file type] and [file link] (or [catch] and [file
> link]) as well as normalize will get you the "true" absolute paths to
> the source and destination.

A simpler solution is to add a dummy path segment to the path in
question, normalize, then strip the dummy off again. The dummy takes
the part of the untouched 'file', causing the real 'file' to be
resolved should it be a symlink.

proc fullnormalize {path} {
# SNARFED from tcllib, fileutil.
return [file dirname [file normalize [file join $path __dummy__]]]
}

--
So long,
Andreas Kupries <akup...@shaw.ca>
<http://www.purl.org/NET/akupries/>
Developer @ <http://www.activestate.com/>
-------------------------------------------------------------------------------