Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

a regex to match a unix path?

2,597 views
Skip to first unread message

tomasz konefal

unread,
Feb 28, 2002, 12:02:34 PM2/28/02
to
hello list,

i'm trying to come up with a regular expression which matches a unix
file system path like the following:

/
/usr
/home/user/foo
...

here is what i've got so far:

$path =~ /^\/[\w/]*$/;

which i think will also capture some bad things like:

///
/foo///bar/
...

can anyone point me in the right direction to get this cleaned up?

thanks,
twkonefal

Josef Drexler

unread,
Feb 28, 2002, 12:31:18 PM2/28/02
to
Sorry, tomasz konefal, could you repeat that? I wasn't paying attention:

> hello list,
>
> i'm trying to come up with a regular expression which matches a unix
> file system path like the following:

That's a pretty useless goal, if you ask me. Unix paths can have any
character but a NULL and the directory separator (usually "/").

E.g., "/usr/Hello,\nmy friend/\001\002 ** " is a valid path...

Basically your RE would have to be /[^\000/]/

I think you should attempt to achieve your *real* goal some other way.

> /
> /usr
> /home/user/foo
> ...
>
> here is what i've got so far:
>
> $path =~ /^\/[\w/]*$/;
>
> which i think will also capture some bad things like:
>
> ///
> /foo///bar/

Multiple slashes are no problem in a path, they are collapsed into a single
slash.

--
Josef Drexler | http://publish.uwo.ca/~jdrexler/
---------------------------------+----------------------------------------
Please help Conserve Gravity | Email address is *valid*.
Carry a helium balloon. | Don't remove the "nospam" part.

tomasz konefal

unread,
Feb 28, 2002, 12:58:15 PM2/28/02
to
Josef Drexler wrote:
> That's a pretty useless goal, if you ask me. Unix paths can have any
> character but a NULL and the directory separator (usually "/").
>
> E.g., "/usr/Hello,\nmy friend/\001\002 ** " is a valid path...
>
> Basically your RE would have to be /[^\000/]/
>
> I think you should attempt to achieve your *real* goal some other way.
>
> Multiple slashes are no problem in a path, they are collapsed into a single
> slash.

i am writing an application which removes users' home directories after
their accounts have expired. the base path is stored in a global hash
that's called via 'do FILE'. and the path to the user's home directory
is basically '$CFG{BASE_DIR} . $username'. i'm trying to use
File::Path, but it fails because of tainting checks. since i have to do
a recursive descent into their home folder to unlink files and remove
directories i have to match the full path to the user's home directory
so that i can untaint it. so i'm trying to hack together a nice little
regex to be a bit more secure than "#!/usr/bin/perl -U". :)

perhaps you can point me in a more productive direction? is there a
better way to remove these home directories?

thanks,
twkonefal

tomasz konefal

unread,
Feb 28, 2002, 1:36:47 PM2/28/02
to
foreach $root (@{$roots}) {
if ($root =~ /^(\/|(\/[^\000|^.|^..]+)+)$/) {
$root = $1;
}
else {
print "$root skipped!\n";
next;
}
# delete by recursing into directories matched above
}

would that be good enough to prevent accidents?

thanks,
twkonefal

Josef Drexler wrote:
>> i'm trying to come up with a regular expression which matches a unix
>>file system path like the following:
>
> That's a pretty useless goal, if you ask me. Unix paths can have any
> character but a NULL and the directory separator (usually "/").
>
> E.g., "/usr/Hello,\nmy friend/\001\002 ** " is a valid path...
>
> Basically your RE would have to be /[^\000/]/
>
> I think you should attempt to achieve your *real* goal some other way.

Josef Drexler

unread,
Feb 28, 2002, 1:39:03 PM2/28/02
to
Sorry, tomasz konefal, could you repeat that? I wasn't paying attention:
> i am writing an application which removes users' home directories after
> their accounts have expired. the base path is stored in a global hash
> that's called via 'do FILE'. and the path to the user's home directory
> is basically '$CFG{BASE_DIR} . $username'. i'm trying to use
> File::Path, but it fails because of tainting checks.

'do "FILE"' does not taint the variables, so the only thing that you need
to untaint is $username. That means you should just check username for
whatever characters are valid in your usernames, and it should work.

If you allow only word characters (\w) for example, try

$username =~ /\W/ and die "Username $username invalid";
($username) = $username=~/^(\w+)$/; # untaint; you know it's valid.

If it turns out that you do need to untaint other variables, you should do
it right at the point where they become tainted, i.e. in your case in FILE.
It's easier and safer to check a variable and untaint it as soon as you
can. Don't forget to check that it's valid though!

--
Josef Drexler | http://publish.uwo.ca/~jdrexler/
---------------------------------+----------------------------------------
Please help Conserve Gravity | Email address is *valid*.

Boycott multistory buildings. | Don't remove the "nospam" part.

Josef Drexler

unread,
Feb 28, 2002, 1:45:32 PM2/28/02
to
Sorry, tomasz konefal, could you repeat that? I wasn't paying attention:
> foreach $root (@{$roots}) {
> if ($root =~ /^(\/|(\/[^\000|^.|^..]+)+)$/) {
> $root = $1;
> }
> else {
> print "$root skipped!\n";
> next;
> }
> # delete by recursing into directories matched above
>}
>
> would that be good enough to prevent accidents?

Hard to say, it depends where you get the @$roots from. Are they from a
file you write yourself, or something a non-security-conscious person would
write?

I assume that what you want to prevent is
a) removing system directories
b) removing user directories for the wrong users

a) would be fairly easy if all users are under a common hierarchy, e.g.
/home, then you just check that the path starts with /home.

b) should be easy if you just check that you're removing /home/$username,
where $username is in fact a valid username, and doesn't contain slashes or
start with one or two dots. That should be safe.

In other cases, well, it depends on how your system is set up and what
characters you allow in user names.

--
Josef Drexler | http://publish.uwo.ca/~jdrexler/
---------------------------------+----------------------------------------
Please help Conserve Gravity | Email address is *valid*.

Don't do push ups | Don't remove the "nospam" part.

0 new messages