Google Grupper støtter ikke lenger nye Usenet-innlegg eller -abonnementer. Historisk innhold er fortsatt synlig.

ereg - regexp for NOT matching certain filename extensions

Sett 1 gang
Hopp til første uleste melding

Martin Lucas-Smith

ulest,
26. jan. 2004, 13:12:5926.01.2004
til

Is there some way of using ereg to detect when certain filename extensions
are supplied and to return false if so, WITHOUT using the ! operator
before ereg () ?

I have an API that allows as an input a regular expression, enabling the
administrator to ensure a file upload matches a certain pattern. For
instance, supplying the string

'.exe$|.com$|.bat$|.zip$|.doc$'

means that the file must end with any of these five extensions.

Is there a way that the regexp could be rewritten to say that the file
must NOT end with any of these, without changing the ereg to !ereg - I
can't do the latter because it's within the class.

Any ideas?


Martin

Andy Hassall

ulest,
26. jan. 2004, 16:20:2026.01.2004
til

Not neatly; that'd require a negative lookahead assertion, which is only
supported in Perl-compatible regexes. Or just using ! ... ;-p

I suppose you could take the perverse approach of enumerating all other
three-letter extensions, except those. So, have a series of three character
classes containing all but the 1st, 2nd then 3rd character of each extension.
But you could only check one extension at a time; if you had an alternation,
it'd always match (if it doesn't match the complement of one extension's three
characters, then it must match on of the other patterns).

e.g. for matching extensions except .exe, letting 0,1,2 and 4+ letter
extensions through:

\.[^eE][^xX][^eE]$|\..{0,2}$|\..{4,}$

(yuk!)

--
Andy Hassall <an...@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>

John Dunlop

ulest,
26. jan. 2004, 17:14:5726.01.2004
til
Andy Hassall wrote:

> On Mon, 26 Jan 2004 18:12:59 +0000, Martin Lucas-Smith <mv...@cam.ac.uk> wrote:
>
> >Is there some way of using ereg to detect when certain filename extensions
> >are supplied and to return false if so, WITHOUT using the ! operator
> >before ereg () ?
>

> Not neatly; that'd require a negative lookahead assertion, which is only
> supported in Perl-compatible regexes.

But wouldn't it require more than one assertion? You can't merely
apply one negative lookahead assertion to the characters following a
FULL STOP, because if the filename contains more than one FULL STOP,
and the characters after the last FULL STOP constitute a forbidden
extension, the pattern would match. For example, imagining "exe" is
the only forbidden extension, then

$string = 'foo.bar.exe';
if (preg_match('`\.(?!exe$)`i',$string))

would return true, since there is present a FULL STOP that isn't
immediately followed by the anchored character sequence "exe".

What you'd need to do if you want to check filename extensions would
be to apply two assertions: one positive lookahead assertion, making
sure the characters following the FULL STOP are at the end of the
string, ensuring that you're dealing with the filename extension and
not another part of the filename; and one negative lookahead
assertion, making sure those characters don't constitute a forbidden
extension. Now, for one- to four-letter extensions,

$string = 'foo.bar.exe';
if (preg_match('`\.(?=[a-z]{1,4}$)(?!exe)`i',$string))

, where the character class denotes possible characters in filename
extensions, will return false.

That's all hypothetical of course, because we're saved by the NOT
operator. Please castigate me for any errors.

--
Jock

Andy Hassall

ulest,
26. jan. 2004, 18:04:5126.01.2004
til
On Mon, 26 Jan 2004 22:14:57 -0000, John Dunlop <john+...@johndunlop.info>
wrote:

Indeed :-) Perhaps even, removing the 1-4 char restriction:

/\.(?=[^.]+$)(?!bad$|worse$|evil$)/i

i.e. a '.' followed by a sequence of one or more non-dots up to the end of the
string, where that sequence is not any of 'bad', 'evil' or 'worse', each
followed by end of string.

So putting it all together:

<pre>
<?php
$goodExts = array('c', 'h', 'jpeg', 'png', 'torrent', 'xyz', 'z');
$badExts = array('exe', 'com', 'bat', 'doc', 'vbscript', 'x', 'zyx');

$re = '/\.(?=[^.]+$)(?!' .
join('|',
array_map(create_function('$a', 'return $a."$";'),
$badExts)) .
')/i';

print("regex = $re\n\n");

$allExts = array_merge($goodExts, $badExts);
$fileNames = array('thingy', 'foo', 'weasel', 'earwig');

for ($i=0; $i<42; $i++) {
$str = $fileNames[array_rand($fileNames)];

for ($j=0; $j < mt_rand(1,3); $j++)
$str .= '.' . $allExts[array_rand($allExts)];

$matched = preg_match($re, $str);

printf("%-64s %s\n",
$str,
$matched ? 'match' : '<b>no match</b>');
}

?>
</pre>

It rejects files without an extension, though.

>That's all hypothetical of course, because we're saved by the NOT
>operator. Please castigate me for any errors.

A single ! character vs. the insanity above... hmm.

0 nye meldinger