Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

wildcard string comparisons

0 views
Skip to first unread message

Nick Payne

unread,
Mar 30, 2002, 1:59:14 AM3/30/02
to
Does any library exist anywhere for performing wildcard string comparisons,
or do I have to roll my own? I want to exclude any files whose name matches
zero or more wildcard filters provided by the user. Basically, the reverse
of what the findfirst function will do - findfirst looks for files based on
a filespec with optional wildcard characters, I want to exclude files based
on wildcards.

Nick


those who know me have no need of my name

unread,
Mar 30, 2002, 3:01:02 AM3/30/02
to
<3ca5...@newshost.pcug.org.au> divulged:

>Does any library exist anywhere for performing wildcard string comparisons,
>or do I have to roll my own?

yes -- posix has regular expression support or you could write your own
thing. comp.unix.programmer is the appropriate place to discuss the
former.

--
bringing you boring signatures for 17 years

Svante

unread,
Mar 30, 2002, 5:45:46 AM3/30/02
to
"those who know me have no need of my name" <not-a-rea...@usa.net> wrote in message
news:a83rb...@enews1.newsguy.com...

> <3ca5...@newshost.pcug.org.au> divulged:
>
> >Does any library exist anywhere for performing wildcard string comparisons,
> >or do I have to roll my own?
>
> yes -- posix has regular expression support or you could write your own
> thing. comp.unix.programmer is the appropriate place to discuss the
> former.

If you can settle for a traditional "*" and "?" pattern-matcher, you
can use something like the following:

int matchstr(const char *sp, const char *pp) {
char pc; /* current pattern char */
do {
if ((pc = *pp++) == '*') {
do {
if (matchstr(sp, pp)) return 1;
} while (*sp++);
return 0;
}
if (!*sp) return (!pc); /* end of str, match if end of pat */
} while (pc == *sp++ || pc == '?');
return 0;
}

--
/Svante

http://axcrypt.sourceforge.net
Free AES Point'n'Click File Encryption for Windows 9x/ME/2K/XP

Mair Allen-Williams

unread,
Mar 30, 2002, 4:22:51 PM3/30/02
to
> int matchstr(const char *sp, const char *pp) {
> char pc; /* current pattern char */
> do {
> if ((pc = *pp++) == '*') {
> do {
> if (matchstr(sp, pp)) return 1;
> } while (*sp++);
> return 0;
> }
> if (!*sp) return (!pc); /* end of str, match if end of pat */
> } while (pc == *sp++ || pc == '?');
> return 0;
> }

blimey. and I spent the last week trying to make a sensible pattern-matching
thing, jumping through all sorts of hoops. And there it is in about 10
lines. Makes me wonder how many other things I'm doing in a long,
complicated way...


David Rubin

unread,
Mar 30, 2002, 1:50:16 PM3/30/02
to
Svante wrote:
>
> "those who know me have no need of my name" <not-a-rea...@usa.net> wrote in message
> news:a83rb...@enews1.newsguy.com...
> > <3ca5...@newshost.pcug.org.au> divulged:
> >
> > >Does any library exist anywhere for performing wildcard string comparisons,
> > >or do I have to roll my own?
> >
> > yes -- posix has regular expression support or you could write your own
> > thing. comp.unix.programmer is the appropriate place to discuss the
> > former.
>
> If you can settle for a traditional "*" and "?" pattern-matcher, you
> can use something like the following:
>
> int matchstr(const char *sp, const char *pp) {
> char pc; /* current pattern char */
> do {
> if ((pc = *pp++) == '*') {
> do {
> if (matchstr(sp, pp)) return 1;
> } while (*sp++);
> return 0;
> }
> if (!*sp) return (!pc); /* end of str, match if end of pat */
> } while (pc == *sp++ || pc == '?');
> return 0;
> }

Unfortunately, this does not work so well. Try this, with tests attached:

#include <stdio.h>
#include <string.h>

int
match(char *pat, char *src)
{
if(*pat == 0)
return (*src == 0);

/* special case */
if(strspn(pat, "*") == strlen(pat))
return 1;

if(*src == 0)
return 0;

switch(*pat){
case '*':
return match(pat+1, src) || match(pat+1, src+1) || match(pat, src+1);
case '?':
return match(pat+1, src+1);
default:
return (*pat == *src) && match(pat+1, src+1);
}
}

#define nelem(x) (sizeof x / sizeof *x)
int
main(void)
{
struct Test {
int exp;
char *pat;
char *src;
} tests[] = {
{1, "*sh*", "shshsh"},
{0, "*shp", "shshsh"},
{1, "*shp*", "shshpsh"},
{1, "*sh", "shshsh"},
{1, "f*.cpp", "foo.cpp"},
{1, "foo.cpp", "foo.cpp"},
{1, "foo*", "foo.cpp"},
{1, "fo*.cpp", "foo.cpp"},
{1, "*.cpp", "foo.cpp"},
{1, "*", "foo.cpp"},
{1, "f*", "foo.cpp"},
{1, "fo*", "foo.cpp"},
{1, "f??.???", "foo.cpp"},
{0, "f??.??", "foo.cpp"},
{1, "f??.*", "foo.cpp"},
{1, "f??.*p", "foo.cpp"},
{0, "", "foo.cpp"},
{0, "fo*.h", "foo.cpp"},
{1, "*foo*.*cpp*", "foo.cpp"},
{1, "foo**", "foo"},
};
int i;
struct Test *t;

for(i=0; i < nelem(tests); i++){
t = &tests[i];
printf("pat=%-15s src=%-15s %d:%d\n", t->pat, t->src, match(t->pat,
t->src), t->exp);
}
return 0;
}

david

--
If 91 were prime, it would be a counterexample to your conjecture.
-- Bruce Wheeler

Svante

unread,
Mar 31, 2002, 4:04:37 AM3/31/02
to
"David Rubin" <dlr...@hotmail.com> wrote in message news:3CA608E8...@hotmail.com...

> Svante wrote:
> >
> > "those who know me have no need of my name" <not-a-rea...@usa.net> wrote in message
> > news:a83rb...@enews1.newsguy.com...
> > > <3ca5...@newshost.pcug.org.au> divulged:
> > >
> > > >Does any library exist anywhere for performing wildcard string comparisons,
> > > >or do I have to roll my own?
> > >
> > > yes -- posix has regular expression support or you could write your own
> > > thing. comp.unix.programmer is the appropriate place to discuss the
> > > former.
> >
> > If you can settle for a traditional "*" and "?" pattern-matcher, you
> > can use something like the following:
> >
> > int matchstr(const char *sp, const char *pp) {
> > char pc; /* current pattern char */
> > do {
> > if ((pc = *pp++) == '*') {
> > do {
> > if (matchstr(sp, pp)) return 1;
> > } while (*sp++);
> > return 0;
> > }
> > if (!*sp) return (!pc); /* end of str, match if end of pat */
> > } while (pc == *sp++ || pc == '?');
> > return 0;
> > }
>
> Unfortunately, this does not work so well. Try this, with tests attached:

Just exactly how does the above fail? I tried your main() code below,
(snipped) and it passes all your tests. (Of course you need to note that
you have the "pattern" and "source" arguments in reverse order).

Please show me where it fails, I really want to know so I can
fix it!

Assuming my version works, it is significantly more efficient.

- Much less recursion.
- No library calls to strlen and strspn for every recursive call.

(snip)


> printf("pat=%-15s src=%-15s %d:%d\n", t->pat, t->src, match(t->pat,
> t->src), t->exp);

printf("pat=%-15s src=%-15s %d:%d\n", t->pat, t->src,

matchstr(t->src, t->pat),
t->exp);

(snip)

Peter Nilsson

unread,
Mar 31, 2002, 5:54:25 AM3/31/02
to
"David Rubin" <dlr...@hotmail.com> wrote in message
news:3CA608E8...@hotmail.com...
> Svante wrote:
> >
> > "those who know me have no need of my name" <not-a-rea...@usa.net>
wrote in message
> > news:a83rb...@enews1.newsguy.com...
> > > <3ca5...@newshost.pcug.org.au> divulged:
> > >
> > > >Does any library exist anywhere for performing wildcard string
comparisons,
> > > >or do I have to roll my own?
> > >
> > > yes -- posix has regular expression support or you could write your own
> > > thing. comp.unix.programmer is the appropriate place to discuss the
> > > former.
> >
> > If you can settle for a traditional "*" and "?" pattern-matcher, you
> > can use something like the following:
> >
> > int matchstr(const char *sp, const char *pp) {
> > char pc; /* current pattern char */
> > do {
> > if ((pc = *pp++) == '*') {
> > do {
> > if (matchstr(sp, pp)) return 1;
> > } while (*sp++);
> > return 0;
> > }
> > if (!*sp) return (!pc); /* end of str, match if end of pat */
> > } while (pc == *sp++ || pc == '?');
> > return 0;
> > }
>
> Unfortunately, this does not work so well. ...

In what sense? I added Svante's function to your code and it passed the tests.

I even dusted off my own version [which I've never used, probably because of the
way it was written. Code maintainers look away, even I'm not game to modify it!
:-)]

Here's your code with marked additions...


#include <stdio.h>
#include <string.h>


/* pn: Svante */
static


int matchstr(const char *sp, const char *pp) {
char pc; /* current pattern char */
do {
if ((pc = *pp++) == '*') {
do {
if (matchstr(sp, pp)) return 1;
} while (*sp++);
return 0;
}
if (!*sp) return (!pc); /* end of str, match if end of pat */
} while (pc == *sp++ || pc == '?');
return 0;
}


/* David */
static /* pn */
int
match(const char *pat, const char *src)


{
if(*pat == 0)
return (*src == 0);

/* special case */
if(strspn(pat, "*") == strlen(pat))
return 1;

if(*src == 0)
return 0;

switch(*pat){
case '*':
return match(pat+1, src) || match(pat+1, src+1) || match(pat, src+1);
case '?':
return match(pat+1, src+1);
default:
return (*pat == *src) && match(pat+1, src+1);
}
}


/* pn: What's his name's version */

static int like(const char *raw, const char *pat) {
char c, d, c0, d0;
const char *r, *p;
int star = 0;

for (;;) {
if (!(d = *pat++)) return (star || !*raw);
else if (d == '*') star = 1;
else if (d == '?') { if (!*raw++) return 0; }
else break;
}

d0 = d;

do if ((c0 = *raw++) == d0) {
r = raw;
p = pat;

do {
if ((d = *p++) == '*') {
if (like(r, p - 1)) return 1;
else break;
}
else if (!d) {
if (!*r) return 1;
else break;
}
} while ((c = *r++) == d || d == '?');
} while (star && c0);

return 0;
}


#define nelem(x) (sizeof x / sizeof *x)

int
main(void)
{
struct Function { /* pn: function array */
int rev;
int (*fun)(const char *, const char *);
char *name;
} functions[] = {
{ 1, matchstr, "Svante" },
{ 0, match , "David" },
{ 1, like , "Also Ran" }
};

struct Test {
int exp;
char *pat;
char *src;
} tests[] = {

{1, "", ""}, /* pn: boundary conditions */
{1, "*", ""},
{1, "?", "x"},
{0, "?", ""},
{0, "?", "xy"},
{0, "", "x"},
{1, "x", "x"},
{0, "x", ""},

{1, "*sh*", "shshsh"},
{0, "*shp", "shshsh"},
{1, "*shp*", "shshpsh"},
{1, "*sh", "shshsh"},
{1, "f*.cpp", "foo.cpp"},
{1, "foo.cpp", "foo.cpp"},
{1, "foo*", "foo.cpp"},
{1, "fo*.cpp", "foo.cpp"},
{1, "*.cpp", "foo.cpp"},
{1, "*", "foo.cpp"},
{1, "f*", "foo.cpp"},
{1, "fo*", "foo.cpp"},
{1, "f??.???", "foo.cpp"},
{0, "f??.??", "foo.cpp"},
{1, "f??.*", "foo.cpp"},
{1, "f??.*p", "foo.cpp"},
{0, "", "foo.cpp"},
{0, "fo*.h", "foo.cpp"},
{1, "*foo*.*cpp*", "foo.cpp"},
{1, "foo**", "foo"},
};

size_t i, j; /* pn: size_t */
struct Test *t;
struct Function *f; /* pn */

for(j=0; j < nelem(functions); j++) { /* pn: outer */
f = &functions[j]; /* function loop */
printf("\n-- %s --\n", f->name);

for(i=0; i < nelem(tests); i++){
t = &tests[i];

printf( /* pn: spliced */


"pat=%-15s src=%-15s %d:%d\n",
t->pat, t->src,

f->rev ? (f->fun)(t->src, t->pat) /* pn */
: (f->fun)(t->pat, t->src),
t->exp
);
}
}
return 0;
}

--
Peter


David Rubin

unread,
Mar 31, 2002, 10:11:20 AM3/31/02
to
Svante wrote:

[snip]


> > > int matchstr(const char *sp, const char *pp) {
> > > char pc; /* current pattern char */
> > > do {
> > > if ((pc = *pp++) == '*') {
> > > do {
> > > if (matchstr(sp, pp)) return 1;
> > > } while (*sp++);
> > > return 0;
> > > }
> > > if (!*sp) return (!pc); /* end of str, match if end of pat */
> > > } while (pc == *sp++ || pc == '?');
> > > return 0;
> > > }
> >
> > Unfortunately, this does not work so well. Try this, with tests attached:
>
> Just exactly how does the above fail? I tried your main() code below,
> (snipped) and it passes all your tests. (Of course you need to note that
> you have the "pattern" and "source" arguments in reverse order).

I have to apologize...My *stupid* evil twin brother divad must have posted that
previous post ;-) Your code is indeed much more efficient and works perfectly
fine.

Svante

unread,
Mar 31, 2002, 4:21:12 PM3/31/02
to
"David Rubin" <dlr...@hotmail.com> wrote in message news:3CA72718...@hotmail.com...

(snip)

> I have to apologize...My *stupid* evil twin brother divad must have posted that
> previous post ;-) Your code is indeed much more efficient and works perfectly
> fine.

No problem, it happens... You did get me a bit worried... ;-)

0 new messages