Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

v17i034: wildmat - a /bin/sh-style pattern matcher, Part01/01

83 views
Skip to first unread message

Rich Salz

unread,
Mar 8, 1991, 11:40:16 PM3/8/91
to
Submitted-by: Rich Salz <rs...@bbn.com>
Posting-number: Volume 17, Issue 34
Archive-name: wildmat/part01

This small routine is an efficient pattern-matcher for shell-style
wildcards. I wrote and posted it five year ago. Since then other people
have picked it up (notably Gilmore's TAR). Others have posted fixes,
which usually introduced bugs (Lars is the notable exception). It's
probably about time that this got archived somewhere, so I wrote a manpage
and as a special bonus I'm including a PostScript version. I'm not
interested in any other languages, nor particularly in seeing new features
other than performance gains.

There is no Makefile; install according local custom. It runs on pretty
much any machine with a C compiler.

I hope you find this useful. I hope you don't pretend that you wrote it.
/rich $alz
<rs...@bbn.com>
March, 1991
-------------------
#! /bin/sh
# This is a shell archive. Remove anything before this line, then feed it
# into a shell via "sh file" or similar. To overwrite existing files,
# type "sh file -c".
# The tool that generated this appeared in the comp.sources.unix newsgroup;
# send mail to comp-sou...@uunet.uu.net if you want that tool.
# Contents: README wildmat.3 wildmat.c wildmat.ps
# Wrapped by rs...@litchi.bbn.com on Thu Mar 7 10:05:08 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
echo If this archive is complete, you will see the following message:
echo ' "shar: End of archive."'
if test -f 'README' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'README'\"
else
echo shar: Extracting \"'README'\" \(756 characters\)
sed "s/^X//" >'README' <<'END_OF_FILE'
X
XThis small routine is an efficient pattern-matcher for shell-style
Xwildcards. I wrote and posted it five year ago. Since then other people
Xhave picked it up (notably Gilmore's TAR). Others have posted fixes,
Xwhich usually introduced bugs (Lars is the notable exception). It's
Xprobably about time that this got archived somewhere, so I wrote a manpage
Xand as a special bonus I'm including a PostScript version. I'm not
Xinterested in any other languages, nor particularly in seeing new features
Xother than performance gains.
X
XThere is no Makefile; install according local custom. It runs on pretty
Xmuch any machine with a C compiler.
X
XI hope you find this useful. I hope you don't pretend that you wrote it.
X /rich $alz
X <rs...@bbn.com>
X March, 1991
END_OF_FILE
if test 756 -ne `wc -c <'README'`; then
echo shar: \"'README'\" unpacked with wrong size!
fi
# end of 'README'
fi
if test -f 'wildmat.3' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'wildmat.3'\"
else
echo shar: Extracting \"'wildmat.3'\" \(1751 characters\)
sed "s/^X//" >'wildmat.3' <<'END_OF_FILE'
X.TH WILDMAT 3
X.SH NAME
Xwildmat \- perform shell-style wildcard matching
X.SH SYNOPSIS
X.nf
X.B "int"
X.B "wildmat(text, pattern)"
X.B " char *text;"
X.B " char *pattern;"
X.fi
X.SH DESCRIPTION
X.I Wildmat
Xcompares the
X.I text
Xagainst the
X.I pattern
Xand
Xreturns non-zero if the pattern matches the text.
XThe pattern is interpreted similar to shell filename wildcards, and not
Xas a full regular expression such as those handled by the
X.IR grep (1)
Xfamily of programs or the
X.IR regex (3)
Xor
X.IR regexp (3)
Xset of routines.
X.PP
XThe pattern is interpreted according to the following rules:
X.TP
X.BI \e x
XTurns off the special meaning of
X.I x
Xand matches it directly; this is used mostly before a question mark or
Xasterisk, and is not valid inside square brackets.
X.TP
X.B ?
XMatches any single character.
X.TP
X.B *
XMatches any sequence of zero or more characters.
X.TP
X.BI [ x...y ]
XMatches any single character specified by the set
X.IR x...y ,
Xwhere any character other than minus sign or close bracket may appear
Xin the set.
XA minus sign may be used to indicate a range of characters.
XThat is,
X.I [0\-5abc]
Xis a shorthand for
X.IR [012345abc] .
XMore than one range may appear inside a character set;
X.I [0-9a-zA-Z._]
Xmatches almost all of the legal characters for a host name.
X.TP
X.BI [^ x...y ]
XThis matches any character
X.I not
Xin the set
X.IR x...y ,
Xwhich is interpreted as described above.
X.SH "BUGS"
XThere is no way to specify a minus sign in a character range.
X.SH HISTORY
XWritten by Rich $alz <rs...@bbn.com> in 1986, and posted to Usenet
Xseveral times since then, most notably in comp.sources.misc in
XMarch, 1991.
X.br
XLars Mathiesen <tho...@diku.dk> enhanced the multi-asterisk failure
Xmode in early 1991.
X.SH "SEE ALSO"
Xgrep(1), regex(3), regexp(3).
END_OF_FILE
if test 1751 -ne `wc -c <'wildmat.3'`; then
echo shar: \"'wildmat.3'\" unpacked with wrong size!
fi
# end of 'wildmat.3'
fi
if test -f 'wildmat.c' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'wildmat.c'\"
else
echo shar: Extracting \"'wildmat.c'\" \(3126 characters\)
sed "s/^X//" >'wildmat.c' <<'END_OF_FILE'
X/*
X** Do shell-style pattern matching for ?, \, [], and * characters.
X** Might not be robust in face of malformed patterns; e.g., "foo[a-"
X** could cause a segmentation violation. It is 8bit clean.
X**
X** Written by Rich $alz, mirror!rs, Wed Nov 26 19:03:17 EST 1986.
X** Rich $alz is now <rs...@bbn.com>.
X** Special thanks to Lars Mathiesen <tho...@diku.dk> for the ABORT code.
X** This can greatly speed up failing wildcard patterns. For example:
X** pattern: -*-*-*-*-*-*-12-*-*-*-m-*-*-*
X** text 1: -adobe-courier-bold-o-normal--12-120-75-75-m-70-iso8859-1
X** text 2: -adobe-courier-bold-o-normal--12-120-75-75-X-70-iso8859-1
X** Text 1 matches with 51 calls, while text 2 fails with 54 calls. Without
X** the ABORT, then it takes 22310 calls to fail. Ugh.
X*/
X
X#define TRUE 1
X#define FALSE 0
X#define ABORT -1
X
X#define NEGATE_CLASS '^'
X
X/* Forward declaration. */
Xstatic int DoMatch();
X
X/*
X** See if the text matches the p, which has an implied leading asterisk.
X*/
Xstatic int
XStar(text, p)
X register char *text;
X register char *p;
X{
X register int ret;
X
X do
X ret = DoMatch(text++, p);
X while (ret == FALSE);
X return ret;
X}
X
X
X/*
X** Match text and p, return TRUE, FALSE, or ABORT.
X*/
Xstatic int
XDoMatch(text, p)
X register char *text;
X register char *p;
X{
X register int last;
X register int matched;
X register int reverse;
X
X for ( ; *p; text++, p++) {
X if (*text == '\0' && *p != '*')
X return ABORT;
X switch (*p) {
X case '\\':
X /* Literal match with following character. */
X p++;
X /* FALLTHROUGH */
X default:
X if (*text != *p)
X return FALSE;
X continue;
X case '?':
X /* Match anything. */
X continue;
X case '*':
X /* Trailing star matches everything. */
X return *++p ? Star(text, p) : TRUE;
X case '[':
X if (reverse = p[1] == NEGATE_CLASS)
X /* Inverted character class. */
X p++;
X for (last = 0400, matched = FALSE; *++p && *p != ']'; last = *p)
X /* This next line requires a good C compiler. */
X if (*p == '-' ? *text <= *++p && *text >= last : *text == *p)
X matched = TRUE;
X if (matched == reverse)
X return FALSE;
X continue;
X }
X }
X
X return *text == '\0';
X}
X
X
X/*
X** User-level routine. Returns TRUE or FALSE.
X*/
Xint
Xwildmat(text, p)
X char *text;
X char *p;
X{
X return DoMatch(text, p) == TRUE;
X}
X
X
X
X#ifdef TEST
X#include <stdio.h>
X
X/* Yes, we use gets not fgets. Sue me. */
Xextern char *gets();
X
X
Xmain()
X{
X char p[80];
X char text[80];
X
X printf("Wildmat tester. Enter pattern, then strings to test.\n");
X printf("A blank line gets prompts for a new pattern; a blank pattern\n");
X printf("exits the program.\n\n");
X
X for ( ; ; ) {
X printf("Enter pattern: ");
X (void)fflush(stdout);
X if (gets(pattern) == NULL || pattern[0] == '\n')
X break;
X for ( ; ; ) {
X printf("Enter text: ");
X (void)fflush(stdout);
X if (gets(text) == NULL)
X exit(0);
X if (text[0] == '\0')
X /* Blank line; go back and get a new pattern. */
X break;
X printf(" %s\n", wildmat(text, pattern) ? "YES" : "NO");
X }
X }
X
X exit(0);
X /* NOTREACHED */
X}
X#endif /* TEST */
END_OF_FILE
if test 3126 -ne `wc -c <'wildmat.c'`; then
echo shar: \"'wildmat.c'\" unpacked with wrong size!
fi
# end of 'wildmat.c'
fi
if test -f 'wildmat.ps' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'wildmat.ps'\"
else
echo shar: Extracting \"'wildmat.ps'\" \(2124 characters\)
sed "s/^X//" >'wildmat.ps' <<'END_OF_FILE'
X% ; -*- PostScript -*-
X% PostScript routine to check text against a shell-style wildcard.
X% text pattern wildmat ==> bool
X% Written by Josh Siegel <sie...@eng.sun.com> from logic taken from
X% Rich $alz (rs...@bbn.com)
X%
X
X/Star { % text pattern => bool
X {
X 2 copy wildmat {
X pop pop true exit
X } {
X exch dup length 1 sub 1 exch getinterval dup () eq {
X pop pop false exit
X } if
X } ifelse
X } loop
X} def
X
X/wildmat_dict
X10 dict dup begin
X (\\) 0 get {
X dup length 1 sub 1 exch getinterval
X 2 copy 0 get exch 0 get eq false eq {
X pop pop false exit
X } if
X } def
X (?) 0 get {
X 1 index () eq {
X pop pop false exit
X } if
X } def
X (*) 0 get {
X exch dup length 1 sub 1 exch getinterval
X dup () eq {
X pop pop true
X } {
X exch Star
X } ifelse
X exit
X } def
X (\[) 0 get {
X 3 dict begin
X dup 1 get (^) 0 get eq dup /reverse exch def
X {
X dup length 2 sub 2 exch getinterval
X } {
X dup length 1 sub 1 exch getinterval
X } ifelse
X /last 0 def
X /matched false def
X { % text pattern
X dup 0 get (-) 0 get eq {
X dup length 1 sub 1 exch getinterval
X 2 copy 0 get exch 0 get gt 2 index 0 get last gt and
X } {
X 2 copy 0 get exch 0 get eq
X } ifelse % text pattern bool
X {
X /matched true def
X } if
X
X dup 0 get /last exch def
X dup length 1 sub 1 exch getinterval
X dup 0 get (\]) 0 get eq {
X exit
X } if
X } loop
X matched reverse eq
X end
X {
X pop pop false exit
X } if
X } def
X /Default {
X 2 copy 0 get exch 0 get eq false eq {
X pop pop false exit
X } if
X } def
Xend def
X
X/wildmat { % text pattern => bool
X {
X dup () eq {
X pop () eq exit
X } if
X dup 0 get
X wildmat_dict exch 2 copy known {
X get
X } {
X pop /Default get
X } ifelse
X exec
X dup length 1 sub 1 exch getinterval exch
X dup length 1 sub 1 exch getinterval exch
X } loop
X} def
X
X/test {
X (b) ([a-z]) wildmat true eq ==
X (a) ([b-z]) wildmat false eq ==
X (a) ([^b-z]) wildmat true eq ==
X (b) ([^b-z]) wildmat false eq ==
X (test) (*es*) wildmat true eq ==
X (test) (te?t) wildmat true eq ==
X} def
END_OF_FILE
if test 2124 -ne `wc -c <'wildmat.ps'`; then
echo shar: \"'wildmat.ps'\" unpacked with wrong size!
fi
# end of 'wildmat.ps'
fi
echo shar: End of archive.
exit 0

exit 0 # Just in case...
--
Kent Landfield INTERNET: ke...@sparky.IMD.Sterling.COM
Sterling Software, IMD UUCP: uunet!sparky!kent
Phone: (402) 291-8300 FAX: (402) 291-4362
Please send comp.sources.misc-related mail to ke...@uunet.uu.net.

0 new messages