I know that lisp eval is written more clear than this recursion below
because I am able to read it easily. and that code is almost self
explanatory. C is more quirky. When you really mean recursively call
another function, you are using return so you can have tail
recursion ???? .
Anyway, its your chance to show how clear C/C++/lisp/Scheme code you
can write that is clearer. Also, i dont exclude pseudocode but it
should be clear enough to be instantly translatable to a programming
language. The real goal is to learn how to write or DERIVE recursion,
how to enumerate cases, order them, and build recursion. You may even
put some introductory tuturial and dont have to reply here in ascii
but can upload a pdf at some link in rapidshare etc. Look how he put
the code after problem statement and then has the need to explain it.
Ideally, the discussion should be such that the student or reader
himself jumps to the solution. That is why I give these unix school of
pike/thomson/kernighan low grade in almost all their expositions
except that they wrote earliest books to make millions of dollars in
royalties and since now they are nobody due to linux, they are poorly
regurgitating old material.
Enjoy .............
============================
The Practice of Programming
In 1998, Rob Pike and I were writing The Practice of Programming
(Addison-Wesley). The
last chapter of the book, “Notation,” collected a number of examples
where good notation
led to better programs and better programming. This included the use
of simple data specifications
(printf, for instance), and the generation of code from tables.
Because of our Unix backgrounds and nearly 30 years of experience with
tools based on
regular expression notation, we naturally wanted to include a
discussion of regular
expressions, and it seemed mandatory to include an implementation as
well. Given our
emphasis on tools, it also seemed best to focus on the class of
regular expressions found in
grep—rather than, say, those from shell wildcards—since we could also
then talk about the
design of grep itself.
The problem was that any existing regular expression package was far
too big. The local
grep was over 500 lines long (about 10 book pages) and encrusted with
barnacles. Open
source regular expression packages tended to be huge—roughly the size
of the entire
book—because they were engineered for generality, flexibility, and
speed; none were
remotely suitable for pedagogy.
I suggested to Rob that we find the smallest regular expression
package that would illustrate
the basic ideas while still recognizing a useful and nontrivial class
of patterns. Ideally,
the code would fit on a single page.
Rob disappeared into his office. As I remember it now, he emerged in
no more than an
hour or two with the 30 lines of C code that subsequently appeared in
Chapter 9 of The
Practice of Programming. That code implements a regular expression
matcher that handles
the following constructs.
Character Meaning
c Matches any literal character c.
. (period) Matches any single character.
^ Matches the beginning of the input string.
$ Matches the end of the input string.
* Matches zero or more occurrences of the previous character.
This is quite a useful class; in my own experience of using regular
expressions on a day-today
basis, it easily accounts for 95 percent of all instances. In many
situations, solving the
right problem is a big step toward creating a beautiful program. Rob
deserves great credit
for choosing a very small yet important, well-defined, and extensible
set of features from
among a wide set of options.
Rob’s implementation itself is a superb example of beautiful code:
compact, elegant,
efficient, and useful. It’s one of the best examples of recursion that
I have ever seen, and it
shows the power of C pointers. Although at the time we were most
interested in conveying
the important role of good notation in making a program easier to use
(and perhaps
easier to write as well), the regular expression code has also been an
excellent way to
illustrate algorithms, data structures, testing, performance
enhancement, and other
important topics.
Implementation
In The Practice of Programming, the regular expression matcher is part
of a standalone program
that mimics grep, but the regular expression code is completely
separable from its
surroundings. The main program is not interesting here; like many Unix
tools, it reads
either its standard input or a sequence of files, and prints those
lines that contain a match
of the regular expression.
This is the matching code:
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
Character Meaning
c Matches any literal character c.
. (period) Matches any single character.
^ Matches the beginning of the input string.
$ Matches the end of the input string.
* Matches zero or more occurrences of the previous character.
4 C H A P T E R O N E
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
This is the matching code:
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
By the way did you note the spicy style of narration by which they
promote each other ? and can you abstract the technique of narration
for image building ? Another scum in electronic, Bob Pease of national
semiconductor writes in the same style. I know all their basic ideas
in that field, which are trivial. The only art is the art of writing
and propaganda.
The one man I have real respect for and is humble is McCarthy. He
really put some original ideas together but still did not give any
clue how he constructed them. However, in lisp recursion is lucidly
clear. You test if a thing is nil. Else you check if it is an atom and
act appropriately to call a handler and otherwise recursion.
Many people seem to have been upset with this article in Beautiful
Code. I emailed Kernighan about it and received a reply (I'd met the
guy), but no real answer to the basic problem, which is that C cannot
express programming ideas clearly.
The first problem is that by being so "simple" it fails to implement a
recognizable "regular expression" processor.
I thought a second problem was that it used a value parameter as a
work area, which isn't something I would do in my languages of choice
(C Sharp and VB in recent years), but I found myself doing this in C
when I returned to the language mostly to kick the shit out it.
A third problem is praising a programmer for fast work. There's enough
slipshod work in our industry, and enough programmer overwork as it
is.
I think Brian Kernighan would be the first to admit that Lisp as
opposed to C made a more far-reaching contribution to computer science.
Well, witness yet another way, boy and girls, to market a book. We'll
be seeing a lot of it from now on I'm sure. Meh. Enjoy.
If you mean "The Practice of Programming" (Kernighan and Pike), (a)
it's been in print for a great many years, and we've already seen a
lot of it; and (b) neither Brian Kernighan, nor Rob Pike, nor
Prentice Hall is infra dig enough to stoop to such tactics.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
> Many people seem to have been upset with this article in Beautiful
> Code. I emailed Kernighan about it and received a reply (I'd met the
> guy), but no real answer to the basic problem, which is that C cannot
> express programming ideas clearly.
I admire the fact that someone brought all this in the public
discussion because a lot of sleazy people (not referring in particular
to this troika) have been able to propagate crap and kept it immune to
academic critique just due to people afraid of criticizing it which
leads to a lot of newbie sheeple.
Do you think C syntax is brain damaged ? Can you explain your second
point in more detail and clearly as I could not understand any of it.
> The first problem is that by being so "simple" it fails to implement a
> recognizable "regular expression" processor.
>
> I thought a second problem was that it used a value parameter as a
> work area, which isn't something I would do in my languages of choice
> (C Sharp and VB in recent years), but I found myself doing this in C
> when I returned to the language mostly to kick the shit out it.
>
> A third problem is praising a programmer for fast work. There's enough
> slipshod work in our industry, and enough programmer overwork as it
> is.
>
> I think Brian Kernighan would be the first to admit that Lisp as
> opposed to C made a more far-reaching contribution to computer science.- Hide quoted text -
Actually, in C all they did was to borrow parens and braces from math
to delimit blocks - BUT lisp already had the idea in the prefix
notation. The other contribution is the for loop with all the loop
elements at the top in the interface. The declaration syntax of
pointers is poor and ugly. switch-case has fall thru. break and
continue is new unless PL1 or BCPL had it.
BUT YOU HAVE NOT SHOWN how to structure the recursion. Does it need
NFA ? I need a tutorial because C is to stay and I can avoid some of
its ugliness by using C++ as pretty C.
> - Show quoted text -
I admire the fact that you
You should post this to comp.lang.c.
Ed.
Very good way to market a book. Something criticized on c.l.l. must be
really good.
Ordered from Amazon 5 minutes ago.
A.L.
please dont derail my thread, BUT, you ordered the wrong book. The one
you SHOULD have ordered is this one:
Book Review
--------------------------------------------------------------------------------
Advanced C Struct Programming by John W.L. Ogilvie
Highly Recommended
ISBN: 0-471-51943-X Publisher: Wiley Pages: 405pp
Price: £22.95
Categories: advanced c data structures
Reviewed by Francis Glassborow in C Vu 3-2 (Jan 1991)
This is the kind of book that I might easily miss on a casual visit
to
my local bookshop. In all honesty my first impression when I opened
it
was not that good. It is aimed at people who wish to take programming
seriously yet it first seemed more like the kind of text that I am
used to finding on the hobbyists shelves. It is much better than
that.
When you have mastered the foothills of programming in C, have a good
runtime library reference on your shelf and, perhaps, have invested
in
a book on data structures and another on programming algorithms or
techniques what do you get to help you develop good medium size
programs (all right large ones if you must)?
If you had asked my advice a month ago, I would have hummed and hawed
and come up with a couple of titles and then suggested that what you
really wanted was a book on program/data design rather than one on C.
Advanced C Struct Programming tackles this need. The author's
declared
intent is to present a practical method for designing and
implementing
complex (complicated) data structures in C. In doing so he leads you
through experience (sometimes of false trails) in tackling a number
of
different programming problems.
The book does not include complete applications or libraries of
source
code. It is a book to be worked through. By the time you have
finished
it you should be a much better programmer. Let me warn you that it is
not a book to dip into in that odd spare moment. If that is all you
have time for then go and do something else.
On the other hand it is not like some of the books above that will
take you years to fully grasp (if ever). Buy this book, set aside a
regular time to work at it, stick to your routine and find yourself
becoming far more professional in your programming.
Yes, I like it and it is not machine dependant. For once I am glad
that the supporting discs are relatively expensive ($39.95 in IBM and
Apple Mac formats) as I think that you will only get the full benefit
by grafting at the keyboard yourself.
--------------------------------------------------------------------------------
Last Update - 13 May 2001.
To link to this review, please use the URL:
http://www.accu.org/bookreviews/public/reviews/a/a000142.htm
Copyright © The Association of C & C++ Users 1998-2000. All rights
reserved.
Mirrored from http://www.accu.org/
>On Aug 29, 8:53�am, A.L. <alewa...@aol.com> wrote:
>> On Sat, 29 Aug 2009 07:10:59 -0700, JustBoo <B...@boowho.com> wrote:
>> >bolega wrote:
>> >> This braggart admits that he had to put this code in TWO books and
>> >> visit it twice to be explained. I am puting the excerpt from pp2-4
>> >> of this book and the C code.
>> >[...]
>>
>> >Well, witness yet another way, boy and girls, to market a book. We'll
>> >be seeing a lot of it from now on I'm sure. Meh. Enjoy.
>>
>> Very good way to market a book. Something criticized on c.l.l. must be
>> really good.
>>
>> Ordered from Amazon 5 minutes ago.
>>
>> A.L.
>
>please dont derail my thread, BUT, you ordered the wrong book. The one
>you SHOULD have ordered is this one:
>
>Book Review
>---------------------------------------------------------------------------�-----
>Advanced C Struct Programming by John W.L. Ogilvie
>Highly Recommended
>ISBN: 0-471-51943-X Publisher: Wiley Pages: 405pp
>Price: �22.95
I have this book already.
Regarding this one that I ordered - Amazon has 30 days return
policy...
A.L.
By the walks-like,swims-like,quacks-like theorem, I do see why you
think so, but I think we should be slow to make such accusations,
since they are so often wrong. I have never actually been accused of
being a sock puppet as far as I am aware, but I have certainly been
accused of running them (which I have never done, but of course those
who accuse me don't believe that).
People can and often do use similar styles, especially when responding
to people with whom they're perhaps not the best of friends. The
other day, an irregular poster, a guy - who'd been away for a while
and is unlikely to have become familiar with my occasional habit of
marking nilgewater with "nonsense snipped - nothing left" - made
precisely the same response. It would be easy to accuse /him/ of
being a sock puppet... and yet his name has been well-known in
another group for many years and he knows far more about that group's
subject than I do, and I vaguely recall having a few extended
disagreements with him in the past. He also posts from a different
continent, by the way.
Sock puppets undoubtedly happen, but they don't really matter because
what counts is not who says what, but what they say (and, to a
certain extent, how they say it). We might pay a lot of attention to
a name we recognise (e.g. if Donald Knuth started posting, we'd sit
up and take notice) - but *only* after it became obvious that it
wasn't some spotty teenager mucking about. And we judge that by
content. "By their fruits you shall know them" and all that.
In the current case, bolega (unknown name, to me at least) is slagging
off Brian Kernighan and Rob Pike (both names known to me) and their
book (which I've read). He starts with the word "braggart", and the
rest is similar ranting. It's just perfectly normal bilgewater of the
sort you often find on the Internet, but I don't see why we need to
accuse it of being nilgewater as well.
Just out of curiosity bolega, how could you know which book A.L.
ordered?
Regards,
Francesco
>In <h7cc1a$e6g$1...@enyo.uwa.edu.au>, Chris McDonald wrote:
<snip>
>In the current case, bolega (unknown name, to me at least) is slagging
>off Brian Kernighan and Rob Pike (both names known to me) and their
>book (which I've read). He starts with the word "braggart", and the
>rest is similar ranting. It's just perfectly normal bilgewater of the
>sort you often find on the Internet, but I don't see why we need to
>accuse it of being nilgewater as well.
Just so. The two styles are quite distinct. Generally speaking
those who use the internet for ranting have quite individual and
distinct styles. Thus Nilges is clearly well educated and well
read, using many post modern and progressive tropes along with
pop psychology. On the other hand Bolega (I thought it might be
a sausage but apparently it is a brand of socks) is less well
read and less literate.
Incidentally, in my opinion the term "troll" isn't appropriate
for such persons. Trolls are like malicious little children who
stir up trouble for the pleasure of seeing people get all
excited. These ranters have a different agenda.
Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
No one asks if a tree falls in the forest
if there is no one there to see it fall.
(re: the Beautiful Code RE matcher)
> The first problem is that by being so "simple" it fails to implement a
> recognizable "regular expression" processor.
False. It implements a /useful subset/ of REs and provides a framework
for adding more features -- it's an educational tool, not a library
component.
--
"I have travelled far and wide upon this journey." - The Reasoning,
/A Musing Dream/
Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN
Read this excellent article: Software Fault Prevention by Language
Choice: Why C is Not My Favorite Language, Richard Fateman. Fateman is
on the faculty of the rather C-centric Univ of California at Berkeley.
http://www.eecs.berkeley.edu/~fateman/papers/software.pdf.
My current language C Sharp addresses some but not all of Fateman's
points.
>
> > The first problem is that by being so "simple" it fails to implement a
> > recognizable "regular expression" processor.
>
> > I thought a second problem was that it used a value parameter as a
> > work area, which isn't something I would do in my languages of choice
> > (C Sharp and VB in recent years), but I found myself doing this in C
> > when I returned to the language mostly to kick the shit out it.
>
> > A third problem is praising a programmer for fast work. There's enough
> > slipshod work in our industry, and enough programmer overwork as it
> > is.
>
> > I think Brian Kernighan would be the first to admit that Lisp as
> > opposed to C made a more far-reaching contribution to computer science.- Hide quoted text -
>
> Actually, in C all they did was to borrow parens and braces from math
> to delimit blocks - BUT lisp already had the idea in the prefix
> notation. The other contribution is the for loop with all the loop
> elements at the top in the interface. The declaration syntax of
> pointers is poor and ugly. switch-case has fall thru. break and
> continue is new unless PL1 or BCPL had it.
>
> BUT YOU HAVE NOT SHOWN how to structure the recursion. Does it need
> NFA ? I need a tutorial because C is to stay and I can avoid some of
> its ugliness by using C++ as pretty C.
Is C here to stay?
No way.
In 2038, lemme tellya straight,
C's inability to get a date
Right will cause the thousand years of darkness.
Ever wonder why spam sorts high, and makes you cry?
You're looking at the answer, my oh my.
I'd ask you to lay off personalities. You people have taken entirely
too long to figure out that I'm a class act, and Bolega may very well
be a class act. I wouldn't myself start off by calling Kernighan a
braggart, but I did have problems with O'Reilly's selecting some
fairly humdrum examples of "Beautiful Code" in the book containing
Kernighan's code. I was offended by Kernighan's praising programmer
for taking an hour.
>
> Incidentally, in my opinion the term "troll" isn't appropriate
> for such persons. Trolls are like malicious little children who
> stir up trouble for the pleasure of seeing people get all
> excited. These ranters have a different agenda.
Thank you for clearing that up at long last. I really appreciate it.
Trolls don't admit they are wrong to their worst enemies, but
Heathfield, who knows C if little else that I can discern, did prove
to me (along with my own experiments) that all parameters in C are
really and truly call be value, period, and as soon as I realized it,
I admitted it. I still have a problem with the fact that I had to beat
the answer out of him as well as mess with C.
>
> Richard Harter, c...@tiac.nethttp://home.tiac.net/~cri,http://www.varinoma.com
Don't make mock of my patronym, punk.
>
> By the walks-like,swims-like,quacks-like theorem, I do see why you
> think so, but I think we should be slow to make such accusations,
> since they are so often wrong. I have never actually been accused of
> being a sock puppet as far as I am aware, but I have certainly been
> accused of running them (which I have never done, but of course those
> who accuse me don't believe that).
Wise words, for wisdom crieth in the streets. The only time I have
used a sock puppet was once on wikipedia.
>
> People can and often do use similar styles, especially when responding
> to people with whom they're perhaps not the best of friends. The
> other day, an irregular poster, a guy - who'd been away for a while
> and is unlikely to have become familiar with my occasional habit of
> marking nilgewater with "nonsense snipped - nothing left" - made
Don't make mock of my patronym or I will make mock of you as I have in
the past. You fuck with me I fight back, you dig? My relatives are war
heroes and professionals and they don't deserve this shit.
> precisely the same response. It would be easy to accuse /him/ of
> being a sock puppet... and yet his name has been well-known in
> another group for many years and he knows far more about that group's
> subject than I do, and I vaguely recall having a few extended
> disagreements with him in the past. He also posts from a different
> continent, by the way.
>
> Sock puppets undoubtedly happen, but they don't really matter because
> what counts is not who says what, but what they say (and, to a
> certain extent, how they say it). We might pay a lot of attention to
> a name we recognise (e.g. if Donald Knuth started posting, we'd sit
> up and take notice) - but *only* after it became obvious that it
> wasn't some spotty teenager mucking about. And we judge that by
> content. "By their fruits you shall know them" and all that.
>
> In the current case, bolega (unknown name, to me at least) is slagging
> off Brian Kernighan and Rob Pike (both names known to me) and their
> book (which I've read). He starts with the word "braggart", and the
> rest is similar ranting. It's just perfectly normal bilgewater of the
Don't make mock of my patronym, since it's an insult not so much to me
as to my relatives. If you must make mock, my first name is Edward and
my handle is spinoza1111.
My uncle, Edward Joseph NILGES, was killed in Italy in March 1945
leading Japanese American Nisei against the Germans. Don't mock him.
My father, Richard G Nilges, fought unethical practises in medicine.
Don't mock him.
Mock my first name or posting handle. Lay off "Nilges".
If by NFA you mean an NDFA (non-deterministic finite automaton), no.
If you go down that road, you need to convert the regex first to an
NDFA and then to a deterministic DFA by treating groups of possible
symbols to single "symbols". A problem I have with the article is that
it doesn't confront this need.
Aho Sethi et al.'s COMPILERS (Dragon book) addresses how to do this.
But for individual regexes it's straightforward to code them directly.
> On Aug 30, 2:30 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
<snip>
>>
>> In the current case, bolega (unknown name, to me at least) is
>> slagging off Brian Kernighan and Rob Pike (both names known to me)
>> and their book (which I've read). He starts with the word
>> "braggart", and the rest is similar ranting. It's just perfectly
>> normal bilgewater of the
>
> Don't make mock of my patronym
From Chambers: "*bilge* noun 1a the broadest part of a ship's bottom;
b (usually bilges) the lowermost parts on the inside of a ship's
hull, below the floorboards. 2 (also bilge-water) the dirty water
that collects in a ship's bilge. 3 /rather dated colloq/ rubbish,
nonsense, drivel."
I can see why you might object to the use of the neologism
"nilgewater", but I see no sensible reason for objecting to the
perfectly normal word "bilgewater". One might reasonably object to a
particular application thereof in a given context, but the word
itself is surely unobjectionable.
<snip>
A few comments:
1) That they totally mis-use the word "troll" is, of course, legion. It
wouldn't be such a big deal if it weren't for how obsessively anal they
are about words here. I.e., the hypocrisy is self-evident.
1a) It has certainly become common usage over the past decade or so to
use the word "troll" as a synonym for "people we don't like" (where
"we" is interpreted to mean "the Establishment"). This makes it no
less wrong, of course, but people would be more inclined to let it
pass if it weren't for the "clc personality type" (described above).
2) I have to admit also that Dicky was right about C's parameter passing
being exclusively call by value. I will further say that this fact
always impressed me - that everything is "by value". That is, in the
context of the minimalist/portable-assembler kind of language that C is.
2a) Nonetheless, that fact is that, in all but the most obscuirely
CLC-sense, it is true that C implements call-by-reference via pointers.
To argue otherwise, is just being a jerk, and I think we all
recognize that. It is the CLC disease to be technically right, but
so obviously wrong, at the same time.
which displays in its entirety a lisp program remarkable (in my
opinion)
for its brevity and generality.
Somehow I think this thread is not about lisp or regular expressions,
but I can't figure out what it IS about, so apologies in advance if my
response is off topic. I'm just sort of going by the subject line.
RJF
>In <04b443ab-b17b-4ec9...@i8g2000pro.googlegroups.com>,
>spinoza1111 wrote:
>
>> On Aug 30, 2:30 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
><snip>
>>>
>>> In the current case, bolega (unknown name, to me at least) is
>>> slagging off Brian Kernighan and Rob Pike (both names known to me)
>>> and their book (which I've read). He starts with the word
>>> "braggart", and the rest is similar ranting. It's just perfectly
>>> normal bilgewater of the
>>
>> Don't make mock of my patronym
>
>From Chambers: "*bilge* noun 1a the broadest part of a ship's bottom;
>b (usually bilges) the lowermost parts on the inside of a ship's
>hull, below the floorboards. 2 (also bilge-water) the dirty water
>that collects in a ship's bilge. 3 /rather dated colloq/ rubbish,
>nonsense, drivel."
>
>I can see why you might object to the use of the neologism
>"nilgewater", but I see no sensible reason for objecting to the
>perfectly normal word "bilgewater". One might reasonably object to a
>particular application thereof in a given context, but the word
>itself is surely unobjectionable.
>
><snip>
Can we, at least, object to the making of a single word noun from a
two word "adjective noun" combination?
English already has more than 1.2 million perfectly good words. Let's
all try using them before inventing new ones.
George
Even though I'm quite sure the original post really was some close
relative of trolling, let's dissect the code itself, and see if it
really can be improved a great deal:
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
If an RE starts with '^', it has to match at the beginning of the
text, so we check only for a match of the remainder of the RE,
starting from the beginning of the text.
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
Otherwise, we step through each possible position in the text, and
check for a match at that position.
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
If we've reached the end of the RE, return 1, indicating we found a
match.
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
If we find a '*', use 'matchstar' to check for a match.
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
If the RE ends in '$', we have a match only if we've reached the end
of the text.
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
If we're at the end of the text, but there's more to the RE, we don't
have a match. Otherwise, if the current character of the RE is '.',
OR it's any other (non-meta-) character that matches the current
character in the text, we have a match iff the remainder of the RE
matches the remainder of the text.
return 0;
Otherwise (e.g. we were at the end of the text, but not the end of
the RE) we do not have a match.
So far, it's simple and straightforward -- we've simply enumerated
the possibilities, and handled each in a fairly straightforward
manner.
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
Here's where things get a little bit tricky. Recall that matchstar()
is called when matchhere() encountered an RE something like 'x*E'. It
calls matchstar(), passing the 'x' as the first parameter, "E" as the
second, and the text as the third (where 'x' is some single
character, and "E" is an arbitrary RE).
matchstar() then says that such an RE matches if 1) 'x' matches some
number of characters in text, followed by E matching the remainder of
the text.
return 0;
If we fall out of the loop, we either reached the end of the text
without finding a match for the remainder of the RE, or else because
we found a mismatch between 'x' and the current character in the RE
before finding a match between the remainder of the RE and the
remainder of the text.
As to the original implication that the code is particularly
difficult to read or understand, I'd have to say no, not really. In
fact, much like most well written recursive code, it's structured
very much like an inductive proof: establish an invariant for what's
currently being considered, then show that the invariant remains true
for the remainder.
In this case, it does that by (mostly in matchhere) checking whether
the current item in the RE matches the current character in the text,
and if it does recursing to establish whether the remainder of the RE
matches the remainder of the text.
It's also rather similar to rather a large amount of Lisp code. The
primary difference is that in Lisp, the primary data structure is
typically a list, whereas here it's a string (or at least C's rather
limited version of a string). Nonetheless the similarity in usage is
striking. In essence, the code works primarily in terms of a CAR and
a CDR. The code examines the CAR directly, and assuming that's
successful, examines the CDR via a recursive call.
For most practical purposes, this _is_ Lisp code that happens to use
a rather strange syntax.
--
Later,
Jerry.
[snip]
> As to the original implication that the code is particularly
> difficult to read or understand, I'd have to say no, not really. In
> fact, much like most well written recursive code, it's structured
> very much like an inductive proof: establish an invariant for what's
> currently being considered, then show that the invariant remains true
> for the remainder.
>
> In this case, it does that by (mostly in matchhere) checking whether
> the current item in the RE matches the current character in the text,
> and if it does recursing to establish whether the remainder of the RE
> matches the remainder of the text.
[snip]
I can only agree with the positive judgment.
After having fiddled quite a bit with the code, having it translated
to C++ in a straightforward manner (replacing "char*" with "const
string&"), having profiled this first version and found it to be 1000+
times slower than the original, having created a second version with
the help of a class that works on const strings and passes only
indexes around, having finally found this second version to be ~100
times slower than the original, having, after all, considered the
original to be quite clear to read and easy to extend, I must conclude
that, unless considering a completely different approach to the task,
the original C code is really, really good for its purposes.
Maybe someone could think that the C syntax along with its pointers is
awful, but I like it.
I suppose it's matter of tastes, after all.
Just my two cents,
have good time,
Francesco
Well, Dickon,
What think'st thou Norfolke.
Nor. A good direction warlike Soueraigne,
This found I on my Tent this Morning.
Iockey of Norfolke, be not so bold,
For Dickon thy maister is bought and sold.
I still object to "bilge", etc., because you are like a child making
fun of my patronym.
You are a dull Fellowe
With a streak of bright Yellowe
Who when he loses an argument
Says 'tis not what I meant,
And who makes a child's game
Out of a person's father's name
Because unlike myself
Thou hast no Pelf
In the wealthe that lieth in words
Your posties they are for the birds.
A Yank can yank your chaine
And cause you paine, knowing he
More than you about your paternitie,
The English language you have lost.
Jockey of Heathfield, be not so bold
We're tired of you, and you're gettin' old.
<snip>
> I still object to "bilge", etc., because you are like a child making
> fun of my patronym.
I thought your patronym was Nilges, not Bilges. Was that a mistake?
I am curious, so you are saying that one should not be impressed by c
and K+R and plan9 and all the rest? That they kind of suck and the
mccarthy really is much better and his lisp can do things much more
simply in fewer line of code?
reality: show me the apps!! apps I am using now: windows7[c?],
firefox3[c], pidgin char[gtk+c?], opera browser[c?], and vlc media
player[c] seem all to be made in c, so where are the lisp replacements
and I shall happily use them, esp if they are more felxible and fast
enuf.
forth for the win!
inyego montoya, you killed my father! you are about to die!
everyone knows this
just as in forth one should learn the base word list before making a
word to do something from lots fo base words that does the same as a
low level word
are you talking about using a perl compat regex package in lisp? or
doing regex natively in lisp somehow?
wow you brits crack me, no wonder been at war for 300 years
"He" has the need to explain it because its goal is to provide advice
and guidance, ie. be educational.
Now go fetch Lisp device drivers.
Surely you're not denying that "bilgewater" is being used because of the
similarity of Nilges and Bilge?
However, he's mistaken in thinking that you're making fun of his
patronym; it's the comparison between the qualities of his writing the
qualities of bilgewater that provides the humor, and the (well deserved)
insult. There's nothing particularly humorous, in itself, about the
similarity between Nilges and bilge.
Hey, Reading Rainbow, I'm a Yank, and it says so in the post. I am
carrying the battle to the enemy as did John Paul Jones when he
invaded Britain during the Revolutionary war.
<snip>
> There's nothing particularly humorous, in itself,
> about the similarity between Nilges and bilge.
No, it isn't even remotely amusing.
> On Sep 1, 4:58О©╫am, spinoza1111 <spinoza1...@yahoo.com> wrote:
>>
>> Read this excellent article: Software Fault Prevention by Language
>> Choice: Why C is Not My Favorite Language, Richard Fateman. Fateman is
>> on the faculty of the rather C-centric Univ of California at Berkeley.
>> http://www.eecs.berkeley.edu/~fateman/papers/software.pdf.
>
> forth for the win!
The world still waits for you to write any program longer than 8 lines,
in any language, be it Forth, Lisp or Scheme.
--
CE3OH...
The dude from Zeppelin?!
--
appendix
Oh never mind...eye roll...deep, deep sigh...crotch grab...
>
> --
> appendix- Hide quoted text -
How about 26000 lines in Visual Basic? 50000 in Cobol? 75000 in
assembler? 25000 in PL/I? 20000 in C sharp? 10000 in C? 1000 in TI-79
machine language, a compiler and run time in 1K for Mouse, a Forth
knockoff?
>
> --
> CE3OH...
So why do you make a joke of it? Is it perhaps because you're a feeb
and this is the only reply you can make to my verse?
> On Sep 2, 6:11 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
>> In <h7lg4v$sh...@news.eternal-september.org>, James Kuyper wrote:
>>
>> <snip>
>>
>> > There's nothing particularly humorous, in itself,
>> > about the similarity between Nilges and bilge.
>>
>> No, it isn't even remotely amusing.
>
> So why do you make a joke of it?
My use of the word "bilgewater" (or, more properly, "bilge-water",
according to Chambers) was not intended to be a joke.
> Is it perhaps because you're a feeb
> and this is the only reply you can make to my verse?
I wouldn't call that "verse". It wouldn't even qualify as doggerel.
Of course not. You grind on without levity and at best create
confusion. We have to beat answers out of you since you insist on
being literally "right" at all times.
>
> > Is it perhaps because you're a feeb
> > and this is the only reply you can make to my verse?
>
> I wouldn't call that "verse". It wouldn't even qualify as doggerel.
Sorry, but of verse you are no judge
Any more than any packer of fudge. Stick with C:
You have no poetry.
> On Sep 2, 10:44 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
>> In
>>
<fd5edcf0-86d1-4f52-b8f5-6fb182b1a...@f18g2000prf.googlegroups.com>,
>>
>> spinoza1111wrote:
>> > On Sep 2, 6:11 pm, Richard Heathfield <r...@see.sig.invalid>
>> > wrote:
>> >> In <h7lg4v$sh...@news.eternal-september.org>, James Kuyper
>> >> wrote:
>>
>> >> <snip>
>>
>> >> > There's nothing particularly humorous, in itself,
>> >> > about the similarity between Nilges and bilge.
>>
>> >> No, it isn't even remotely amusing.
>>
>> > So why do you make a joke of it?
>>
>> My use of the word "bilgewater" (or, more properly, "bilge-water",
>> according to Chambers) was not intended to be a joke.
>
> Of course not. You grind on without levity and at best create
> confusion.
So first you accuse me of trying to be funny, and then you accuse me
of trying *not* to be funny. It becomes evident, therefore, that you
are only really interested in criticising me, no matter what I do.
There is obviously no point, therefore, in paying any attention to
any of your criticisms.
> We have to beat answers out of you
On the contrary, I'm generally happy to provide answers, when I know
them. I don't know /all/ the answers, however.
> since you insist on
> being literally "right" at all times.
It is very clear that you have not succumbed to the same temptation.
<nonsense snipped>
> [...] It is my belief and
> observation on a lot of problems by these so called "demi gods" that
> they are actually all average and no more intelligent. Its all that
> they got some opportunities to study some things at opportune time and
> facilities and also spent a lot of time in a productive environment
> and team.
it's a *bad* thing that they spent time in a productive team?
> I know that lisp eval is written more clear than this recursion below
> because I am able to read it easily. and that code is almost self
> explanatory. C is more quirky. When you really mean recursively call
> another function, you are using return so you can have tail
> recursion ???? .
ah, what? In what sense is C's recursion any more difficult that
any other language's recursion?
<snip>
> This braggart admits that he had to put this code in TWO books and
> visit it twice to be explained. I am puting the excerpt from pp2-4 of
> this book and the C code. The C code will become indented and syntax
> highlighted once you paste in emacs etc. It is my belief and
> observation on a lot of problems by these so called "demi gods" that
> they are actually all average and no more intelligent. Its all that
> they got some opportunities to study some things at opportune time and
> facilities and also spent a lot of time in a productive environment
> and team.
>
> I know that lisp eval is written more clear than this recursion below
> because I am able to read it easily. and that code is almost self
> explanatory. C is more quirky. When you really mean recursively call
> another function, you are using return so you can have tail
> recursion ???? .
>
> Anyway, its your chance to show how clear C/C++/lisp/Scheme code you
> can write that is clearer. Also, i dont exclude pseudocode but it
> should be clear enough to be instantly translatable to a programming
> language. The real goal is to learn how to write or DERIVE recursion,
> how to enumerate cases, order them, and build recursion. You may even
> put some introductory tuturial and dont have to reply here in ascii
> but can upload a pdf at some link in rapidshare etc. Look how he put
> the code after problem statement and then has the need to explain it.
> Ideally, the discussion should be such that the student or reader
> himself jumps to the solution. That is why I give these unix school of
> pike/thomson/kernighan low grade in almost all their expositions
> except that they wrote earliest books to make millions of dollars in
> royalties and since now they are nobody due to linux, they are poorly
> regurgitating old material.
>
> Enjoy .............
>
> ============================
>
> The Practice of Programming
>
> In 1998, Rob Pike and I were writing The Practice of Programming
> (Addison-Wesley). The
> last chapter of the book, “Notation,” collected a number of examples
> where good notation
> led to better programs and better programming. This included the use
> of simple data specifications
> (printf, for instance), and the generation of code from tables.
> Because of our Unix backgrounds and nearly 30 years of experience with
> tools based on
> regular expression notation, we naturally wanted to include a
> discussion of regular
> expressions, and it seemed mandatory to include an implementation as
> well. Given our
> emphasis on tools, it also seemed best to focus on the class of
> regular expressions found in
> grep—rather than, say, those from shell wildcards—since we could also
> then talk about the
> design of grep itself.
> The problem was that any existing regular expression package was far
> too big. The local
> grep was over 500 lines long (about 10 book pages) and encrusted with
> barnacles. Open
> source regular expression packages tended to be huge—roughly the size
> of the entire
> book—because they were engineered for generality, flexibility, and
> speed; none were
> remotely suitable for pedagogy.
> I suggested to Rob that we find the smallest regular expression
> package that would illustrate
> the basic ideas while still recognizing a useful and nontrivial class
> of patterns. Ideally,
> the code would fit on a single page.
> Rob disappeared into his office. As I remember it now, he emerged in
> no more than an
> hour or two with the 30 lines of C code that subsequently appeared in
> Chapter 9 of The
> Practice of Programming. That code implements a regular expression
> matcher that handles
> the following constructs.
>
> Character Meaning
> c Matches any literal character c.
> . (period) Matches any single character.
> ^ Matches the beginning of the input string.
> $ Matches the end of the input string.
> * Matches zero or more occurrences of the previous character.
>
> This is quite a useful class; in my own experience of using regular
> expressions on a day-today
> basis, it easily accounts for 95 percent of all instances. In many
> situations, solving the
> right problem is a big step toward creating a beautiful program. Rob
> deserves great credit
> for choosing a very small yet important, well-defined, and extensible
> set of features from
> among a wide set of options.
> Rob’s implementation itself is a superb example of beautiful code:
> compact, elegant,
> efficient, and useful. It’s one of the best examples of recursion that
> I have ever seen, and it
> shows the power of C pointers. Although at the time we were most
> interested in conveying
> the important role of good notation in making a program easier to use
> (and perhaps
> easier to write as well), the regular expression code has also been an
> excellent way to
> illustrate algorithms, data structures, testing, performance
> enhancement, and other
> important topics.
>
> Implementation
> In The Practice of Programming, the regular expression matcher is part
> of a standalone program
> that mimics grep, but the regular expression code is completely
> separable from its
> surroundings. The main program is not interesting here; like many Unix
> tools, it reads
> either its standard input or a sequence of files, and prints those
> lines that contain a match
> of the regular expression.
> This is the matching code:
> /* match: search for regexp anywhere in text */
> int match(char *regexp, char *text)
> {
> if (regexp[0] == '^')
> return matchhere(regexp+1, text);
> do { /* must look even if string is empty */
> if (matchhere(regexp, text))
> return 1;
> } while (*text++ != '\0');
> return 0;
> }
> /* matchhere: search for regexp at beginning of text */
> int matchhere(char *regexp, char *text)
> {
> if (regexp[0] == '\0')
> return 1;
> if (regexp[1] == '*')
> return matchstar(regexp[0], regexp+2, text);
> Character Meaning
> c Matches any literal character c.
> . (period) Matches any single character.
> ^ Matches the beginning of the input string.
> $ Matches the end of the input string.
> * Matches zero or more occurrences of the previous character.
> 4 C H A P T E R O N E
> if (regexp[0] == '$' && regexp[1] == '\0')
> return *text == '\0';
> if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
> return matchhere(regexp+1, text+1);
> return 0;
> }
> /* matchstar: search for c*regexp at beginning of text */
> int matchstar(int c, char *regexp, char *text)
> {
> do { /* a * matches zero or more instances */
> if (matchhere(regexp, text))
> return 1;
> } while (*text != '\0' && (*text++ == c || c == '.'));
> return 0;
> }
Arc:
(def match (regex-str text-str)
(with (regex (coerce regex-str 'cons)
text (coerce text-str 'cons))
(if (is #\^ (car regex))
(match-here (cdr regex) text)
(or
(reclist (fn (xs) (match-here regex xs)) text)
(match-here regex '())))))
(def match-here (regex text)
(or
(empty regex)
(and (is #\* (cadr regex))
(match-star (car regex) (cddr regex) text))
(and (is #\$ (car regex)) (single regex) (empty text))
(and (or (and (acons text) (is #\. (car regex)))
(is (car regex) (car text)))
(match-here (cdr regex) (cdr text)))))
(def match-star (chr regex text)
(or
(match-here regex text)
(if (and (acons text) (or (is chr (pop text)) (is chr #\.)))
(match-star chr regex text)
nil)))
Jarc> (match "ob" "foobar")
t
Jarc> (match "ab" "foobar")
nil
Jarc> (match "fob" "foobar")
nil
Jarc> (match "fo*b" "foobar")
t
Jarc> (match "f.*b" "foobar")
t
Jarc> (match "^f.*b" "foobar")
t
Jarc> (match "f.*b$" "foobar")
nil
Jarc> (match "b.r$" "foobar")
t
Jarc> (match "^foobar$" "foobar")
t
Jarc> (match "" "foobar")
t