Why no reg exp in Forth?

Albert van der Horst

unread,

Jul 31, 2002, 5:48:02 PM7/31/02

to

Why don't we have a nice package for regular expressions in Forth?

I finally seem to understand why.
The reason is implementation problems. I just don't grok all the
talk about "not useful" or " not Forth like" or "Gray is better"

Let us look at the traditional way to handle regular expressions.

Say we have
[abc]*b
For those not in the know: this matches
a sequence of characters matching 'a' or 'b' or 'c' followed by
'b'.

So the following string are matched :
b ab bb cb cbab

Note that in matching "bb" we first match "bb" to "[abc]*". Then we discover
that the string is exhausted so we retry by only matching "b" to "[abc]*".
This is called backtracking. Because this solves a real problem for you,
implementing it presents a real problem too.

This is handled in FORTRAN (Kernighan and Pike) by first translating the
reg exp in a buffer with a mixed content (tokens count and characters).

token: CHARACTER CLOSURE
count: 3
char: a
char: b
char: c
token: CHARACTER EXACT MATCH
char: b

Afterwards this buffer is interpreted, during the checking of the
target string.

In Forth of course we have but one interpreter so
naturally this translates in Forth into :

xt for CHARACTER CLOSURE
count: 3
char: a
char: b
char: c
xt for: CHARACTER EXACT MATCH
char: b
xt for EXIT

HOWEVER!
This is a design error of the worst kind, and everybody going there
gets into problems without ever recovering.
On the other hand nobody ever seems to try out a different way to do it.

The reason is the following:
1. we have to handle the inline data, which is inhomogeneous as well
2. for backtracking purposes we must remember (in this example) the
position of " xt for: CHARACTER EXACT MATCH" to execute this and the following
code again and again.
3. for backtracking purposes we must remember where we started matching,
probably on the return stack.

These three uses of the return stack (two of them non portable at that!) get
entangled to the point of no return.

Bottom line: no regular expressions in Forth.

----------------------------------------------------------------------------

Interestingly, a straightforward port of my c++-code for regular expressions
appeared quite feasible (thus far, i.e. short of debugging).

This will lead to code in the following style that many will find
offensive. However it only requires CORE set words.
All the IF branches put a backtrack-to-where? pointer on the return stack.
So replacing the multiple IF's with a jump table is not straightforward.

\ **************************************************************************
\ WARNING: THIS IS NOWHERE WORKING CODE
\ IT IS THE RESULT OF A CONVERSION FROM c++ TO SEE HOW IT LOOKS
\ THE c++-CODE IS NOT STANDARD REG EXP TO BEGIN WITH
\ **************************************************************************

2 CONSTANT CCHAR \ Specific character
4 CONSTANT CANY \ Any character
6 CONSTANT CSET \ A character set
8 CONSTANT CSETC \ A character set, characterized by its complement
01 CONSTANT CLOS \ And this in to get the closure of one of the above
10 CONSTANT CEOF \ Marks the separation between fields
12 CONSTANT CSYNC \ Match as few characters as possible
0 CONSTANT CEND \ Marks the end of the regular expression

\ Layout of expression in ``patternE''.
\ NOTE: THIS IS A SEQUENTIAL LAYOUT!
\ It consists of the following elements (separated by ',' | is bitwise or )
\ CCHAR c , CLOS|CCHAR c , CANY , CLOS|CANY , CEOF #field ,
\ CSET count (count-1)*c , CSETC count (count-1)*c ,
\ CLOS|CSET count (count-1)*c , CLOS|CSETC count (count-1)*c ,
\ CSYNC , CEND
\ ``c'' means any character except null ('\000'). Null is not allowed,
\ because it must not be present in the expression to be matched, it is
\ needed as a terminator. ``count'' and ``#field'' are integers.
\ #field 0 matches the beginning of the record (say '^') and 1 ..
\ matches the field separators ('$' of implied in '@'). The last field
\ separator is the CEOF 0 of the next record.
\ CEOF and CEND are restricted
\ Note: the position in ``patternE'' determines whether it is a
\ ``charclass'', a count or a character. If sequentially parsed this
\ does not lead to ambiguity despite overlapping ranges
\ (All unsigned char's are acceptable in the expression as c.)

\ ***************************************************************************/
\ */
\ Returns "the string at ``lp'' matches the whole of the regular */
\ expression ``ep''" */
\ */
\ ***************************************************************************/

: matchE ( CHAR *lp VARIABLE *ep -- fl )
\ Each pass through this loop handles an element of the compiled
\ expression (where ``ep'' points), including its closure (if any).
\ Returns as soon as a mismatch is detected.
BEGIN
COUNT
DUP CEND = IF DROP 2DROP TRUE EXIT
ELSE DUP CEOF = IF DROP
OVER >R COUNT R> !SEPPOSITION
0 >R
ELSE DUP CCHAR = IF DROP
\ Special action needed for matchee exhausted
COUNT >R SWAP COUNT R> <> IF 2DROP FALSE EXIT THEN
SWAP
0 >R
ELSE DUP CCHAR CLOS OR = IF DROP
OVER >R
COUNT >R SWAP
BEGIN DUP C@ R@ = WHILE 1+ REPEAT RDROP
ELSE DUP CANY = IF DROP
SWAP COUNT 0= IF 2DROP FALSE EXIT THEN
SWAP
0 >R
ELSE DUP CSYNC = IF DROP
\ Special action needed for matchee exhausted
BEGIN OVER C@ WHILE
\ Adaption for 3 param matchE
2DUP RECURSE IF 2DROP TRUE EXIT THEN
SWAP 1+ SWAP
REPEAT
2DROP FALSE EXIT
ELSE DUP CSET = IF DROP
>R
R@ C@ ismemberofsetE 0= IF RDROP 2DROP FALSE EXIT THEN
COUNT + R> SWAP
0 >R
ELSE DUP CSET CLOS OR = IF DROP
OVER >R
>R
BEGIN DUP R@ C@ ismemberofsetE WHILE R> 1+ >R REPEAT
COUNT + R> SWAP
ELSE DUP CSETC = IF DROP
\ Special action needed for matchee exhausted
OVER C@ 0= IF 2DROP FALSE EXIT THEN
>R
R@ C@ ismemberofsetE IF RDROP 2DROP FALSE EXIT THEN
COUNT + R> SWAP
0 >R
ELSE DUP CSETC CLOS OR = IF DROP
OVER >R
>R
BEGIN DUP R@ C@ ismemberofsetE 0= WHILE R> 1+ >R REPEAT
COUNT + R> SWAP
THEN THEN THEN THEN THEN THEN THEN THEN THEN THEN

\ `lp' points to the first non-matching character
\ At this point we have matched as much patterns as possible
\ Backtrack if we have matched too much
R@ IF
BEGIN
\ Adaption for 3 param matchE
2DUP RECURSE IF RDROP 2DROP TRUE EXIT THEN
OVER R@ > WHILE
SWAP 1- SWAP
REPEAT
2DROP RDROP FALSE EXIT
THEN
RDROP
AGAIN
;

\ **************************************************************************
\ WARNING: THIS IS NOWHERE WORKING CODE
\ IT IS THE RESULT OF A CONVERSION FROM C TO SEE HOW IT LOOKS
\ THE C-CODE IS NOT STANDARD REG EXP TO BEGIN WITH
\ **************************************************************************
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
To suffer is the prerogative of the strong. The weak -- perish.
alb...@spenarnc.xs4all.nl http://home.hccnet.nl/a.w.m.van.der.horst

Marcel Hendrix

unread,

Aug 2, 2002, 12:49:24 AM8/2/02

to

Albert van der Horst writes:

> Why don't we have a nice package for regular expressions in Forth?

Because you don't take the trouble to search for it?

The iForth distribution has the Hawk language implementation which sports
regexps and associative arrays. And there is FOSM. And our Australian friend
whose name escapes me (Monty Python: it must be a Bruce) had also a lot of
postings on it 1 or 2 years ago.

But a certain hot-headed ignorance of the past is of course necessary to
embark on any project :-)

-marcel

Gary Chanson

unread,

Aug 2, 2002, 1:24:16 AM8/2/02

to

"Marcel Hendrix" <m...@iaehv.iae.nl> wrote in message
news:obo29.5998$Q4.6...@typhoon.bart.nl...

Quest32 has a regular expression engine, too (both a library and in the
editor).

--

-Gary Chanson (MVP for Windows SDK)
-Software Consultant (Embedded systems and Real Time Controls)
-gcha...@mvps.org

-Abolish public schools

Guido Draheim

unread,

Aug 2, 2002, 10:40:16 AM8/2/02

to

Es schrieb Marcel Hendrix:

I did find FOSM and dumped it for being too unusual in syntax.

Apart from iForth special or Quest32 specials, is there are portable
regex package for forth that comes to your mind, and one that
derives from some established regex syntax? That would be fantastic!

Wil Baden

unread,

Aug 2, 2002, 12:23:20 PM8/2/02

to

In article <3D4A99D0...@gmx.de>, Guido Draheim
<guidod...@gmx.de> wrote:

> is there are portable
> regex package for forth that comes to your mind, and one that
> derives from some established regex syntax? That would be fantastic!

ThisForth has standard regexpr. It is Tatu Ylonen's free 1991 version
in C with Forth linkage. Any Forth-in-C can easily have standard
regexpr.

I thought that it would be fantastic.

However my experience showed that generalized SKIP and SCAN is more
powerful. For example: Text to HTML; HTML to text.

My basic functions are:

SKIP[ <character test> ]SKIP
SCAN[ <character test> ]SCAN
BACK[ <character test> ]BACK
Is-Upper
Is-Lower
Is-Alpha
Is-Digit
Is-Alnum
Is-White
View-Next-Line

--
Wil Baden

Elizabeth D. Rather

unread,

Aug 2, 2002, 3:22:07 PM8/2/02

to

Albert van der Horst wrote:

> Why don't we have a nice package for regular expressions in Forth?
>
> I finally seem to understand why.
> The reason is implementation problems. I just don't grok all the
> talk about "not useful" or " not Forth like" or "Gray is better"

I'm happy to hear from some other posts in this thread that there are several
versions of regular expression parsers in Forth around.

As for us, we don't have a version in our products for the simple
reason that we've never encountered a need for it and our customers
have never asked for it (this is based on 30 yrs experience, primarily
in embedded systems but with some data base work along the way).

I'm certainly not saying that it "isn't useful", obviously there's a class
of problems for which it's quite useful. We just haven't run into it.
Our priorities must be to solve our customers problems as effectively
as possible, and to provide tools relevant to solving those problems.

Perl, which has been much under discussion here, is a special-
purpose language for a certain class of problems. That's fine,
if you have that kind of problem you should probably use Perl.
I don't see the logic of complaining that Forth (which was also
designed to address a certain class of problems, specifically
embedded & real-time apps, although it's certainly been
found useful in other application domains) should be able
to reproduce the functionality of a language designed to
address a different, special class of problems.

Cheers,
Elizabeth

Chris Jakeman

unread,

Aug 2, 2002, 4:51:09 PM8/2/02

to

On Fri, 02 Aug 2002 04:49:24 GMT, m...@iaehv.iae.nl (Marcel Hendrix)
wrote:

>
>Albert van der Horst writes:
>
>> Why don't we have a nice package for regular expressions in Forth?
>
>Because you don't take the trouble to search for it?
>
>The iForth distribution has the Hawk language implementation which sports
>regexps and associative arrays. And there is FOSM.

FoSM is neat because it is not a package layered on top of Forth as
you might expect but an extension of Forth as a pattern-matching
language. FoSM's patterns are Forth words that can be compiled,
executed and combined with ANS Forth words.

Bye for now ____/ / __ / / / / /
/ / / _/ / / / /
Chris Jakeman __/ / / __ / / /_/
/ / / / / / / \
[To reply, please __/ __/ ____/ ___/ __/ _\
unspam my address]
Forth Interest Group United Kingdom
Voice +44 (0)1733 753489 chapter at http://www.fig-uk.org

Jeff Fox

unread,

Aug 2, 2002, 5:37:10 PM8/2/02

to

Albert van der Horst wrote:
> Why don't we have a nice package for regular expressions in Forth?
> I finally seem to understand why.
> The reason is implementation problems.

I don't think so.

> I just don't grok all the talk about "not useful" or
> "not Forth like" or "Gray is better"

We started out with a variant of the Gray parser for parsing HTML
files in our Forth browser at iTV. The second version was greatly
improved in performance, complexity, and size by removing the parser.
It was much better to treat HTML as Forth. It was "more Forth-like"
to throw out the stuff that was "not useful" for that problem
and use the stuff that was built into Forth for the purpose.
That seems to be the hardest thing for people to grok about Forth.

best wishes,
Jeff

Stephen J. Bevan

unread,

Aug 2, 2002, 6:23:28 PM8/2/02

to

Jeff Fox <f...@ultratechnology.com> writes:
> We started out with a variant of the Gray parser for parsing HTML
> files in our Forth browser at iTV. The second version was greatly
> improved in performance, complexity, and size by removing the parser.
> It was much better to treat HTML as Forth.

Could you expand on what you mean by "treat HTML as Forth"? Do you
mean altering what the outer interpreter considers to be a word
terminator so that '>' and/or perhaps '<' are considered terminators
so that you can "parse" something like :-

<HTML><HEAD><TITLE>Example</TITLE></HEAD><BODY>Hello</BODY></HTML>

or do you use some other mechanism? If that is the case, or some
other use is made of the outer interpreter, I'm curious as to what
approach would be taken in ColorForth.

Jeff Fox

unread,

Aug 2, 2002, 7:28:05 PM8/2/02

to

"Stephen J. Bevan" wrote:
> Could you expand on what you mean by "treat HTML as Forth"? Do you
> mean altering what the outer interpreter considers to be a word
> terminator so that '>' and/or perhaps '<' are considered terminators
> so that you can "parse" something like :-
>
> <HTML><HEAD><TITLE>Example</TITLE></HEAD><BODY>Hello</BODY></HTML>

Yes. A big problem in a browser is that most HTML pages have errors.
Different browsers handle them in different ways. Trying to apply
formal parsing methods to unravel the various errors was not as
nice as extending Forth to execute HTML to parse a file and format
a page for display.

> I'm curious as to what approach would be taken in ColorForth.

Chuck talked a bit about his vision of Forth Markup Language
in his chatroom interview this May. Interesting question.
http://www.ultratechnology.com/chatlog.htm

best wishes,
Jeff

Stephen J. Bevan

unread,

Aug 2, 2002, 11:24:48 PM8/2/02

to

Jeff Fox <f...@ultratechnology.com> writes:
> > I'm curious as to what approach would be taken in ColorForth.
>
> Chuck talked a bit about his vision of Forth Markup Language
> in his chatroom interview this May. Interesting question.
> http://www.ultratechnology.com/chatlog.htm

There he mentions converting HTML->FML and then driving the display by
interpreting the FML. No mention of how the HTML->FML translation
would be done though. Assuming this is done using the same technique
as you used, then it would seem that something very much like the
outer interpreter would form part of HTML->FML converter. However,
Chuck's approach with ColorForth seems to be that the outer
interpreter is not needed at runtime -- most of its work having been
done by the editor. I'll be interested to hear what direction he
takes for HTML->FML.

a...@redhat.invalid

unread,

Aug 3, 2002, 9:36:08 AM8/3/02

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
> Why don't we have a nice package for regular expressions in Forth?

We do, as many have pointed out.

But that's not what I want to address here. As a result of your
posting I've been studying the pcre package and it strikes me that
although very nice and convenient, perhaps such a powerful regular
expression syntax is overkill for many Forth applications. pcre is
5000 lines of code. Maybe full regular expressions are an
over-generalized solution to many classes of parsing problems.

On the other hand, perhaps it's nice not to have to worry about such
things when I need a quick 'n dirty job done.

Andrew.

John Passaniti

unread,

Aug 3, 2002, 1:31:02 PM8/3/02

to

"Elizabeth D. Rather" <era...@forth.com> wrote in message
news:3D4ADBDF...@forth.com...

> Perl, which has been much under discussion here, is

> a special-purpose language for a certain class of
> problems.

I wouldn't class Perl as a "special purpose language." Like any language,
it has strengths and weaknesses, but that alone doesn't make it a "special
purpose" language. Examples of what I would consider as special purpose
languages include things like SQL, Pilot, resource compilers for GUIs, JCL,
VHDL, tbl, pic, roff, and the endless "little languages" that we all come up
with to solve very specific problems.

I see Perl as a general purpose language that happens to have good support
for processing textual data. Because of that, it is commonly seen where
text processing is useful and interesting, such as CGI, system
administration, and increasingly bioinformatics. But there is more to the
language than processing text.

> I don't see the logic of complaining that Forth (which was
> also designed to address a certain class of problems,
> specifically embedded & real-time apps, although it's
> certainly been found useful in other application domains)
> should be able to reproduce the functionality of a language
> designed to address a different, special class of problems.

When Larry Wall first created Perl, he designed it as a language that he
thought was good for report generation and system administration tasks. He
took features from other languages (and a few new ideas), put it in a
blender, and came up with a language that for a few years was rarely used
for anything but the tasks he designed it for.

Several years later when the World Wide Web became a big deal, programmers
discovered that Perl is a fine language for doing CGI work. None of this
was planned or designed-- it just so happened that the text processing
features in Perl were a good match for CGI work. Now several years later,
people who work with bioinformatic databases (such as genetic researchers)
are finding that the language (and add on modules that are available) work
well for them.

Who knows who will use Perl 5, 10, or 20 years from now.

I frankly don't believe Forth was *designed* for the niches it now finds
itself in. Certainly my reading of Forth's early history doesn't suggest
that Mr. Moore (and you, or others) ever sat down and said, "I'm going to
*design* a language that will have limited applicability for these specific
application domains." I see no special constructs in Forth that are
specific to embedded work.

Instead, what happened was that programmers found Forth's attributes of
small size, speed, interactivity, and extensibility to be suitable for
embedded work. And like CGI programmers who latched on to Perl, embedded
programmers latched on to Forth.

You state that Forth, Inc. never had the need for regular expressions and
customers never asked you for it. The conclusion you drew from that was
that such a language feature wasn't important for your work. Fine-- but
another conclusion might be that Forth, Inc. has a reputation of working
primarily in certain application domains and so only draws customers that
have needs similar to other customers you have served. In other words,
"Forth is for embedded and real-time systems" became a self-fulfilling
self-selecting prophecy.

Maybe Forth, Inc. isn't interested in seeking out other application domains
where Forth can be useful. Or maybe there is a lack of vision here-- that
if Forth did have some language facilities programmers have found useful in
other application domains, then perhaps Forth wouldn't be marginalized to
embedded systems. Of course, we'll never know, because plenty of people
(including yourself) seem perfectly happy limiting Forth to specific
application domains.

Elizabeth D. Rather

unread,

Aug 3, 2002, 4:46:01 PM8/3/02

to

John Passaniti wrote:

> "Elizabeth D. Rather" <era...@forth.com> wrote in message
> news:3D4ADBDF...@forth.com...
> > Perl, which has been much under discussion here, is
> > a special-purpose language for a certain class of
> > problems.
>
> I wouldn't class Perl as a "special purpose language." Like any language,
> it has strengths and weaknesses, but that alone doesn't make it a "special
> purpose" language. Examples of what I would consider as special purpose
> languages include things like SQL, Pilot, resource compilers for GUIs, JCL,
> VHDL, tbl, pic, roff, and the endless "little languages" that we all come up
> with to solve very specific problems.
>
> I see Perl as a general purpose language that happens to have good support
> for processing textual data. Because of that, it is commonly seen where
> text processing is useful and interesting, such as CGI, system
> administration, and increasingly bioinformatics. But there is more to the
> language than processing text.

> ...

>
> When Larry Wall first created Perl, he designed it as a language that he
> thought was good for report generation and system administration tasks. He
> took features from other languages (and a few new ideas), put it in a
> blender, and came up with a language that for a few years was rarely used
> for anything but the tasks he designed it for.
>
> Several years later when the World Wide Web became a big deal, programmers
> discovered that Perl is a fine language for doing CGI work. None of this
> was planned or designed-- it just so happened that the text processing
> features in Perl were a good match for CGI work. Now several years later,
> people who work with bioinformatic databases (such as genetic researchers)
> are finding that the language (and add on modules that are available) work
> well for them.

Ok, I think the only place we disagree is in the interpretation of "special
purpose language." What I meant by that is a language optimized for
certain classes of problems; your description of Perl above matches that
definition for me. I do not mean "limited to certain applications."

> ...

> I frankly don't believe Forth was *designed* for the niches it now finds
> itself in. Certainly my reading of Forth's early history doesn't suggest
> that Mr. Moore (and you, or others) ever sat down and said, "I'm going to
> *design* a language that will have limited applicability for these specific
> application domains." I see no special constructs in Forth that are
> specific to embedded work.

Well, I was there. Again, I think our definitions are differing, not the
facts. Forth wasn't designed to be "limited" in any way, but it
certainly was _optimized_ to solve certain problems found in the
embedded applications for which it was designed and developed.

> You state that Forth, Inc. never had the need for regular expressions and
> customers never asked you for it. The conclusion you drew from that was
> that such a language feature wasn't important for your work. Fine-- but
> another conclusion might be that Forth, Inc. has a reputation of working
> primarily in certain application domains and so only draws customers that
> have needs similar to other customers you have served. In other words,
> "Forth is for embedded and real-time systems" became a self-fulfilling
> self-selecting prophecy.
>
> Maybe Forth, Inc. isn't interested in seeking out other application domains
> where Forth can be useful. Or maybe there is a lack of vision here-- that
> if Forth did have some language facilities programmers have found useful in
> other application domains, then perhaps Forth wouldn't be marginalized to
> embedded systems. Of course, we'll never know, because plenty of people
> (including yourself) seem perfectly happy limiting Forth to specific
> application domains.

One of the basic principles in marketing for a small business with
limited resources is to identify a niche that is well-matched to your
strengths and focus on that niche. A marketing message that asserts
"My product [technology, ...] is good for everything" is actually a
weak and ineffective message, and attempting to field a product
that is attempting to be "all things to all people" is more likely to
produce a product that is only moderately good at anything, and
which has a poorly focused image to potential users.

FORTH, Inc. is very good at embedded & real-time applications.
We do not find that focus limiting, we find it presents a clear
picture of capabilities to potential customers. If someone who
is expert at another field (e.g. text processing) wants to tailor a
version of Forth to that and market it to folks in that arena that's
fine with me.

Cheers,
Elizabeth

Samuel A. Falvo II

unread,

Aug 3, 2002, 6:16:30 PM8/3/02

to

ste...@dino.dnsalias.com (Stephen J. Bevan) wrote in message news:<m31y9gp...@dino.dnsalias.com>...

> There he mentions converting HTML->FML and then driving the display by
> interpreting the FML. No mention of how the HTML->FML translation
> would be done though. Assuming this is done using the same technique

The technique I would use to process HTML text is to treat it
character by character (because of the free-form formatting of an HTML
and XML source file, you have no other choice). It's important to
realize that you have two basic modes of operation inside an HTML
file: text and tag.

When "interpreting text," you basically pass each character you find
into a buffer belonging to the last opened tag's container. Thus,
some buffer management will be needed if you do this (even if it's
just ','). An alternative is to actually render the characters
directly to the screen, clipping permitting, of course. I like the
former approach better, as a tree containing the structure of the
document can be built, and processed afterwards.

When a < character is found, however, the text interpretter knows that
what follows is going to be a command of some sort (e.g., bold,
italics, definition of a table, etc). Thus, we parse the word from
the succeeding text, delimited by either space, a /, or >, and treat
that as a directly executable Forth word. Look it up in the
dictionary, and EXECUTE.

NOTE: It's important to note that when processing XML documents, words
that start with / will have the leading / in the name. Thus, 
will call B and /B, respectively. However, will **ONLY** call P
(because parsing stops at the /). In software that I'd write, it'd be
up to the implementation of P to properly handle the empty-container
condition.

To facilitate processing of arguments, enough input context is exposed
to the called words to be able to parse out the arguments.

Once the word corresponding to an HTML tag has been executed, control
is returned to the character interpretter where the cycle repeats.
Only when the end of the input file has been reached will the
character interpretter exit.

But, that's how I'd do it. That's not necessarily how Chuck or Jeff
would do it. :)

\ HYPOTHETICAL CODE FOR ILLUSTRATION PURPOSES ONLY

CREATE Context 2 CELLS ALLOT

: Context! ( caddr u -- )
Context 2DUP CELL+ ! NIP ! ;

: Context@
Context DUP @ SWAP CELL+ @ ;

: HashWord
\ If necessary, massage the name of the word here to include a
standard
\ prefix or postfix, so that tag names are "unique" from normal Forth
words.
\ We wouldn't want people to enter <BYE> in our XML and have it work,
now
\ would we? :)
\
\ Alternatively, you could just search for the word in a special
word-list
\ guaranteed to be separate from the normal Forth dictionary.
;

DEFER UnsupportedTag ( -- ) ( or define it here if you wish )

: XMLTag
ParseTag HashWord FIND IF EXECUTE ELSE UnsupportedTag THEN ;

DEFER Datum ( ch -- ) ( or define it here if you wish )

: Bump
Context@ 1- SWAP 1+ SWAP Context! ;

: XMLChar
Context @ C@ DUP '< = IF DROP XMLTag ELSE Datum THEN Bump ;

: NotEOF?
Context CELL+ @ ;

: XML ( caddr u -- )
Context! BEGIN NotEOF? WHILE XMLChar REPEAT ;

--
Samuel A. Falvo II

Rick Hohensee

unread,

Aug 4, 2002, 12:15:43 AM8/4/02

to

"Elizabeth D. Rather" <era...@forth.com> wrote in message news:<3D4ADBDF...@forth.com>...

"regular expression" is probably a common computing term
because Ken Thompson felt is was the right way to approach
search features in a text editor. The UNIX ed editor had
regexes when it was a PDP assembly program. Regexes are the
base of Chomsky's hierarchy of language-like constructs, and
as such can be viewed as fundamental for a lot of things.
Ken Thompson thought so. ed and it's many offspring gave UNIX
and it's offspring a well-deserved reputation for
text-processing prowess. I've been using ed today.
Regexes are also usable on binary data. My Linux distro
is dependant on ed's ability to do that, to convert anything
I import to my non-standard filenames. If you know how to use
the standard unix toolset you have a hard time justifying a
purpose-built DBMS.

Rick Hohensee

Bernd Paysan

unread,

Aug 3, 2002, 5:59:44 PM8/3/02

to

a...@redhat.invalid wrote:

> pcre is
> 5000 lines of code. Maybe full regular expressions are an
> over-generalized solution to many classes of parsing problems.

gray.fs is only ~800 lines of code, and implements a recursive descent
parser. The BNF parser generator from Brad Rodriguez is a one-screener.
Since regexps are a under-powered solutions to many classes of parsing
problems (due to the restriction of being finite state machines), I suppose
that people rather write an ad-hoc parser instead of using regexps.

Using a special ad-hoc parser is typically a better solution than using
something generated (from gray or as regular expression doesn't matter), if
you care about speed and size.

E.g. the HTML example from Jeff Fox. Yes, you probably can write down a
grammar for HTML in gray, but it's overkill. HTML has two mode switches: <
to switch from text to tag and > to switch from tag to text. That's it, or
most of it. Tags have a name, and optional parameters (attribute=value or
attribute="string", typically with the option that strings or values
without quotes are delimited by the next space or the tag end, so that
broken HTML code can be read by the browser, too).

Another example is how HTML is to be parsed. In principle, a HTML document
is a hierarchical tree of data structures, just like an XML object (with a
few exceptions). However, in practice, HTML is a state machine. You can say
"bold and italics just italics and normal again", and most
browsers will parse that ok, though it is invalid HTML code. You can write
HTML pages with a lot of beginning paragraphs, nested into each others, and
the browser will print it all right, with the paragraphs one after the
other.

So you can drop all your knowledge about how HTML should work when you write
code that renders HTML found on the net.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

a...@redhat.invalid

unread,

Aug 5, 2002, 6:21:58 AM8/5/02

to

Bernd Paysan <bernd....@gmx.de> wrote:
> a...@redhat.invalid wrote:

>> pcre is
>> 5000 lines of code. Maybe full regular expressions are an
>> over-generalized solution to many classes of parsing problems.

> gray.fs is only ~800 lines of code, and implements a recursive descent
> parser. The BNF parser generator from Brad Rodriguez is a one-screener.
> Since regexps are a under-powered solutions to many classes of parsing
> problems (due to the restriction of being finite state machines),

Right. That's been my experience; the simple fact that it's
impossible to describe a syntax with matched brackets using regular
expression syntax suggests to me that it's not a general purpose
string matching language, so something simpler but which allows
recursion is a better choice if you only want to have just one parsing
language.

> Using a special ad-hoc parser is typically a better solution than using
> something generated (from gray or as regular expression doesn't matter), if
> you care about speed and size.

There's a debate going on in gcc land at the moment; people seem to be
moving away from automatically generated parsers like YACC to hand
coded recursive descent, at least for C++.

Andrew.

Marcel Hendrix

unread,

Aug 5, 2002, 7:22:01 AM8/5/02

to

"Bart Lateur" <bart....@pandora.be> wrote in message

> The best known location for regular expressions, likely without the name
> tag, is in so-called "lexers", the tokenisers for compilers. FORTH is
> one of the few compilers that fundamentally works without one.
>
> For example, the structure for a variable name as a string is typically:
> start with a letter, followed by zero or more of (letters, digits, or
> underscore). That description is a regular expression.

AFAIR, in Algol you can have variable names with spaces in them
(I don't remember if they were significant or not).

-marcel

Guido Draheim

unread,

Aug 5, 2002, 7:32:20 AM8/5/02

to

Es schrieb Bart Lateur:

>
> Rick Hohensee wrote:
>
> >"regular expression" is probably a common computing term
> >because Ken Thompson felt is was the right way to approach
> >search features in a text editor.
>

> The best known location for regular expressions, likely without the name
> tag, is in so-called "lexers", the tokenisers for compilers. FORTH is
> one of the few compilers that fundamentally works without one.
>
> For example, the structure for a variable name as a string is typically:
> start with a letter, followed by zero or more of (letters, digits, or
> underscore). That description is a regular expression.
>

As Bernd says, people go away from generated lexers - the current
gcc cvs does not contain a lex-script for the language frontends
anymore. Hand-written code is usually better (gcc/c-lex.c:c_lex).
The usage of regex for lexers is useful during development though,
and a regex machine like pcre does allow to _study a _compile'd
regex to try to optimize the precompiled form for later matching.

Guido Draheim

unread,

Aug 5, 2002, 8:11:36 AM8/5/02

to

Es schrieb Bart Lateur:

>
> a...@redhat.invalid wrote:
>
> >Right. That's been my experience; the simple fact that it's
> >impossible to describe a syntax with matched brackets using regular
> >expression syntax suggests to me that it's not a general purpose
> >string matching language, so something simpler but which allows
> >recursion is a better choice if you only want to have just one parsing
> >language.
>

> Eh... what? Regular expressions can be used to describe tags, but *not*
> the whole language. Regular expressions can *never* be used to describe
> recursively defined, tree-strucured, matched bracket style of input.
> Regular expressions provide the lexer, thus to extract the tokens, to
> separate the tags from the content. But it can't be used to process the
> whole, entire input all at once. You shouldn't blame the regular
> expressions. You're just using the wrong tool. But that's no reason to
> say they're useless.
>

Actually, the perl-regex machine is very very powerful and can actually
be (ab)used to describe the full languages syntax - it simply does not
quite like those nested syntaxes (e.g. control loops). A little /ge
recursion on these helps it however. - BUT, a regex-description is
much less readable than a language description with named parsing
states, as compared to the regex |-alterntives and (?= lookaheads and
(?! negative lookaheads and (?<! negative lookbehinds and others.

a...@redhat.invalid

unread,

Aug 5, 2002, 1:15:35 PM8/5/02

to

Bart Lateur <bart....@pandora.be> wrote:
> a...@redhat.invalid wrote:

>>Right. That's been my experience; the simple fact that it's
>>impossible to describe a syntax with matched brackets using regular
>>expression syntax suggests to me that it's not a general purpose
>>string matching language, so something simpler but which allows
>>recursion is a better choice if you only want to have just one parsing
>>language.

> Eh... what? Regular expressions can be used to describe tags, but *not*
> the whole language.

Right.

> Regular expressions can *never* be used to describe recursively
> defined, tree-strucured, matched bracket style of input.

What I said.

> Regular expressions provide the lexer, thus to extract the tokens,
> to separate the tags from the content. But it can't be used to
> process the whole, entire input all at once. You shouldn't blame the
> regular expressions. You're just using the wrong tool.

So are you agreeing or disagreeing? I can no longer tell.

> But that's no reason to say they're useless.

Which is why, I suspect, no-one did.

Andrew.

Richard Owlett

unread,

Aug 5, 2002, 2:23:12 PM8/5/02

to

[ Caution -- reply is in reply to previous post -- but as it's no
longer in local memory Netscape says ???!???!???!??? :]\ ]

What can "reg exp" do for me?

I'm an avocational programmer.
I think Forth is what Basic should have been ( apologies to Dartmouth
;/)

I really like Basic's string handling functions.
I object to Gate$ et. al. claiming they $know$ what'$ be$t for me.
I see Forth's extensibility as being solution to many problems.

I can comprehend INDIVIDUAL posts in this thread.
I get lost in the branches of trees, let alone trees of forest ;{

a...@redhat.invalid

unread,

Aug 5, 2002, 2:56:32 PM8/5/02

to

Bart Lateur <bart....@pandora.be> wrote:
> (newsfep1-gui.server.ntli.net)

> a...@redhat.invalid wrote:

>>So are you agreeing or disagreeing? I can no longer tell.

> I'm saying that you appear to be starting out with wrong expectations
> about regular expressions. They can't describe grammars, but that
> doesn't mean that they're useless.

So why do you think we disagree? As far as I can tell we agree
completely.

Andrew.

Julian V. Noble

unread,

Aug 5, 2002, 3:14:50 PM8/5/02

to

Bart Lateur wrote:

>
> Rick Hohensee wrote:
>
> >"regular expression" is probably a common computing term
> >because Ken Thompson felt is was the right way to approach
> >search features in a text editor.
>

> The best known location for regular expressions, likely without the name
> tag, is in so-called "lexers", the tokenisers for compilers. FORTH is
> one of the few compilers that fundamentally works without one.
>
> For example, the structure for a variable name as a string is typically:
> start with a letter, followed by zero or more of (letters, digits, or
> underscore). That description is a regular expression.
>

> --
> Bart.

Certainly I need to recognize regular expressions for my FORmula
TRANslator (now incarnated as ftran201.f) but it seemed a lot simpler
to define each independently as a FSM than to create a generic tool
(or use an extant one like gray). It also produces shorter code.

--
Julian V. Noble
Professor of Physics
j...@virginia.edu

"Science knows only one commandment: contribute to science."
-- Bertolt Brecht, "Galileo".

Samuel A. Falvo II

unread,

Aug 5, 2002, 7:56:54 PM8/5/02

to

Bernd Paysan <bernd....@gmx.de> wrote in message news:<hojhia...@cohen.paysan.nom>...

> E.g. the HTML example from Jeff Fox. Yes, you probably can write down a
> grammar for HTML in gray, but it's overkill. HTML has two mode switches: <

Shucks, I thought the hypothetical code I posted earlier in this
thread was pretty elegant, all things considered. :) The bulk of it
probably would fit in 32 lines of source, not counting the
unmarshalling of data into the document tree.

Wil Baden

unread,

Aug 10, 2002, 7:43:09 AM8/10/02

to

In article <aj1c6e$vg0$1...@pcls4.std.com>, Gary Chanson
<gcha...@no.spam.TheWorld.com> wrote:

> "Wil Baden" <neil...@earthlink.net> wrote in message
> news:090820021113071025%neil...@earthlink.net...
> >
> > Thus personally I don't think I need grep. I had it in my Forth ten
> > years ago and found no use for it. I'll put it back and see what I've
> > been missing.
>
> You probably don't need it in your Forth system (except as a loadable
> library), but it's *REAL* *NICE* in an editor (possibly also in a database
> system).

Thank you, Gary.

I agree with you.

--
Wil

Anton Ertl

unread,

Aug 10, 2002, 1:36:09 PM8/10/02

to

Bernd Paysan <bernd....@gmx.de> writes:
>Using a special ad-hoc parser is typically a better solution than using
>something generated (from gray or as regular expression doesn't matter), if
>you care about speed and size.

Example: the programming language shootout part on Regular Expression
Matching <http://www.bagley.org/~doug/shootout/bench/regexmatch/>.

This is a "same way" test, so one is supposed to use a regexp matcher,
not a hand-coded program. I was nopt aware of the Forth regexp
matchers, so I decided to see how it would come out when using Gray,
which supports *regular* right part grammars after all (a grammar
where the right part of each rule can use the regexp operators
?,*,+,|, and parentheses).

However, Gray is a heavyweight tool that needs quite a bit of setup
work to get going, and as a result the program is several times as
large as a plain hand-coded solution would be. Also, somewhat to my
surprise, the solution was quite slow (I have to investigate that at
some time; at that time I just went for the next benchmarl:-).

In any case, the problem posed is simple enough that hand-coding a
solution is pretty straightforward, and the power of regexps to deal
with e.g., non-determinism is not needed.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
EuroForth 2002: http://www.complang.tuwien.ac.at/anton/euroforth2002/

John Passaniti

unread,

Aug 11, 2002, 7:42:54 PM8/11/02

to

"Bernd Paysan" <bernd....@gmx.de> wrote in message
news:hojhia...@cohen.paysan.nom...

> Using a special ad-hoc parser is typically a better
> solution than using something generated (from gray
> or as regular expression doesn't matter), if you
> care about speed and size.

Actually, that statement applies to any high-level tool, not just regular
expression engines. If one builds all the code themselves, ignores
generality, and focuses entirely on code that is highly specific to the
application, you will virtually always get something faster and smaller than
if you used a more generalized tool.

The problem is assuming that if something is faster and/or smaller that is
necessarily better. It isn't. There are a class of programmers who spend
their time optimizing things that don't need to be optimized. The end
result can be very costly.

The classic example of this sort of thing is the programmer who spends their
time optimizing code that is run only a minority of the time, verses
optimizing code that is run many times per second. Where I last worked, one
programmer was obsessed with a section of code that had a long runtime and
consumed a lot of ROM. He wanted it faster and smaller, and dedicated a
couple days to this end. After the project was finished and we had a
review, we learned that his obsession ended up delaying the product shipment
by a week. By crunching more numbers, we figured out that each day the
product wasn't shipping, it cost us around $5,000. So the cost of making
his code faster and smaller ended up costing the company $25,000 (or put
another way, about 1/3rd of his yearly salary). Pretty bad, considering the
sales lifetime of the product.

And if that wasn't enough, it turns out the size of the code in question was
never an issue (we had plenty of ROM left), and the speed was irrelevant, as
it only ran a tiny fraction of the time. He wasted his time on faster and
smaller, and it cost the company money.

There are clearly counter-examples to this-- I know because I've worked on
projects where size and speed were very critical. But the larger point here
is that faster and smaller isn't necessarily better. Instead of having
blinders on, one has to step backwards and look at the big picture, to see
where effort should be placed and where it should be ignored.

Many in this newsgroup seem suspicious of generalized tools. Someone says
that regular expressions would be useful, and reflexively people knee-jerk
back "but regular expressions aren't good for everything." That was never
the claim. Somebody else says that they might like more sophisticated
strings for their work, and the response is that Forth's string handling is
just fine-- why should anyone need anything more? There is a certain
idiotic arrogance in an attitude that one knows best for others-- without
even getting into the specifics of their needs.

Arrogance also comes in the form of this following theme, that is often
repeated in this newsgroup in various forms:

> "If you want it done right, you have to do it yourself"

More accurately, if one wants something done their own way, they need to do
it themselves.

An important part of sharing code with others is the acceptance of a certain
humility-- that one doesn't always have the best solutions to a problem.
When I first started using Perl and Python, I came to them with a certain
amount of arrogance, and purposely went out of my way to do everything
myself instead of using the available modules. Later, I decided to admit I
wasn't the best programmer on the planet and looked at some of the modules
that were available. Sometimes, doing so stroked my ego-- I did have better
solutions. But other times, I read code from others that was better than
what I came up with.

This observation is independent of language: There are always people who
are probably better at what you do than you are. Accept that and move on.

I wonder where this Not Invented Here arrogance comes from, and why it seems
so prevalent in the Forth community (or at least this newsgroup).

Bernd Paysan

unread,

Aug 12, 2002, 5:30:52 AM8/12/02

to

John Passaniti wrote:
> Actually, that statement applies to any high-level tool, not just regular
> expression engines. If one builds all the code themselves, ignores
> generality, and focuses entirely on code that is highly specific to the
> application, you will virtually always get something faster and smaller
> than if you used a more generalized tool.

The question is whether Forth is a tool or a programming language. E.g.
there's a regexp program in the shootout, which is basically a one-liner
for sed. Would I use Forth to solve this problem (extract phone numbers out
of a file)? Probably not (though there are regexp libraries), I would use
sed or awk. The regexp compiler for these specialized tools probably is
faster, since someone made the effort to convert the FSM to a deterministic
FSM. On the other hand, perhaps it would be possible to write a faster
regexp compiler in Forth than in any of those languages, since a native
code Forth could execute the deterministic FSM as native code.

Usually, for the developments I do tuning costs at development time are
necessary, since there are manufacturing costs later. Tuning matters when
you solve a larger problem (runs often, has manufacturing and/or user
costs), tuning doesn't matter if you solve a run-once problem.

So if your coworker saved a bunch of ROM space for $25k, and you could use a
smaller ROM (saving let's say 10¢ per unit), you just have to sell 250k
units (for me, that doesn't sound a lot). If you can't use a smaller ROM,
the money may be lost. Or not, because later, you figure out that you want
more software in the same ROM, and you can use the space to add features
that sell more units.

--
Bernd Paysan

"If you want it done right, you have to do it yourself"

http://www.jwdt.com/~paysan/

a...@redhat.invalid

unread,

Aug 12, 2002, 5:50:34 AM8/12/02

to

John Passaniti <nn...@japanisshinto.com> wrote:

> Many in this newsgroup seem suspicious of generalized tools.

Suspicion is good; out of hand rejection is bad.

> Someone says that regular expressions would be useful, and
> reflexively people knee-jerk back "but regular expressions aren't
> good for everything."

I don't recall that. What I do recall is someone saying that maybe
something simpler but more flexible would be a better choice if you
only want to have just one parsing language. That's very different.

> That was never the claim.

That's right; it wasn't.

Andrew.

Guido Draheim

unread,

Aug 12, 2002, 6:49:58 AM8/12/02

to

Es schrieb a...@redhat.invalid:

>
> John Passaniti <nn...@japanisshinto.com> wrote:
>
> > Someone says that regular expressions would be useful, and
> > reflexively people knee-jerk back "but regular expressions aren't
> > good for everything."
>
> I don't recall that. What I do recall is someone saying that maybe
> something simpler but more flexible would be a better choice if you
> only want to have just one parsing language. That's very different.

Add the characterization "powerful". A powerful regex syntax does in
fact have some complexity around. Of course, there might be regex forms
being simpler than perl/posix-style regex, but it can not be much, I
might reflexivly say.

Yes, using a less powerful tool for problems less in need, that is
always an option, but then again I know that I can not express some
of the matches with glob-style matching or strstr operations. The
perlish lookahead/lookbehind parts have even allowed these regex
forms to cover patterns that would have otherwise needed a stateful
parser.

By that, it has grown to a level that a programmer just has to learn
the pieces ones - and get the ability to solve problems of a wide
range. Of course, the programmer could learn a dozen other forms
more specialized, but that would be a specialized programmer, if I
may say so. The average one who just needs some pattern-based
parsing would be just overwhelmed. In a way - one tool for a great
range is simply.... efficient. Efficient on the side of the programmer
work and maintainance work.

Guido Draheim

unread,

Aug 12, 2002, 6:56:49 AM8/12/02

to

Es schrieb Bart Lateur:
>
> That must be one proper way to process regular expressions in FORTH: do
> what FTRAN does for mathematical expressions, and compile the regular
> expression into FORH code, which then later you can simply execute.
>

All regex libs that I know, do in fact want you to "compile" a regex
first and "exec" with it just then. It's always two-pass. One is
allowed to match multiple inputs with one compiled form. See posix
`man regcomp` and descriptin of pcre_compile/pre_exec. The problem of
perl has been that it could not detect $var-interpolations needed,
so it compiled the regex on every occasion for perhaps something
may have changed in between. Good if they solved this.

Astrobe

unread,

Aug 12, 2002, 7:48:35 AM8/12/02

to

"John Passaniti" <nn...@JapanIsShinto.com> wrote in message news:<ul5in6g...@corp.supernews.com>...
> "Astrobe" <ast...@netcourrier.com> wrote in message
> news:b2ebfb5f.02080...@posting.google.com...
> > My thruth regarding Forth is that it is RAD before
> > RAD, 4GL before 4GL
>
> Only if you have private, weak, and/or vague definitions of RAD and 4GL.
> Those acronyms have expected meanings, and Forth is not (by itself at least)
> either a RAD or 4GL language. As most people use those terms, a RAD/4GL
> language is typically one that operates at a very high level.

Forth is not low level.

> The goal with
> such languages is to allow the programmer to focus on the problem, not the
> solution.

Both are important.

> To that end, those languages typically support implicit memory
> management, sophisticated strings, high-level data structures, implicit
> algorithms and control structures, and other high-level concepts (such as
> visual GUI development).

GUI developpement makes sense if the application itself has a
graphical interface. I think 4GL stills valid with text interface.

> Forth does none of that (by itself at least).

It is a matter of package.

> If you believe Forth is a RAD or 4GL language, then you better define what
> *exactly* you mean by RAD and 4GL language. It apparently isn't a
> definition shared by others.

The point is that I wrote, Forth is 4GL *BEFORE* 4GL. I forgot to
mention that
Forth is also objects before objects. What I mean is that many modern
concepts
were present in the first times of Forth. They had not their modern
form of
course, but the idea is that Forth offers the basis that fit modern
development models.

>
> > and most of all, it enforces you to have a good
> > programming methodology.
>
> Forth does absolutely no such thing.
>
> Forth doesn't enforce any programming methodology, style, paradigm, mindset,
> or even vague approach. You can write terrible code in Forth if you want.

Truism.
The fact is that writing terrible code in Forth leads faster to
nightmare ( and finaly to rewrite the whole thing) than other
languages.
You cannot do anything valuable in Forth if you don't know what 'to
factor'
means.

>
> > It other words it is not a language for "quick
> > and dirty".
>
> That contradicts your claim that Forth is a RAD / 4GL language. And it also
> doesn't make sense.

Never said that Forth *is* RAD or 4GL.

>
> Forth is a terrific language for "quick and dirty" applications. One of the
> applications I've used Forth in is as a debugging monitor for hardware.
> Hardware engineers can learn enough Forth to tickle an address latch or
> program a DMA controller. Then without bothering the software engineers,
> they can write their own quick and dirty code to test things out. The code
> they write is typically terrible, not well-factored, and makes all sorts of
> assumptions. But it's perfect for getting quick results.

I agree with that. The "quick and dirty" I was thinking of is that
every have seen mostly in C - that awful, unmaintenable and
inefficient pieces of code that are there because it works so who
matters.

Amicalement,
Astrobe

Bernd Paysan

unread,

Aug 12, 2002, 7:43:12 AM8/12/02

to

Guido Draheim wrote:
> All regex libs that I know, do in fact want you to "compile" a regex
> first and "exec" with it just then. It's always two-pass. One is
> allowed to match multiple inputs with one compiled form. See posix
> `man regcomp` and descriptin of pcre_compile/pre_exec. The problem of
> perl has been that it could not detect $var-interpolations needed,
> so it compiled the regex on every occasion for perhaps something
> may have changed in between. Good if they solved this.

I haven't seen that. They want to use the same VM, but that doesn't mean
that they stop compiling the regexp each time it's needed.

Perhaps changing the evaluation order (first regexp, then var expansion)
would solve that problem.

BTW: FoSM always compiles regexps, while the regexp package from Ruvin Pinka
(which look like Perl regexps) compiles the regexp on each occurance.

Syntax suggestion for convenient string matching:

~" foo(.*)bar"

compiles a regexp match. Variables could be named \1 to \9, and stored in an
array (just like the result substrings).

=" foobar"

could compile an exact string match.

Albert van der Horst

unread,

Aug 13, 2002, 9:07:00 AM8/13/02

to

In article <3D5792D6...@gmx.de>, Guido Draheim <gui...@gmx.de> wrote:
>Es schrieb a...@redhat.invalid:
>>
>> John Passaniti <nn...@japanisshinto.com> wrote:
>>
>> > Someone says that regular expressions would be useful, and
>> > reflexively people knee-jerk back "but regular expressions aren't
>> > good for everything."
>>
>> I don't recall that. What I do recall is someone saying that maybe
>> something simpler but more flexible would be a better choice if you
>> only want to have just one parsing language. That's very different.
>
>Add the characterization "powerful". A powerful regex syntax does in
>fact have some complexity around. Of course, there might be regex forms
>being simpler than perl/posix-style regex, but it can not be much, I
>might reflexivly say.

Personal I would say that the regex of sed are about right.
The POSIX stuff is overkill. Adding the regex of sed to Forth
is probably useful for a class of problems. The POSIX ones I doubt
any one would want to do in Forth, except as a programming
exercise.
Powerful? I have done some really mean things with sed-scripts.
More power than that I have never needed.

On a different subject. I started this thread with the claim
that reg exp are impossible to do in Forth, for some theoretical
reasons, except with very ugly programming.
mhx and the russian reg exp package proved me wrong.
Nobody pointed that out, so I have to do it myself.

Greetings Albert.
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
To suffer is the prerogative of the strong. The weak -- perish.
alb...@spenarnc.xs4all.nl http://home.hccnet.nl/a.w.m.van.der.horst

Jerry Avins

unread,

Aug 13, 2002, 10:45:53 PM8/13/02

to

Albert van der Horst wrote:
>

...

>
> On a different subject. I started this thread with the claim
> that reg exp are impossible to do in Forth, for some theoretical
> reasons, except with very ugly programming.
> mhx and the russian reg exp package proved me wrong.
> Nobody pointed that out, so I have to do it myself.
>

Now, Albert. We're all grown up here (at least it says so in the tourist
guide). After putting you to shame so handily, there was no need at all
to pillory you. Seriously, thanks for starting an interesting thread.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Guido Draheim

unread,

Aug 14, 2002, 6:33:47 AM8/14/02

to

Jerry Avins wrote:
>
> Albert van der Horst wrote:
> >
> ...
> >
> > On a different subject. I started this thread with the claim
> > that reg exp are impossible to do in Forth, for some theoretical
> > reasons, except with very ugly programming.
> > mhx and the russian reg exp package proved me wrong.
> > Nobody pointed that out, so I have to do it myself.
> >
> Now, Albert. We're all grown up here (at least it says so in the tourist
> guide). After putting you to shame so handily, there was no need at all
> to pillory you. Seriously, thanks for starting an interesting thread.

Indeed!

(I just have not come around to actually use the regex packages but I have
same use cases on my desk that can greatly benefit from it. Thanks!)