Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tiny regular expressions code

571 views
Skip to first unread message

Hans Bezemer

unread,
Oct 13, 2014, 12:13:06 PM10/13/14
to
Short intermission from all the clashes here, sorry for the inconvenience,
but I used Forth for some CODING. You may fry me for the code quality
later, but this is a tiny regular expression parser, based on an article by
Kernighan and Pike
(http://www.drdobbs.com/architecture-and-design/regular-expressions/184410904).

It features ^, $, * and . but I added ? as well (zero or one). This is some
stuff that was left lying around as an intermediate stage for a similar 4tH
library (which is cleaner and more capable), but I thought, may be it is of
use/interest to someone, why throw it away.

Knock yourselves out ;-)

Hans Bezemer

---8<---
\ Regular Expressions by Brian W. Kernighan and Rob Pike
\ Believed to be in the public domain

defer (matchhere)

: (match*) ( a n ra rn c --f)
begin
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ dup [char] . = swap r@ = or else dup xor then r> swap
while \ character equals text?
>r 2>r 1 /string 2r> r> \ if so, match again
repeat
drop 2drop 2drop false \ clean up, return false
;

: (match?) ( a n ra rn c --f)
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ dup [char] . = swap r> = or else r> dup xor then
if 2>r 1 /string 2r> (matchhere) else 2drop 2drop false then
; \ character equals text?

:noname ( a n ra rn -- f)
dup if \ regular expression a null string?
over char+ c@ dup [char] * = \ if not, does it equal a '*'
if \ if so, call (match*)
drop over c@ >r 2 /string r> (match*) exit
else \ otherwise, does it equal a '?'
[char] ? =
if \ if so, call (match?)
over c@ >r 2 /string r> (match?) exit
else \ otherwise does it equal a '$'
over c@ [char] $ = over 1 = and
if \ and is it the last character?
2drop nip 0= exit \ is so, check length of text
else \ finally, check if it char matches
2over 0<> >r c@ >r over c@ dup
[char] . = swap r> = or r> and
if 1 /string 2>r 1 /string 2r> recurse exit then false
then \ if so recurse, otherwise quit
then
then
else
true \ zero length regular expression
then >r 2drop 2drop r> \ clean up and exit
; is (matchhere) \ assign to DEFER (we got 'em)

: match ( a n ra rn --f)
dup if over c@ [char] ^ = if 1 /string (matchhere) exit then then
begin \ if caret, chop it
2over 2over (matchhere) if 2drop 2drop true exit then
>r over r> swap \ match characters
while \ until no more text
2>r 1 /string 2r> \ chop text
repeat 2drop 2drop false \ clean up
;

s" 0,9" s" ^0,?9$" match . .s cr
s" 0:9" s" ^0,?9$" match . .s cr
s" 09" s" ^0,?9$" match . .s cr
s" 009" s" ^0,?9$" match . .s cr
s" 0,,9" s" ^0,?9$" match . .s cr cr
---8<---

Hans Bezemer

unread,
Oct 13, 2014, 12:16:14 PM10/13/14
to
Hans Bezemer wrote:

Sorry, backporting a bugfix:

---8<---
: (match?) ( a n ra rn c --f)
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ dup [char] . = swap r> = or else r> drop dup xor then
if 2>r 1 /string 2r> (matchhere) else 2drop 2drop false then
; \ character equals text?
---8<---

Hans Bezemer

Pablo Hugo Reda

unread,
Oct 13, 2014, 12:32:24 PM10/13/14
to
very useful

Good one Hans!!

Howerd

unread,
Oct 13, 2014, 2:19:00 PM10/13/14
to
Hi Hans,

I like this code, but I may have found a bug (although this could be a lack of understanding on my part ) :

s" x09" s" 0,?9" match . cr -1
s" x009" s" 0,?9" match . cr -1
s" x0009" s" 0,?9" match . cr -1

All of the above show a match, but there is no ',' in any of the input strings.
Does s" 0,?9" mean find a 0 then a , then any or no character, then a 9 in the input string?
I applied your bug fix BTW...

Best regards,
Howerd

Hans Bezemer

unread,
Oct 13, 2014, 2:34:11 PM10/13/14
to
Howerd wrote:

> Hi Hans,
>
> I like this code, but I may have found a bug (although this could be a
> lack of understanding on my part ) :
>
> s" x09" s" 0,?9" match . cr -1
> s" x009" s" 0,?9" match . cr -1
> s" x0009" s" 0,?9" match . cr -1
>
> All of the above show a match, but there is no ',' in any of the input
> strings. Does s" 0,?9" mean find a 0 then a , then any or no character,
> then a 9 in the input string? I applied your bug fix BTW...

Yes, without a leading ^ it means: go for the first "0". Then the ? applies
to the PREVIOUS character (the ,). Finally: go for a 9. So it reads
- Find an 0
- Followed by an optional ,
- Followed by a 9.

Hans Bezemer

Howerd

unread,
Oct 13, 2014, 2:53:05 PM10/13/14
to
Hi Hans,

Thanks for explaining that :-)

Best regards,
Howerd

Hans Bezemer

unread,
Oct 13, 2014, 4:13:03 PM10/13/14
to
Hans Bezemer wrote:

This is an even smaller one, supporting just * and .:

---8<---
: (match-or-dot)
over c@ [char] . = >r 2swap dup 0<> r> and
>r over c@ >r 2swap over c@ r> = r> or
;

: match-reg
dup 0> if
over char+ c@ [char] * <>
if
2over 1 /string 2over 1 /string recurse >r (match-or-dot)
r> and >r 2drop 2drop r> exit
then

begin
(match-or-dot)
while
2over 2over 2 /string recurse if 2drop 2drop true exit then
2>r 1 /string 2r>
repeat 2 /string recurse exit

else
2drop nip 0=
then
;

s" aa" s" a" match-reg . cr
s" aa" s" aa" match-reg . cr
s" aaa" s" aa" match-reg . cr
s" aa" s" a*" match-reg . cr
s" aa" s" .*" match-reg . cr
s" ab" s" .*" match-reg . cr
s" aab" s" c*a*b" match-reg . cr depth .
---8<---

Hans Bezemer

Marcel Hendrix

unread,
Oct 13, 2014, 4:59:05 PM10/13/14
to
Hans Bezemer <the.bee...@gmail.com> writes Re: Tiny regular expressions code

[..]
> this is a tiny regular expression parser, based on an article by
> Kernighan and Pike
> (http://www.drdobbs.com/architecture-and-design/regular-expressions/184410904).

> It features ^, $, * and . but I added ? as well (zero or one). This is some
> stuff that was left lying around as an intermediate stage for a similar 4tH
> library (which is cleaner and more capable), but I thought, may be it is of
> use/interest to someone, why throw it away.

> Knock yourselves out ;-)

I had to use the below to decipher it :-)

-marcel

-- ------------------------------------------------------------------------
anew -regexpr

DEFER (matchhere) ( $1 $2 -- bool )

: (match*) LOCAL c DLOCALS| $2 $1 | ( $1 $2 char -- bool )
BEGIN $1 $2 (matchhere) IF TRUE EXIT ENDIF
$1 IF c@ dup '.' =
swap c = OR
ELSE drop FALSE
ENDIF
WHILE $1 1 /string TO $1
REPEAT FALSE ;

: (match?) LOCAL c DLOCALS| $2 $1 | ( $1 $2 char -- bool )
$1 $2 (matchhere) IF TRUE EXIT ENDIF
$1 IF c@ dup '.' =
swap c = OR
ELSE drop FALSE
ENDIF
IF $1 1 /string $2 (matchhere)
ELSE FALSE
ENDIF ;

:NONAME LOCALS| n2 a2 n1 a1 | ( $1 $2 -- bool )
n2 0= IF TRUE EXIT ENDIF
a2 char+ c@ '*' = \ if not, does it equal a '*'
IF a2 c@ >R \ if so, call (match*)
a2 n2 2 /string TO n2 TO a2
a1 n1 a2 n2 R> (match*) EXIT
ENDIF \ otherwise, does it equal a '?'
a2 char+ c@ '?' =
IF a2 c@ >R \ if so, call (match?)
a2 n2 2 /string TO n2 TO a2
a1 n1 a2 n2 R> (match?) EXIT
ENDIF \ otherwise, does it equal a '$'
a2 c@ '$' = n2 1 = AND
IF n1 0= EXIT \ and is it the last character?
ENDIF
a2 c@ '.' = \ finally, check if char matches
a2 c@ a1 c@ = OR
n1 0<> AND
IF a1 n1 1 /string
a2 n2 1 /string
RECURSE EXIT
ENDIF
FALSE ; IS (matchhere)

: match DLOCALS| $2 $1 | ( $1 $2 -- bool )
$2 nip IF $2 drop c@ '^'
= IF $1 $2 1 /string (matchhere) EXIT ENDIF
ENDIF
BEGIN $1 $2 (matchhere) \ if caret, chop it
IF TRUE exit
ENDIF
$1 NIP \ match characters until no more text
WHILE $1 1 /string TO $1 \ chop text
REPEAT FALSE ;

cr s" 0,9" s" ^0,?9$" match . .( -1)
cr s" 0:9" s" ^0,?9$" match . .( 0)
cr s" 09" s" ^0,?9$" match . .( -1)
cr s" 009" s" ^0,?9$" match . .( 0)
cr s" 0,,9" s" ^0,?9$" match . .( 0)
cr s" 0,,9" s" ^0,*9$" match . .( -1)
cr s" 0,,9" s" ^0*9$" match . .( 0)
cr

Hans Bezemer

unread,
Oct 13, 2014, 5:09:31 PM10/13/14
to
Marcel Hendrix wrote:

> I had to use the below to decipher it :-)
Oh Marcel, you know of my hatred of locals ;-) ;-)
But agreed: with 2 strings on the stack it gets pretty crowded pretty soon -
especially if you have to evaluate some complex statement.

Still, the 4tH version is somewhat cleaner (but not ANS):

:noname ( a n ra rn -- f)
dup if \ regular expression a null string?
over char+ c@ (special?) if exit then
over c@ ($) = over 1 = and \ otherwise does it equal a '$'
if \ and is it the last character?
2drop nip 0= exit \ is so, check length of text
else \ finally, check if it char matches
2over 0<> >r c@ >r over c@ r> swap (eq?) r> and
if chop 2>r chop 2r> recurse exit then false
then \ if so recurse, otherwise quit
else
true \ zero length regular expression
then >r 2drop 2drop r> \ clean up and exit
; is (matchhere) \ assign to DEFER (we got 'em)

Hans Bezemer

Albert van der Horst

unread,
Oct 14, 2014, 5:22:38 AM10/14/14
to
In article <543bfa10$0$2928$e4fe...@news2.news.xs4all.nl>,
Most useful are expressions if they can be used to specify
transformations, and those are most useful if they can be scripted.

If you want to go that extra mile with r.e. , you may want to look
at forthtools.html on the site below.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Hans Bezemer

unread,
Oct 14, 2014, 11:41:58 AM10/14/14
to
Albert van der Horst wrote:

> Most useful are expressions if they can be used to specify
> transformations, and those are most useful if they can be scripted.
> If you want to go that extra mile with r.e. , you may want to look
> at forthtools.html on the site below.
I had a look at it and I'd love to port it to 4tH - although it's a major
project. More so, because it's not ANS (several words I don't recognize - I
know lina) and has no stack comments (despite well documented code). I
didn't look close enough to see if it does fancy dictionary stuff. That's a
bit of a problem since 4tH has no dictionary.

Hans Bezemer

Albert van der Horst

unread,
Oct 14, 2014, 12:27:55 PM10/14/14
to
In article <543d4422$0$2833$e4fe...@news2.news.xs4all.nl>,
I can't believe that people miss stack comment if things are specified like
(knowing that stack items are in capitals.)
\ For X return the SQUARE of x.
: SQ DUP * ;

lina is of course somewhat obscure and it would be nice if this was made
available in 4TH.
As far as I know 4TH that shouldn't be too hard. I'm willing to join
forces on this. regular expressions in itself is a major project,
but porting to 4TH shouldn't be with your knowledge of 4th and my
knowledge of the package.

Most non ANS words are in the block file. Only denotations are ciforth
specific, but more and more Forths have recognizers.
I think having a string recognizer for " is a mandatory prelude for r.e.
I would have gone mad if I had to consider each space whether it
belongs to a string or not and it would make the turnkey applications
unattractive for non-Forthers.

>
>Hans Bezemer

Albert van der Horst

unread,
Oct 15, 2014, 5:48:03 AM10/15/14
to
In article <543d4f07$0$6914$e4fe...@dreader36.news.xs4all.nl>,
This triggered me to update the (2003!) regular expression package
that is based on lina version 4.

Supposedly after "WANT -legacy-" it should compile on lina version 5.
It didn't, caused by a defect in the legacy system.
The word IN is missing in the legacy set. I will add
: IN PP ;
to version 5.2 legacy and then it works.

Then I have ported it to version 5 32-bits.
After
REQUIRE -> WANT
$S -> $/
^ -> .S
(PARSE) -> PARSE
it works without the elective, with just the WANT system loaded.
Then it also works with 5.1 64 bits.

This is the version anyone porting it should use because:
1. You can use it with the latest lina
2. much less of lina is loaded (easier!)
3. better ISO-compatibility
4. I will throw verbose testing (that was there, but not published)
into the package, which should help immensely.

Contact me, because I can't update my website at the moment.

Hans Bezemer

unread,
Oct 15, 2014, 11:36:34 AM10/15/14
to
Albert van der Horst wrote:

> This is the version anyone porting it should use because:
> 1. You can use it with the latest lina
> 2. much less of lina is loaded (easier!)
> 3. better ISO-compatibility
> 4. I will throw verbose testing (that was there, but not published)
> into the package, which should help immensely.
>
> Contact me, because I can't update my website at the moment.
I did, using the email address used here and at your site. Haven't got an
answer yet..

I know I have to work at my patience ;-)

Hans Bezemer

Albert van der Horst

unread,
Oct 16, 2014, 5:20:47 AM10/16/14
to
In article <543e9480$0$2938$e4fe...@news2.news.xs4all.nl>,
I check my e-mail once a day, not always.

Anyway, I might as well explain here how to work with the archive.
In case others may want to try, or want input whether they want to
try.

1. null-test, on a linux system, assuming rcs (cvs works ceteres paribus).
extract the archive
tar xfz regexp.tgz ;cd regexp
overwrite all files in regexp/ with version LAST_FOR_LINA4
co -rLAST_FOR_LINA4 RCS/*
test using lina in the current directory (. must be in the path,
or adapt the makefile)
make -B test
make -B thinkingtest
2. Install lina 32 bits version 5 and extract the latest source from the
archive.
The same tests must succeed.
3. Change the source to run verbose instead of normal.
The file test is much larger now and must be equal to test-verbose.
The resulting file test is where you compare your code with.
4. With the other Forth (4th) try to compile regexp.frt
You will have to add a prelude with facilities you copy from
forth.lab. You may have to reimplement some missing words
from the lina documentation: $@ $! $+! $/ $^
5. Now the testing can begin. The results must be the same as
the test-verbose from step 3.

This completes the first phase. You can be pretty confident,
as the tests are really comprehensive.

Now the second phase is to have a turnkey that can be redistributed
and used as a filter
"
#!refilter -s
1 ARG[] GET-FILE
".." ".." GLOBAL
2 ARG[] PUT-FILE
"
That is very implementation dependant, even not always possible,
but I think 4TH can do it.

>
>Hans Bezemer

jo...@planet.nl

unread,
Oct 17, 2014, 8:03:07 AM10/17/14
to
13 oktober 2014 18:13:06 UTC+2 Hans wrote:
Hi Hans,

> Short intermission from all the clashes here, sorry for the inconvenience,
>
> but I used Forth for some CODING. You may fry me for the code quality
Thank you very much, nice code.

Testing your code I wonder:
Is the following match not possible?
cr s" 123abcd_efg" s" abc*ef" match . .( 1)

Jos

Hans Bezemer

unread,
Oct 17, 2014, 11:46:57 AM10/17/14
to
jo...@planet.nl wrote:
> Testing your code I wonder:
> Is the following match not possible?
> cr s" 123abcd_efg" s" abc*ef" match . .( 1)

Nope. It says: look for "ab", followed by zero or more "c's" and finally
followed by "ef". It's NOT a wildcard, pal!

You probably wanna say:

s" abc.*ef"

Which means: look for "abc", followed by zero or more random characters and
finally followed by "ef". May be this is more appropriate for you: ;-)

---8<---
: chop 1- swap char+ swap ; ( a n -- a+1 n-1)
: s@ 0> if c@ else drop -1 then ; ( a n -- c)
: 4drop 2drop 2drop ; ( a b c d --)
: 4dup 2over 2over ; ( a b c d -- a b c d a b c d)
: 2s@ 4dup s@ -rot s@ ; ( a1 n1 a2 n2 -- a1 n1 a2 n2 c1 c2)
: w@ 2over s@ ; ( a n -- c)
: (2chop) chop 2swap ; ( a1 n1 a2 n2 -- a2+1 n2-1 a1 n1)
: 2chop (2chop) (2chop) ; ( a1 n1 a2 n2 -- a1+1 n1-1 a2+1 n2-1)
\ returns true on match, false if not
: wild-match ( a1 n1 a2 n2 -- f)
begin ( a1 n1 a2 n2)
dup 0<> >r w@ '*' <> r> and ( a1 n1 a2 n2 f)
while ( a1 n1 a2 n2)
2s@ <> if w@ '?' <> if 4drop false exit then then 2chop
repeat 4dup 2>r 2>r ( a1+y n1-y a2+x n2-x)

begin ( a1 n1 a2 n2)
dup ( a1 n1 a2 n2 f)
while ( a1 n1 a2 n2)
w@ '*' = if ( a1 n1 a2 n2 f)
2>r chop dup 2r> rot 0= ( a1+1 n1-1 a2 n2)
if 2r> 2r> 4drop 4drop true exit then
4dup chop 2r> 2r> 4drop 2>r 2>r ( a1+1 n1-1 a2 n2)
else ( a1 n1 a2 n2)
2s@ = >r w@ '?' = r> or ( a1 n1 a2 n2 f)
if 2chop else 4drop 2r> 2r> 2dup chop 2>r 2over 2>r then
then ( a1 n1 a2 n2)
repeat 2r> 2r> 4drop 2drop ( a1 n1)
( f)
begin 2dup s@ '*' = while chop repeat nip 0=
;
---8<---

s" *abc*ef*" s" 123abcd_efg" wild-match . -1 ok

Hans Bezemer

Hans Bezemer

unread,
Oct 17, 2014, 1:18:01 PM10/17/14
to
Hans Bezemer wrote:

If you think the previous version (which kept the state in stack values) was
some heavy duty code, here is a recursive version which is cleaner:

---8<---
: chop 1- swap char+ swap ; ( a n -- a+1 n-1)
: 4drop 2drop 2drop ; ( a b c d --)

: match-wild
begin
dup \ exit when match string is exhausted
while
over c@ [char] ? = \ match any character
if \ except empty string
>r over r> swap 0= if 4drop false exit then
2>r chop 2r> chop \ okay, next one please
else
over c@ [char] * = \ match zero or more characters
if \ because we eat one character
2over 2over chop recurse if 4drop true exit then
>r over r> swap if 2over chop 2over recurse else false then
>r 4drop r> exit \ from match string, recursion stops
else \ nothing worked with this wildcard
2over if \ not an empty string?
c@ >r over c@ r> = \ if so, compare both characters
if 2>r chop 2r> chop else 4drop false exit then
else \ otherwise, it's no use going on
drop 4drop false exit
then
then
then
repeat \ get next character
>r drop nip r> or 0= \ only a match if both are at the end
;
---8<---

s" abracadabra" s" ?b*r?" match-wild . cr
s" pArka" s" *a" match-wild . cr
s" parka" s" *a" match-wild . cr
s" park" s" *a*" match-wild . cr
s" a" s" a" match-wild . cr
s" aardvark" s" a*" match-wild . cr
0 dup s" *" match-wild . cr

s" argh" s" a" match-wild . cr
s" ba" s" a" match-wild . cr
s" badger" s" a*" match-wild . cr
s" park" s" *a" match-wild . cr
s" perk" s" *a*" match-wild . cr
s" abracadabr" s" ?b*r?" match-wild . cr
s" abracadabzr" s" ?b*r?" match-wild . cr .s

Hans Bezemer

Hans Bezemer

unread,
Oct 18, 2014, 10:00:48 AM10/18/14
to
Hans Bezemer wrote:

This code is exactly the same as the previous posting. I just wondered if my
skills had grown enough in eight years that I could come up with something
a bit less murky. I remember the discussion in c.l.f at the time and nobody
was able to produce something *WITHOUT LOCALS* that tickled me enough to
adopt it.

So I tried to produce something more transparent. It's a bit paranoid about
the length of the wildcard string (and subsequent fetches) but being
paranoid doesn't mean they're not out to get you ;-)

Surprisingly, (on 4tH) it's not compiling to less instructions than the
previous version.

---8<---
: chop 1- swap char+ swap ; ( a n -- a+1 n-1)
: (w@) dup 0> if over c@ else dup then ;
: 4drop 2drop 2drop ; ( n1 n2 n3 n4 --)
\ current char matches or equals '?'
: (match-or-?) ( a1 n1 wa1 wa1 -- a2 n2 wa2 wn2 f)
2>r over c@ 2r> rot >r (w@) dup [char] ? = swap r> = or
dup >r if 2>r chop 2r> chop then r> \ increment pointers on match
;
\ wildcard routine
: match-wild ( a n wa wa -- f)
begin
>r over r> swap \ we still have a string?
while
(w@) [char] * <> \ wildcard character unequal to '*'?
while \ if so, try to match character
(match-or-?) 0= if 4drop false exit then
repeat \ if not, exit and return false
[UNDEFINED] 4TH# [IF] then [THEN] 2dup 2dup 2>r 2>r
\ set up temporary pointers
begin
>r over r> swap \ we still have a string?
while
(w@) [char] * = \ wildcard character equals '*'?
if \ if so, drop temporary values
2r> 2r> 4drop chop dup 0> \ chop wildcard string, exit if done
if 2dup 2>r 2over chop 2>r else 4drop true exit then
else \ if not, save new temporary values
(match-or-?) 0= \ try to match character
if 4drop 2r> 2r@ 2over chop 2>r then
then \ if no match, restore values
repeat 2nip 2r> 2r> 4drop \ drop string and temporary values
\ get rid of superfluous '*'
begin dup 0> while over c@ [char] * = while chop repeat
[UNDEFINED] 4TH# [IF] then [THEN] nip 0> 0=
; \ wildcard string should be empty now
---8<---

s" 123abcd_efg" s" *abc*ef*" match-wild . -1 ok

Hans Bezemer

s" abracadabra" s" ?b*r?" match-wild . cr -1
ok
s" pArka" s" *a" match-wild . cr -1
ok
s" parka" s" *a" match-wild . cr -1
ok
s" park" s" *a*" match-wild . cr -1
ok
s" a" s" a" match-wild . cr -1
ok
s" aardvark" s" a*" match-wild . cr -1
ok
0 dup s" *" match-wild . cr cr -1

ok
ok
s" argh" s" a" match-wild . cr 0
ok
s" ba" s" a" match-wild . cr 0
ok
s" badger" s" a*" match-wild . cr 0
ok
s" park" s" *a" match-wild . cr 0
ok
s" perk" s" *a*" match-wild . cr 0
ok
s" abracadabr" s" ?b*r?" match-wild . cr 0
ok
s" abracadabzr" s" ?b*r?" match-wild . cr depth . 0
0 ok


jo...@planet.nl

unread,
Oct 18, 2014, 10:50:39 AM10/18/14
to
On 18 oktober 2014 16:00:48 UTC+2 The Beez wrote:
> Hans Bezemer wrote:
>> May be this is more appropriate for you:
> This code is exactly the same as the previous posting.
[...]

Thank's Hans,

match-wild is exactly what I was looking for and works fine.
I did not like Dr.dobb's tool at all.

For chop I defined:

: chop 1 /string ; ( a n -- a+1 n-1)

Jos

Hans Bezemer

unread,
Oct 18, 2014, 11:25:51 AM10/18/14
to
If people are interested, here is the 4tH code (containing plenty of
4tH-isms) which is enhanced by:

- escaping characters
- predefined sets (prefixed by %, see code)
- + and ?

So it is capable of doing this:

---8<---
s" 0,9" s" ^0,?9$" match-reg . cr
s" 0:9" s" ^0,?9$" match-reg . cr
s" 09" s" ^0,?9$" match-reg . cr
s" 009" s" ^0,?9$" match-reg . cr
s" 0,,9" s" ^0,?9$" match-reg . cr cr

s" abcdefg" s" abcdefg$" match-reg . cr
s" ababababab" s" ab*a$" match-reg . cr
s" aaaaaaaaba" s" a*ba$" match-reg . cr
s" aaaaaabac" s" ab*a$" match-reg . cr
s" abbd" s" ab*d$" match-reg . cr
s" abbde" s" ab*d$" match-reg . cr cr

s" -1234.56" s" -?%9+\.?%9*$" match-reg . cr
s" -1234" s" -?%9+\.?%9*$" match-reg . cr
s" 1234.56" s" -?%9+\.?%9*$" match-reg . cr
s" 1234" s" -?%9+\.?%9*$" match-reg . cr
s" 1234.ab" s" -?%9+\.?%9*$" match-reg . cr
s" 1234,23" s" ^-?%9+\.?%9*$" match-reg . cr
s" PRExyz23" s" ^PRE.*23$" match-reg . cr
s" -.23" s" ^-?%9+\.?%9*$" match-reg . cr depth .
---8<---

Maybe someone is interested to make an ANS version out of this. I'm good ;-)

Hans Bezemer

P.S. :TOKEN xxx yyy ; equals :NONAME yyy ; constant xxx

---8<---
: break? ?do over i c@ = if 0= leave then loop nip ;
---8<---
: IS-ASCII ( char -- flag ) 128 < ;
: IS-PRINT ( char -- flag ) DUP IS-ASCII SWAP BL 1- - 0> AND ;
: IS-WHITE ( char -- flag ) [CHAR] ! - 0< ;
: IS-DIGIT ( char -- flag ) [CHAR] 0 - MAX-N AND 10 < ;
: IS-LOWER ( char -- flag ) [CHAR] a - MAX-N AND 26 < ;
: IS-UPPER ( char -- flag ) [CHAR] A - MAX-N AND 26 < ;
: IS-ALPHA ( char -- flag ) BL OR IS-LOWER ;
: IS-ALNUM ( char -- flag ) DUP IS-ALPHA SWAP IS-DIGIT OR ;
: IS-XML ( char -- flag ) 0 S| <>&"'| BOUNDS DO OVER I C@ = OR LOOP
NIP ;
: IS-HTML ( char -- flag ) DUP IS-XML SWAP IS-PRINT 0= OR ;
---8<---
-1 constant NULL ( NULL pointer)

defer key=

:token string-key >r 2dup r@ @c count compare 0= r> swap ;
:token num-key over over @c = ;

: row ( x a1 n1 xt -- x a2 f)
is key= >r ( x a)
begin ( x a)
dup @c NULL <> dup ( x a f f)
while ( x a f)
drop key= dup 0= ( x a f -f)
while ( x a f)
drop r@ cells + ( x a)
repeat ( x a)
[UNDEFINED] 4TH# [IF] then [THEN]
r> drop ( x a f)
;
---8<---
\ Regular Expressions by Brian W. Kernighan and Rob Pike
\ Believed to be in the public domain

\ 4th version by J.L. Bezemer, 2014

[UNDEFINED] match-req [IF]
[UNDEFINED] 2over [IF] include lib/anscore.4th [THEN]
[UNDEFINED] row [IF] include lib/row.4th [THEN]
[UNDEFINED] is-ascii [IF] include lib/istype.4th [THEN]
[UNDEFINED] break? [IF] include lib/breakq.4th [THEN]

defer (matchhere)

128 +constant +cmd

char ^ +cmd constant (^) \ all special commands
char ? +cmd constant (?)
char * +cmd constant (*)
char + +cmd constant (+)
char $ +cmd constant ($)
char . +cmd constant (.)
char 9 +cmd constant (9)
char a +cmd constant (@)
char A +cmd constant (a)
char # +cmd constant (#)
char & +cmd constant (&)
char _ +cmd constant (_)

create (eq?) \ is it equal?
(.) , ' is-ascii , \ equivalent to .
(9) , ' is-digit , \ equivalent to [0-9]
(@) , ' is-lower , \ equivalent to [a-z]
(a) , ' is-upper , \ equivalent to [A-Z]
(#) , ' is-alpha , \ equivalent to [a-zA-Z]
(&) , ' is-alnum , \ equivalent to [a-zA-Z0-9]
(_) , ' is-white , \ whitespace
NULL , \ if a set execute, otherwise compare
does> 2 num-key row if nip cell+ @c execute else drop = then ;
\ some helper words
: (crunch) 1- over over over char+ -rot cmove ;
: (cmd!) over dup c@ +cmd swap c! ; ( a n -- a n)
: (contains?) 2>r c@ false 2r> bounds break? ;
\ prepare regular expression
: (prepare) ( a1 n1 -- a1 n2)
over swap \ save string address
begin
dup \ any characters left?
while \ if so, does it contain
over s" ^$?*+." (contains?) \ a metacharacter?
if (cmd!) \ if so, set command bit
else dup 1 > \ string length at least two?
if over c@ [char] \ = \ if it contains an escape
if (crunch) \ ignore the next character
else \ otherwise, if it is marked as
over c@ [char] % = \ a set, set the command bit
if over char+ s" 9aA#&_" (contains?) if (crunch) (cmd!) then then
then
then
then chop \ next character
repeat drop over - \ calculate new length
;
\ match zero or more times
: (match*) ( a n ra rn c --f)
begin
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ r@ (eq?) else dup xor then r> swap
while \ character equals text?
>r 2>r chop 2r> r> \ if so, match again
repeat drop 2drop 2drop false \ clean up, return false
;
\ match zero or one time
: (match?) ( a n ra rn c --f)
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ r> (eq?) else r> drop dup xor then
if 2>r chop 2r> (matchhere) else 2drop 2drop false then
;
\ match one or more times
: (match+) ( a n ra rn c --f)
>r 2over if c@ r@ (eq?) if 2>r chop 2r> r> (match*) exit then else drop
then
2drop 2drop r> dup xor \ check one character then
; \ perform (match*)

create (special?) \ all special characters
(*) , ' (match*) ,
(?) , ' (match?) ,
(+) , ' (match+) ,
NULL ,
does> 2 num-key row \ if special character
if \ execute it
cell+ @c >r drop over c@ >r chop chop r> r> execute true
else \ otherwise drop values
drop drop false \ and return false
then
;

:noname ( a n ra rn -- f)
dup if \ regular expression a null string?
over char+ c@ (special?) if exit then
over c@ ($) = over 1 = and \ otherwise does it equal a '$'
if \ and is it the last character?
2drop nip 0= exit \ is so, check length of text
else \ finally, check if any text left
2over \ and if character matches
if c@ >r over c@ r> swap (eq?) if chop 2>r chop 2r> recurse exit then
else drop then false \ if so recurse, otherwise quit
then \ and return false
else
true \ zero length regular expression
then >r 2drop 2drop r> \ clean up and exit
; is (matchhere) \ assign to DEFER (we got 'em)

: match-reg ( a n ra rn --f)
(prepare) dup if over c@ (^) = if chop (matchhere) exit then then
begin \ if caret, chop it
2over 2over (matchhere) if 2drop 2drop true exit then
>r over r> swap \ match characters
while \ until no more text
2>r chop 2r> \ chop text
repeat 2drop 2drop false \ clean up
;

[DEFINED] 4TH# [IF]
hide (matchhere)
hide +cmd
hide (^)
hide (?)
hide (*)
hide (+)
hide ($)
hide (.)
hide (9)
hide (@)
hide (a)
hide (#)
hide (&)
hide (_)
hide (eq?)
hide (match*)
hide (match?)
hide (match+)
hide (prepare)
hide (special?)
hide (crunch)
hide (cmd!)
hide (contains?)
[THEN]
[THEN]
---8<---

Anton Ertl

unread,
Oct 18, 2014, 11:47:43 AM10/18/14
to
Related to an adjacent discussion: I think that this serves nicely as
a demonstration that lack of locals does not automatically lead to
well-factored code (especially if you use 'the magical "seven, plus or
minus two" words' *maximum* as the definition of "well-factored").

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2014: http://www.euroforth.org/ef14/

Hans Bezemer

unread,
Oct 18, 2014, 12:51:44 PM10/18/14
to
Anton Ertl wrote:

> Hans Bezemer <the.bee...@gmail.com> writes:
> Related to an adjacent discussion: I think that this serves nicely as
> a demonstration that lack of locals does not automatically lead to
> well-factored code (especially if you use 'the magical "seven, plus or
> minus two" words' *maximum* as the definition of "well-factored").
Well, that might also be due to my style of coding. When creating library
code I DON'T factor unless I have to. It saves me the call overhead (which
is heavier in 4tH than other Forths) and (for library code) the savings in
size don't add up to much. I don't know how users are thrashing the code,
so I want it as tight as possible.

In applications programs I do factor A LOT. E.g. in the 4tH preprocessor
you'll find LOTS of one-liners and other words rarely exceed three lines.

BTW, I don't find the "localists" doing so great either where "one-liners"
are concerned. Speaking of no one in general, I hasten to say.

Still, IMHO this particular code is not overly difficult to follow. And that
is where my primary concern is here.

Hans Bezemer

hughag...@yahoo.com

unread,
Oct 18, 2014, 10:19:31 PM10/18/14
to
On Saturday, October 18, 2014 8:47:43 AM UTC-7, Anton Ertl wrote:
> Related to an adjacent discussion: I think that this serves nicely as
> a demonstration that lack of locals does not automatically lead to
> well-factored code (especially if you use 'the magical "seven, plus or
> minus two" words' *maximum* as the definition of "well-factored").

I agree with Anton on this --- I don't write such lengthy Forth with multiple levels of control-structure indentation. The fact that Hans is writing code at all, however, puts him way ahead of the majority of C.L.F. experts who never write any code, so I applaud his effort.

I haven't actually examined Hans' code closely, as regular-expressions aren't something that I'm interested in. The last time that the subject of pattern-matching text came up, was this thread:
https://groups.google.com/forum/#!searchin/comp.lang.forth/google$20codejam/comp.lang.forth/S6vlFRldw44/ikfRkqH7968J
It might be interesting if Hans were to try out his regular-expression code on that problem.

Hans Bezemer

unread,
Oct 19, 2014, 8:51:16 AM10/19/14
to
hughag...@yahoo.com wrote:

> On Saturday, October 18, 2014 8:47:43 AM UTC-7, Anton Ertl wrote:
> I agree with Anton on this --- I don't write such lengthy Forth with
> multiple levels of control-structure indentation.
Basically, knowing the rules also implies you know why they're there and
break 'em if they don't make sense. In this case, library code shouldn't
clutter my name space too much and run as fast as possible.

Furthermore, it should still remain to be as maintainable as possible (or
required). The latter is taken care of by keeping the stack diagram pretty
stable: every line starts of with "a n wa wn". That's good enough for me.

> The fact that Hans is
> writing code at all, however, puts him way ahead of the majority of C.L.F.
> experts who never write any code, so I applaud his effort.
I couldn't agree with you more! That's why I started off this thread
with "Hey, here's some code, instead off simply bashing each others heads
in during some academical discussion or personal vendetta". It's ridiculous
when you compare it with other newsgroups on programming. Like a bunch of
old, fat guys discussing how to get most of the performance out of an
engine or whether a car should be modified or remain original, but NEVER
DRIVE A MILE.

And finally, I find it offending when posing somebodies code as "an example
of bad programming" when there is a SEE.F lying around in his own code for
ages which IMHO is much more horrifying than this snippit. I take that very
personally and shows what a bunch of arrogant snobs have taken over the
discussions here.

I have the habit of rewriting a bad piece of my own code (like WILDCARD.4TH)
when I come across it - and I have almost 400 libraries to maintain.

If you want to see how well others attacked this particular piece of code,
check
https://groups.google.com/forum/#!searchin/comp.lang.forth/wildcard/comp.lang.forth/hKwBzF_RNWY/BW57ZepUOR4J
You'll see that the initials AE are lacking (as usual), since the person in
particular doesn't make his hands dirty by writing actual code himself and
expose it to the scrutiny of his peers. Let alone write something useful
(in the REAL world) and let others build on it and improve it. That's what
FOSS (gforth is GNU) is all about in the first place - and c.l.f. should be
a forum, a breeding ground for that. But it's not.

From time to time I'm visiting Google group Fignition, where you still find
the hint of the FIG-Forth era of coding - and its' fun, it's what Forth was
and should be.

I use 4tH personally and for work, simply because it gives me an edge. I
crunch out functionality faster than others, because if something is
lacking, I can add it - up to the compiler itself. I did a 64K program in
less than a week. It merged data from half a dozen different sources,
identified the format, checked it, cleaned it up, produced error lists in
binary Excel, punched out a clean .CSV file, ready to be read in and left a
run report which was acceptable for an accountant. When it ran, it did in
about 30 secs what it had taken a grown man

When I left, they wanted to port the thing to C (or something). I wrote down
the exact specifications (including the code that catered for the zillion
possible date formats it could decipher - from native Excel via "3 letter
code" to ISO) and they found out it would take a good C-programmer 3 months
to replicate.

Did I cut corners here and there? Absolutely. And I can tell you right where
they are and why I cut them. That's the real world. But I'm not there to
write academic papers, I'm there to make a difference and get the job done
within time and budget. And BTW, I'm not a programmer, I'm a service
management consultant who's not afraid to get his hands dirty when the
world needs to be saved. And yes, I do have a few dozens publications on my
name, including Forth related if you really want to know.

So In order to give you all something to criticize and waste a perfect day
again, here's the code I wrote this week. I wrote it in an hour or so to
convert a bunch of SQL statements into a neatly .CSV file so I could match
it against the data dictionary, instead of letting some poor bastard match
the whole 3000 fields by hand. You can see where the "awful regex" code
came in handy.

I hope you all feel very good about yourselves, because "I'm ok and you're
ok" isn't as much fun as "I'm ok and you're an @$$hole".

Hans Bezemer

---8<---
include lib/kpre.4th
include lib/leading.4th
include lib/scanskip.4th
include lib/compare.4th

variable #lines
2048 constant /buffer
/buffer buffer: buffer
64 string tablename

: Usage abort" Usage: extract infile outfile " ;
: Read-file refill ;
: Preprocess 0 #lines ! buffer /buffer source! ." Tablename, fieldname" cr ;
: trimming -leading -trailing ;
: table? 2dup [char] . split 2drop chop tablename place ;

: field?
over s" SELECT" >r r@ rot r@ compare 0= if r> /string else r> drop then
trimming bl split trimming 2swap
trimming bl split trimming 2nip
s" AS" compare
if
." Suspicious field [" type ." ] at line " #lines ? cr 2drop
else
tablename count type ." , " type cr
then
;

: fields?
begin
dup 0>
while
[char] , split trimming dup 0> if field? chop else 2drop then
repeat
;

: Process
1 #lines +!
0 parse trimming
dup if
2dup s" ^%9%9*\..*$" match-reg if table? else
2dup s| ^.*".*"%_*as%_*".*".*| match-reg if fields? then
then
then 2drop
;

[needs lib/convert.4th]
---8<---

Hans Bezemer

unread,
Oct 19, 2014, 9:23:38 AM10/19/14
to
Hans Bezemer wrote:

BTW, fix this one in "prim.fs":

: d2*+ ( ud n -- ud+n c )
over MINI
and >r >r 2dup d+ swap r> + swap r> ;

It should be:

: d2*+ ( ud n -- ud+n c )
over MINI
and >r >r 2dup d+ r> 0 d+ r> ;

It's been there for ages as well.

Hans Bezemer

Andrew Haley

unread,
Oct 19, 2014, 10:31:41 AM10/19/14
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>
> Related to an adjacent discussion: I think that this serves nicely as
> a demonstration that lack of locals does not automatically lead to
> well-factored code

I must have missed that claim.

> (especially if you use 'the magical "seven, plus or minus two"
> words' *maximum* as the definition of "well-factored").

Which no-one does, AFAIK. I'm surprised you didn't know the reference
from Thinking Forth, but here's the Wikipedia entry:

"The Magical Number Seven, Plus or Minus Two: Some Limits on Our
Capacity for Processing Information" is one of the most highly
cited papers in psychology. It was published in 1956 by the
cognitive psychologist George A. Miller of Princeton University's
Department of Psychology in Psychological Review. It is often
interpreted to argue that the number of objects an average human
can hold in working memory is 7 +- 2. This is frequently referred to
as Miller's Law.

Andrew.

Marcel Hendrix

unread,
Oct 19, 2014, 11:10:23 AM10/19/14
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes Re: Tiny regular expressions code

>> Hans Bezemer <the.bee...@gmail.com> writes:

> Related to an adjacent discussion: I think that this serves nicely as
> a demonstration that lack of locals does not automatically lead to
> well-factored code (especially if you use 'the magical "seven, plus or
> minus two" words' *maximum* as the definition of "well-factored").

Removing the locals *and* removing the stack noise does not seem to
lead to a much easier algorithm (although the max. number of
lines / word is now 8). There is quite obvious similarity between
the words in the local-less variant, but I don't see how to exploit it.

A state machine and goto's may be a better idea in this case.

-marcel

-- -------------------------
ANEW -regexpr

0 [IF]

DEFER (matchhere) ( $1 $2 -- bool )

: (match*) LOCALS| c n2 a2 n1 a1 | ( $1 $2 char -- bool )
BEGIN
a1 n1 a2 n2 (matchhere) IF TRUE EXIT ENDIF
n1 0= IF FALSE EXIT ENDIF
a1 c@ '.' = a1 c@ c = OR 0= IF FALSE EXIT ENDIF
a1 n1 1 /string TO n1 TO a1
AGAIN ;

: (match?) LOCALS| c n2 a2 n1 a1 | ( $1 $2 char -- bool )
a1 n1 a2 n2 (matchhere) IF TRUE EXIT ENDIF
n1 0= IF FALSE EXIT ENDIF
a1 c@ '.' = a1 c@ c = OR 0= IF FALSE EXIT ENDIF
a1 n1 1 /string a2 n2 (matchhere) ;

:NONAME 0 LOCALS| c n2 a2 n1 a1 | ( $1 $2 -- bool )
n2 0= IF TRUE EXIT ENDIF
a2 c@ TO c a2 char+ c@ '*' = \ if not, does it equal a '*'
IF a2 n2 2 /string TO n2 TO a2
a1 n1 a2 n2 c (match*) EXIT
ENDIF \ otherwise, does it equal a '?'
a2 char+ c@ '?' =
IF a2 n2 2 /string TO n2 TO a2
a1 n1 a2 n2 c (match?) EXIT
ENDIF \ otherwise, does it equal a '$'
c '$' = n2 1 = AND IF n1 0= EXIT ENDIF \ and is it the last character?
n1 0= IF FALSE EXIT ENDIF
c '.' = c a1 c@ = OR 0= IF FALSE EXIT ENDIF
a1 n1 1 /string a2 n2 1 /string (matchhere) ; IS (matchhere)

: match LOCALS| n2 a2 n1 a1 | ( $1 $2 -- bool )
n2 IF a2 c@ '^' = IF a1 n1 a2 n2 1 /string (matchhere) ENDIF
ELSE BEGIN
a1 n1 a2 n2 (matchhere) IF TRUE EXIT ENDIF
n1 0= IF FALSE EXIT ENDIF
a1 n1 1 /string TO n1 TO a1
AGAIN
ENDIF ;

[ELSE]

CREATE $1$2 #256 5 * CELLS ALLOT $1$2 VALUE ptr
: +level ( char a1 n1 a2 n2 -- ) 5 CELLS +TO ptr ptr 2 CELL[] d! ptr d! ptr 4 CELL[] ! ;
: -level ( -- ) -5 CELLS +TO ptr ;
: reset ( $1 $2 -- ) $1$2 TO ptr -level 2>r 2>r 0 2r> 2r> +level ;
: a1 ( -- addr ) ptr @ ; : n1 ( -- n ) ptr CELL+ @ ; : $1 ( -- a1 n1 ) ptr d@ ;
: a2 ( -- addr ) ptr 2 CELL[] @ ; : n2 ( -- n ) ptr 3 CELL[] @ ; : $2 ( -- a2 n2 ) ptr 2 CELL[] d@ ;
: c ( -- char ) ptr 4 CELL[] @ ;
: shorten$1 ( u -- ) $1 ROT /string ptr d! ;
: shorten$2 ( u -- ) $2 ROT /string ptr 2 CELL[] d! ;

DEFER (matchhere) ( -- bool )

: (match*) ( char -- bool )
$1 $2 +level
BEGIN
(matchhere) IF -level TRUE EXIT ENDIF
n1 0= IF -level FALSE EXIT ENDIF
a1 c@ '.' = a1 c@ c = OR 0= IF -level FALSE EXIT ENDIF
1 shorten$1
AGAIN ;

: (match?) ( char -- bool )
$1 $2 +level
(matchhere) IF -level TRUE EXIT ENDIF
n1 0= IF -level FALSE EXIT ENDIF
a1 c@ '.' = a1 c@ c = OR 0= IF -level FALSE EXIT ENDIF
1 shorten$1 (matchhere) -level ;

:NONAME ( -- bool )
a2 c@ $1 $2 +level
n2 0= IF -level TRUE EXIT ENDIF
a2 char+ c@ '*' = IF 2 shorten$2 c (match*) -level EXIT ENDIF
a2 char+ c@ '?' = IF 2 shorten$2 c (match?) -level EXIT ENDIF
c '$' = n2 1 = AND IF -level n1 0= EXIT ENDIF
n1 0= IF -level FALSE EXIT ENDIF
c '.' = c a1 c@ = OR 0= IF -level FALSE EXIT ENDIF
1 shorten$1 1 shorten$2 (matchhere) -level ; IS (matchhere)

: match ( $1 $2 -- bool )
reset
n2 IF a2 c@ '^' = IF 1 shorten$2 (matchhere) ENDIF
ELSE BEGIN
(matchhere) IF -level TRUE EXIT ENDIF
n1 0= IF -level FALSE EXIT ENDIF
1 shorten$1
AGAIN
ENDIF ;

[THEN]

Albert van der Horst

unread,
Oct 19, 2014, 1:10:42 PM10/19/14
to
In article <5443b39a$0$2947$e4fe...@news2.news.xs4all.nl>,
If I look at your regular expressions, i've the same feeling as I have
looking at Marcel Hendrix Manx package. "Oh God, he is a really good
programmer." Being a good programmer is, however, not what it is all about.

I couldn't, and wouldn't, try to maintain a package in that style.
I would restructure it first. And with manx I even failed to do that.
(I've rewritten it, and now I can expand it in any direction I want.)

If I look at my regexpression package I see:

1. an emphasis on documentation. Good programmers can keep lots of things
in their head. I can't. This is how *my* r.e. starts:

\---------------------------------------------------
\ Regular expressions in Forth.
\ This package handles only simple regular expressions and replacements.
\ See the words RE-MATCH and RE-REPLACE for usage.
\ The following aspects are handled:
\ 1. Compiling ^ (begin only) $ (end only) and special characters + ? * [ ] < >
\ 2. Grouping using ( ) , only for replacement.
\ 4. Ranges and inversion of char set (between [ ] ).
\ 3. Above characters must be escaped if used as is by \ , making \ a special char.
\ 4. Some sets are escaped by \ (\w) , some non-printables are denoted by an
\ escape sequence.
\ 5. It is an error to escape characters that do no denote blank
\ space, are not special, nor are denoting a set, However ^ - $
\ etc. may be escaped where they are not special.

\ Implementation notes:
\ * Usually regular expressions are compiled into a buffer consisting of
\ tokens followed by strings or characters in some format.
\ We follow the same here, except that tokens are execution tokens.
\ * No attempt is done at reentrant code.
\ * \d \s \w etc. can be handled by just adding sets

\ Data structures :
\ a char set is a bit set, with a bit up for the matching character.
\ a string is a regular string variable (so with a cell count).


\---------------------------------------------------

2. An emphasis on structure and data structures.
I'm a bit mathematical inclined. If the structures aren't right
I get nowhere.

3. Simple basic definitions.
You can find the occasional IF within a loop, or chained IF's.
If it gets more complicated then that, my debugging skills
fall short.

4. Comprehensive tests.
If it works, that's nothing. You must be able to change it.
If it still works, that's nothing. You must be able to be
sure that it still works.

By the way. Anton Ertl is a fine programmer. I've looked at his
solution for the Ants euler problem. And he has improved on my
"yet another prime counting program" with a keen eye for
efficiency.

Groetjes Albert

Hans Bezemer

unread,
Oct 19, 2014, 1:57:35 PM10/19/14
to
Albert van der Horst wrote:

> If I look at your regular expressions, i've the same feeling as I have
> looking at Marcel Hendrix Manx package. "Oh God, he is a really good
> programmer." Being a good programmer is, however, not what it is all
> about.
Agreed. But I don't think we disagree much here. Keep reading, I think I
agree on virtually every point you address. I do, however, take my work and
skills quite serious.

> I couldn't, and wouldn't, try to maintain a package in that style.
> I would restructure it first. And with manx I even failed to do that.
> (I've rewritten it, and now I can expand it in any direction I want.)
First, every programmer I know has his own style. Having seen a lot of
programs - as you have, I presume - I can only say there are a lot of
styles. Having done maintenance on programs I did not write, I see good
style and bad style. Good style doesn't mean it's my style. I always try to
keep the style and conventions of the original programmer (if I can), I
can't rework every program to meet my style. But maintenance IS (agreed)
the main thing. Having done maintenance on my uBasic interpreter, my
preprocessor and other non-trivial (what I consider non-trivial) programs,
I don't feel I'm lacking too much in that department.

> 1. an emphasis on documentation. Good programmers can keep lots of things
> in their head. I can't. This is how *my* r.e. starts:
I'm not the kind of programmer who tries to cram his documentation into his
programs. I do, however, have one of the highest scores where comments are
concerned (pull the code and read from column 40 forward) and I have about
600 pages documenting my compiler on almost every subject possible. That's
over 2(!) gForth manuals. Anders gezegd, ik voel me nog steeds niet echt
aangesproken.

> By the way. Anton Ertl is a fine programmer. I've looked at his
> solution for the Ants euler problem. And he has improved on my
> "yet another prime counting program" with a keen eye for
> efficiency.
Quoting from a person I respect: "Oh God, he is a really good
programmer." Being a good programmer is, however, not what it is all
about.

Hans Bezemer

Bernd Paysan

unread,
Oct 19, 2014, 3:14:48 PM10/19/14
to
It pobably has stayed there because the recommendation is when you don't
implement the multiplication and the division in assembler, please at least
implement d2*+, the support primitive. And I'm sure this d2*+ has been
tested in the context it is supposed to be used, and worked there
(multiplication). It's just that if you use it for general purpose stuff,
it won't work.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Hans Bezemer

unread,
Oct 19, 2014, 6:15:23 PM10/19/14
to
Bernd Paysan wrote:
> It pobably has stayed there because the recommendation is when you don't
> implement the multiplication and the division in assembler, please at
> least
> implement d2*+, the support primitive. And I'm sure this d2*+ has been
> tested in the context it is supposed to be used, and worked there
> (multiplication). It's just that if you use it for general purpose stuff,
> it won't work.
Well, I didn't. It's used in UM/MOD and UM* definitions afterwards, which
render the wrong results because of it. But (as I always say at work) I
told you, so it's your problem now.

Hans Bezemer

Anton Ertl

unread,
Oct 20, 2014, 5:57:18 AM10/20/14
to
Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>
>> Related to an adjacent discussion: I think that this serves nicely as
>> a demonstration that lack of locals does not automatically lead to
>> well-factored code
>
>I must have missed that claim.

I am not surprised.

BTW, my comment was not a criticism of Hans Bezemer's code, but on the
claim outlined above. I have not looked closely enough at the problem
or the code to make any well-founded critique of the code. That's why
I put special emphasis on the following formalistic criterion for
"well-factored"):

>> (especially if you use 'the magical "seven, plus or minus two"
>> words' *maximum* as the definition of "well-factored").
>
>Which no-one does, AFAIK.

You did:

|>>But, IMO, any word which is much longer than than the magical "seven,
|>>plus or minus two" words may well be in need of some attention.
|>
|> You would unfactor a three-word definition?
|
|No: the "seven, plus or minus two" is a *maximium*.

<GNKdnUPWhZo1RN3J...@supernews.com>

>I'm surprised you didn't know the reference
>from Thinking Forth

What makes you think I didn't know it? I just don't find the
connection drawn to Miller's result at all convincing. The only
interesting part in Thinking Forth about this is:

|An informal examination of one of Moore's applications shows that he
|averages seven references, including both words and numbers, per
|definition.

If he considered 7 a maximum, he apparently also considered it a
minimum (or he would not arrive at an average of 7). A histogram
would be more interesting, and I am sure that he did not limit himself
to 7 or 9 references maximum. I wonder whether :, <name> and ; are
included in the count of references.

m.a.m....@tue.nl

unread,
Oct 20, 2014, 6:13:17 AM10/20/14
to
On Monday, October 20, 2014 11:57:18 AM UTC+2, Anton Ertl wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>
> >Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
[..]
>> An informal examination of one of Moore's applications shows that he
>> averages seven references, including both words and numbers, per
>> definition.

I'll have a look in 'Footsteps in an empty valley' and/or the Weasel
manual (full kernel), tonight. This should be easy to (dis)prove.

-marcel

Hans Bezemer

unread,
Oct 20, 2014, 11:25:50 AM10/20/14
to
Anton Ertl wrote:
> BTW, my comment was not a criticism of Hans Bezemer's code, but on the
> claim outlined above. I have not looked closely enough at the problem
> or the code to make any well-founded critique of the code. That's why
> I put special emphasis on the following formalistic criterion for
> "well-factored"):
After having vented my anger (an consequently get it out of my system) and
having read the entire thread I think I owe you a mild apology, because I
believe you sincerely didn't mean to put my code down as "a particular
example of bad programming" (I know you didn't exactly state that, but that
was how I read it).

However, being an oldtimer down here and having ported and read lots of
programs with C-size definitions, I was taken off guard why this particular
not-too-interesting piece of code (I wrote it in an hour or so) had to be
singled out.

But my two cents to the discussion, I don't think that ANY programming
paradigm is able to enforce "writing good code".
- I think that forcing Python users to indent properly may have given more
rise to hard to trace bugs than anything else;
- Some loop constructs are better handled by GOTO than forcing people into
structured programming;
- And the best of all, OO has given rise to "lasagna code" instead of
getting in the land of reliable code (with other claims of OO broken down
by reality as well);
- And what Java is concerned, well, we all know too well what's wrong with
Java, don't we.

So, when Forth poses no particular limits on the size of a definition
(especially now we miss the measure of a "screen") I don't know why anyone
should be surprised why long definitions come to be. If we do
our "factoring" thing, it will emerge the most WITH applications or BETWEEN
libraries. Needless to say that where libraries are concerned, these words
will be very low level - since the functionality of libraries can differ
greatly.

On top of that, if the API doesn't really enforce to expose a lot of
symbols, factoring will greatly depend on patterns. Lots of algorithms
however, involve lots of loops and ifs and have very few patterns that have
to be (or can reasonably can be) factored out. In other words, yes, you can
take out pieces, but what's the use when they're referred to only ONCE?

Hans Bezemer





Anton Ertl

unread,
Oct 20, 2014, 12:32:41 PM10/20/14
to
Hans Bezemer <the.bee...@gmail.com> writes:
>On top of that, if the API doesn't really enforce to expose a lot of
>symbols, factoring will greatly depend on patterns. Lots of algorithms
>however, involve lots of loops and ifs and have very few patterns that have
>to be (or can reasonably can be) factored out. In other words, yes, you can
>take out pieces, but what's the use when they're referred to only ONCE?

As Andrew Haley writes, for understanding and for testing; also, for
changing. However, to truly become helpful for that, the factors need
to implement identifiable concepts, and Andrew Haley and I differ in
what we consider "identifiable concepts".

As for "referred only ONCE": what may be referred only once now may be
referred more often if you factor well. And it's hard to predict what
will be reused and what will not.

Marcel Hendrix

unread,
Oct 20, 2014, 2:19:21 PM10/20/14
to
m.a.m....@tue.nl writes Re: The magical number seven (was: Tiny regular expressions code)
[..]
>> >Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
[..]
>> An informal examination of one of Moore's applications shows that he
>> averages seven references, including both words and numbers, per
>> definition.

> I'll have a look in 'Footsteps in an empty valley' and/or the Weasel
> manual (full kernel), tonight. This should be easy to (dis)prove.

'Footsteps in an empty valley'

-DIGIT \ 27 words
NUMBER \ 53 words
WORD \ 48 words
QUIT \ 14 words
BAUD \ 17 words
SPACES \ 13 words
M/MOD \ 10 words
FILL \ 15 words
SCAN \ 12 words
THRU \ 12 words
] \ 46 words
@ \ 11 words ( optimized)

( here I stopped )

I can't find any trace of words that are obviously factored out of larger
definitions. All words perform a definite function.
Not all words have a stack picture. No words have empty stack pictures. Only
the data stack is commented, input stream and rack are not.

I counted ':', "name" and ';' as words.

The listing I used for counting was the metacompiler for the NOVIX.
I didn't count the very long optimizer words (50 .. 72 words,
unfactored :-), and also not artefacts of metacompiling like : @ @ ; .

-marcel

Rod Pemberton

unread,
Oct 20, 2014, 4:34:45 PM10/20/14
to
On Mon, 20 Oct 2014 11:28:00 -0400, Hans Bezemer
<the.bee...@gmail.com> wrote:

> But my two cents to the discussion, I don't think that ANY programming
> paradigm is able to enforce "writing good code".

People are able to master the ability to solve complicated math and
physic problems. Are you saying people are unable to represent the
solution process for solving such problems as a verifiably correct
programming paradigm?

> - I think that forcing Python users to indent properly may have given
> more rise to hard to trace bugs than anything else;

I know very little about Python. So, I can accept that as true, for now.

Contrarily, I understand that indenting in languages, like C, which don't
require it improves readability immensely. From my experience in other
languages, I believe improved readability reduces certain coding errors.

> - Some loop constructs are better handled by GOTO than forcing people
> into structured programming;

I've used structured programming since I first learned how to program a
little over three decades ago. GOTO is completely unecessary. So, under
what circumstances do you believe your claim to be true?

> - And the best of all, OO has given rise to "lasagna code" instead of
> getting in the land of reliable code

I agree.

> (with other claims of OO broken down by reality as well);

Interesting.

> - And what Java is concerned, well, we all know too well what's wrong
> with Java, don't we.

Other than Java not having pointers, I don't know what you're talking
about. Of course, I'm familiar with C but not Java. Could you explain?


Rod Pemberton

Elizabeth D. Rather

unread,
Oct 20, 2014, 6:19:07 PM10/20/14
to
On 10/20/14 10:36 AM, Rod Pemberton wrote:
> On Mon, 20 Oct 2014 11:28:00 -0400, Hans Bezemer
> <the.bee...@gmail.com> wrote:
>
>> But my two cents to the discussion, I don't think that ANY programming
>> paradigm is able to enforce "writing good code".
>
> People are able to master the ability to solve complicated math and
> physic problems. Are you saying people are unable to represent the
> solution process for solving such problems as a verifiably correct
> programming paradigm?

That isn't what he said. Of course people can write good programs. What
he said is that a programming paradigm cannot *force* them to. People
can (and do) write bad programs in any language.

>> - I think that forcing Python users to indent properly may have given
>> more rise to hard to trace bugs than anything else;
>
> I know very little about Python. So, I can accept that as true, for now.
>
> Contrarily, I understand that indenting in languages, like C, which don't
> require it improves readability immensely. From my experience in other
> languages, I believe improved readability reduces certain coding errors.

Of course. Indenting that conveys meaning always improves readability.
Again, the key word here is "forcing".

>> - Some loop constructs are better handled by GOTO than forcing people
>> into structured programming;
>
> I've used structured programming since I first learned how to program a
> little over three decades ago. GOTO is completely unecessary. So, under
> what circumstances do you believe your claim to be true?

Arbitrary GOTOs are unnecessary, I agree. I think what he means is that
sometimes it is necessary to be able to branch out of a loop.

>> - And the best of all, OO has given rise to "lasagna code" instead of
>> getting in the land of reliable code
>
> I agree.

There are applications that naturally lend themselves to OO programming,
and others that don't. The notion that OO is a panacea for all
programming problems is a fallacy.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

hughag...@yahoo.com

unread,
Oct 20, 2014, 7:34:53 PM10/20/14
to
On Monday, October 20, 2014 8:25:50 AM UTC-7, The Beez wrote:
> Anton Ertl wrote:
> > BTW, my comment was not a criticism of Hans Bezemer's code, but on the
> > claim outlined above. I have not looked closely enough at the problem
> > or the code to make any well-founded critique of the code. That's why
> > I put special emphasis on the following formalistic criterion for
> > "well-factored"):
> After having vented my anger (an consequently get it out of my system) and
> having read the entire thread I think I owe you a mild apology, because I
> believe you sincerely didn't mean to put my code down as "a particular
> example of bad programming" (I know you didn't exactly state that, but that
> was how I read it).

Getting back to the subject of regular expressions, have you tried your regexp on the "Alien Alphabet" problem in that thread that I mentioned?

For the most part, I don't like regular expressions, and would prefer to use a technique like I used in that problem. OTOH, a lot of people like regular expressions, and they are a big part of all the major scripting languages --- if you are doing a lot of text pattern-matching, they may make simplify the work and be worthwhile. I used AWK a long time ago, and it worked pretty well.

Rod Pemberton

unread,
Oct 21, 2014, 1:45:30 AM10/21/14
to
On Mon, 20 Oct 2014 18:19:05 -0400, Elizabeth D. Rather
<era...@forth.com> wrote:
> On 10/20/14 10:36 AM, Rod Pemberton wrote:
>> On Mon, 20 Oct 2014 11:28:00 -0400, Hans Bezemer
>> <the.bee...@gmail.com> wrote:

>>> But my two cents to the discussion, I don't think that ANY programming
>>> paradigm is able to enforce "writing good code".
>>
>> People are able to master the ability to solve complicated math and
>> physic problems. Are you saying people are unable to represent the
>> solution process for solving such problems as a verifiably correct
>> programming paradigm?
>
> That isn't what he said. Of course people can write good programs. What
> he said is that a programming paradigm cannot *force* them to.
> People can (and do) write bad programs in any language.
>

A programming paradigm can force type checking which prevents certain
types of bad programming, i.e., you're "forced" to not do that. So, what
prevents a paradigm from being implemented which would prohibit all bad
implementations? I think the only real issue is that many, many possible
combinations of programming elements would need to be tested or proved
to ensure that no "bad" programs could result, intentional or otherwise.
That may be a daunting task or even not doable with humanity's current
intellectual and computational resources. But, if we could reach that
point, then we can be reasonably assured that programmers are "forced"
to only write good programs. The paradigm simply won't allow for bad
ones. Of course, all of this would go against the nature of those who
prefer Forth to other languages.


Rod Pemberton

Elizabeth D. Rather

unread,
Oct 21, 2014, 2:31:43 AM10/21/14
to
The fundamental flaw with most of the "bad" programs I've encountered is
bad design. This may be either failure to think through and document all
the relevant design issues, or think through appropriate implementation
methodologies, or just plain "slash and burn" programming techniques.

The important thing is for the programmer(s) to (a) understand the
problem, (b) have a coherent approach to addressing it, and (c) have a
dedication to producing the cleanest and most transparent possible
solution to the requirements. I don't know of any compiler that can
"enforce" this.

Lars Brinkhoff

unread,
Oct 21, 2014, 2:56:56 AM10/21/14
to
m...@iae.nl (Marcel Hendrix) writes:
> -DIGIT \ 27 words
> NUMBER \ 53 words
> WORD \ 48 words
> QUIT \ 14 words
> BAUD \ 17 words
> SPACES \ 13 words
> M/MOD \ 10 words
> FILL \ 15 words
> SCAN \ 12 words
> THRU \ 12 words
> ] \ 46 words
> @ \ 11 words ( optimized)
> ( here I stopped )

There are many shorter words. I get an average of 8.7.

Histogram:

1 ******************
2 *****************************
3 *********************
4 ********************
5 ************
6 ****************
7 ***********
8 **********
9 *****
10 *******
11 ***
12 ***
13 **
14 *
15 *****
16 **
17 **
18 *
19 **
20 **
21 **
22 *
23 *
24 **
25 *
27 *
29 *
30 *
32 *
34 *
38 *
43 **
47 *
60 **
67 *

Hans Bezemer

unread,
Oct 21, 2014, 4:27:43 AM10/21/14
to
Rod Pemberton wrote:

Where Elizabeth took the trouble of addressing most issues (thank you for
that, I couldn't have explained it as well as you did!) There are a few
points left for me.
> I've used structured programming since I first learned how to program a
> little over three decades ago. GOTO is completely unecessary. So, under
> what circumstances do you believe your claim to be true?
I agree, it doesn't happen all the time, but at crucial moments I'm glad
GOTO is there. In Forth the VERY occasional "R> DROP" will do the same
thing and due to the nice CATCH some GOTO's are eliminated as well. I for
one, have to agree structured programming is a real boon, given the ordeal
I had to go through porting TEONW.
http://thebeezspeaks.blogspot.nl/2011/02/how-to-reheat-30-year-old-spaghetti.html

>> (with other claims of OO broken down by reality as well);
> Interesting.
I have papers in my posession saying that OO neither gives rise to much
reuse nor are a real productivity boost, neither when developing nor when
maintaining.

>> - And what Java is concerned, well, we all know too well what's wrong
>> with Java, don't we.
>
> Other than Java not having pointers, I don't know what you're talking
> about. Of course, I'm familiar with C but not Java. Could you explain?
Sure. First OO isn't optional. Second, I love table-oriented-programming (I
think my 4tH is best equipped for that) and Java makes that impossible by
the sheer horror of using function pointers.

I even avoid USING Java programs as much as I can, because even from a user
standpoint they're a pain in the neck. Different Java versions, frequent
crashing, pages of errors that make it impossible to find out what went
wrong and if you can possibly fix it. BTW, there are many pages of Java
complaints all over the web - more than Forth related ;-)

Hans Bezemer

Albert van der Horst

unread,
Oct 21, 2014, 5:06:18 AM10/21/14
to
In article <NaSdnSZVRbNQYNjJ...@supernews.com>,
Elizabeth D. Rather <era...@forth.com> wrote:
<SNIP>
>
>The fundamental flaw with most of the "bad" programs I've encountered is
>bad design. This may be either failure to think through and document all
>the relevant design issues, or think through appropriate implementation
>methodologies, or just plain "slash and burn" programming techniques.

You are the second person I know off who uses the term "slash and
burn" w.r.t. programming.
The other person was me, as in "slash and burn maintenance".
This means that in maintaining a program with a feature that doesn't
work, you remove the feature, and all its auxiliary words. If any of
those auxiliary words support a feature that isn't used, remove them
anyway, and that feature as well. Then do some refactoring and/or
redesigning and maybe reinstall the original feature.
In adding a feature that doesn't fit the design, redesign and remove
before adding. More often than not, after a redesign I discovered that
adding the feature was a mere couple of lines.

Where did you get the term? For me slash and burn suggest that you're
not afraid of removing largish parts of a program.
Too many programmers consider the old program code as sacred,
and dare not tough it, unless it is soooo obviously wrong.

>
>The important thing is for the programmer(s) to (a) understand the
>problem, (b) have a coherent approach to addressing it, and (c) have a
>dedication to producing the cleanest and most transparent possible
>solution to the requirements. I don't know of any compiler that can
>"enforce" this.

Right.

>
>Cheers,
>Elizabeth
>
>--
>==================================================
>Elizabeth D. Rather (US & Canada) 800-55-FORTH
>FORTH Inc. +1 310.999.6784
>5959 West Century Blvd. Suite 700
>Los Angeles, CA 90045
>http://www.forth.com
>
>"Forth-based products and Services for real-time
>applications since 1973."
>==================================================

Bernd Paysan

unread,
Oct 21, 2014, 5:34:18 AM10/21/14
to
Hans Bezemer wrote:
>> Other than Java not having pointers, I don't know what you're talking
>> about. Of course, I'm familiar with C but not Java. Could you explain?
> Sure. First OO isn't optional. Second, I love table-oriented-programming
> (I think my 4tH is best equipped for that) and Java makes that impossible
> by the sheer horror of using function pointers.

The classes are actually tables of function pointers, so as long as you use
names to index into these tables, use a class in Java to implement that.

Of course, when you use numbers to index, it's awful, because you can't do
that.

Bernd Paysan

unread,
Oct 21, 2014, 5:44:37 AM10/21/14
to
The histogram is much more useful here: It's a long tail statistics.
Apparently, the pressure is towards very small words, and ~80% of the words
are "well factored". 20% aren't, and it looks like they need half of the
source code space.

Just Ian

unread,
Oct 21, 2014, 10:06:55 AM10/21/14
to
Marcel Hendrix said:

> 'Footsteps in an empty valley'

> -DIGIT \ 27 words
> NUMBER \ 53 words
> WORD \ 48 words

(etc)

> I can't find any trace of words that are obviously factored out of larger
> definitions. All words perform a definite function.
> Not all words have a stack picture. No words have empty stack pictures. Only
> the data stack is commented, input stream and rack are not.

By coincidence, I was looking at 'Footsteps..' recently. As well as the comment from someone else that if you include the shorter words, the average is seven-ish, I don't think anyone would claim cmFORTH to be anything other than a quick hack to see if the hardware works.

Who can forget the splitting of control structures between words, for example? I think it was MAX and MIN that jumped into the middle of the other, but it could have been any similar comparison words.

Amongst the problems with the Novix was that it was released with cmFORTH as the leading software.

> and also not artefacts of metacompiling like : @ @ ; .

I would say that's not an artefact, that's because on the hardware, the definition of '@' is '@'! Without that and the other similar ones, you couldn't use any of the primitive words interactively.

Ian

Marcel Hendrix

unread,
Oct 21, 2014, 2:11:00 PM10/21/14
to
Just Ian <supe...@gmail.com> wrote Re: The magical number seven (was: Tiny regular expressions code)

> Marcel Hendrix said:

>> 'Footsteps in an empty valley'

>> -DIGIT \ 27 words
>> NUMBER \ 53 words
>> WORD \ 48 words

[..]

> By coincidence, I was looking at 'Footsteps..' recently. As well as the comment
> from someone else that if you include the shorter words, the average is seven-ish,
> I don't think anyone would claim cmFORTH to be anything other
> than a quick hack to see if the hardware works.

I should be obvious from my post that I am convinced that [TOPIC] is not
illustrated by 'Footsteps ... .' I listed the reasons in the OP. For all
other purposes Lars has provided the histogram and exact (hopefully) count (applause!)

Seven words is not a lot, and IMO it is obvious Charles Moore is not writing such
short definitions out of principle. I'm pretty sure that e.g. -DIGIT, NUMBER
and WORD have comparable word counts on small systems by other authors.

> Who can forget the splitting of control structures between words, for example?
> I think it was MAX and MIN that jumped into the middle of the other, but it
> could have been any similar comparison words.
[..]

Why the party trick when he just needed 'a quick hack to see if the hardware
works?'

>> and also not artefacts of metacompiling like : @ @ ; .

> I would say that's not an artefact, that's because on the hardware, the def
> inition of '@' is '@'! Without that and the other similar ones, you couldn't
> use any of the primitive words interactively.

Maybe 'artefact' is a wrong choice of word.
I consider ": @ @ ;" atypical when the intention is to get a feeling
for how normal well-factored Forth code should look. Anybody would (need to)
write it like that.

-marcel

Elizabeth D. Rather

unread,
Oct 21, 2014, 2:47:34 PM10/21/14
to
On 10/20/14 11:06 PM, Albert van der Horst wrote:
> In article <NaSdnSZVRbNQYNjJ...@supernews.com>,
> Elizabeth D. Rather <era...@forth.com> wrote:
> <SNIP>
>>
>> The fundamental flaw with most of the "bad" programs I've encountered is
>> bad design. This may be either failure to think through and document all
>> the relevant design issues, or think through appropriate implementation
>> methodologies, or just plain "slash and burn" programming techniques.
>
> You are the second person I know off who uses the term "slash and
> burn" w.r.t. programming.
> The other person was me, as in "slash and burn maintenance".
> This means that in maintaining a program with a feature that doesn't
> work, you remove the feature, and all its auxiliary words. If any of
> those auxiliary words support a feature that isn't used, remove them
> anyway, and that feature as well. Then do some refactoring and/or
> redesigning and maybe reinstall the original feature.
> In adding a feature that doesn't fit the design, redesign and remove
> before adding. More often than not, after a redesign I discovered that
> adding the feature was a mere couple of lines.
>
> Where did you get the term? For me slash and burn suggest that you're
> not afraid of removing largish parts of a program.
> Too many programmers consider the old program code as sacred,
> and dare not tough it, unless it is soooo obviously wrong.

To me, "slash and burn" just means throwing code at a problem without
careful thought and design. In maintenance, the scenario you describe is
kind of the opposite, if you mean that the new version is a
carefully-conceived re-design replacing an ill-conceived hack.

Rod Pemberton

unread,
Oct 22, 2014, 2:19:24 AM10/22/14
to
On Tue, 21 Oct 2014 14:47:33 -0400, Elizabeth D. Rather
<era...@forth.com> wrote:

> To me, "slash and burn" just means throwing code at a problem without
> careful thought and design.

Some problems may be too difficult to comprehend or easily quantify.
In that case, "shotgunning" can be very effective in narrowing
the scope of the problem to something manageable, solveable,
or identifiable. It's especially useful in debugging.


Rod Pemberton

Andrew Haley

unread,
Oct 26, 2014, 5:24:55 AM10/26/14
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>>
>>> (especially if you use 'the magical "seven, plus or minus two"
>>> words' *maximum* as the definition of "well-factored").
>>
>>Which no-one does, AFAIK.
>
> You did:
>
> |>>But, IMO, any word which is much longer than than the magical "seven,
> |>>plus or minus two" words may well be in need of some attention.

And where, pray, do I use that as a definition of "well-factored"?

> If he considered 7 a maximum, he apparently also considered it a
> minimum (or he would not arrive at an average of 7). A histogram
> would be more interesting, and I am sure that he did not limit himself
> to 7 or 9 references maximum.

I accept that point.

Andrew.

Andrew Haley

unread,
Oct 26, 2014, 5:27:40 AM10/26/14
to
Just Ian <supe...@gmail.com> wrote:
>
> Amongst the problems with the Novix was that it was released with
> cmFORTH as the leading software.

That's true. Novix polyFORTH was fantastic to use and had very good
metacompiling capabilities, but was rather expensive. cmFORTH was a
PITA. Shame.

Andrew.

Andrew Haley

unread,
Oct 26, 2014, 5:33:59 AM10/26/14
to
Hans Bezemer <the.bee...@gmail.com> wrote:
>
> But my two cents to the discussion, I don't think that ANY programming
> paradigm is able to enforce "writing good code".
> - I think that forcing Python users to indent properly may have given more
> rise to hard to trace bugs than anything else;
> - Some loop constructs are better handled by GOTO than forcing people into
> structured programming;
> - And the best of all, OO has given rise to "lasagna code" instead of
> getting in the land of reliable code (with other claims of OO broken down
> by reality as well);

Indeed; like Forth, good OO requires good taste.

> - And what Java is concerned, well, we all know too well what's wrong with
> Java, don't we.

Do we? Why Java, in particular? It's just a (very successful) OO
language, and it is extremely conservatively designed, or at least it
was. Granted, it has over the years accreted a lot of features beyond
its original design -- but so has the langauge we know and love. :-)

> On top of that, if the API doesn't really enforce to expose a lot of
> symbols, factoring will greatly depend on patterns. Lots of
> algorithms however, involve lots of loops and ifs and have very few
> patterns that have to be (or can reasonably can be) factored out. In
> other words, yes, you can take out pieces, but what's the use when
> they're referred to only ONCE?

Why not? Forth is designed to reduce the cost of factoring almost to
zero.

Andrew.

Andrew Haley

unread,
Oct 26, 2014, 5:41:56 AM10/26/14
to
Hans Bezemer <the.bee...@gmail.com> wrote:

> Sure. First OO isn't optional. Second, I love table-oriented-
> programming (I think my 4tH is best equipped for that) and Java
> makes that impossible by the sheer horror of using function
> pointers.

What's the problem with function pointers? You can certainly create
arrays of references to functional objects, and they can do the same
thing.

> I even avoid USING Java programs as much as I can, because even from a user
> standpoint they're a pain in the neck. Different Java versions, frequent
> crashing, pages of errors that make it impossible to find out what went
> wrong and if you can possibly fix it. BTW, there are many pages of Java
> complaints all over the web - more than Forth related ;-)

That's rather inevitable. Java is the second most-used (or perhaps
most-used, depending on whose survey you believe) programming
language.

Andrew.

Hans Bezemer

unread,
Oct 26, 2014, 7:05:59 AM10/26/14
to
Andrew Haley wrote:

> What's the problem with function pointers? You can certainly create
> arrays of references to functional objects, and they can do the same
> thing.
Working around anything to get what you want is always a bad design
paradigm.

> That's rather inevitable. Java is the second most-used (or perhaps
> most-used, depending on whose survey you believe) programming
> language.
I boiled it down to three so I'm doing fine in that department.

Hans Bezemer

Hans Bezemer

unread,
Oct 26, 2014, 7:10:25 AM10/26/14
to
Andrew Haley wrote:

> Indeed; like Forth, good OO requires good taste.
Oh no. The major error here is that OO is a design technique - not a
programming paradigm. You use it only when it's appropriate. Applying it to
every problem is asking for trouble. And indeed it does. But at always,
programmers are superstitious people, believing in holy men and holy books.
And where believe comes in, facts disappear.

Hans Bezemer

Andrew Haley

unread,
Oct 26, 2014, 2:58:50 PM10/26/14
to
Hans Bezemer <the.bee...@gmail.com> wrote:
> Andrew Haley wrote:
>
>> What's the problem with function pointers? You can certainly create
>> arrays of references to functional objects, and they can do the same
>> thing.
>
> Working around anything to get what you want is always a bad design
> paradigm.

Huh? What is being "worked around"? Functional objects are Java's
equivalent. If you really want an array of functions, that's how you
do it.

Andrew.

the_gavino_himself

unread,
Nov 8, 2014, 10:24:17 PM11/8/14
to
On Monday, October 13, 2014 9:13:06 AM UTC-7, The Beez wrote:
> Short intermission from all the clashes here, sorry for the inconvenience,
> but I used Forth for some CODING. You may fry me for the code quality
> later, but this is a tiny regular expression parser, based on an article by
> Kernighan and Pike
> (http://www.drdobbs.com/architecture-and-design/regular-expressions/184410904).
>
> It features ^, $, * and . but I added ? as well (zero or one). This is some
> stuff that was left lying around as an intermediate stage for a similar 4tH
> library (which is cleaner and more capable), but I thought, may be it is of
> use/interest to someone, why throw it away.
>
> Knock yourselves out ;-)
>
> Hans Bezemer
>
> ---8<---
> \ Regular Expressions by Brian W. Kernighan and Rob Pike
> \ Believed to be in the public domain
>
> defer (matchhere)
>
> : (match*) ( a n ra rn c --f)
> begin
> >r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
> 2over if c@ dup [char] . = swap r@ = or else dup xor then r> swap
> while \ character equals text?
> >r 2>r 1 /string 2r> r> \ if so, match again
> repeat
> drop 2drop 2drop false \ clean up, return false
> ;
>
> : (match?) ( a n ra rn c --f)
> >r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
> 2over if c@ dup [char] . = swap r> = or else r> dup xor then
> if 2>r 1 /string 2r> (matchhere) else 2drop 2drop false then
> ; \ character equals text?
>
> :noname ( a n ra rn -- f)
> dup if \ regular expression a null string?
> over char+ c@ dup [char] * = \ if not, does it equal a '*'
> if \ if so, call (match*)
> drop over c@ >r 2 /string r> (match*) exit
> else \ otherwise, does it equal a '?'
> [char] ? =
> if \ if so, call (match?)
> over c@ >r 2 /string r> (match?) exit
> else \ otherwise does it equal a '$'
> over c@ [char] $ = over 1 = and
> if \ and is it the last character?
> 2drop nip 0= exit \ is so, check length of text
> else \ finally, check if it char matches
> 2over 0<> >r c@ >r over c@ dup
> [char] . = swap r> = or r> and
> if 1 /string 2>r 1 /string 2r> recurse exit then false
> then \ if so recurse, otherwise quit
> then
> then
> else
> true \ zero length regular expression
> then >r 2drop 2drop r> \ clean up and exit
> ; is (matchhere) \ assign to DEFER (we got 'em)
>
> : match ( a n ra rn --f)
> dup if over c@ [char] ^ = if 1 /string (matchhere) exit then then
> begin \ if caret, chop it
> 2over 2over (matchhere) if 2drop 2drop true exit then
> >r over r> swap \ match characters
> while \ until no more text
> 2>r 1 /string 2r> \ chop text
> repeat 2drop 2drop false \ clean up
> ;
>
> s" 0,9" s" ^0,?9$" match . .s cr
> s" 0:9" s" ^0,?9$" match . .s cr
> s" 09" s" ^0,?9$" match . .s cr
> s" 009" s" ^0,?9$" match . .s cr
> s" 0,,9" s" ^0,?9$" match . .s cr cr
> ---8<---

bravo

the_gavino_himself

unread,
Nov 8, 2014, 11:42:47 PM11/8/14
to
beez what about web programming in forth?
0 new messages