If people are interested, here is the 4tH code (containing plenty of
4tH-isms) which is enhanced by:
- escaping characters
- predefined sets (prefixed by %, see code)
- + and ?
So it is capable of doing this:
---8<---
s" 0,9" s" ^0,?9$" match-reg . cr
s" 0:9" s" ^0,?9$" match-reg . cr
s" 09" s" ^0,?9$" match-reg . cr
s" 009" s" ^0,?9$" match-reg . cr
s" 0,,9" s" ^0,?9$" match-reg . cr cr
s" abcdefg" s" abcdefg$" match-reg . cr
s" ababababab" s" ab*a$" match-reg . cr
s" aaaaaaaaba" s" a*ba$" match-reg . cr
s" aaaaaabac" s" ab*a$" match-reg . cr
s" abbd" s" ab*d$" match-reg . cr
s" abbde" s" ab*d$" match-reg . cr cr
s" -1234.56" s" -?%9+\.?%9*$" match-reg . cr
s" -1234" s" -?%9+\.?%9*$" match-reg . cr
s" 1234.56" s" -?%9+\.?%9*$" match-reg . cr
s" 1234" s" -?%9+\.?%9*$" match-reg . cr
s" 1234.ab" s" -?%9+\.?%9*$" match-reg . cr
s" 1234,23" s" ^-?%9+\.?%9*$" match-reg . cr
s" PRExyz23" s" ^PRE.*23$" match-reg . cr
s" -.23" s" ^-?%9+\.?%9*$" match-reg . cr depth .
---8<---
Maybe someone is interested to make an ANS version out of this. I'm good ;-)
Hans Bezemer
P.S. :TOKEN xxx yyy ; equals :NONAME yyy ; constant xxx
---8<---
: break? ?do over i c@ = if 0= leave then loop nip ;
---8<---
: IS-ASCII ( char -- flag ) 128 < ;
: IS-PRINT ( char -- flag ) DUP IS-ASCII SWAP BL 1- - 0> AND ;
: IS-WHITE ( char -- flag ) [CHAR] ! - 0< ;
: IS-DIGIT ( char -- flag ) [CHAR] 0 - MAX-N AND 10 < ;
: IS-LOWER ( char -- flag ) [CHAR] a - MAX-N AND 26 < ;
: IS-UPPER ( char -- flag ) [CHAR] A - MAX-N AND 26 < ;
: IS-ALPHA ( char -- flag ) BL OR IS-LOWER ;
: IS-ALNUM ( char -- flag ) DUP IS-ALPHA SWAP IS-DIGIT OR ;
: IS-XML ( char -- flag ) 0 S| <>&"'| BOUNDS DO OVER I C@ = OR LOOP
NIP ;
: IS-HTML ( char -- flag ) DUP IS-XML SWAP IS-PRINT 0= OR ;
---8<---
-1 constant NULL ( NULL pointer)
defer key=
:token string-key >r 2dup r@ @c count compare 0= r> swap ;
:token num-key over over @c = ;
: row ( x a1 n1 xt -- x a2 f)
is key= >r ( x a)
begin ( x a)
dup @c NULL <> dup ( x a f f)
while ( x a f)
drop key= dup 0= ( x a f -f)
while ( x a f)
drop r@ cells + ( x a)
repeat ( x a)
[UNDEFINED] 4TH# [IF] then [THEN]
r> drop ( x a f)
;
---8<---
\ Regular Expressions by Brian W. Kernighan and Rob Pike
\ Believed to be in the public domain
\ 4th version by J.L. Bezemer, 2014
[UNDEFINED] match-req [IF]
[UNDEFINED] 2over [IF] include lib/anscore.4th [THEN]
[UNDEFINED] row [IF] include lib/row.4th [THEN]
[UNDEFINED] is-ascii [IF] include lib/istype.4th [THEN]
[UNDEFINED] break? [IF] include lib/breakq.4th [THEN]
defer (matchhere)
128 +constant +cmd
char ^ +cmd constant (^) \ all special commands
char ? +cmd constant (?)
char * +cmd constant (*)
char + +cmd constant (+)
char $ +cmd constant ($)
char . +cmd constant (.)
char 9 +cmd constant (9)
char a +cmd constant (@)
char A +cmd constant (a)
char # +cmd constant (#)
char & +cmd constant (&)
char _ +cmd constant (_)
create (eq?) \ is it equal?
(.) , ' is-ascii , \ equivalent to .
(9) , ' is-digit , \ equivalent to [0-9]
(@) , ' is-lower , \ equivalent to [a-z]
(a) , ' is-upper , \ equivalent to [A-Z]
(#) , ' is-alpha , \ equivalent to [a-zA-Z]
(&) , ' is-alnum , \ equivalent to [a-zA-Z0-9]
(_) , ' is-white , \ whitespace
NULL , \ if a set execute, otherwise compare
does> 2 num-key row if nip cell+ @c execute else drop = then ;
\ some helper words
: (crunch) 1- over over over char+ -rot cmove ;
: (cmd!) over dup c@ +cmd swap c! ; ( a n -- a n)
: (contains?) 2>r c@ false 2r> bounds break? ;
\ prepare regular expression
: (prepare) ( a1 n1 -- a1 n2)
over swap \ save string address
begin
dup \ any characters left?
while \ if so, does it contain
over s" ^$?*+." (contains?) \ a metacharacter?
if (cmd!) \ if so, set command bit
else dup 1 > \ string length at least two?
if over c@ [char] \ = \ if it contains an escape
if (crunch) \ ignore the next character
else \ otherwise, if it is marked as
over c@ [char] % = \ a set, set the command bit
if over char+ s" 9aA#&_" (contains?) if (crunch) (cmd!) then then
then
then
then chop \ next character
repeat drop over - \ calculate new length
;
\ match zero or more times
: (match*) ( a n ra rn c --f)
begin
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ r@ (eq?) else dup xor then r> swap
while \ character equals text?
>r 2>r chop 2r> r> \ if so, match again
repeat drop 2drop 2drop false \ clean up, return false
;
\ match zero or one time
: (match?) ( a n ra rn c --f)
>r 2over 2over (matchhere) if r> drop 2drop 2drop true exit then
2over if c@ r> (eq?) else r> drop dup xor then
if 2>r chop 2r> (matchhere) else 2drop 2drop false then
;
\ match one or more times
: (match+) ( a n ra rn c --f)
>r 2over if c@ r@ (eq?) if 2>r chop 2r> r> (match*) exit then else drop
then
2drop 2drop r> dup xor \ check one character then
; \ perform (match*)
create (special?) \ all special characters
(*) , ' (match*) ,
(?) , ' (match?) ,
(+) , ' (match+) ,
NULL ,
does> 2 num-key row \ if special character
if \ execute it
cell+ @c >r drop over c@ >r chop chop r> r> execute true
else \ otherwise drop values
drop drop false \ and return false
then
;
:noname ( a n ra rn -- f)
dup if \ regular expression a null string?
over char+ c@ (special?) if exit then
over c@ ($) = over 1 = and \ otherwise does it equal a '$'
if \ and is it the last character?
2drop nip 0= exit \ is so, check length of text
else \ finally, check if any text left
2over \ and if character matches
if c@ >r over c@ r> swap (eq?) if chop 2>r chop 2r> recurse exit then
else drop then false \ if so recurse, otherwise quit
then \ and return false
else
true \ zero length regular expression
then >r 2drop 2drop r> \ clean up and exit
; is (matchhere) \ assign to DEFER (we got 'em)
: match-reg ( a n ra rn --f)
(prepare) dup if over c@ (^) = if chop (matchhere) exit then then
begin \ if caret, chop it
2over 2over (matchhere) if 2drop 2drop true exit then
>r over r> swap \ match characters
while \ until no more text
2>r chop 2r> \ chop text
repeat 2drop 2drop false \ clean up
;
[DEFINED] 4TH# [IF]
hide (matchhere)
hide +cmd
hide (^)
hide (?)
hide (*)
hide (+)
hide ($)
hide (.)
hide (9)
hide (@)
hide (a)
hide (#)
hide (&)
hide (_)
hide (eq?)
hide (match*)
hide (match?)
hide (match+)
hide (prepare)
hide (special?)
hide (crunch)
hide (cmd!)
hide (contains?)
[THEN]
[THEN]
---8<---