On Monday, May 30, 2022 at 12:43:22 PM UTC-5, luser droog wrote:
> I just spent the evening and morning writing a parser for BNF syntax.
> Pretty meager stuff, I know. But the debugging effort went rather
> smoothly thanks to the introduction of error message support.
> Even the bare-bones error messages told me which production was
> failing.
>
Ok. It's written now. Now I'm not sure if it's "good" or not. It all works.
But down towards the bottom of the file, there's just so very, very much
hidden business going on ... I'm just not so sure anymore. Maybe there's
such a thing as "too much abstraction".
Some commentary.
First it loads the parser combinator library. Then it uses the combinators
to build a parser for EBNF notation. The EBNF-parse function receives
a string-input or file-input structure and produces a dictionary of parsers.
Each of these parsers has been built as an indirect function, ie. wrapped
in { ... exec }. This was done so that they could be composed for the
right hand side expressions and then filled in later, to link each one to
its symbol. Either this or some kind of dependency analysis or topographical
sort was necessary to deal with all the potential mutual recursion and just
general interdependency in an ENBF grammar. As a bonus, they can be
extracted and wrapped individually with handlers, which is done at the bottom.
So, the top half of the file describes a grammar which yields a parser for
EBNF notation. A little further down the EBNF parser is applied to a
grammar for a USA postal address, which yields parsers for each of the
symbols in the grammar, notably postal-address itself, the start symbol.
And then at the very bottom, the postal-address parser is applied to
a string.
It's working, but somehow I lack confidence in it. It's just so weird.
(
pc12.ps)run
/to-string{ dup type /stringtype ne {
dup 0 exch {length add} forall string exch
0 exch {3 copy putinterval length add} forall pop } if }
def
/choice-symbol (|) char def
/defining-symbol (=) str def
/terminating-symbol (;) char def
/space ( \n) anyof def
/space? //space many def
/digit (0)(9) range def
/alpha (a)(z) range (A)(Z) range alt def
/name (-_) anyof //alpha alt some def
/identifier //name
{to-string cvn} using
def
/terminal (") char (") noneof many xthen (") char thenx
(') char (') noneof many xthen (') char thenx alt
{to-string {str} curry} using
def
/non-terminal //identifier
{{load executeonly} curry} using
def
/Expression {-777 exec} def
/Symbol //identifier def
/Factor //terminal
//non-terminal alt
([) char //space? thenx //Expression xthen //space? thenx (]) char thenx
{{maybe} compose} using alt
({) char //space? thenx //Expression xthen //space? thenx (}) char thenx
{{many} compose} using alt
def
/Term //Factor //space? thenx some
{dup xcheck not { {compose {then} compose} reduce } if} using
def
//Expression 0 //Term
//choice-symbol //space? thenx
//Term xthen many then
{dup xcheck not { {compose {alt} compose} reduce } if} using
put
/BNF-definition //space? //Symbol xthen //space? thenx
//defining-symbol thenx //space? thenx
//Expression then //space? thenx
//terminating-symbol thenx //space? thenx
def
/EBNF //BNF-definition some def
/EBNF-parse {
0 0 3 2 roll string-input EBNF
dup first /OK eq { second first } if %%fixme what if it's not OK?
1 dict begin
aload length 2 idiv {def} repeat
currentdict end
dup ===
1 dict begin
dup
{pop {-777 exec} ll def} forall
{exch load exch 0 exch exec put} forall
currentdict end
} def
/EOL (\n) char def
/upper (A)(Z) range def
/_ ( ) char maybe def
(
postal-address = name-part street-address zip-part ;
name-part = personal-part _ last-name _ opt-suffix-part EOL
| personal-part _ name-part ;
personal-part = initial "." | first-name ;
street-address = house-num _ street-name opt-apt-num EOL ;
zip-part = town-name "," _ state-code _ zip-code EOL ;
opt-suffix-part = "Sr." | "Jr." | roman-numeral | "" ;
opt-apt-num = [ apt-num ] ;
apt-num = digit { digit } ;
town-name = name ;
state-code = upper upper ;
zip-code = digit digit digit digit digit ;
initial = "Mr" | "Mrs" | "Ms" | "M" ;
roman-numeral = "I" [ "V" ] { "I" } ;
first-name = name ;
last-name = name ;
house-num = digit { digit } ;
street-name = name ;
) EBNF-parse
ps
dup length =
dup {pop =} forall
dup /postal-address get ==
begin
<<
/street-name {to-string one one}
/first-name {to-string}
/last-name {to-string}
>>
{{using}curry exch load exch 0 exch update}forall
0 0
(Mr. luser droog I\n2357 Streetname\nAnytown, ST 00000\n)
string-input postal-address
ps
quit
$ gsq
pc12ebnf.ps
<< /apt-num {/digit load executeonly /digit load executeonly many then} /house-num {/digit load executeonly /digit load executeonly many then} /street-address {/house-num load executeonly /_ load executeonly then /street-name load executeonly then /opt-apt-num load executeonly then /EOL load executeonly then} /state-code {/upper load executeonly /upper load executeonly then} /personal-part {/initial load executeonly (.) str then /first-name load executeonly alt} /postal-address {/name-part load executeonly /street-address load executeonly then /zip-part load executeonly then} /street-name {/name load executeonly} /opt-suffix-part {(Sr.) str (Jr.) str alt /roman-numeral load executeonly alt () str alt} /zip-code {/digit load executeonly /digit load executeonly then /digit load executeonly then /digit load executeonly then /digit load executeonly then} /opt-apt-num {/apt-num load executeonly maybe} /initial {(Mr) str (Mrs) str alt (Ms) str alt (M) str alt} /zip-part {/town-name load executeonly (,) str then /_ load executeonly then /state-code load executeonly then /_ load executeonly then /zip-code load executeonly then /EOL load executeonly then} /roman-numeral {(I) str (V) str maybe then (I) str many then} /first-name {/name load executeonly} /last-name {/name load executeonly} /name-part {/personal-part load executeonly /_ load executeonly then /last-name load executeonly then /_ load executeonly then /opt-suffix-part load executeonly then /EOL load executeonly then /personal-part load executeonly /_ load executeonly then /name-part load executeonly then alt} /town-name {/name load executeonly} >>
stack:
-dict-
17
apt-num
house-num
street-address
state-code
personal-part
postal-address
street-name
opt-suffix-part
zip-code
opt-apt-num
initial
zip-part
roman-numeral
first-name
last-name
name-part
town-name
{{{-array- exec +is-ok {second forcexs x-xs -array- exec +is-ok {second x-xs 3 1 roll {append} exec success} {x-xs 3 2 roll ( after ) exch cons exch cons cons} ifelse} if} exec +is-ok {second forcexs x-xs -array- exec +is-ok {second x-xs 3 1 roll {append} exec success} {x-xs 3 2 roll ( after ) exch cons exch cons cons} ifelse} if} exec}
stack:
[/OK [[(M) (r) (.) ( ) (luser) ( ) (droog) ( ) (I) (\n) (2) (3) (5) (7) ( ) [(Streetname)] (\n) (A) (n) (y) (t) (o) (w) (n) (,) ( ) (S) (T) ( ) (0) (0) (0) (0) (0) (\n)] {3 0 () string-input}]]