Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

New parser rewrite appears to be working nicely

21 views
Skip to first unread message

luser droog

unread,
May 30, 2022, 1:43:22 PM5/30/22
to
I just spent the evening and morning writing a parser for BNF syntax.
Pretty meager stuff, I know. But the debugging effort went rather
smoothly thanks to the introduction of error message support.
Even the bare-bones error messages told me which production was
failing.

(pc12.ps)run
/to-string{ dup type /stringtype ne {
dup length string exch 0 exch {3 copy putinterval length add} forall pop } if }
def

/space ( \n) anyof def
/space? //space many def
/alpha (a)(z) range (A)(Z) range alt def
/identifier (-) char //alpha alt some
{to-string} using
def

/terminal (") char (") noneof many
xthen (") char thenx def
/non-terminal (<) char //identifier
xthen (>) char thenx def
/Symbol //non-terminal def
/Expression //terminal //non-terminal alt //space? thenx some def
/BNF //space? //Symbol xthen //space? thenx
(::=) str then //space? thenx
//Expression then //space? thenx
def

/BNF-parse {
0 0 3 2 roll string-input BNF
dup first /OK eq { second first } if
} def

(
<postal-address> ::= <name-part> <street-address> <zip-part>
) BNF-parse
pq


[Terminal transcript. 'Norah' is my laptop's name.]

]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Error: /undefined in alpha
Operand stack:
identifier
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push
Dictionary stack:
--dict:737/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- --dict:37/58(L)-- --dict:2/2(L)-- --dict:62/120(L)--
Current allocation mode is local
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Error: /undefined in BNF
Operand stack:
BNF --nostringval-- --nostringval--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval--
Dictionary stack:
--dict:737/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- --dict:37/58(L)-- --dict:2/2(L)-- --dict:69/120(L)--
Current allocation mode is local
Current file position is 611
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Fail
[[{(<) eq} (not satisfied)] [[(\n) [0 0]] {0 0 ( <postal-address> ::= <name-part> <street-address> <zip-part>\n) string-input}]]
GS> ]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Fail
[[( after ) (\n)] [[{(<) eq} (not satisfied)] [[( ) [0 0]] {0 1 ( <postal-address> ::= <name-part> <street-address> <zip-part>\n) string-input}]]]
GS> ]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Fail
[[( after ) [(<) (p)]] [[{(>) eq} (not satisfied)] [[(o) [0 4]] {0 5 (stal-address> ::= <name-part> <street-address> <zip-part>\n) string-input}]]]
GS> ]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Fail
[[( after ) [(<) (p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (>) (:) (:) (=)]] [[{(-) eq} (not satisfied)] [[(<) [0 23]] {0 24 (name-part> <street-address> <zip-part>\n) string-input}]]]
GS> ]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Fail
[[( after ) [(<) (p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (>) (:) (:) (=)]] [[{dup (A) ge exch (Z) le and} (not satisfied)] [[(<) [0 23]] {0 24 (name-part> <street-address> <zip-part>\n) string-input}]]]
GS> ]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
p
(o)
GS> ]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
stack:
[(p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (:) (:) (=) (n) (a) (m) (e) (-) (p) (a) (r) (t)]
]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
stack:
[(p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (:) (:) (=) (n) (a) (m) (e) (-) (p) (a) (r) (t) (s) (t) (r) (e) (e) (t) (-) (a) (d) (d) (r) (e) (s) (s) (z) (i) (p) (-) (p) (a) (r) (t)]
]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
Error: /undefined in to-string
Operand stack:
--nostringval-- (<) --nostringval-- --nostringval--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:737/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- --dict:37/58(L)-- --dict:2/2(L)-- --dict:70/120(L)--
Current allocation mode is local
Current file position is 779
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$ gsq pc12bnf.ps
stack:
[(postal-address) (:) (:) (=) (name-part) (street-address) (zip-part)]
]0;~/pcomb/ps
Norah@laptop ~/pcomb/ps
$

luser droog

unread,
Jun 3, 2022, 10:11:56 PM6/3/22
to
On Monday, May 30, 2022 at 12:43:22 PM UTC-5, luser droog wrote:
> I just spent the evening and morning writing a parser for BNF syntax.
> Pretty meager stuff, I know. But the debugging effort went rather
> smoothly thanks to the introduction of error message support.
> Even the bare-bones error messages told me which production was
> failing.
>

Ok. It's written now. Now I'm not sure if it's "good" or not. It all works.
But down towards the bottom of the file, there's just so very, very much
hidden business going on ... I'm just not so sure anymore. Maybe there's
such a thing as "too much abstraction".

Some commentary.

First it loads the parser combinator library. Then it uses the combinators
to build a parser for EBNF notation. The EBNF-parse function receives
a string-input or file-input structure and produces a dictionary of parsers.

Each of these parsers has been built as an indirect function, ie. wrapped
in { ... exec }. This was done so that they could be composed for the
right hand side expressions and then filled in later, to link each one to
its symbol. Either this or some kind of dependency analysis or topographical
sort was necessary to deal with all the potential mutual recursion and just
general interdependency in an ENBF grammar. As a bonus, they can be
extracted and wrapped individually with handlers, which is done at the bottom.

So, the top half of the file describes a grammar which yields a parser for
EBNF notation. A little further down the EBNF parser is applied to a
grammar for a USA postal address, which yields parsers for each of the
symbols in the grammar, notably postal-address itself, the start symbol.
And then at the very bottom, the postal-address parser is applied to
a string.

It's working, but somehow I lack confidence in it. It's just so weird.


(pc12.ps)run
/to-string{ dup type /stringtype ne {
dup 0 exch {length add} forall string exch
0 exch {3 copy putinterval length add} forall pop } if }
def

/choice-symbol (|) char def
/defining-symbol (=) str def
/terminating-symbol (;) char def

/space ( \n) anyof def
/space? //space many def
/digit (0)(9) range def
/alpha (a)(z) range (A)(Z) range alt def
/name (-_) anyof //alpha alt some def
/identifier //name
{to-string cvn} using
def

/terminal (") char (") noneof many xthen (") char thenx
(') char (') noneof many xthen (') char thenx alt
{to-string {str} curry} using
def
/non-terminal //identifier
{{load executeonly} curry} using
def
/Expression {-777 exec} def
/Symbol //identifier def
/Factor //terminal
//non-terminal alt
([) char //space? thenx //Expression xthen //space? thenx (]) char thenx
{{maybe} compose} using alt
({) char //space? thenx //Expression xthen //space? thenx (}) char thenx
{{many} compose} using alt
def
/Term //Factor //space? thenx some
{dup xcheck not { {compose {then} compose} reduce } if} using
def
//Expression 0 //Term
//choice-symbol //space? thenx
//Term xthen many then
{dup xcheck not { {compose {alt} compose} reduce } if} using
put
/BNF-definition //space? //Symbol xthen //space? thenx
//defining-symbol thenx //space? thenx
//Expression then //space? thenx
//terminating-symbol thenx //space? thenx
def
/EBNF //BNF-definition some def

/EBNF-parse {
0 0 3 2 roll string-input EBNF
dup first /OK eq { second first } if %%fixme what if it's not OK?

1 dict begin
aload length 2 idiv {def} repeat
currentdict end
dup ===

1 dict begin
dup
{pop {-777 exec} ll def} forall
{exch load exch 0 exch exec put} forall
currentdict end
} def

/EOL (\n) char def
/upper (A)(Z) range def
/_ ( ) char maybe def

(
postal-address = name-part street-address zip-part ;
name-part = personal-part _ last-name _ opt-suffix-part EOL
| personal-part _ name-part ;
personal-part = initial "." | first-name ;
street-address = house-num _ street-name opt-apt-num EOL ;
zip-part = town-name "," _ state-code _ zip-code EOL ;
opt-suffix-part = "Sr." | "Jr." | roman-numeral | "" ;
opt-apt-num = [ apt-num ] ;
apt-num = digit { digit } ;
town-name = name ;
state-code = upper upper ;
zip-code = digit digit digit digit digit ;
initial = "Mr" | "Mrs" | "Ms" | "M" ;
roman-numeral = "I" [ "V" ] { "I" } ;
first-name = name ;
last-name = name ;
house-num = digit { digit } ;
street-name = name ;
) EBNF-parse

ps
dup length =
dup {pop =} forall
dup /postal-address get ==

begin
<<
/street-name {to-string one one}
/first-name {to-string}
/last-name {to-string}
>>
{{using}curry exch load exch 0 exch update}forall

0 0
(Mr. luser droog I\n2357 Streetname\nAnytown, ST 00000\n)
string-input postal-address
ps

quit


$ gsq pc12ebnf.ps
<< /apt-num {/digit load executeonly /digit load executeonly many then} /house-num {/digit load executeonly /digit load executeonly many then} /street-address {/house-num load executeonly /_ load executeonly then /street-name load executeonly then /opt-apt-num load executeonly then /EOL load executeonly then} /state-code {/upper load executeonly /upper load executeonly then} /personal-part {/initial load executeonly (.) str then /first-name load executeonly alt} /postal-address {/name-part load executeonly /street-address load executeonly then /zip-part load executeonly then} /street-name {/name load executeonly} /opt-suffix-part {(Sr.) str (Jr.) str alt /roman-numeral load executeonly alt () str alt} /zip-code {/digit load executeonly /digit load executeonly then /digit load executeonly then /digit load executeonly then /digit load executeonly then} /opt-apt-num {/apt-num load executeonly maybe} /initial {(Mr) str (Mrs) str alt (Ms) str alt (M) str alt} /zip-part {/town-name load executeonly (,) str then /_ load executeonly then /state-code load executeonly then /_ load executeonly then /zip-code load executeonly then /EOL load executeonly then} /roman-numeral {(I) str (V) str maybe then (I) str many then} /first-name {/name load executeonly} /last-name {/name load executeonly} /name-part {/personal-part load executeonly /_ load executeonly then /last-name load executeonly then /_ load executeonly then /opt-suffix-part load executeonly then /EOL load executeonly then /personal-part load executeonly /_ load executeonly then /name-part load executeonly then alt} /town-name {/name load executeonly} >>
stack:
-dict-
17
apt-num
house-num
street-address
state-code
personal-part
postal-address
street-name
opt-suffix-part
zip-code
opt-apt-num
initial
zip-part
roman-numeral
first-name
last-name
name-part
town-name
{{{-array- exec +is-ok {second forcexs x-xs -array- exec +is-ok {second x-xs 3 1 roll {append} exec success} {x-xs 3 2 roll ( after ) exch cons exch cons cons} ifelse} if} exec +is-ok {second forcexs x-xs -array- exec +is-ok {second x-xs 3 1 roll {append} exec success} {x-xs 3 2 roll ( after ) exch cons exch cons cons} ifelse} if} exec}
stack:
[/OK [[(M) (r) (.) ( ) (luser) ( ) (droog) ( ) (I) (\n) (2) (3) (5) (7) ( ) [(Streetname)] (\n) (A) (n) (y) (t) (o) (w) (n) (,) ( ) (S) (T) ( ) (0) (0) (0) (0) (0) (\n)] {3 0 () string-input}]]
0 new messages