Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A minimal LL(1) parser generator ?

136 views
Skip to first unread message

Andy

unread,
Dec 22, 2019, 11:17:44 AM12/22/19
to
ANTLR has even LL(*) but is too complicated. I am searching maximal
simple and elegant generator which generates function call like
written by hand.

Temporary goal: must be very simple, bacause this will not parse
program/document, just only strings with my definitions of (augmented)
regular expressions. I can write it by hand, but generator minimizes
possibility making mistake and systemztize work.

Aharon Robbins

unread,
Dec 22, 2019, 7:04:42 PM12/22/19
to
In article <19-1...@comp.compilers>, Andy <borucki...@gmail.com> wrote:
>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand. ...


https://github.com/arnoldrobbins/stacc may be of interest.

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com

Anton Ertl

unread,
Dec 26, 2019, 12:12:41 PM12/26/19
to
Andy <borucki...@gmail.com> writes:
>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand.

Sounds like you want a generator that generates a recursive-descent
parser. A while ago I compared a number of parser generators
[ertl99ef], and among those that generate recursive-descent parsers
the shortest one by far was Gray
<http://www.complang.tuwien.ac.at/forth/Gray5.zip>. Whether you
consider it simple and elegant is for you to decide.

@InProceedings{ertl99ef,
author = {M. Anton Ertl},
title = {Is {Forth} Code Compact? {A} Case Study},
booktitle = {EuroForth'99 Conference Proceedings},
year = {1999},
address = {St. Petersburg, Russia},
url = {http://www.complang.tuwien.ac.at/papers/ertl99ef.ps.gz},
abstract = {Forth advocates often claim that Forth code is
smaller, faster, and requires less development time
than equivalent programs in other languages. This
paper investigates this claim by comparing a number
of parser generators written in various languages
with respect to source code size. The smallest
parser generator (14 lines) in this comparison is
written in Forth, and the other Forth program is
smaller than the others in its class by a factor of
8 or more; however, the Forth programs do not have
all the features of their counterparts. I took a
closer look at Gray (in Forth) and Coco/R (in
Modula-2) and found that several Forth features
missing from Modula-2 give Gray more than a factor
of three advantage over Coco/R (even if the other
size differences were solely due to differences in
functionality): run-time code generation; access to
the parser and a simple, flexible syntax; and
Forth's dictionary.}
}

Concerning the other question that came up, about dealing with left
recursion, and building the abstract syntax tree (AST) appropriately,
that part works nicely in Gray:

In BNF you would write

expr->expr '-' number
expr->number

(and the parse tree would have the same structure as the AST).

Gray uses a variant of regular right part grammars, and you would
write that as a pure parsing rule as follows:

(( number (( '-' number )) ** )) <- expr

For buiding the AST, let's assume that "number" leaves a tree node on
the stack, and we have a word

new-bin-node ( node1 node2 -- node3 )

(i.e., it pops node1 and node2 from the stack, and pushes node3).
Then we can extend the above as follows:

(( number (( '-' number {{ new-bin-node }} )) ** )) <- expr ( -- node )

(The "( -- node )" comment documents the stack effect of parsing "expr").

Passing the nodes on Forth's stack rather than using pseudo-variables
like yacc's $1 etc. works nicely when going beyond BNF. Now I wonder
how Algol-based EBNF parser generators do it, but am too lazy to
actually research it.

Getting a right-recursive tree is actually slightly harder: Now we
actually have to perform recursion (non-recursive variants exist, but
are not simpler). Let's have a similar example, but with the
right-associative operator '^'.

nonterminal expr ( -- node ) \ declare before use
(( number (( '^' expr {{ new-bin-node }} )) ?? )) expr rule

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

carlgl...@gmail.com

unread,
Dec 30, 2019, 11:29:40 AM12/30/19
to
Lightweight and interesting, this Gray parser-generator.

Example:

Gray's meta notation uses a postix operator "&&" for "separator list".

"&&" seems particularly elegant and interesting:

If Gray: "a b &&" ---> EBNF: "a { b a }"

then I assume:

"a b && c &&" ---> "a { b a } { c a }"

and so on ...

Further "a b && b && ..." ===> "a b &&"

??? Is my understanding correct?

Carl
----

gazt...@gmail.com

unread,
Dec 31, 2019, 11:28:31 AM12/31/19
to
On Sunday, December 22, 2019 at 8:17:44 AM UTC-8, Andy wrote:
> ANTLR has even LL(*) but is too complicated. I am searching maximal
> simple and elegant generator which generates function call like
> written by hand. ...

https://www.codeproject.com/Articles/5162249/How-to-Make-an-LL-1-Parser-Lesson-1

I wrote a lesson on building one in 3 steps

Anton Ertl

unread,
Dec 31, 2019, 11:42:51 AM12/31/19
to
carlgl...@gmail.com writes:
>Gray's meta notation uses a postix operator "&&" for "separator list".
>
>"&&" seems particularly elegant and interesting:
>
>If Gray: "a b &&" ---> EBNF: "a { b a }"
>
>then I assume:
>
>"a b && c &&" ---> "a { b a } { c a }"

The left operand of the second && would be "a b &&", so the whole
thing would be equivalent to

a {b a} {c {a {b a}}}

a (( b || c )) &&

would parse the same strings, but with different parse trees; so if
you add actions, the results would probably be different.

>and so on ...
>
>Further "a b && b && ..." ===> "a b &&"

They describe the same strings, yes. However, the former has an
ambiguous grammar that allows to parse "aba" in two ways: either the b
is the first b or the second one. In Gray, this leads to a conflict
(it just can deal with LL(1) grammars), and I guess one of these
parsing options never be exercised, which means that it works as if
you had written "a b &&" in the first place.

carlgl...@gmail.com

unread,
Jan 1, 2020, 12:33:46 PM1/1/20
to
On Tuesday, December 31, 2019 at 8:42:51 AM UTC-8, Anton Ertl wrote:
> carlgl...@gmail.com writes:
> >Gray's meta notation uses a postix operator "&&" for "separator list".
...
> >If Gray: "a b &&" ---> EBNF: "a { b a }"
...
> >then ...
...
> >"a b && c &&"
>
> The left operand of the second && would be "a b &&", so the whole
> thing would be equivalent to
>
> a {b a} {c {a {b a}}}
...

Should that be ?
a {b a} { c a {b a}}

Also, would
a b && c d &&
parse as:
a { b a } c { d c }
or as:
a { b a } c { d a { b a } c }

???

Carl

honey crisis

unread,
Jan 2, 2020, 1:21:45 PM1/2/20
to
On Sunday, December 22, 2019 at 8:17:44 AM UTC-8, Andy wrote:
> I am searching maximal simple and elegant generator which generates function call like written by hand.

I designed Parsley to do exactly that.

i don't know how minimal it is at this point, but I've been using the LL1.cs file in it in several of my parser projects to generate parse tables. Maybe it will help.

https://www.codeproject.com/Articles/5255160/Parse-Anything-with-Parsley-A-Different-Approach

Anton Ertl

unread,
Jan 2, 2020, 1:21:53 PM1/2/20
to
carlgl...@gmail.com writes:
>On Tuesday, December 31, 2019 at 8:42:51 AM UTC-8, Anton Ertl wrote:
>> carlgl...@gmail.com writes:
>> >Gray's meta notation uses a postix operator "&&" for "separator list".
>...
>> >If Gray: "a b &&" ---> EBNF: "a { b a }"
...
>> >"a b && c &&"
...
>Should that be ?
>a {b a} { c a {b a}}

Yes.

>Also, would
> a b && c d &&

I assume you mean a concatenation of "a b &&" and "c d &&". In Gray
syntax that would be

(( a b && c d && ))

>parse as:
> a { b a } c { d c }

Yes.

>or as:
> a { b a } c { d a { b a } c }

If you want that, you would write it as

(( a b && c )) d &&

&& is just a binary postfix operator.

George Neuner

unread,
Jan 2, 2020, 1:22:21 PM1/2/20
to
A little late to the discussion, but ...


On Sat, 21 Dec 2019 19:07:55 -0800 (PST), Andy
<borucki...@gmail.com> wrote:

>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand.

There are a number of decent tools that will generate recursive
descent code, but you won't find any that will produce code as tight
as custom hand written.


You neglected to mention what programming language you want to target.
However:

PCCTS is the ancestor of ANTLR. It is LL(k) for configurable k. It
produces C or C++ code for the parser. You need to supply a lexer
separately.

Coco/R is LL(k). There are versions available for several languages,
including C, C++, Java, Delphi, VB, etc. Like ANTLR, Coco/R also can
generate a lexer.



In my experience, PCCTS produces tighter code than ANTLR. As long as
you don't need the flexibility of LL(*) - or need a lexer generated,
or to target a language other than C or C++ - then PCCTS may be the
better choice.

If you need one stop shopping for RD coded parsers and lexers, then
you do need something like ANTLR or Coco/R which can do both.


>Temporary goal: must be very simple, because this will not parse
>program/document, just only strings with my definitions of (augmented)
>regular expressions. I can write it by hand, but generator minimizes
>possibility making mistake and systemztize work.

If you are working with individual strings rather than a multi-string
"document", then perhaps you really only want a lexer generator.

Perhaps something like RE2C which takes regex patterns and generates
code (in C) to recognize them. The recognizers are FA, but the code
is generated inline in the function that needs it.

Or boost::spirit, perhaps in combination with boost::regex (or
std::regex in C++11 or later). Spirit has a fairly steep learning
curve also, but it has the advantage that it requires only the C++
compiler and does not need a separate tool.


Hope this helps,
George

rockbr...@gmail.com

unread,
Jan 4, 2020, 9:19:10 PM1/4/20
to
On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
> ANTLR has even LL(*) but is too complicated. I am searching maximal
> simple and elegant generator which generates function call like
> written by hand.

A large set of parsers are lined up in the parser generator comparison on
Wikipedia here
https://en.wikipedia.org/wiki/Comparison_of_parser_generators

The question of who in the list does bona fide code synthesis (as opposed to
cookie-cutter code generation) is not directly addressed, as far as I can see.
But the items can be reviewed individually.

Perhaps you will end up writing a parser generator that does this. The obvious
"direct map" method is
Draft Version:
state = goto label
state transition = goto
general format:
Label:
action
goto(s) - either single goto or branched gotos

Synthesized Version:
control flow structures are synthesized from this.

The transformation of jump tables to control flow structures is essentially
the same process as the transformation of finite state automata to regular
expressions. An intelligent transformation process will leave some goto's in
place in order to prevent the introduction of redundancy.

If the generated parser is linked to the parser specification, the
transformation has to be able to correctly inject "#line" directives.
Alternatively, a better method is NOT to inject "#line" directives, but
comments that interweave displays of the original code that the transformation
took place from.

honey crisis

unread,
Jan 5, 2020, 3:29:01 PM1/5/20
to
> Synthesized Version:
> control flow structures are synthesized from this.


Do you think this is something that's sought after more generally?

I've considered it for my parser generators but the problem i run into is a
kind of "state explosion" wherein since i can't freely branch, i can't
converge paths, so each dysjunction can only create more dysjunctions.
Basically, it ends up copying code to fit it inside whiles and ifs.

carlgl...@gmail.com

unread,
Jan 5, 2020, 3:29:50 PM1/5/20
to
On Thursday, January 2, 2020 at 10:21:53 AM UTC-8, Anton Ertl wrote:
...
> (( a b && c d && ))

1. For concatenation of these, I would write, in Gray:
((a b &&))((c d & &&)) for EBNF: a { b a } c { d c }
because wouldn't
((a b && c d & &&))
be the same as?
a { b a } c { d a { b a } c }

2. Let us compare separator-list notation.

In Waite and Goos, "Compiler Construction", there is an infix operator, "||", for separator list, not to be confused with Gray's meta-symbol for "or".
And Eiffel has a special notation for separator list.

In comparison, Anton's Gray has a postfix operator, "&&", for separator-list.

/Comparison of separator-list notation in
Waite/Goos EBNF, Eiffel EBNF (BNF-E), and Gray EBNF/

Wirth EBNF Gray EBNF:
-------------------------------------------------------------------
Waite/Goos: x || y ==========> x { y x } x y &&

In Eiffel: { x y ...}* ===> [ x { y x } ] (( x y && )) ??
{ x y ...}+ ===> x { y x } x y &&
-------------------------------------------------------------------

What is better about Gray is that it only uses braces in action semantics,
in doubled form {{ ... }}, and no one is likely to be confused by the use of
braces.

Also it's postfix separator-list operator, "&&", is easier to remember than
Eiffel's more verbose special variation of standard EBNF braces.

And Gray's "&&" separator-list operator is just another postfix op, consistent with it's use of "??", "**" and "++"

Gray better separates the 2 styles of notation, it's "regular expression" style versus standard bracketed EBNF style.

The 2 styles ideally should not be mixed, or at least confusion should be avoided by not introducing yet another similiar notation, in the form of a special variation of standard EBNF braces for separator lists.

One minor criticism of Gray:

I do wish Gray had a production rule terminator, say semicolon ";" or stop "."

The end of 1 rule would be clearly distinguishable from the beginning of the next without special processing of end-of-line which makes scanning and
parsing space sensitive. We would avoid the need for LL(2) parsing or other
parsing and scanning techniques.

Carl
----

carlgl...@gmail.com

unread,
Jan 6, 2020, 9:37:47 AM1/6/20
to
correction:
((a b &&))((c d & &&)) for EBNF: a { b a } c { d c }
should be:
((a b &&))((c d &&)) for EBNF: a { b a } c { d c }

and
((a b && c d & &&))
should be:
((a b && c d &&))

Don't know how those extraneous single "&" got there!

Carl
---

carlgl...@gmail.com

unread,
Jan 6, 2020, 9:37:56 AM1/6/20
to
That table didn't align properly, so lets present the same results this way:
-------------------------------------------------------------------------
Waite/Goos: x || y ===> Wirth EBNF: x { y x } ===> Gray: x y &&
-------------------------------------------------------------------------
Eiffel: { x y ...}* => Wirth EBNF: [ x { y x } ] => Gray: (( x y && )) ??
-------------------------------------------------------------------------
Eiffel: { x y ...}+ => Wirth EBNF: x { y x } => Gray: xy &&
-------------------------------------------------------------------------
Carl
----

Christopher F Clark

unread,
Jan 6, 2020, 9:39:06 AM1/6/20
to
rockbr...@gmail.com wrote:
> The question of who in the list does bona fide code synthesis (as opposed to
> cookie-cutter code generation) is not directly addressed, as far as I can see.
> But the items can be reviewed individually.

> Perhaps you will end up writing a parser generator that does this. The obvious
> "direct map" method is
> Draft Version:
> state = goto label
> state transition = goto
> general format:
> Label:
> action
> goto(s) - either single goto or branched gotos

> Synthesized Version:
> control flow structures are synthesized from this.

While I must admit that I am not sure I understand your question (your
conception of how automata work is too different than mine for me to
readily understand your points), if you are asking what I think you
are asking, we solve your question a slightly different way.

So, let me rephrase your question a little bit. When you come to the
end of a production, you have a reduce action and that causes the
popping of a non-terminal from the stack. Depending upon what you
pop, the state machine transitions to another state. This is normally
coded as the "goto table" in standard LR constructions. But the goto
table is in fact essentially a jump table, just as the shift table is
a jump table on incoming tokens.

If that is a correct interpretation of your question, then we do
something different for that, and what we do is similar to a code
generation process, and in fact, we have a version that generates the
code is if-then-else and/or case statements as appropriate, to
completely encode the tables in the target programming language. Not
recursive ascent, but more a direct implementation of the FSA as code.

So, first our LR construction is non-standard. We not only keep
tokens on our parser stack, but when we reduce a production, we put
the non-terminal itself on the stack. Thus, in a state, you might see
either a token or a non-terminal as the item you are processing. So,
the shift-table and the goto-table are merged into one table.

Now, to make that work, we don't encode what to do in the state
itself. There are no goto states. No reduce states. No accept
states. Etc. A state is simply a jump table that implements "see this
on the stack and take this transition". The transition itself encodes
what the action should be. I think this difference is similar to the
difference in Mealy and Moore machines. One encodes the output in the
state, the other on the transition (and you can transform one to the
other). I just happen to like my encoding to be all on transitions,
with states just as tables.

And we have an "assembly" language of actions that a transition can
invoke. Thus, normally, upon seeing a token, the actions are "shift;
goto state n;" two assembly language steps. At the beginning of a
non-terminal (and only then) do we want to push something on the stack
(this is similar to what Mule does and I think similar to your bracket
notation, although I have not entirely figured that out), so if the
token is the first token in a rule, it does three assembly language
steps, "push; shift, goto state n". Similarly if there is an
output/action associated with the transition, that will be added to
the assembly language sequence "shift; do act m; goto state n;". I
am not going to detail all the possible actions. There are more than
one would expect because it allows us to implement a variety of
things. (I am currently considering making the machine deal with
captures and backreferences. That is simply some more actions in the
assembly language.)

In any case, hopefully, you can imagine how we simply invoke those
actions in a switch/case statement as the steps (macro calls, function
calls, or by expanding such code inline) for that case. You can then
decide whether you want two nested switch/case statements (one on
state, one per state on input) to encode them or to merge them or
generate them as code. (At Intel, I used a similar technique to
compress the states as many states have only a few cases to consider.
So some states want to be a jump table, but many (most) states in a
FSA are simply check for one or two items and if not found fail. You
can often turn than into linear code.) There are numerous choices
that we implement as various table generation options. The internal
model of jump tables with transitions that have specific actions does
not limit that.

Best regards,
Chris

******************************************************************************
Chris Clark email: christoph...@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Ben Hanson

unread,
Jan 7, 2020, 2:48:49 PM1/7/20
to
On Monday, 6 January 2020 14:39:06 UTC, Christopher F Clark wrote:
> (I am currently considering making the machine deal with
> captures and backreferences. That is simply some more actions in the
> assembly language.)

I can confirm that this works well. Adding the ability to search using a grammar rather than simply match also opens up a whole other world of possibilities.

Regards,

Ben

Anton Ertl

unread,
Jan 22, 2020, 4:10:15 PM1/22/20
to
carlgl...@gmail.com writes:
[Applied the corrections from later postings]
>On Thursday, January 2, 2020 at 10:21:53 AM UTC-8, Anton Ertl wrote:
>...
>> (( a b && c d && ))
>
>1. For concatenation of these, I would write, in Gray:
> ((a b &&))((c d &&)) for EBNF: a { b a } c { d c }
>because wouldn't
> ((a b && c d &&))
>be the same as?
> a { b a } c { d a { b a } c }

In Gray, you cannot concatenate x and y by writing them adjacent to
one another, so

a b && c

are just two unrelated grammar expressions: "a b &&" and "c". If you continue that with

a b && c d

you now have three unrelated grammar expressions. Now apply the &&
operator:

a b && c d &&

and you have two unrelated grammar expressions: "a b &&" and "c d &&".
In order to combine them into a concatenatio, you normally surround
them with (( ... )):

(( a b && c d && ))

(or alternatively, you use the binary postfix operator "concat"). Likewise,

(( a b && )) (( c d && ))

would be two unrelated grammar expressions.

Also, note that each word (including grammar operators) has to be
separated from adjacent words with white space.

>2. Let us compare separator-list notation.
>
>In Waite and Goos, "Compiler Construction", there is an infix operator, "||", for separator list, not to be confused with Gray's meta-symbol for "or".
>And Eiffel has a special notation for separator list.
>
>In comparison, Anton's Gray has a postfix operator, "&&", for separator-list.
>
>/Comparison of separator-list notation in
>Waite/Goos EBNF, Eiffel EBNF (BNF-E), and Gray EBNF/
>
> Wirth EBNF Gray EBNF:
>-------------------------------------------------------------------
>Waite/Goos: x || y ==========> x { y x } x y &&
>
>In Eiffel: { x y ...}* ===> [ x { y x } ] (( x y && )) ??
> { x y ...}+ ===> x { y x } x y &&
>-------------------------------------------------------------------

(( x y && )) ??

is equivalent to

x y && ??

>I do wish Gray had a production rule terminator, say semicolon ";" or stop "."
>
>The end of 1 rule would be clearly distinguishable from the beginning of the next without special processing of end-of-line which makes scanning and
>parsing space sensitive. We would avoid the need for LL(2) parsing or other
>parsing and scanning techniques.

Gray just uses Forth's parser (which just scans for white space, i.e.,
as far as the parser is concerned, it's a regular, not a context-free
language). The rest works by putting grammar expressions on the
stack, and having grammar operators that take grammar expressions from
the stack and put the resulting grammar expression on the stack;
that's why the operators are postfix. The

(( ... || ... || ... ))

syntax is syntactic sugar for a more readable syntax than using the
binary postfix operators "concat" and "alt". If && had been part of
the original concept instead of an afterthought that demonstrates the
extensibility, maybe I would have had syntactic sugar for that, too.
Even as a Forth programmer, I admit that binary postfix operators are
not very readable when expressions with three or more binary operators
are involved (which is rare in most programming, but pretty common in
grammars).

Back to the question of production rule terminators: As a result of
the above, a rule typically reads

(( ... )) <- nonterminal

so it's pretty clear to the system and the programmer where a rule
starts and where it ends. However, there is little error checking
involved, and if you write

(( ... (( ... )) <- nonterminal

(i.e., if you forget to write all the necessary closing "))"), you
will get stuff left over on the stack and the nonterminal will not do
what you want. While the compiler does not flag that as error, this
error is easy to notice and find in testing on typical Forth systems
(and it's something you could also get with other Forth code).

Anton Ertl

unread,
Jan 23, 2020, 10:28:51 AM1/23/20
to
rockbr...@gmail.com writes:
>On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
>> ANTLR has even LL(*) but is too complicated. I am searching maximal
>> simple and elegant generator which generates function call like
>> written by hand.
>
>A large set of parsers are lined up in the parser generator comparison on
>Wikipedia here
>https://en.wikipedia.org/wiki/Comparison_of_parser_generators
>
>The question of who in the list does bona fide code synthesis (as opposed to
>cookie-cutter code generation) is not directly addressed, as far as I can see.
>But the items can be reviewed individually.

It's unclear to me what you mean with those two terms. "Like written
by hand" is somewhat clearer in that I don't write code manually that
automata-based generators generate. The code generated by generators
of recursive-descent parsers should look more like hand-written code;
especially if you optimize for that. In general, though there are a
number of reasons why the code will deviate from that ideal:

* The interface to the scanner is narrow and leads to stylized code
where a hand-written parser might make use of knowledge of the
scanner (i.e., use a wider interface).

* Some things are simpler for a human, and others are simpler for a
code generator, so if one does not optimize for looking like code
written by a human, the result will look different.

As an example, consider the following rule in Gray:

nonterminal expr
...
(( (( term
|| "-" term {{ 0 swap - }} ))
(( "+" term {{ + }}
|| "-" term {{ - }} )) **
)) expr rule ( -- n )

(This is for a simple calculator).

Gray generates code for the rule that Gforth decompiles as:

noname :
<term+$D18> testsym
IF <term+$A8> @ execute
ELSE <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$158>
THEN
BEGIN <term+$D58> testsym
WHILE <")"+$A0> testsym
IF <")"+$A0> testsym ?readnext <term+$A8> @ execute <term+$428>
ELSE <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$638>
THEN
REPEAT ;

You see control structures similar to what a human would write. The
|| results in an IF...ELSE...THEN, the ** in a BEGIN...WHILE...REPEAT.
That part is fairly straightforward and does not need a detour through
state machines (at least as long as you stay with LL(1) grammars).

But you also see that the word has no name and lots of the words it
calls have no name, either (all the <...+$...> things are addresses of
unnamed words); that's because it's easier for the generator to deal
with addresses than to generate new names. A human would use names
instead.

You also see occurences of testsym and ?readnext that are the
interface to the scanner.

carlgl...@gmail.com

unread,
Jan 23, 2020, 10:29:11 AM1/23/20
to
Anton, I now see that, in Gray, the precedence of "&&" is higher than every other operator.

So Gray EBNF:
a b && c d && e f && <- x

is equivalent to Wirth EBNF:

x = a { b a } c { d c } e { f e } .

Likewise:

a b && ? <- x

is
x = [ a { b a } ] .

Carl

Anton Ertl

unread,
Jan 25, 2020, 2:05:57 PM1/25/20
to
carlgl...@gmail.com writes:
>Anton, I now see that, in Gray, the precedence of "&&" is higher than every other operator.

Among the postfix operators, precedence is determined by the way they
are written down. But yes, if you see (( a b || c d )) as having two
infix operators, || and the empty operator, then you can write a
precedence hierarchy:

postfix
empty
||

>So Gray EBNF:
>a b && c d && e f && <- x

Correctly written as

(( a b && c d && e f && )) <- x

>is equivalent to Wirth EBNF:
>
>x = a { b a } c { d c } e { f e } .

Yes.

>Likewise:
>
>a b && ? <- x

a b && ?? <- x

>is
>x = [ a { b a } ] .

Yes.

Fred J. Scipione

unread,
Jan 25, 2020, 3:43:08 PM1/25/20
to
In article <20-0...@comp.compilers>, an...@mips.complang.tuwien.ac.at
says...
>
> rockbr...@gmail.com writes:
> >On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
> >> ANTLR has even LL(*) but is too complicated. I am searching maximal
> >> simple and elegant generator which generates function call like
> >> written by hand.
> >
> >A large set of parsers are lined up in the parser generator comparison on
> >Wikipedia here
> >https://en.wikipedia.org/wiki/Comparison_of_parser_generators
> >
> >The question of who in the list does bona fide code synthesis (as opposed to
> >cookie-cutter code generation) is not directly addressed, as far as I can see.
> >But the items can be reviewed individually.
>
> It's unclear to me what you mean with those two terms. "Like written
> by hand" is somewhat clearer in that I don't write code manually that
....<snipped>...

To the original poster: For "like written by hand" you might look at
RDP (recursive descent parser generator) -
<http://www.cs.rhul.ac.uk/research/languages/projects/rdp.html>

RDP provides sources for the generator and support libraries for lexing
and parsing. Sources are in ANSI C and produce parsers in ANSI C
(including the RDP parser, of course).

The authors have not been looking for collaborators, but I have found
the project easy to modify for my personal 'improvements'. They include
some extensions to the RDP grammar and producing 'switch' statements and
custom 'for', 'while', and 'do ... while' loops where possible in place
of the authors generic 'if ... else' chains and
'while(1)... if() break;' constructs. I also made the generated calls
to library function uses macros, so that it would be easier to use
custom replacements where additional functionality was needed (e.g.
filters on symbol table searching). The results are much closer to
"like written by hand".

One drawback to the RDP generated code that you might want to improve
is a way to give the generated symbol sets meaningful names in place
of the current auto-generated generic names (RDP001[], RDP002[], etc.).
0 new messages