Are compiler developers light-years ahead of other software development?

Roger L Costello

unread,

Jan 16, 2022, 12:26:38 PM1/16/22

to

Hello Compiler Experts!

I am learning the algorithms and theory underlying parsing. Wow! I am
discovering that parsing has such a rich theoretical foundation: grammars, LL,
LR, first() function, following() function, parsing tables, etc.

In all the software that I have written, I am lucky if I can use one or two
algorithms that have a theoretical foundation. Mostly, the software is one-off
code.

The mathematician Alfred North Whitehead has this wonderful phrase [1]:

... the slow accumulation of clear and relevant ideas.

The parsing community has done that - the community has slowly accumulated
clear and relevant ideas.

I can't think of any other branch of computer science where there is such a
rich foundation upon which to develop software.

Are compiler developers light-years ahead of other software development?

/Roger

[1] An Introduction to Mathematics by Alfred North Whitehead, p. 19.

Philipp Klaus Krause

unread,

Jan 16, 2022, 5:27:36 PM1/16/22

to

On 16.01.22 15:36, Roger L Costello wrote:
> I can't think of any other branch of computer science where there is such a
> rich foundation upon which to develop software.

There are others. But a compiler is something that most softare
developers use, and thus are aware of.

gah4

unread,

Jan 17, 2022, 10:15:44 PM1/17/22

to

On Sunday, January 16, 2022 at 9:26:38 AM UTC-8, Roger L Costello wrote:
> Hello Compiler Experts!

> I am learning the algorithms and theory underlying parsing. Wow! I am
> discovering that parsing has such a rich theoretical foundation: grammars, LL,
> LR, first() function, following() function, parsing tables, etc.

(snip)

> I can't think of any other branch of computer science where there is such a
> rich foundation upon which to develop software.

Thinking about D.E.Knuth's "The Art of Computer Programming", there is a whole
book (volume 3), "Sorting and Searching". A large fraction of important
algorithms rely on the foundation of search algorithms. (And if you sort, you can
use binary search to find thinks.)

Now, some of the book is based on the now lost art of sorting data on
magnetic tape, where one has only sequential access to most of the data.

Others will tell you that Quicksort is all you need to know, and forget
the rest of the book.

In any case, I vote for sorting and searching algorithms as the rich theoretical
foundation of much of CS. Hash tables are in important part
of many compilers. Dynamic programming algorithms are used in
many optimization problems, and fundamental to many algorithms.

The dynamic programming algorithm used by the Unix diff command,
to find an optimal set of differences between two files, originated
from biologists comparing protein sequences.

Anton Ertl

unread,

Jan 19, 2022, 6:42:30 PM1/19/22

to

Roger L Costello <cost...@mitre.org> writes:
>I am learning the algorithms and theory underlying parsing. Wow! I am
>discovering that parsing has such a rich theoretical foundation: grammars, LL,
>LR, first() function, following() function, parsing tables, etc.
>
>In all the software that I have written, I am lucky if I can use one or two
>algorithms that have a theoretical foundation. Mostly, the software is one-off
>code.

...

>Are compiler developers light-years ahead of other software development?

There are multiple facets to this question:

1) Compilers are decades ahead of much other software development: If
you look at early CS curricula (I have looked at one from IIRC 1965),
you see compilers and a few other topics, but many of the topics that
now fill the curricula did not even exist then. E.g., BNF was
invented before Algol 60 was published in 1960, while the relational
model for data bases is from 1970, the concept of NP-completeness is
from 1971, and RISC architectures (in the form of the IBM 801) were
invented in the mid-1970s. And if you look at currently hot topics
like deep learning, cloud computing and IoT, their practical relevance
is much younger (although I guess you can point to 1960s work for each
of them that points in that direction).

2) As I have written some time ago:

|Compilers are ideal software projects: You have relatively stable and
|well-defined, partly even formal specifications for the input and the
|output, and lots of interesting subproblems in between.
|
|In particular, the scanning part of compilers has been formalized
|using regular languages and is expressed in regular expressions, which
|have become not only popular for scanning programming languages, but
|for many other tasks as well. The parsing part has been formalized
|with context-free grammars in (extended) BNF and that has led to the
|development of parser generators that are quite popular in language
|implementations.

There is also similar formalization in instruction selection (although
my impression is that generators are not as widely used there as in
the front end).

However, in between things are more like more common programming, even
though people have tried to propagate the formalization success in
these areas, too.

And I think that points to why we don't see that much of that kind of
formalization in most other areas: context-free grammars are something
that most programming language designers want to use, so developing
methods for parsing such languages and putting them in parser
generators has a significant payoff, while a formal semantics becomes
so inflexible that most programming language designers prefer to stick
to more or less informal semantics (in some cases with some parts
formalized), and then of course you implement the semantics with a
less formal approach, too. And for most other software, it's similar.

3) The love of formal stuff can lead you astray. E.g., the first
Fortran compiler did not report syntax errors.

Of course, mistakes also happen in less formalized software; my point
is that if you focus your attention on the nice formal stuff, you can
easily overlook that your software has other aspects, too, that you
need to cover.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I am reasonably sure the original Fortran compiler reported syntax
errors. A 1957 paper at bitsavers says it did:
http://bitsavers.org/pdf/ibm/704/FORTRAN_paper_1957.pdf
-John]

Christopher F Clark

unread,

Jan 20, 2022, 3:15:37 PM1/20/22

to

Sadly, I would like to offer an alternate perspective on how far
compilers are than other areas of software development.

Yes, LL and LR parser generators have been around for decades. My
compiler class in the late 1970s had me implement both of them.
However, shortly after that parser technology was considered basically
a settled problem, while in actuality many parser generators are hard
to use and people don't understand it when they output state machines
and conflict messages freak compiler writers out, etc.

As a result, despite all the technology we could be bringing to bare
on compilers, as far as I can tell, more than half the compilers in
actual use are done via hand-written recursive descent with little
hacks here and there to get them to work. The prevalence of "ad hack"
compilers should be embarrassing to anyone who wants to talk about
software "engineering".

There is good theory we could put to use building better compilers and
more reliable languages. However, most of it is languishing and going
to waste.

At least that's what this compiler developer sees when he looks at the
landscape. But maybe I am biased in that perspective....

******************************************************************************
Chris Clark email: christoph...@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[There's a definite "too busy chopping down trees to sharpen our axes" vibe
here. -John]

Anton Ertl

unread,

Jan 21, 2022, 5:30:42 PM1/21/22

to

Christopher F Clark <christoph...@compiler-resources.com> writes:
>Yes, LL and LR parser generators have been around for decades. My
>compiler class in the late 1970s had me implement both of them.
>However, shortly after that parser technology was considered basically
>a settled problem, while in actuality many parser generators are hard
>to use and people don't understand it when they output state machines
>and conflict messages freak compiler writers out, etc.

This points to one reason why some people don't use these generators.
You need to know quite a bit about the implementation technique behind
it to understand what (for LR-based parser generators) a shift-reduce
or reduce-reduce conflict means and how to fix that problem.

>As a result, despite all the technology we could be bringing to bare
>on compilers, as far as I can tell, more than half the compilers in
>actual use are done via hand-written recursive descent with little
>hacks here and there to get them to work.

The "little hacks" point to another reason: How to deal with cases
where LL or LR technology, or (probably more often) context-free
grammars don't quite fit, e.g., when you want to do a compatible
extension of an existing language, like C++ (which was intended to be
compatible with C). My impression is that Terrence Parr attacked this
problem head-on by formalizing these hacks (to some extent) with
syntactic and semantic predicates in PCCTS/ANTLR.

There is also the problem of interfacing between the parser and the
semantic analysis and back end of the compiler. The yacc interface is
not that great, but for some reason improvements such as ox have not
taken over the world (maybe because ox is based on attribute grammars
which is quite a leap for imperative programmers).

Finally, a reason is that some people don't want to depend on software
written by others. E.g., Wirth apparently considered it simpler to
write a recursive-descent parser by hand than to write a parser
generator and use that for implementing a parser. And he even wrote a
compiler text book that teaches this approach.

gah4

unread,

Jan 21, 2022, 10:25:15 PM1/21/22

to

On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:

(snip)

> This points to one reason why some people don't use these generators.
> You need to know quite a bit about the implementation technique behind
> it to understand what (for LR-based parser generators) a shift-reduce
> or reduce-reduce conflict means and how to fix that problem.

Interesting reason, as I know exactly how I learned about that.
(And not from compiler class, where I might have.)

In compiling a program, not actually changing it, the instructions warn
that it will get a shift-reduce conflict warning, and to ignore it.

There is a whole separate thread on ambiguous languages, which is
where these come from. In a language with an if-then-optional else
construct, and which can be nested, there is an ambiguity in which
if an else belongs to. In most such languages it goes to the nearest
(deepest nesting) if, and this is the default from the warning.

reduce-reduce results from an actual mistake in the language,
and does need to be fixed.

I believe the program was Kermit, but I am not so sure now the
details. I believe that it has an implementation language that
translates to C, using a lex/yacc parser. The Makefile does
it all for you, but you have to know to ignore the message.

Kaz Kylheku

unread,

Jan 21, 2022, 10:26:51 PM1/21/22

to

On 2022-01-16, Roger L Costello <cost...@mitre.org> wrote:
> Are compiler developers light-years ahead of other software development?

I'd say it's pretty impressive how, say, the Google Assistant can speak
completely naturally (and in multiple languages). I mean, details like
intonation across entire sentences and such; nothing "robot-like".

Fantastic advances have been done in computer graphics in the last 30
years.

Compilers don't all use fancy algorithms, or not all of them. Fancy
algorithms are specialized, optimized (in particular ways) solutions to
problems that have other solutions that are pedestrian. Sometimes the
other solutions are actually faster on your input cases.

None of the buzzwords you mentioned related to parsing are needed in
building a compiler: grammars, LL, LR, first() function, following()
function, parsing tables, etc.

All that goes away if you write recursive descent parser. The grammar is
then in the documentation and code comments only, and somewhat expressed
in its code structure. There is no LR, first or follow, no tables.

The GNU C++ compiler, undeniably a production compiler and a
major/known/widely-used one, has a big recursive-descent parser
maintained by hand: over a megabyte of code.

In other words, a major compiler for probably the programming language
with the most complicated syntax ever, eschews pretty much all that we
have learned and accumulated about parsing between around 1968 and now.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[In principle a r-d parser recognizes an LL(1) grammar, in practice people
often cheat a little. -John]

Anton Ertl

unread,

Jan 22, 2022, 10:51:46 AM1/22/22

to

gah4 <ga...@u.washington.edu> writes:
>On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:
>> You need to know quite a bit about the implementation technique behind
>> it to understand what (for LR-based parser generators) a shift-reduce
>> or reduce-reduce conflict means and how to fix that problem.
>
>Interesting reason, as I know exactly how I learned about that.
>(And not from compiler class, where I might have.)
>
>In compiling a program, not actually changing it, the instructions warn
>that it will get a shift-reduce conflict warning, and to ignore it.

That means that the program's authors have (hopefully) determined that
the default resolution (shift) for the shift/reduce conflicts in that
program was what they intended. It does not tell you that shift
resulution is the intended behaviour for all cases (otherwise there
would be no point in reporting the conflict).

>There is a whole separate thread on ambiguous languages, which is
>where these come from. In a language with an if-then-optional else
>construct, and which can be nested, there is an ambiguity in which
>if an else belongs to. In most such languages it goes to the nearest
>(deepest nesting) if, and this is the default from the warning.

That's not the only occurence of shift/reduce conflicts, but it's a
relatively prominent one, because many compiler writers leave it to
the parser generator to resolve it. However, when developing a
grammar, I have seen cases (even for relatively small languages) with
many conflicts (both shift/reduce and reduce/reduce). I always change
the grammar to eliminate all the conflicts. Often the syntax is
changed by these changes, which is outside the scope of pure compiler
writers, but inside the scope of language designers.

Concerning the Algol 60 case, the syntax is unambiguous (from
<https://www.masswerk.at/algol60/report.htm>):

|<if clause> ::= if <Boolean expression> then
|<unconditional statement> ::= <basic statement> |
| <compound statement> | <block>
|<if statement> ::= <if clause> <unconditional statement>
|<conditional statement> ::= <if statement> |
| <if statement> else <statement> |
| <if clause> <for statement> |
| <label>: <conditional statement>

Statements such as

if B1 then if B2 then S1 else S1

and even

if B1 then if B2 then S1

are syntax errors. You cannot put an if-statement or a conditional
statement in a then-branch without wrapping it in a begin...end.

Languages like Pascal and C (and its descendants) changed this to
allow either, making the grammar ambiguous, and using prose to resolve
the ambiguity. E.g., for Pascal:

|IfStatement = "if" BooleanExpression "then" Statement [ "else" Statement ].
|
|Note: The syntactic ambiguity arising from the construct
|
|if e1 then if e2 then sl else s2
|
|is resolved by interpreting the construct as equivalent to
|
|if e1 then begin if e2 then sl else s2 end

I consider either grammar to be flawed; Algol fixed the flaw in Algol
68, Wirth fixed it in Modula, but the C-syntax descendants have
unfortunately never fixed it.

<https://en.wikipedia.org/wiki/Dangling_else#Avoiding_the_conflict_in_LR_parsers>
describes how to express the intent in the grammar rather than in
prose, but looking at the example grammars makes it clear why compiler
writers prefer to rely on the parser-generator resolution of the
shift/reduce conflict rather than expressing the intent in the
grammar.

Yacc has precedence and associativity declarations to provide
additional information to the parser generator on how to resolve
ambiguity in the BNF (and for suppressing the warnings about conflicts
that are resolved by these declarations). Something like this might
be a good solution for dealing with the dangling else if you cannot
avoid it by fixing the syntax.

>reduce-reduce results from an actual mistake in the language,
>and does need to be fixed.

There can be cases where the default behaviour is the intended
behaviour, too, but IIRC the default depends on the order of rules, so
could be destroyed by a seemingly innocuous change to the parser
generator input. And there are (probably many) cases where either
reduction is intended in some cases, so the generator default is wrong
for some cases for all rule orders.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[The original sin of dangling else may have been COBOL. A 1964 IBM
manual for 7080 COBOL says ELSE matches the closest IF. PL/I inherited
it shortly aftrwards. -John]

Anton Ertl

unread,

Jan 22, 2022, 10:59:04 AM1/22/22

to

Kaz Kylheku <480-99...@kylheku.com> writes:
>None of the buzzwords you mentioned related to parsing are needed in
>building a compiler: grammars, LL, LR, first() function, following()
>function, parsing tables, etc.
>
>All that goes away if you write recursive descent parser. The grammar is
>then in the documentation and code comments only, and somewhat expressed
>in its code structure. There is no LR, first or follow, no tables.

As our moderator notes, recursive descent is an implementation
technique for LL(1) grammars. Moreover, you need first-sets to
implement them. If your grammar is

A -> B|C

your recursive-descent parser becomes:

if next_symbol in first(B) then parse_B else parse_C fi

If you want to learn about conflicts in your grammar early on (rather
than finding it out later by having some input be parsed other than
intended), you also need follow-sets: if your grammar is

A -> B|epsilon

there is a conflict if "intersection(first(b),follow(A))" (where "U"
signifies the union) is not empty: if one of the symbols in the
intersection comes up at the start of an A, do you parse B or not?

Of course, if you are freshly designing a programming language, you
may be fine with declaring the actual behaviour of the
redursive-descent parser to be the intended behaviour. But if you
have to implement a given grammar, things become more complex.

You can also design the syntax to make recursive-descent parsing easy,
like Wirth did. E.g., almost all statements start with a keyword, and
there is AFAIK only one statement that may be started by several
symbols (none of them conflicting with one of the other statement
keywords), so you can easily implement statements like this:

if next_symbol=IF_SYM then ifstatement()
else if next_symbol=WHILE_SYM then whilestatement()
else if ...
...
else assignmentstatement()

All statements are followed by ";" or END, making it easy to know that
there are no conflicts with following statements.

These properties also make it easier for humans to understand
grammars, so they are a good idea in syntax design. If you go further
in that direction, you get to Lisp (which does not need a BNF-based
parser) or Forth (where the parser just looks for the next white
space). Of course, if you look at textbooks for these languages, you
find that you have patterns at a different ("static semantics") level,
but at least it's always clear in these language from the syntax which
if an else-branch belongs to.

>The GNU C++ compiler, undeniably a production compiler and a
>major/known/widely-used one, has a big recursive-descent parser
>maintained by hand: over a megabyte of code.
>
>In other words, a major compiler for probably the programming language
>with the most complicated syntax ever, eschews pretty much all that we
>have learned and accumulated about parsing between around 1968 and now.

C++ is the language that inspired Terrence Parr to add synactic and
semantic predicates to PCCTS/ANTLR (not sure what has been added since
I heard him talk about it in the early 1990s). So basically parser
generators like yacc were not sufficient for the problems posed by the
C++ syntax; when G++ was started in 1987, no free parser generator had
such features, so the decision to go with a hand-written parser is
understandable, and of course since then there were two good reasons
to stay there: 1) Transition costs. 2) Backwards compatibility.

Looking further back, AFAIK Cfront also used a hand-written parser;
this (together with the requirement of C compatibility) resulted in a
syntax design that was not constrained by the parser generators of the
time, but of course also a syntax that was beyond their capabilities.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[Nope, cfront used a yacc parser with a whole lot of %left and %right
declarations to make shift/reduce warnings go away. Legend says that
much odd C++ syntax is from poorly understood consequences of those
declarations. See it at
http://www.softwarepreservation.org/projects/c_plus_plus/index.html#cfront
-John]

Roger L Costello

unread,

Jan 22, 2022, 11:01:15 AM1/22/22

to

Kaz Kylheku wrote this about the C++ compiler:

> In other words, a major compiler for probably
> the programming language with the most
> complicated syntax ever, eschews pretty much
> all that we have learned and accumulated about
> parsing between around 1968 and now.

Yikes!

They ignored the rich theory and vast set of algorithms, in favor of their own
proprietary code? Why would the C++ compiler developers do such a thing?

/Roger
[My guess is that they were too busy chopping down trees to sharpen their axes. -John]

Kaz Kylheku

unread,

Jan 22, 2022, 6:50:37 PM1/22/22

to

On 2022-01-22, Roger L Costello <cost...@mitre.org> wrote:
> Kaz Kylheku wrote this about the C++ compiler:
>
>> In other words, a major compiler for probably
>> the programming language with the most
>> complicated syntax ever, eschews pretty much
>> all that we have learned and accumulated about
>> parsing between around 1968 and now.
>
> Yikes!
>
> They ignored the rich theory and vast set of algorithms, in favor of
> their own proprietary code? Why would the C++ compiler developers do
> such a thing?

Wild guess:

Suppose that there is a foo_statement which contains a bar_designator,
and something is wrong in there. They can fire up gdb, and put a simple
breakpoint on bar_designator, feed in the test case and get a call stack
in which parse_bar_designator is called by parse_foo_statement.
They can examine all the locals, and arguments up the stack.

Typically, nothing like this is easily possible with the
theoretically-based parser generation tools.

And in C++, you're likely going to be debugging parsing quite a bit.

The main parser generation tools used by GNU projects are Flex and
Bison. These tools are moving targets; especially Bison. The common
practice is to ship the generated parser.

(In a fundamental compiler project, you have to for other reasons, like
the users not having thta tool installed, because maybe they need your
compiler to build it: you want as few dependencies as possible.)

Now GCC is hacked on quite a bit and has lots of contributors. It would
be annoying to tell people "Oh, just use the generated, shipped
parsers if you're not touching the grammar; if you need to regenerate,
please use the exact version Bison X.Y.Z.".

If a parser is hand-written, that whole sort of problem goes away.

> /Roger
> [My guess is that they were too busy chopping down trees to sharpen their axes. -John]

Kaz Kylheku

unread,

Jan 22, 2022, 6:51:10 PM1/22/22

to

On 2022-01-22, Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Kaz Kylheku <480-99...@kylheku.com> writes:
>>None of the buzzwords you mentioned related to parsing are needed in
>>building a compiler: grammars, LL, LR, first() function, following()
>>function, parsing tables, etc.
>>
>>All that goes away if you write recursive descent parser. The grammar is
>>then in the documentation and code comments only, and somewhat expressed
>>in its code structure. There is no LR, first or follow, no tables.
>
> As our moderator notes, recursive descent is an implementation
> technique for LL(1) grammars. Moreover, you need first-sets to
> implement them. If your grammar is
>
> A -> B|C
>
> your recursive-descent parser becomes:

I touched on this point with "All that goes away if you write recursive

descent parser. The grammar is then in the documentation and code
comments only, and somewhat expressed in its code structure."

So it's not that we don't have a grammar or don't know that it's LL;
perhaps we do. But the tooling doesn't.

> if next_symbol in first(B) then parse_B else parse_C fi

In a given parser, there might be no conscious connection to any such
theoretical entities, even though the theory readily identifies those
elements in the resulting work.

(Kind of like a musician who plays by ear is using certain musical
devices that he might not be able to name, but music scholars can.)

>
> If you want to learn about conflicts in your grammar early on (rather
> than finding it out later by having some input be parsed other than
> intended), you also need follow-sets: if your grammar is
>
> A -> B|epsilon
>
> there is a conflict if "intersection(first(b),follow(A))" (where "U"
> signifies the union) is not empty: if one of the symbols in the
> intersection comes up at the start of an A, do you parse B or not?

You can easily parse both ways and pick one. There are readily available
backtracking techniques, even in blub languages.

Years ago I wrote a C expression parser which did this quite well using
exceptions based on setjmp and longjmp.

(Because it parsed expression shapes outside of any declaration context,
so it woudln't know, say, whether (A)(B) is the expression (B) being
cast to type (A), or function (A) being called with argument (B). The
goal of the parser was to determine whether the expression could have
side effects, with a tri-valued result: no, possibly, yes.)

"Parse expression grammars" formalize this more.

That sort of approach has to be carefully wielded, because you can
get poor performance due to explosion of the search space, with several
levels of speculative parsing.

> Of course, if you are freshly designing a programming language, you
> may be fine with declaring the actual behaviour of the
> redursive-descent parser to be the intended behaviour. But if you
> have to implement a given grammar, things become more complex.

If you have to implement a given grammar, with any tool or approach,
you need to have, or develop as you go along, a comprehensive regression
test suite, hitting as many of the the corner cases as possible.

Ian Lance Taylor

unread,

Jan 22, 2022, 6:51:47 PM1/22/22

to

On Sat, Jan 22, 2022 at 3:26 PM Roger L Costello <cost...@mitre.org> wrote:

>
> They ignored the rich theory and vast set of algorithms, in favor of their
> own
> proprietary code? Why would the C++ compiler developers do such a thing?
>
> /Roger
> [My guess is that they were too busy chopping down trees to sharpen their
> axes. -John]
>

The change in GCC away from using bison to a recursive descent parser was a
considered decision on the part of the GCC C++ maintainers. I believe that
the project started at
https://gcc.gnu.org/legacy-ml/gcc/2000-10/msg00573.html and it was
committed to the tree in December, 2002.

In my experience bison/yacc parsers are really good for knowing the exact
language that you are parsing. But it is difficult to make them generate
good error messages and it is difficult to debug them. A language like C++
can only be parsed in conjunction with a symbol table, because the parsing
depends on the semantic meaning of tokens; that too is difficult to do
using yacc. So while I agree that yacc parsers are theoretically superior,
I believe that experience shows that they have some problems in practice.

In fact, I would be interested in knowing whether there is any generally
used C++ compiler that uses a yacc parser. For example, clang also uses a
recursive descent parser.

Ian

Kaz Kylheku

unread,

Jan 23, 2022, 3:07:34 PM1/23/22

to

On 2022-01-22, Ian Lance Taylor <ianlanc...@gmail.com> wrote:
> In my experience bison/yacc parsers are really good for knowing the exact
> language that you are parsing.

In my experience, you can easily know what language you're processing
with Yacc is if you avoid its /ad hoc/ features for suppressing conflict
messages. Namely:

- %left, %right and %nonassoc declarations for tokens.

- %prec in rules.

Otherwise, you're likely going to be relying on your regression test
cases to inform you what language you're parsing.

The exception is that %left, %right are readily understood if they are
only used for the operator tokens in the productions for a binary
operator expression grammar (that likely being their motivating use case).