Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Are there "compiler generators"?

122 views
Skip to first unread message

Roger L Costello

unread,
May 28, 2022, 10:16:21 PM5/28/22
to
Hi Folks,

There are lexer generators. Flex is a lexer generator.

There are parser generators. Bison is a parser generator.

Are there compiler generators?

Page 52 of the book "Crafting a Compiler with C" says this in the chapter
titled "Scanning--Theory and Practice":

Programming a scanner generator is an example of nonprocedural programming.
That is, unlike ordinary programming, which we call procedural, we do not tell
a scanner generator "how" do scan but simply "what" we want scanned. This is a
higher-level approach and in many ways a more natural one. Much recent
research in computer science is directed toward nonprocedural programming
styles. (Database query languages and Prolog, a "logic" programming language,
are nonprocedural.) Nonprocedural programming is most successful in limited
domains, such as scanning, where the range of implementation decisions that
must be automatically made is limited. Nonetheless, a long-standing (and as
yet unrealized) goal of computer scientists is to generate an entire compiler
from a specification of the properties of the source language and target
computer.

That was written in 1991. Is it still true in 2022--there are no compiler
generators?

/Roger
[There are certainly programs that will generate a combined lexer and parser
but there's a lot more to a compiler. -John]

Robin Vowels

unread,
May 29, 2022, 6:13:42 PM5/29/22
to
"There's "A Compiler Generator" by McKeeman, Horning, and Wortman.

Jan Ziak

unread,
May 29, 2022, 6:14:16 PM5/29/22
to
On Sunday, May 29, 2022 at 4:16:21 AM UTC+2, Roger L Costello wrote:
> Hi Folks,
>
> There are lexer generators. Flex is a lexer generator.
>
> There are parser generators. Bison is a parser generator.
>
> Are there compiler generators? ...

If you like reading about something which might be practical in a
hundred years: https://people.idsia.ch/~juergen/goedelmachine.html
(1st paper in the series: https://arxiv.org/abs/cs/0309048)

Some practical examples available today, having some similarity to the above method, are PyPy and GraalVM.

There is also http://www.general-game-playing.de for example.

-atom

Anton Ertl

unread,
May 29, 2022, 6:14:57 PM5/29/22
to
Roger L Costello <cost...@mitre.org> writes:
>Are there compiler generators?

Apart from scanner generators and parser generators, there are also:

* generators of attribute grammar evaluators (e.g., Ox). They use an
attribute grammar as input.

* generators of instruction selectors (e.g., burg and friends). Burg
uses a tree grammar as input.

* generators of virtual machine interpreters and other code dealing
with virtual machine instructions (e.g., vmgen). Vmgen uses a
description of the virtual machine instructions as inputs.

* There is also some work on generating type checkers.

All of these could be classified as non-procedural, although they tend
to include some procedural code.

You need to add substantial amounts of glue code to integrate the
pieces generated by these generators into a compiler, especially for
languages that do not quite fit in the generators' molds. Plus, there
are pieces such as symbol tables and register allocation that are not
covered by these generators.

I have also seen a paper by a French group (don't remember the names)
in the early 1990s where a generator could generate a complete Pascal
compiler from a specification. My impression, however, was that the
generator could only generate compilers for languages that are
relatively close to Pascal, and I saw no good way to make it much more
flexible.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Thomas Koenig

unread,
May 29, 2022, 6:16:05 PM5/29/22
to
Roger L Costello <cost...@mitre.org> schrieb:
> Is it still true in 2022--there are no compiler
> generators?

Consider what a compiler does can be roughly subdivided
into the follwing tasks:

a) Lexing, resulting in a stream of tokens
b) Parsing, resulting in an abstract syntax tree
c) Conversion into intermediate form, resulting in
SSA (usually, these days)
d) Optimization of the SSA, resulting in modified SSA
e) Conversion into machine language, resulting in some
internl representation of the machine language
f) Optimization of the machine language
g) Output to later stages (in assembler or in intermediate
form)

Roughly speaking a)-c) is the front end, d) is the middle end and
e)-f) is the back end.

As our moderator writes,

> [There are certainly programs that will generate a combined lexer and parser
> but there's a lot more to a compiler. -John]

a) and b) are well covered. For c), you would need a machine-
implementable description of the language semantics, and there is
no such thing in widespread use. d) is a main component of today's
compilers, which is hand-crafted code. e)-g) can be covered by
machine descriptions, that gcc and LLVM have (it is debatable if
these can be called generators).

So, assuming you can use gcc or LLVM, the main gap is c) - you
would need a formal semantic description, and it would have to
be written in a way that it can be translated automatically.
That might be doable if you design your own langue with that goal.

Defining a format for language semantics which is useful for a
reasonable subset of the multitude of languages in current use and
machine-translatable (a input to a "semantic front end generator",
if you will) will be _extremely_ hard. Maybe starting with the
formal definition of Modula-2 and then thinking in two directions,
how could a semantic front end generator be built from that
description and and how could that format be extended to cover
other languages, could be the start of an approach.

Martin Ward

unread,
May 29, 2022, 6:16:50 PM5/29/22
to
On 28/05/2022 23:27, Roger L Costello wrote:
> There are lexer generators. Flex is a lexer generator.
>
> There are parser generators. Bison is a parser generator.
>
> Are there compiler generators?

Suppose a program F(x, y) takes two inputs and produces
an output. Suppose input x is known at compile time.
Then we can optimise the program F(x, .) by precomputing
all the computations that depend on x to give a program
G(.) which can be applied to y to produce the output F(x, y):

G(y) = F(x, y)

This optimisation process is called "partial evaluation"
or "specialisation".

An interpreter is a program F(p, d) which takes
a program p and some input data d and produces
the result of executing p on d.

In theory a partial evaluator (a "specialiser") can be applied
to an interpreter and the source code for a program to give
an executable which is a specialised version of the interpreter
which only runs the given source code, does not require
the source code to be applied and runs faster than
the original combination of interpreter and source code.

To take this to the next level: specialising the specialiser
for the interpreter generates a compiler for the interpreted
language. So this is a type of "compiler generator".

Finally, specialising the specialiser for itself gives a tool
that can convert any interpreter into an equivalent compiler.

https://en.wikipedia.org/wiki/Partial_evaluation

Futamura, Y. (1999). "Partial Evaluation of Computation Process—An
Approach to a Compiler-Compiler". Higher-Order and Symbolic Computation.
12 (4): 381–391. CiteSeerX 10.1.1.10.2747. doi:10.1023/A:1010095604496.
S2CID 12673078.

In practice, there is a lot more to writing a compiler than just
partially evaluating an interpreter.

(The difference between theory and practice is that in theory there
is no difference between theory and practice, but in practice there is.)

--
Martin

Dr Martin Ward | Email: mar...@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

Fernando

unread,
May 29, 2022, 6:17:20 PM5/29/22
to
Hi Roger.

At least in theory, the Second Futamura Projection (of which there are three
of them) would generate a compiler. That's a way to do partial program
evaluation (https://en.wikipedia.org/wiki/Partial_evaluation). There have been
research papers about it, demonstrating that one can indeed generate compilers
out of partial evaluation, eg.:

* An experiment in partial evaluation: the generation of a compiler generator.
ND Jones et al. 1985
* A compiler generator produced by a self-applicable specializer can have a
surprisingly natural and understandable structure. SA Romanenko, 1988
* Practical second Futamura projection: partial evaluation for
high-performance language interpreters. F Latifi, 2019
* Etc

Regards,

Fernando

gah4

unread,
May 30, 2022, 2:47:29 PM5/30/22
to
On Saturday, May 28, 2022 at 7:16:21 PM UTC-7, Roger L Costello wrote:

> There are lexer generators. Flex is a lexer generator.

> There are parser generators. Bison is a parser generator.

> Are there compiler generators?

My old favorite, and probably still favorite, compiler book is:

"Retargetable C Compiler, A: Design and Implementation"

(It is more likely to be your favorite if you are interested in a C compiler.)

LCC uses, more or less, a code generator generator. You supply the
instruction combinations to do operations that are needed, and it
uses dynamic programming to select the optimal code.

It is usual to generate assembly code for an assembler.

Mostly that leaves the middle end, especially for optimization,
that doesn't have its own generator.

Anton Ertl

unread,
May 30, 2022, 2:48:14 PM5/30/22
to
Martin Ward <mar...@gkc.org.uk> writes:
>In practice, there is a lot more to writing a compiler than just
>partially evaluating an interpreter.

In the 1990s I heard several talks about compiler generation in that
way, and it was an old topic by then, with still no practical results,
so my impression was that it never was going to be practical.

But recently I heard great things about optimizing Truffle
interpreters which eventually results in a compiler for the
interpreted language. You probably still have to hold it right, but
it's certainly much better than what I expected.

I asked one of the people involved if there had been a breakthrough,
but he could not name one, and said that there were just a lot of
little problems to be solved.

Hans-Peter Diettrich

unread,
May 30, 2022, 2:49:45 PM5/30/22
to
On 5/29/22 11:14 AM, Thomas Koenig wrote:
> Roughly speaking a)-c) is the front end, d) is the middle end and
> e)-f) is the back end.

According to CoCo/R a "compiler" can be anything that deals with formal
input. A generator of binary programs is only one kind of a compiler,
another one is a pretty printer.

I mention this because IMO there is much more demand for non-programming
language processors (HTML, XML...) than for programming language
compilers. I'd appreciate a word of our esteemed mod about the term
"compiler" in this group.

DoDi
[We chose the name comp.compilers in the 1980s and it's not going to change,
but I've always considered anything related to analysis or translation of computer
languages to be on topic. -John]

Kaz Kylheku

unread,
May 30, 2022, 4:27:55 PM5/30/22
to
On 2022-05-28, Roger L Costello <cost...@mitre.org> wrote:
> Hi Folks,
>
> There are lexer generators. Flex is a lexer generator.
>
> There are parser generators. Bison is a parser generator.
>
> Are there compiler generators?

Writing a compiler isn't something that, as a whole, requires code
generation.

The reason is that the bulk of it can be handled by libraries.

Say you want to write a working compiler with minimal effort.

You have some input syntax and so for that there is scanner and parser
generation with which you write your front end.

What helps you avoid writing the middle, and the back-end from scratch?
More code generation of some kind?

Quite simply, some compiler construction components which have API's
that you use. Ideally speaking, here is something like how it goes: in
your parser's actions, you use the API's to generate the right
intermediate representation of the code, and then hand it off to some
function that handles the remaining processing. The resulting compiler
then fits into an existing toolchain framework as a first-class
citizen.

There isn't any part of that process which screams "this needs code
generation". Really, doing the pattern matching to handle your source
syntax and translate it into the intermediate representation is the main
part of the system that benefits from code generation, and
that's about it.

What could help you would be a specialized parser construction language
which has some constructs that can be used in the actions to ease the
translations to the intermediate form. That would then be closer to a
"compiler generator", perhaps: something that not only writes the
parser, where you write procedural code for every phrase structure rule,
but something which has a rewrite notation that lets write the
source->intermediate translation scheme declaratively, using higher
level concepts from that particular intermediate rep, rather than, say,
low C language calls into its API.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Hans-Peter Diettrich

unread,
May 31, 2022, 11:20:08 AM5/31/22
to
On 5/30/22 2:53 PM, Hans-Peter Diettrich wrote:

> analysis or translation of computer languages to be on topic. -John]

What are those "computer" languages? I'd prefer "formal" languages
(Chomsky...) instead. E.g. Meta§ also was used for DNA analysis.

DoDi
[I was going to say artificial languages but I don't think we have anything
useful to say about Esperanto. In practice it hasn't been very hard to keep
discussions more or less on topic. -John]

gah4

unread,
May 31, 2022, 10:21:22 PM5/31/22
to
On Tuesday, May 31, 2022 at 8:20:08 AM UTC-7, Hans-Peter Diettrich wrote:
> On 5/30/22 2:53 PM, Hans-Peter Diettrich wrote:

> > analysis or translation of computer languages to be on topic. -John]
> What are those "computer" languages? I'd prefer "formal" languages
> (Chomsky...) instead. E.g. Meta§ also was used for DNA analysis.

> [I was going to say artificial languages but I don't think we have anything
> useful to say about Esperanto. In practice it hasn't been very hard to keep
> discussions more or less on topic. -John]

Definitely interpreters and macro processors have been discussed,
though some might not call them compilers. Also text processors
like TeX.

I am wondering, though, about (human) language translators.
It seems that many use non-deterministic AI systems, and so are
fundamentally different from most of what is discussed here.

If you parse Esperanto with a Flex/Bison parser then it should
be fine here. If you parse Fortran with a deep neural net, then
maybe not.
[That's essentially what I've been thinking. In the 1950s and 1960s
there was a lot of work trying to do human language translation using
formal methods and it worked very poorly, sort of adequate for
translating technical manuals, not for anything else. The breakthrough
was when someone at Google realized that the Candian parliament's
Hansard, the transcript of debates, had high quality parallel
French/English translations going back a century and they fed it
into their machine learning systems. -John]

Christopher F Clark

unread,
Jun 1, 2022, 1:17:44 PM6/1/22
to
Although there are many wonderful examples in replies already. I want
to mention the Eli "system"
http://eli-project.sourceforge.net/

It was (is) an attempt to do exactly that. It did so by cobbling
together different tools that different parts of a compiler. One in
particular is worth mentioning (Oil) which handled static type systems
and which types were compatible and when coercions were applicable. I
never found the overall system to be that usable because it didn't
really unify the tools to work well together, so that the notations
tended to stay divergent.

In fact, that is partially why in our (Compiler Resources) version of
Yacc++ we integrated the lexer and used Yacc-like notation for it. We
actually did it, because using one notation simply makes sense, but
seeing a tool which used multiple non-integrated notations simply
reinforced that opinion.

-----

But, the main point is that compilers are a big area with lots of
little parts, some of them have tools and others don't. More
importantly, much of the development in compilers since 1991 has been
in hand-written parts. I don't think there is a single production
quality compiler for C++ using anything but a hard-written parser.
That to me is a shame. What we have instead of tools are more
"frameworks" e.g. gcc and the gnu-compiler-collection, CLANG, even the
MIPS suite (Chow's work). LLVM (and possiblu MLIR) are bright spots
that the trend might be reversing. It's a pendulum, swinging one way
than the other.

------

And finally to Dr Dietrich's and our moderators point. I think that
compilers hand-written or tool generated are all appropriate,
including things like TeX and HTML etc. That would also include IDEs.
However, AI systems that aren't self explanatory and which we don't
know exactly what they recognize are probably not. That's a different
technology. Fortunately, it sounds like there are AI researchers
attempting to make AI systems that do "explain" what their reasoning
is. I might include them especially if they help us advance our own
understanding. The following article talks about it:

https://www.quantamagazine.org/machine-scientists-distill-the-laws-of-physics-from-raw-data-20220510/

------

Sorry, that I glommed three replies into one. It's just that I am
still in a rush to get some software finished for my "day" job
(despite having once retired but then un-retired to do interesting
stuff and some not so interesting) and taking time to write these
replies definitely impacts that.

Best regards,
Chris

--
******************************************************************************
Chris Clark email: christoph...@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Roger L Costello

unread,
Jun 1, 2022, 1:24:49 PM6/1/22
to
Page 161 of the book "Introducing Compiling Techniques" by J.P. Bennett says:
---------------------
Code generator generators

Just as there are parser generators, so there are code generator
generators. In general they all use the same idea, to match patterns
of instructions generated by the front end to patterns of instructions
available to the target machine.

IBURG

Iburg generates a fast tree parser. It takes a grammar describing the
patterns to be matched, with actions that generate the target code.
The grammar is augmented by the 'cost' of the code generated, and is
in general ambiguous. The Iburg generated parser finds the lowest cost
parse of a given sentence. Typically the cost will be the execution
time of the code, but for a compiler concerned with compact code
generation, it could be code size.

The idea of using a grammar to describe target code, and finding the
lowest cost parse, in order to generate the best code predates Iburg.
An early system due to Susan Graham used YACC productions to achieve
the same end.

---------------------

Wow!

So a compiler can be generated declaratively by using a set of
declarative generator tools, e.g., Flex for lexical analysis, Bison
for syntax/semantic analysis, and Iburg for code generation.

Has anyone used this combination of tools to create a whole compiler?

/Roger
[I expect that someone has used lex, yacc, and iburg in the same
compiler sometime in the past 30 years. But that doesn't mean that
they combine into a compiler generator any more than a saw, a hammer,
and a paintbrush combine into a house generator. They're tools, each
does what it does. -John]

Anton Ertl

unread,
Jun 2, 2022, 6:02:01 PM6/2/22
to
Roger L Costello <cost...@mitre.org> writes:
>So a compiler can be generated declaratively by using a set of
>declarative generator tools, e.g., Flex for lexical analysis, Bison
>for syntax/semantic analysis, and Iburg for code generation.
>
>Has anyone used this combination of tools to create a whole compiler?

The students taking my compiler course build such compilers: one
compiler per student, implementing a different small programming
language every year. They also use Ox (an attribute grammar evaluator
generator) for getting more structure into the in-between part.
Instead of iburg, they can also use burg, which has the same
interface.

gah4

unread,
Jun 2, 2022, 6:02:45 PM6/2/22
to
On Wednesday, June 1, 2022 at 10:24:49 AM UTC-7, Roger L Costello wrote:

(snip)

> So a compiler can be generated declaratively by using a set of
> declarative generator tools, e.g., Flex for lexical analysis, Bison
> for syntax/semantic analysis, and Iburg for code generation.

(snip, our moderator mentioned)
> [I expect that someone has used lex, yacc, and iburg in the same
> compiler sometime in the past 30 years. But that doesn't mean that
> they combine into a compiler generator any more than a saw, a hammer,
> and a paintbrush combine into a house generator. They're tools, each
> does what it does. -John]

There are now 3D printed houses, such as using computer controlled
concrete pouring devices. One could accept a computerized design
for a house, and then generate it in concrete. (I think you still need
to do the interior finishing the old way.)

One could write a system around flex, bison, and iburg, that would
accept as input a description of the language, and the output
machine instructions, that would call flex, bison, and iburg as
needed. The description could be a slightly higher level,
or as flex/bison input with minimal wrapper.
(Maybe not so much optimization, as that tends to be more
specialized. It could be useful as a first compiler for
a new machine.)

Many imperative languages are similar enough in structure
that it might almost work. It might be especially interesting
for those wanting to make small changes to existing
languages.

Well, I would choose a higher level language than the usual C
for output of lexer and parser. A language with just the operations
needed for a target compiler.

One could, then, use the code generator not only for the output
from the generated compiler, but for the compiler itself.
(Assume it is for a new machine, with no compilers yet.)

minf...@arcor.de

unread,
Jun 7, 2022, 10:53:40 AM6/7/22
to
Roger L Costello schrieb am Mittwoch, 1. Juni 2022 um 19:24:49 UTC+2:
> So a compiler can be generated declaratively by using a set of
> declarative generator tools, e.g., Flex for lexical analysis, Bison
> for syntax/semantic analysis, and Iburg for code generation.
>
> Has anyone used this combination of tools to create a whole compiler?

This is a hot AI research field. 'Deep compilers' touch topics like
least cost parsing and optimization of other compilation steps.
But AFAIU there is no such holistic thing like automatic complete
compiler construction a la
Compiler = f (grammar, software-infrastructure, target-hardware)

Some state of the art overview:
https://github.com/zwang4/awesome-machine-learning-in-compilers

OTOH code generators based on graphical input are already "old hats".

Thomas Koenig

unread,
Jun 7, 2022, 2:41:56 PM6/7/22
to
minf...@arcor.de <minf...@arcor.de> schrieb:

> But AFAIU there is no such holistic thing like automatic complete
> compiler construction a la
> Compiler = f (grammar, software-infrastructure, target-hardware)

This function misses "semantics" as an argument, probably the
biggest hurdle.

mac

unread,
Jun 9, 2022, 12:50:01 PM6/9/22
to
> Are there compiler generators?

Our esteemed moderator no doubt remembers PQCC
(https://en.wikipedia.org/wiki/PQCC) the Production-Quality-Compiler
Compiler back in the 80s.

Our idea of a production-quality compiler has evolved since then, both in
language semantics and in code quality
0 new messages