Compiler implementation language preference ?

Michael Justice

unread,

May 22, 2018, 1:39:07 PM5/22/18

to

Is there any preference to writing a compiler in say c instead of say
java, fortran, basic etc? I ask cause i see many of the projects using
either c or c++ instead of other programming languages.

Sincerely,

nullCompiler
[Mostly people use what they're used to, or in languages that are easy
to bootstrap on the machines they want to use. IBM's Fortran H
compiler was famously written in itself, but I wouldn't write a new
compiler in Fortran because it doesn't have great data structuring or
dynamic storage management. (Yes, I know that Fortran 2008 is a lot
different from Fortran 66.) -John]

Bruce Mardle

unread,

May 23, 2018, 11:04:20 AM5/23/18

to

On Tuesday, 22 May 2018 18:39:07 UTC+1, Michael Justice wrote:
> Is there any preference to writing a compiler in say c instead of say
> java, fortran, basic etc? I ask cause i see many of the projects using
> either c or c++ instead of other programming languages.

> [Mostly people use what they're used to, or in languages that are easy
> to bootstrap on the machines they want to use. IBM's Fortran H
> compiler was famously written in itself, but I wouldn't write a new
> compiler in Fortran because it doesn't have great data structuring or
> dynamic storage management. (Yes, I know that Fortran 2008 is a lot
> different from Fortran 66.) -John]

Per John's remark, the last translator I wrote (a Z280
cross-assembler) was in C (and bison) principally because that's what
I usually write in! In the early '80s I wrote 2 translatory things in
ZX Spectrum Basic. (The Speccy was the only computer I had access to.)
About 1,000 lines each. Later, I translated 1 in Mallard Basic and the
other into Turbo Pascal, both on an Amstrad PCW. The translation from
Spectrum Basic to Mallard Basic was a lot harder than the translation
to Pascal, which may explain my dim view of Basic!

I bet some of my old (later) Spectrum C programs would still compile... though
I'd probably have to turn off lots of warnings!
I've learnt a few new programming languages in the past 17 years but, in my
dotage, I've mostly forgotten them again :-/

William Clodius

unread,

May 27, 2018, 10:20:16 PM5/27/18

to

Michael Justice <nullco...@gmail.com> wrote:

> <snip>> [Mostly people use what they're used to, or in languages that are easy

> to bootstrap on the machines they want to use. IBM's Fortran H
> compiler was famously written in itself, but I wouldn't write a new
> compiler in Fortran because it doesn't have great data structuring or
> dynamic storage management. (Yes, I know that Fortran 2008 is a lot
> different from Fortran 66.) -John]

With pointers and allocatable arrays I don't see Fortran lacking in
allocatable storage management. With derived types and inheritance I
don't see it lacking in data structuring compared to, say, ISO C.

Allocatable length character strings has largely eliminated the
weaknesses of its character type, though in practice it is probably
best to deal with Unicode encodings by mapping them to a standatard
encoding using eight bit integers. The ISO_FORTRAN_ENV module allows
it to portably deal with multiple sized integers, so that such things
as UTF-8 files can be handled. Fortran compatible pre-processors
exist, though they are not as well known as the C pre-processor. What
it does lack, compared to ISO C, are type casts, and unsigned
integers. What it lacks compared to C++ is templates. More importantly
it has a user community focussed on numerics, that is relatively
unfamiliar with the types of algorithms used in compilers, and is
relativly uninterested in such applications.

Walter Banks

unread,

Jun 8, 2018, 12:17:46 PM6/8/18

to

Most of our compilers (including C compilers) are written in Pascal.

There are two reasons. The strong type checking in the Pascal compiler I
use is an important part of development productivity.

The second reason is Pascal has features that make it well matched to
the implementing a compiler. Pascal's fundamental support for string,
sets and boolean support tends to be very useful and natural to use. We
regularly use expert systems as part of our code creation process. Our
experience has been that they are easier to implement in Pascal than C.

The final point is an area of personal preference is the scoping
support for local functions.

w..

rockbr...@gmail.com

unread,

Nov 9, 2018, 9:50:09 PM11/9/18

to

On Tuesday, May 22, 2018 at 12:39:07 PM UTC-5, Michael Justice wrote:
> Is there any preference to writing a compiler in say c instead of say
> java, fortran, basic etc? I ask cause i see many of the projects using
> either c or c++ instead of other programming languages.
>

> nullCompiler
> [Mostly people use what they're used to, or in languages that are easy
> to bootstrap on the machines they want to use.

A test of whether the language, itself, is worth using -- assuming it is a
general purpose language -- is whether you'd be willing to write the compiler,
itself, in it! I put up a branch (and heavily recoded) version of cparse on my
machine, which is in C and has 3 layers of self-bootstrapping. GCC has several
layers of self-bootstrpping, depending on what you implement from it (and
distressingly, it has -- as of version 6 -- acquired *dependencies* on
libraries further upstream! That's a major no no!)

GnuBC has a (largely eliminable) layer of bootstrapping to compile its
predefined libraries into itself.

Knuth's TeX engine is built on top of the (context-sensitive) parser in Web
and/or cweb. The "tangle" and "weave" programs are the core that has to be
bootstrapped. Tangle is Web->Pascal (ctangle cweb->C); weave is Web->TeX,
cweave is cweb->TeX; (and all this is a setup for TeX.web, which has to be
compiled via Web).

Go is also self-built.

A notable gap is that Yacc is not self-compiled; thereby falling short of the
"is it worth using" test!

Code synthesis tools (indent, yacc to some degree, web) are difficult to do
with traditional parsers; since synthesis -- which is an application of the
field of "pragmatics" not "syntax"(!) -- means you have phrase structure
rules, but no start symbol! Instead, you process maximal parsable chunks; and
that generally is what requiring a context-sensitive parser. That's because
the source language has macros (in the case of Web, at least, that's the
reason). Translators all fall into this class too, particularly if the
language has macros. Those have to be handled correctly ideally without
breaking open the black box into the translator output.

The self-compile trick could be extended to theorem provers, since proof
algebras themselves are ... algebraic formalisms. I put up a small part of
Lambek-Scott's higher-order categorical logic/type-theory formalism on top of
Prover9-Mace4 (with difficulty), for instance. A bigger challenge might be to
try to bootstrap compile Martin-Loef's type theory on top of Automath; since
it is a (self-admitted) descendant of Automath.

Fortran prakrits (to coin a phrase) could be bootstrapped on top the old
Sanskrit Fortran (to coin another phrase) by good compiler-writer like ...

> ...but I wouldn't write a new

> compiler in Fortran because it doesn't have great data structuring or
> dynamic storage management. (Yes, I know that Fortran 2008 is a lot
> different from Fortran 66.) -John]

... John. (It's still an idea, ad-hoc extend the language you write it in and
use scripts to reduce it to Sanskrit in mid-process.)

Kaz Kylheku

unread,

Nov 10, 2018, 11:09:20 PM11/10/18

to

On 2018-11-09, rockbr...@gmail.com <rockbr...@gmail.com> wrote:
> A notable gap is that Yacc is not self-compiled; thereby falling short of the
> "is it worth using" test!

GNU Bison's grammar is written in Yacc. Amusingly, it is going to town
with Bison extensions, so you need Bison to re build it:

http://git.savannah.gnu.org/cgit/bison.git/tree/src/parse-gram.y

So the user doesn't have to, they keep the generated C in the repo.

And that's about how far anyone can reasonably go in using a Yacc to build
Yacc.

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1

Kaz Kylheku

unread,

Nov 10, 2018, 11:10:50 PM11/10/18

to

On 2018-11-09, rockbr...@gmail.com <rockbr...@gmail.com> wrote:

> On Tuesday, May 22, 2018 at 12:39:07 PM UTC-5, Michael Justice wrote:
>> Is there any preference to writing a compiler in say c instead of say
>> java, fortran, basic etc? I ask cause i see many of the projects using
>> either c or c++ instead of other programming languages.
>>
>> nullCompiler
>> [Mostly people use what they're used to, or in languages that are easy
>> to bootstrap on the machines they want to use.
>
> A test of whether the language, itself, is worth using -- assuming it is a
> general purpose language -- is whether you'd be willing to write the compiler,
> itself, in it! I put up a branch (and heavily recoded) version of cparse on my
> machine, which is in C and has 3 layers of self-bootstrapping. GCC has several
> layers of self-bootstrpping, depending on what you implement from it (and
> distressingly, it has -- as of version 6 -- acquired *dependencies* on
> libraries further upstream! That's a major no no!)

A nice bootstrapping method is to build an interpreter for the language
also in some widely available language (like C). The compiler can be
executed by the interpreter to compile itself, plus any other run-time
support code also written in that language.

If the compiler produces that widely-used systems programming language,
then it can just be redistributed in compiled form and an interpreter
need not be included.

That's a tough way to evolve the language, though. An interpreter gives
you a version of the language that is immune to bootstrapping
chicken-egg problems and provides a reference model for what compiled
code should be doing.

You can always revert to the interpreter when things go horribly wrong.

When you make modifications to the compiler and they are so wrong they
break the compiler, you don't have to revert them. Just blow off all
the compiled materials, try too fix your work in the compiler and just
bootstrap from scratch through the stable interpreter. You never need
a last-known-good copy of the compiler in your workspace.

Richard

unread,

Nov 11, 2018, 4:40:26 AM11/11/18

to

On 09.11.18 23:29, rockbr...@gmail.com wrote:

> A test of whether the language, itself, is worth using -- assuming it is a
> general purpose language -- is whether you'd be willing to write the compiler,
> itself, in it!

This does not prove anything about applicability of the language for
anything other than writing a similar compiler.

Richard
[Good point. Compilers use a variety of data structures and recursive algorithms
so if you can write a compiler, it's likely an adequate systems language. On
the other hand, IBM Fortran H was written in itself which only made sense because
the alternative was assembler. -John]

Walter Banks

unread,

Nov 11, 2018, 4:42:15 AM11/11/18

to

On 2018-11-09 5:29 p.m., rockbr...@gmail.com wrote:

>
> A test of whether the language, itself, is worth using -- assuming it
> is a general purpose language -- is whether you'd be willing to write
> the compiler, itself, in it! I put up a branch (and heavily recoded)
> version of cparse on my machine, which is in C and has 3 layers of
> self-bootstrapping. GCC has several layers of self-bootstrpping,
> depending on what you implement from it (and distressingly, it has --
> as of version 6 -- acquired *dependencies* on libraries further
> upstream! That's a major no no!)
>
> GnuBC has a (largely eliminable) layer of bootstrapping to compile
> its predefined libraries into itself.
>
> Knuth's TeX engine is built on top of the (context-sensitive) parser
> in Web and/or cweb. The "tangle" and "weave" programs are the core
> that has to be bootstrapped. Tangle is Web->Pascal (ctangle cweb->C);
> weave is Web->TeX, cweave is cweb->TeX; (and all this is a setup for
> TeX.web, which has to be compiled via Web).
>
> Go is also self-built.

I would argue against that suitability test with the following simple
logic. My choice for implementation language is primarily the most
suitable language to implement the compiler.

Part of what you are suggesting is the compiler bootstrap process that
is a very different process. Even for that I would argue against only
using the same language As one of several possible choices including
cross compiling on another platform.

w..

Nick

unread,

Nov 17, 2018, 11:46:39 AM11/17/18

to

One man's opinion: Have a look at D. https://dlang.org/ My approach
is to get the right answer first, then factor out the GC, then maybe
refactor more to port to C++. Simple Java can be cut-and-paste ported
to D.

Aaron Gray

unread,

Dec 19, 2018, 3:12:34 PM12/19/18

to

On Tuesday, 22 May 2018 18:39:07 UTC+1, Michael Justice wrote:

> Is there any preference to writing a compiler in say c instead of say

> java, fortran, basic etc? ...

> [Mostly people use what they're used to, or in languages that are easy
> to bootstrap on the machines they want to use. IBM's Fortran H
> compiler was famously written in itself, but I wouldn't write a new
> compiler in Fortran because it doesn't have great data structuring or
> dynamic storage management. (Yes, I know that Fortran 2008 is a lot
> different from Fortran 66.) -John]

Pity there are no real compiler-compilers anymore, hint-hint, I am working on one to rule them all ;)

Aaron Gray
---
Independent Open Source Software Engineer, Computer Language Researcher, Information Theorist, and amateur computer scientist.
[Please don't say you've invented another UNCOL. -John]

steve kargl

unread,

Dec 19, 2018, 8:15:52 PM12/19/18

to

Aaron Gray wrote:
> On Tuesday, 22 May 2018 18:39:07 UTC+1, Michael Justice wrote:
>> Is there any preference to writing a compiler in say c instead of say
>> java, fortran, basic etc? ...
>
>> [Mostly people use what they're used to, or in languages that are easy
>> to bootstrap on the machines they want to use. IBM's Fortran H
>> compiler was famously written in itself, but I wouldn't write a new
>> compiler in Fortran because it doesn't have great data structuring or
>> dynamic storage management. (Yes, I know that Fortran 2008 is a lot
>> different from Fortran 66.) -John]

The latest Fortran standard is informally referred to as F2018.
It became the official standard a week or so ago.
https://wg5-fortran.org/f2018.html
[You're right, but I still wouldn't want to write a compiler in it. That's
not what it's for. -John]

Martin Ward

unread,

Dec 21, 2018, 11:54:22 AM12/21/18

to

On 19/12/18 19:54, Aaron Gray wrote:
> Pity there are no real compiler-compilers anymore, hint-hint, I am
> working on one to rule them all ;)
>

> [Please don't say you've invented another UNCOL. -John]

I may be wrong but read Aaron's message to mean that he is
working on a domain specific language for writing compilers,
in the style of Language Oriented Programming,
rather than a universal intermediate program representation language,
which is what UNCOL attempted to be.

The original paper on Language Oriented Programming:

http://www.gkc.org.uk/martin/papers/middle-out-t.pdf

--
Martin

Dr Martin Ward | Email: mar...@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

Aaron Gray

unread,

Dec 21, 2018, 12:01:52 PM12/21/18

to

On Wednesday, 19 December 2018 20:12:34 UTC, Aaron Gray wrote:
> On Tuesday, 22 May 2018 18:39:07 UTC+1, Michael Justice wrote:
> > [Mostly people use what they're used to, or in languages that are easy

> > to bootstrap on the machines they want to use. ...

>
> Pity there are no real compiler-compilers anymore, hint-hint, I am working
on one to rule them all ;)
>
> Aaron Gray
> ---

> [Please don't say you've invented another UNCOL. -John]

John,

No I am not the man from UNCOL !

I am back working on my source to source compiler-compiler in the vein of YACC
but a real compiler-compiler not just a parser generator.

I am hopefully going to have all the main parser algorithms and some little
known ones and some new ones implemented. I have my Lexical Analyser Generator
LG implemented and an working on the Parser Generator PG, and an AST generator
AG, there are a few more tools and components to this. I am using algorithms
that are much simpler, clearer, and cleaner than the existing Flex, Bison, and
Byacc. I have literally implemented the algorithms from the Dragon Book and
even simplified them a bit, and an algorithm for equivalence classes my friend
invented, and am now working on the more complex "meta machine" algorithms.
Hopefully I will be able to parse all major languages.

I am working in C++ using nothing more complex than templates. It is library
based with tools that use the library.

For example I am using the Dragon Book's Regular Expression direct to DFA
technique heres an example of the code :-

signed int DFA::GenerateRG2DFA(LexicalContext* context) {
States states;
State startState = states.newState(context->firstpos());

this->accept[startState] = -1;
std::deque<State> UnfinishedStates;

UnfinishedStates.push_back(startState);

while (!UnfinishedStates.empty()) {
signed int accept = -1;
State state = UnfinishedStates.front();
UnfinishedStates.pop_front();
State nextState;

for (unsigned int input = 0; input < getNumberOfInputs(); ++input) {

bitset followpos(context->getNumberOfPositions());

for (bitset::iterator position = state.positions.begin(), end =
state.positions.end(); position != end; ++position) {
if (position.isElement()) {
if (context->move(position, input))
followpos |= context->followpos(position);

signed int action = context->getAction(position);
if (action != -1 && (accept == -1 || (accept != -1 && action < accept)))
accept = action;
}
}

if (!followpos.isEmpty()) {
if (!(nextState = states.findState(followpos)))
UnfinishedStates.push_back(nextState = states.newState(followpos));
}
else
nextState = State::NullState;

(*table)[state.index - 1][input] = (isTerminalState(context,
followpos) ? -1 : 1) * nextState;
} // end for inputs
this->accept[state.index] = accept;
} // end while (!W.empty())

return startState;
}

Happy Christmas,

Aaron
[Oh, that's entirely reasonable. A lot of the cruft in lex and yacc
and its descendants dates from the era when everyhing had to fit into
64K on a PDP-11. I've never seen any reason to use LALR rather than
LR(1) if you have room for the tables. -John]

Kaz Kylheku

unread,

Dec 21, 2018, 8:44:24 PM12/21/18

to

On 2018-12-20, Martin Ward <mar...@gkc.org.uk> wrote:
> On 19/12/18 19:54, Aaron Gray wrote:
>> Pity there are no real compiler-compilers anymore, hint-hint, I am
>> working on one to rule them all ;)
>>
>> [Please don't say you've invented another UNCOL. -John]
>
> I may be wrong but read Aaron's message to mean that he is
> working on a domain specific language for writing compilers,
> in the style of Language Oriented Programming,
> rather than a universal intermediate program representation language,
> which is what UNCOL attempted to be.
>
> The original paper on Language Oriented Programming:
>
> http://www.gkc.org.uk/martin/papers/middle-out-t.pdf

From that:

>> "In the case of the FermaT tool, the lowest level translator and
>> support library consists of 2–3,000 lines of LISP code. This
>> translates from low-level META WSL to LISP, all the rest of the system
>> is written in META WSL. To port the system to a new version of LISP, or
>> even to a new base language such as C, only requires rewriting the
>> lowest level translator: and this is a comparatively small task–in fact,
>> the first version of the translator was wr itten in less than three man
>> days. The FermaT system is currently being ported from a Unix environm
>> ent to a PC environment, using C rather than LISP as the implementation
>> language."

That language hopping aspect reminds me of MAL (Make A Lisp): a project
which bootstraps the same Lisp dialect (its own) in numerous different
languages (currently 74):

https://github.com/kanaka/mal