Good practical language and OS agnostic text?

comp...@is-not-my.name

unread,

Apr 17, 2012, 5:28:46 PM4/17/12

to

Guys, I'm having a bear of a time finding a good practical language
and OS agnostic text on writing a compiler. I'm weak in math and not
interested in the theoretical details. I want to understand the hows
and whys of compiler writing. Everything I've found is either
gobbledygook equations or "let's use C/C++/Java on UNIX" or things
that are so trivial and focused they don't explain general cases and
can't be extended to anything useful.

I think of all the compilers were written in the DOS days and there
were normal guys writing them, not Nobel math prizewinners. Shirley
there must be at least one book that explains how compilers work
without depending on intimate knowledge of a specific target or
implementation language? Whatever happened to pseudocode? It doesn't
have to be modern, just something that covers all the practical
aspects of compiler writing. Not exhaustive coverage of every possible
angle mind you, but at least so I can understand the basics and get
started without having to go back to uni and get an advanced maths
degree. I'm a good practical programmer and have experience with
writing my own libraries for all sorts of data structures and
functions and am not a cut and paste sort of fellow at all. I hope to
find a text or two for a guy like me who isn't a professor! Thanks for
your suggestions.

[Sorry to burst your bubble, but I knew people writing compilers for
DOS, and they understood parsing theory just fine. Although I agree
that some compiler texts are more readable than others, the math isn't
there to be obscure, it's there because understanding how state
machines and LL and LR work makes writing fast and reliable scanners
and parsers vastly easier. As far as the language they use for
examples, you have to use something. If you can find a copy of
Holub's "Compiler Design in C", and the errata list which is essential
due to the incredible number of errors in the published edition, you
might be able to work your way through that. -John]

Philip Herron

unread,

Apr 18, 2012, 9:25:37 AM4/18/12

to

On 17 April 2012 22:28, <comp...@is-not-my.name> wrote:
> .... I'm a good practical programmer and have experience with

> writing my own libraries for all sorts of data structures and
> functions and am not a cut and paste sort of fellow at all. I hope to
> find a text or two for a guy like me who isn't a professor! Thanks for
> your suggestions.

I understand your squabbles with wanting a very straight forward text
to work from. I had similar annoyances although I have a degree in
Math and Computer science maybe that made it a little easier but I
personally think understanding the ideas behind grammar and state
machines etc all that theory is actually very straight forward it just
can look obscure but it is kind of essential to know otherwise you can
end up making a lot of mistakes without really understanding why. But
that's not to say just taking your time to understand from first
principles you wont get there it just would take more time.

I personally found the Lex and Yacc o'reilly book extremely insightful
because if you work though the bison manual the examples they give can
really make you see how things work to a very basic level and its very
easy to see how it can be extended very quickly. Another is the dragon
book i still think although there is a lot of obscurity in it, but Its
a classic book for a reason its actually really really good. There are
a few other online pdf manuals i don't have the links for but a very
quick search though this mailing list will turn them up. Some of them
a very much to the point. The biggest point of all is compiler and os
work isn't very rewarding for a very long time when you start working
at it. As in you can go for ages and ages and not feel like your
getting anywhere then all of a sudden it can click. Well that's how i
feel sometimes.

I hope this gives you some hope because its not the easiest subject to
tackle. There is no substitution to just working at something just
starting your own small basic compiler project to understand
expressions will give you very much of what you want to know for the
basics in my opinion and just work at that in your own time.

--Phil
[Thanks for the suggestion of lex & yacc. I revised it a couple
of years ago, the new edition is called flex & bison. -John]

BGB

unread,

Apr 18, 2012, 11:39:54 AM4/18/12

to

On 4/17/2012 2:28 PM, comp...@is-not-my.name wrote:
> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing. Everything I've found is either
> gobbledygook equations or "let's use C/C++/Java on UNIX" or things
> that are so trivial and focused they don't explain general cases and
> can't be extended to anything useful.

<snip>

this is a hard thing to ask.

There is a lot of possible variation between languages, all of which may
have notable impact on the "best" compiler design.

I have not as of yet seen much of anything which tries to address all
this, and most focus more narrowly on "a C or Java like language
compiling directly to native code and using a C-style calling
convention". if designing a dynamic language, or having a "VM layer", or
a language with notably different requirements, then it may make sense
to vary the compiler design as well.

OS and CPU architecture can impact a fair amount as well, so it could be
similarly hard to address all of this without making at least some
assumptions on these fronts.

for example, do they use x86 or ARM as the example? ...

Otherwise, I guess, the book would probably be more of a "survey of the
field and lists of assorted design trade-offs" (or like a topic-focused
encyclopedia or similar), which although likely still useful, is
probably not likely to seem as "refined" or sell nearly as well.

Since many people seem to more like being told "this is how it is and
this is how you do it" than being handed something which more resembles
an encyclopedia.

like with "authority": many people play "follow the leader", others play
"oppose the leader", but both are likely pointless.

a person actively "doing their own thing" will not likely play that
game, and may not really care what the leader thinks, but will in turn
seem to at times be "following the leader" and "opposing the leader",
not because they actually are, but because not everything "the leader"
does is necessarily either right or wrong.

> [Sorry to burst your bubble, but I knew people writing compilers for
> DOS, and they understood parsing theory just fine. Although I agree
> that some compiler texts are more readable than others, the math isn't
> there to be obscure, it's there because understanding how state
> machines and LL and LR work makes writing fast and reliable scanners
> and parsers vastly easier. As far as the language they use for
> examples, you have to use something. If you can find a copy of
> Holub's "Compiler Design in C", and the errata list which is essential
> due to the incredible number of errors in the published edition, you
> might be able to work your way through that. -John]

I don't entirely agree (not being as much of a fan of math either).
the problem isn't so much about "understanding the topic", so much as
people often trying to throw mathematical notation at pretty much
everything.

often it could be explained easily enough using either natural-language,
some kind of pseudo-code, or some other specialized (presumably
non-esoteric) notation.

it is kind of defeats the point if a person needs a math degree to even
figure out what exactly this glob of notation is supposed to be (or even
what sort of math this is even supposed to be, as it becomes more the
"find a match the greek letters and other symbols" game).

it is much like people throwing set notation at everything.
yes, the concept of sets can be used.

but people throw it at nearly everything, often in cases where:
there is another notation and/or formalism which can express the topic
better (such as predicate logic, or even good old algebraic notation);
it has little to do with the topic (such as statistics or newtonian
mechanics, and more obscures the topic than it helps);
...

the end result is the same, the actual topic is buried under a pile of
mathematical gobbledygook.

Alain Ketterlin

unread,

Apr 18, 2012, 12:24:00 PM4/18/12

to

comp...@is-not-my.name writes:

> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows

> and whys of compiler writing. [...]

First, don't expect to understand much of compilation without at least
some background in discrete maths (some basic language theory, but
also graph theory if you go down later stages), and of course
algorithmics and programming.

Second, don't think compilation is all about language theory. For
instance, control-flow analysis is heavy on graph traversals, code
generation may use subtle algorithmics (e.g., dynamic programming), etc.
And optimization techniques may use whatever will provide a suitable
model (some loop optimizations make heavy use of linear algebra).

Of course, if you're interested in compilers you'll become interested
into all the theories/topics they use. And I think it's a very nice way
to learn a lot about computer science.

OK, now my suggestion: "Modern Compiler Implementation", by Andrew
Appel, which imho has the perfect balance between theory and
implementation. You are allowed to choose the implementation language:
the book exists in C, Java, and ML versions.

-- Alain.

Derek M. Jones

unread,

Apr 18, 2012, 1:16:31 PM4/18/12

to

On 17/04/2012 22:28, comp...@is-not-my.name wrote:
> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing.

I always recommend:
A Retargetable C Compiler: Design and Implementation
by David R. Hanson and Christopher W. Fraser

If you are weak on math you might have a problem getting your head
around recursion. If you cannot understand recursion your compiler
writing days are finished.

> I think of all the compilers were written in the DOS days and there
> were normal guys writing them, not Nobel math prizewinners. Shirley

In my experience compiler writers are not normal guys, but then I
am a vested interest.

> [Sorry to burst your bubble, but I knew people writing compilers for
> DOS, and they understood parsing theory just fine. Although I agree
> that some compiler texts are more readable than others, the math isn't
> there to be obscure, it's there because understanding how state
> machines and LL and LR work makes writing fast and reliable scanners

> and parsers vastly easier. As far as the language they use for ...]

I would question whether it is necessary to have any deep knowledge
of parsing theory. Some knowledge of state machines is useful for any
kind of software development.

I would add my voice to the previous posters who suggested our
moderator's book, flex & bison.

comp...@is-not-my.name

unread,

Apr 18, 2012, 3:30:09 PM4/18/12

to

> [Sorry to burst your bubble, but I knew people writing compilers for
> DOS, and they understood parsing theory just fine. Although I agree
> that some compiler texts are more readable than others, the math isn't
> there to be obscure, it's there because understanding how state
> machines and LL and LR work makes writing fast and reliable scanners
> and parsers vastly easier.

I know you have quite a lot of knowledge on the topic so I will take your
word for it. But are you really telling me you learned all the theory before
you ever started a compiler project? I need a way to get started and I can't
seem to understand the math I have seen so far and I have nobody to talk it
over with so I really need something more practical. I realize I won't be an
expert on compiler theory or parsing theory or any other kind of theory but
I cannot believe the principles can't be generalized enough with some basic
non-mathematical explanations to be useful in practice or that if someone
did that the results would have no practical value. Eventually theory has to
be implemented and the implementation doesn't contain all the theory, so
there has to be a way. As we all know, it's not uncommon to implement
algorithms that the programmer doesn't (fully) understand in code, he
depends on scientists, mathematicians etc. to generalize it enough so it can
be implemented usefully. That's what I'm looking for at this point. I'll
take somebody else's word on the theory, I would like to see the data
structures and understand the algorithms from a practical view and accept
the logic behind it is sound.

> As far as the language they use for examples, you have to use something.

Pseudocode would be more readable for me than what I am finding in modern
texts. If you're saying they all use code then I'll revise my request and
ask for a text that uses assembly language rather than C or an HLL.

> If you can find a copy of Holub's "Compiler Design in C", and the errata
> list which is essential due to the incredible number of errors in the
> published edition, you might be able to work your way through that. -John]

Since there are an incredible number of errors is this really a book a guy
like me should be using, even with the errata list? I have no experience
with C and I find it unreadable. I get the big picture with C but not the
important details. Thank you.
[I didn't learn all the theory before I ever started a compiler project, but
the compiler I wrote before I knew any theory was pretty putrid. Don't ask
for details, it was in about 1970 and I don't remember many of them. Holub's
book does require that you understand C, but once you've applied all the errata,
it's not bad and quite practical. -John]

glen herrmannsfeldt

unread,

Apr 18, 2012, 4:29:18 PM4/18/12

to

comp...@is-not-my.name wrote:

> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing. Everything I've found is either
> gobbledygook equations or "let's use C/C++/Java on UNIX" or things
> that are so trivial and focused they don't explain general cases and
> can't be extended to anything useful.

My favorite understandable compiler book is the LCC book:

http://www.amazon.com/Retargetable-Compiler-Design-Implementation/dp/0805316701

You might be right that all books are one (or more) of those three,
but then you should choose from among those.

I don't know of many trivial compiler books, but, yes there are a
number that are more theoretical than I like.

C is a nice, simple language in which to describe compiler design, and
also not so bad a language in which to write compilers. Given that,
it isn't a bad start toward writing one for another language, and/or
written in another language, unless you don't know C.

Both C and Java are fairly simple, and reasonably similar, with much
of the complication moved to the library. That means a complete
compiler can be described relatively simply, covering all the
important ideas. Once you learn those, you will be ready to go on to
other languages (both for compiling and writing the compiler in).

You don't say what language you are interested in. There are some
complications to writing Fortran compilers not covered in most books.
Both Fortran and PL/I don't have reserved words, which requires
special handling by the compiler. Fixed form Fortran ignores blanks,
which requires a special lexical analysis technique, but most of
compiler theory is in parsing, which is reasonably language
independent.

It seems to me natural that a pseudo-code description will be more
theoretical, and harder to understand, than one using a well-known
high-level language.

-- glen

glen herrmannsfeldt

unread,

Apr 18, 2012, 6:43:10 PM4/18/12

to

Derek M. Jones <de...@knosof.co.uk> wrote:
> On 17/04/2012 22:28, comp...@is-not-my.name wrote:
>> Guys, I'm having a bear of a time finding a good practical language
>> and OS agnostic text on writing a compiler. I'm weak in math and not
>> interested in the theoretical details. I want to understand the hows
>> and whys of compiler writing.

> I always recommend:
> A Retargetable C Compiler: Design and Implementation
> by David R. Hanson and Christopher W. Fraser

So, that makes two (out of about five) of us. (My post comes later.)

Much of the book is about code generation, which, it seems to me,
is not described in as much detail in many other compiler books.

Parsing theory is where much of the theory, and hard to understand
mathematical descriptions, appear, but in the end (back end, in the
case of compilers) it is about code generations.

As far as languages to write compilers in, it is now usual (though
maybe not 50 years ago) to describe parts of the compiler in a
special purpose language. As previously noted, there are flex and
bison to write the front end, though you usually need to know some
C to use them.

For compilers that generate code for more than one target,
(at least gcc and lcc), the back end is usually described through
a language easier for humans to understand. To me, the lcc code
generator is much easier to understand than that of gcc.
You should be able to write a description for a new target
without knowing C, or much of parsing theory. You do need a
good understanding of the instruction set for the target, though.

-- glen

Roberto Waltman

unread,

Apr 18, 2012, 10:00:16 PM4/18/12

to

Not a direct answer to your question - Stanford University is offering
an online compiler course starting April 23. You may want to take it,
(it's free.)

Details here: https://www.coursera.org/course/compilers

Now, regarding a compiler textbook with a good balance between theory
and implementation details, I always recommend Pyster's "Compiler
Design and Construction"
It is dated, (as proven by the choice of source and target languages:
"Rascal" [Rudimentary Pascal] and IBM 370 assembler) but still an
excellent guide for your first attempts at compiler writing.
I believe it is a much easier first read than both LCC and the Dragon
book.
Adapting it to languages you know will deepen your understanding of
how it works.
Another good choice (language wise) could be the Oberon compilers.

--
Roberto Waltman

Bakul Shah

unread,

Apr 19, 2012, 12:15:09 AM4/19/12

to

Check out Nils M Holm's "Practical Compiler Construction", available
at lulu.com. It is a 365 page "tour" through a *complete* compiler for
a subset of C language. The compiler can compile itself and you can
download the code from author's site (www.t3x.org). It doesn't use
lex or yacc (just a hand-rolled scanner and a recursive descent
parser). The compiler is about 4300 lines of code. It describes all
the key concepts but given the simple design doesn't go into a lot of
details (beyond describing the code). The book describes a i386 code
generator. The code generator interface seems well enough abstracted.
When challenged, Nils put together a x86-64 backend in a day!

I have just skimmed the book so far but compared to the Fraser and
Hanson C compiler book this is a much simpler book, doesn't go in as
much depth but seems to be an easy read. So a good beginner book but
you'd do well to consult other books later.

BGB

unread,

Apr 19, 2012, 3:05:34 AM4/19/12

to

On 4/18/2012 3:43 PM, glen herrmannsfeldt wrote:
> Derek M. Jones<de...@knosof.co.uk> wrote:
>> On 17/04/2012 22:28, comp...@is-not-my.name wrote:
>>> Guys, I'm having a bear of a time finding a good practical language
>>> and OS agnostic text on writing a compiler. I'm weak in math and not
>>> interested in the theoretical details. I want to understand the hows
>>> and whys of compiler writing.
>
>> I always recommend:
>> A Retargetable C Compiler: Design and Implementation
>> by David R. Hanson and Christopher W. Fraser
>
> So, that makes two (out of about five) of us. (My post comes later.)

sadly, in my case, I can't really currently get (or afford) books which
aren't freely available online.

did find a site though (which I guess is a personal site for the author?):
http://www.well.com/~cwf/pro/vita.htm

which has maybe a few interesting looking papers.

saw one about employing LZ77 in an ISA, which is a fairly nifty seeming
feature, but is sadly not really directly applicable in my case.

> Much of the book is about code generation, which, it seems to me,
> is not described in as much detail in many other compiler books.
>
> Parsing theory is where much of the theory, and hard to understand
> mathematical descriptions, appear, but in the end (back end, in the
> case of compilers) it is about code generations.

then there is the irony being that it is considerably more effort IME to
write the back-end than it is to write a parser, or at least this has
been my experience in these matters (logic for things like register
allocation, allocating space in the stack-frame, logic for dealing with
types and spitting out various ASM fragments, ... is all a bit painful).

granted, I tend not to use any fancy parsing algorithms, as pretty much
all of my parsers have been hand-written recursive descent parsers.

although, I guess it is always possible that maybe I have just been
approaching the back-end in ways more complex or painful than necessary,
but it is not clear how it could all be considerably easier.

granted, maybe it is also that modern CPU architectures (such as x86 and
x86-64) and type-towers (integer types, FPU, and SIMD) are more complex
than many older architectures.

if so, maybe the parser preoccupation is at least partly a hold-over
from the times where the parser was more complex relative to the
code-generator?

I guess the other major option is that it is because the parser is often
the part of the problem that most people run into first (and thus where
more people are more likely to give up and walk away?).

> As far as languages to write compilers in, it is now usual (though
> maybe not 50 years ago) to describe parts of the compiler in a
> special purpose language. As previously noted, there are flex and
> bison to write the front end, though you usually need to know some
> C to use them.

I have not personally used them.

to me, depending on an external tool to spit out code for something that
is maybe a few thousand lines seems a bit overkill IMHO.

granted, maybe it is that I haven't been writing "maximally efficient"
parsers, but more just sort of "straightforward but naive" parsers:
read in a few tokens;
match against possible constructions;
...

I had not usually actually introduced a separate lexer and parser stage
either, but instead typically just read tokens directly from a string
buffer during parsing (or a "reader stream object").

I have sometimes used a small hash-table to partly optimize re-reading
the same tokens (as my parsers tend to often end up re-reading the same
tokens multiple times, and hashing the string-pointer allows skipping
much of the logic). this was done in my C parser as it ends up dealing
with considerably more code during parsing (and thus tends to bog down
more readily with reading tokens).

a lot of this then ends up with a lot of if/else logic and "strcmp()"
and similar (although in a few places I have used "indexed tokens",
where an index number is used for keywords in place of string comparisons).

maybe it is all a bit lazy, but it works.

> For compilers that generate code for more than one target,
> (at least gcc and lcc), the back end is usually described through
> a language easier for humans to understand. To me, the lcc code
> generator is much easier to understand than that of gcc.
> You should be able to write a description for a new target
> without knowing C, or much of parsing theory. You do need a
> good understanding of the instruction set for the target, though.

in my case, most of my stuff tends to be plain C.

some of the code in my prior code-generator did use a special
preprocessor which allowed for more powerful macros.

my assembler uses a special notation for the ASM listings, where
basically a notation similar to that used in the Intel docs is used to
describe most of the opcodes.

later changes to support AVX and XOP, and ARM and Thumb, were a little
less clean (nasty notational cruft).

more notation had to be devised as these docs had (for some reason)
mostly using either bit-based notations or "bit-box" diagrams.

actually, when I started out trying to write the assembler, I started
out writing explicit logic for the various instructions, but quickly
became frustrated, and then wrote a tool to convert the Intel-docs
notation into C (changing all this mostly into a task of transcribing
the docs into a text file).

but, pretty much everything else is plain C.

I am not terribly sure why "needing to knowing C" would be something to
be avoided for a compiler backend, when presumably someone who "doesn't
know C" (and can't be asked to learn about it) should presumably be
somewhere far away from a compiler's back-end?

a better goal I think is using a specialized format more as an aide to
reduce the amount of code which needs to be written to support a new target.

Hans-Peter Diettrich

unread,

Apr 19, 2012, 8:20:11 AM4/19/12

to

glen herrmannsfeldt schrieb:

> C is a nice, simple language in which to describe compiler design, and
> also not so bad a language in which to write compilers. Given that,
> it isn't a bad start toward writing one for another language, and/or
> written in another language, unless you don't know C.

<rant from the hate-C fraction>
C is an ugly language, featured (and widely used) to write cryptic code.
In practice it's useless without another language, used in the
preprocessor (and more languages used in "make" and the autobloat
tools). Even an inventor of that language acknowleged later, that he
better should have followed Wirth's advice, to e.g. make the language
LL(1). The strength of C is writing low level (OS) code, but it lacks
many features desireable in *managing* (such) big applications. This
issue has been addressed in C++, C# and Java, later, but who would
advice an newbie to implement an compiler for C++, which IMO also is
everything but a "nice" or easily understandable language?

But of course you are right in so far, that a C/C++ compiler is one of
the best examples, where one can study much required theory, which is
not required for writing compilers for better designed languages.
</rant>

Just my $0.02 ;-)

DoDi

Hans-Peter Diettrich

unread,

Apr 19, 2012, 7:53:30 AM4/19/12

to

Alain Ketterlin schrieb:

> comp...@is-not-my.name writes:
>
>> Guys, I'm having a bear of a time finding a good practical language
>> and OS agnostic text on writing a compiler. I'm weak in math and not
>> interested in the theoretical details. I want to understand the hows
>> and whys of compiler writing. [...]
>
> First, don't expect to understand much of compilation without at least
> some background in discrete maths (some basic language theory, but
> also graph theory if you go down later stages), and of course
> algorithmics and programming.

I dare to disagree. For writing an parser it's sufficient to understand
a formal language definition (BNF...), from which a LL(1) parser can be
even hand-coded. When tools are used to create a compiler skeleton
(lex/yacc, Coco/R, Antlr...), the related math is encapsulated in these
tools, no need that their users bother with it.

IMO the OP will be comfortable with Wirth's books, languages and
compilers, which are understandable even without a big theoretical
background. Even if Wirth is concerned with *teaching* compiler
principles, his languages and compilers are not the toys as many people
believe. E.g. Oberon implements a complete OS, with the compiler being
an integrated part of the entire system. From there it's only a small
step to understanding and implementing e.g. JIT compilers, which require
an different approach from stand-alone compilers.

> Second, don't think compilation is all about language theory. For
> instance, control-flow analysis is heavy on graph traversals, code
> generation may use subtle algorithmics (e.g., dynamic programming), etc.
> And optimization techniques may use whatever will provide a suitable
> model (some loop optimizations make heavy use of linear algebra).
>
> Of course, if you're interested in compilers you'll become interested
> into all the theories/topics they use. And I think it's a very nice way
> to learn a lot about computer science.

Again I suggest the OP to dig into the various (optional) parts of an
compiler later, when he discoverd a *practical* need/motivation for code
flow analysis, register allocation etc. Many people (like me ;-) are
much more open to the theory, when they have practical examples for
their application *before*.

Life is too short for writing an full-blown heavily-optimizing
production compiler from scratch, including its whole RTL. A beginner
IMO is better off with a small language and compiler, where he can study
the related problems, and can find out the areas of his personal interest.

DoDi

comp...@is-not-my.name

unread,

Apr 19, 2012, 7:31:33 AM4/19/12

to

g...@nospam.ugcs.caltech.edu wrote

> Much of the book is about code generation, which, it seems to me,
> is not described in as much detail in many other compiler books.

I wonder how useful that will be for me given my non-UNIX target but I'll
try to look at that book since it got two thumbs up here.

> Parsing theory is where much of the theory, and hard to understand
> mathematical descriptions, appear, but in the end (back end, in the
> case of compilers) it is about code generations.

I looked over Let's Build a Compiler and the code generation part wasn't
difficult for me. I worked out the first few examples cross-compiling to
z/OS. But after that it started to look like it wasn't a serious article as
far as producing something useful goes. Anyone care to comment on it?

> As far as languages to write compilers in, it is now usual (though
> maybe not 50 years ago) to describe parts of the compiler in a
> special purpose language. As previously noted, there are flex and
> bison to write the front end, though you usually need to know some
> C to use them.

Not only don't I know C and am not interested in knowing it, but the tools
for the front end aren't available to me and I don't plan on using anything
I don't write. I've gotten this far without ever cutting and pasting and I
don't intend to start now.

> You should be able to write a description for a new target
> without knowing C, or much of parsing theory. You do need a
> good understanding of the instruction set for the target, though.

And that I have in spades. Thanks for your post.

comp...@is-not-my.name

unread,

Apr 19, 2012, 7:31:34 AM4/19/12

to

use...@nospam.rwaltman.com wrote

> Not a direct answer to your question - Stanford University is offering
> an online compiler course starting April 23. You may want to take it,
> (it's free.)

Thanks I saw that and was very excited, until I learned it was the usual
environment, that doesn't help me.

> Now, regarding a compiler textbook with a good balance between theory
> and implementation details, I always recommend Pyster's "Compiler
> Design and Construction"
> It is dated, (as proven by the choice of source and target languages:
> "Rascal" [Rudimentary Pascal] and IBM 370 assembler) but still an
> excellent guide for your first attempts at compiler writing.

Now you're talking! This could be the one! I will try to find a copy, thank
you!

> I believe it is a much easier first read than both LCC and the Dragon
> book.
> Adapting it to languages you know will deepen your understanding of
> how it works.
> Another good choice (language wise) could be the Oberon compilers.

I don't generally like the Wirth languages because they often have built in
limitations that make them unsuitable for real work. However they do seem
amenable to changing them so that they are useful. People took Pascal and
Modula-2 in new directions and many variations are supposed to be pretty
good. I'll look at Oberon again now that you mention it. Thanks for your
post!

comp...@is-not-my.name

unread,

Apr 19, 2012, 7:31:34 AM4/19/12

to

Alain Ketterlin <al...@nospam.dpt-info.u-strasbg.fr> wrote

> comp...@is-not-my.name writes:
>
> > Guys, I'm having a bear of a time finding a good practical language
> > and OS agnostic text on writing a compiler. I'm weak in math and not
> > interested in the theoretical details. I want to understand the hows
> > and whys of compiler writing. [...]
>
> First, don't expect to understand much of compilation without at least
> some background in discrete maths (some basic language theory, but
> also graph theory if you go down later stages), and of course
> algorithmics and programming.

Discrete maths!? I don't even know what that is.

> Second, don't think compilation is all about language theory. For
> instance, control-flow analysis is heavy on graph traversals, code
> generation may use subtle algorithmics (e.g., dynamic programming), etc.
> And optimization techniques may use whatever will provide a suitable
> model (some loop optimizations make heavy use of linear algebra).

Yeah but like anything else now that you smart mathematicians have worked
all that out, can't we use what you know without fully understanding it? Is
it unreasonable to explain what to do in an algorithmic way without needing
for the reader to understand the theoretical background? Otherwise it smacks
of having to reinvent the research wheel to do a project like this.

> OK, now my suggestion: "Modern Compiler Implementation", by Andrew
> Appel, which imho has the perfect balance between theory and
> implementation. You are allowed to choose the implementation language:
> the book exists in C, Java, and ML versions.

I don't know or use any of those languages so I suppose it won't help me
very much but I'll see if I can look over the book anyway. Thank you.

Torben Ægidius Mogensen

unread,

Apr 19, 2012, 8:58:56 AM4/19/12

to

comp...@is-not-my.name writes:

> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing. Everything I've found is either
> gobbledygook equations or "let's use C/C++/Java on UNIX" or things
> that are so trivial and focused they don't explain general cases and
> can't be extended to anything useful.

You could have a look at my book "Introduction to Compiler Design"
(http://www.springer.com/computer/swe/book/978-0-85729-828-7), or the
free-to-download earlier version called "Basics of Compiler Design"
(http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/index.html).

The books are language agnostic, describing techniques independently of
any specific implementation langauge, and mostly independently of the
source and target languages for the compilers. There is some math, but
not as much as in some other compiler textbooks I have seen.

I wrote the book mainly as a reaction to the books available at the
time, which were either (IMO) too advanced for students just beginning
the second year of CS or basically walk-throughs of specific compilers
for and in specific languages and, hence, not generally useful.

Torben

comp...@is-not-my.name

unread,

Apr 19, 2012, 12:32:13 PM4/19/12

to

de...@nospam.knosof.co.uk wrote

> I always recommend:
> A Retargetable C Compiler: Design and Implementation
> by David R. Hanson and Christopher W. Fraser

Thanks.

> If you are weak on math you might have a problem getting your head
> around recursion.

I'm sure I don't understand recursion like a mathematician understands it. I
think I understand the applied use of it.

> If you cannot understand recursion your compiler writing days are
> finished.

Before they even started, how sad!

> > I think of all the compilers were written in the DOS days and there
> > were normal guys writing them, not Nobel math prizewinners. Shirley
>
> In my experience compiler writers are not normal guys, but then I
> am a vested interest.

You're probably right since the more I look at it the more it seems
the 90/10 rule applies. 90% of compilers were written by the same
guys. Small group after all. I guess you (plural) have a vested
interested in keeping the group small ;-)
[Back when I was a grad student, I taught the compilers course required
for all CS majors. Some of them even wrote compilers that worked. -John]

comp...@is-not-my.name

unread,

Apr 19, 2012, 12:32:12 PM4/19/12

to

From: Philip Herron <redb...@nospam.gcc.gnu.org>

> I understand your squabbles with wanting a very straight forward text

> to work from. I had similar annoyances although i have a degree in
> Math and Computer science maybe that made it a little easier but i

> personally think understanding the ideas behind grammar and state
> machines etc all that theory is actually very straight forward it just
> can look obscure but it is kind of essential to know otherwise you can
> end up making a lot of mistakes without really understanding why. But
> that's not to say just taking your time to understand from first
> principles you wont get there it just would take more time.

I have an undergraduate degree in computer science as well but we didn't get
into much theoretical stuff. I went into development from there and didn't
get an advanced degree. Obviously I lack the higher academic knowledge that
many of you have. Your comments seem to confirm what I thought about this
specific issue. It's just I haven't found the right presentation and not
having anyone to discuss things with is also difficult. I don't think the
theory is silly or not relevant that's not my point at all. Given I lack the
background to understand it I'm trying for a more practical angle, that's
all.

> I personally found the Lex and Yacc o'reilly book extremely insightful
> because if you work though the bison manual the examples they give can
> really make you see how things work to a very basic level and its very
> easy to see how it can be extended very quickly.

I've looked over the archives here and John obviously knows his stuff! I'm
sure the book is very good. That sort of approach doesn't help me though
because I don't have Lex or Yacc or Bison in my development environment and
I really want to understand the parts well enough to write my own pieces and
not use something other people have written. That method has always been a
good idea in the past because I was forced to understand how the code
works. If you use other people's code you can miss things.

> Another is the dragon book i still think although there is a lot of
> obscurity in it, but Its a classic book for a reason its actually really
> really good.

I received some scans a few years ago and the book is daunting. If everyone
agrees it's good I suppose I should buy a real copy I always find reading
actual books is better than switching back and forth between screens. The
prospect of needing to understand a 1,000+ page book to do a small project
such as think I have in mind seems a bit much.

> There are a few other online pdf manuals i don't have the links for but a
> very quick search though this mailing list will turn them up. Some of them
> a very much to the point.

That's the biggest problem is how many there are and since I can't tell
which are good and which are bad I'm asking for help selecting the right one
for me. So far I haven't come across it.

> The biggest point of all is compiler and os work isn't very rewarding for
> a very long time when you start working at it. As in you can go for ages
> and ages and not feel like your getting anywhere then all of a sudden it
> can click. Well that's how i feel sometimes.

Thanks that's an important piece of information to keep handy on a project
like this. I've worked on many large products that may take years before
they go to production so I'm aware of this general idea. Writing big
software projects is not for the immediate gratification crowd!

> I hope this gives you some hope because its not the easiest subject to
> tackle. There is no substitution to just working at something just
> starting your own small basic compiler project to understand
> expressions will give you very much of what you want to know for the
> basics in my opinion and just work at that in your own time.

Thanks that's exactly what I'm after. I don't have any plans to rock
the compilation world or write books on the subject. I'd be very excited to
be able to understand the basics enough to produce something I can use and
to use that as a path to additional learning on the topic.

comp...@is-not-my.name

unread,

Apr 19, 2012, 1:32:19 PM4/19/12

to

BGB <cr8...@nospam.hotmail.com> wrote:

> this is a hard thing to ask.
>
> There is a lot of possible variation between languages, all of which may
> have notable impact on the "best" compiler design.

I'm not dreaming about "best" just getting a small project going would be
enough to make my day or month etc.

> I have not as of yet seen much of anything which tries to address all
> this, and most focus more narrowly on "a C or Java like language
> compiling directly to native code and using a C-style calling
> convention".

I guess that would be ok, do you have any suggestions on a book like that?

> OS and CPU architecture can impact a fair amount as well, so it could be
> similarly hard to address all of this without making at least some
> assumptions on these fronts.

Fair enough. I didn't think so but everybody seems to be saying that so I
guess that is the way these books are written. I would have thought there is
a lot that has nothing to do with code generation and that code generation
itself aside from optimization obviously would be a pluggable component.

> for example, do they use x86 or ARM as the example? ...

My target would be z/OS and I may want to do it so it can generate code for
MVS and XA and ESA targets as well. If I can get to the code generation part
I'll have no trouble with this actual detail. It's all the stuff before that
and maybe after it, I have no idea how to do.

> like with "authority": many people play "follow the leader", others play
> "oppose the leader", but both are likely pointless.

I have no agenda and I know my limits. I'm not opposed to a book that picks
one way and pretends it's the only way, we've all been around to know that
can't be true. I'm willing to go along for the ride, but I am trying to find
a writer you guys says is trustworthy.

> a person actively "doing their own thing" will not likely play that
> game, and may not really care what the leader thinks, but will in turn
> seem to at times be "following the leader" and "opposing the leader",
> not because they actually are, but because not everything "the leader"
> does is necessarily either right or wrong.

I'm happy to follow the leader on this sort of project because I've never
done it before. I don't pretend to know how to do it and I'd be silly to
pretend that. But I do need something that explains the practical issues and
I don't need to understand the theory to be able to prove anything. If I
can understand /what/ to do I'll eventually understand /why/ it works well
enough for what I need. At some level everything is a black box. Some people
just drill down further than others until they get to that point.

> I don't entirely agree (not being as much of a fan of math either).
> the problem isn't so much about "understanding the topic", so much as
> people often trying to throw mathematical notation at pretty much
> everything.

This issue comes up a lot and it's partly people have worked pretty hard to
amass deep knowledge of a topic and they don't like to see people coming in
and trivializing or disrespecting that and saying you're all silly and I can
do it without all that. Nothing of the kind, I have a lot of respect for
people with the theoretical understanding who've actually written a real,
disciplined compiler according to proper rigorous methods.

My approach is not to dismiss that or say it has no value, quite the
opposite. I don't know what you guys know and I can't learn it quickly
enough so I need some distillation of the theory and a presentation in
a practical way a good programmer who is a very bad mathematician can
understand and actually use.

> often it could be explained easily enough using either natural-language,
> some kind of pseudo-code, or some other specialized (presumably
> non-esoteric) notation.

That is what I thought but everybody else seems to say that isn't true.

> it is kind of defeats the point if a person needs a math degree to even
> figure out what exactly this glob of notation is supposed to be (or even
> what sort of math this is even supposed to be, as it becomes more the
> "find a match the greek letters and other symbols" game).

Yeah that's how I feel.

BartC

unread,

Apr 19, 2012, 1:43:00 PM4/19/12

to

<comp...@is-not-my.name> wrote in message news:12-0...@comp.compilers...

>> [Sorry to burst your bubble, but I knew people writing compilers for
>> DOS, and they understood parsing theory just fine. Although I agree

>I need a way to get started and I can't
> seem to understand the math I have seen so far and I have nobody to talk
> it over with so I really need something more practical.

You don't need any maths to get started. Perhaps that was to do with
parsing; just google for "recursive descent parsers", you might find some
examples that don't use a curly-brace language.

--
Bartc

compil...@h-rd.org

unread,

Apr 19, 2012, 2:07:28 PM4/19/12

to

Hi,

my favorites list:

Best read, easy to understand and follow:
Compiler Construction - N. Wirth [PDF (597 KB)]
http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

somwhat old, but good to read: Gries "Compiler Construction for
digital computers"

And probably the most refreshing one: the Lisp 1.5 manual , it has is
an interpreter and compiler in the appendix. (
http://www.softwarepreservation.org/projects/LISP/book/LISP%201.5%20Programmers%20Manual.pdf/view
).

[Appendix B of the Lisp 1.5 manual (which I happen to have in
convenient 1969 paper form) does have a pseudocode interpreter, but
Appendix D about the compiler just describes how to use it, no
listings. And he wouldn't like the Lisp compiler anyway, since then
he'd have to learn LAP. Gries is a good thought, quite concrete and the
target machine is a thinly disguised S/360. -John]

comp...@is-not-my.name

unread,

Apr 19, 2012, 3:05:16 PM4/19/12

to

g...@nospam.ugcs.caltech.edu wrote

> You might be right that all books are one (or more) of those three,
> but then you should choose from among those.

If that would have been an option I would have done it by now.

> C is a nice, simple language in which to describe compiler design, and
> also not so bad a language in which to write compilers. Given that,
> it isn't a bad start toward writing one for another language, and/or
> written in another language, unless you don't know C.

I don't know C and it's not a good choice on z/OS. I am familiar with
assembler and I used to know PL/I fairly well and could come up to speed
with it and use it if it would be a good implementation choice. I don't
think it would though, because it requires the IBM runtime which is licensed.

> Both C and Java are fairly simple, and reasonably similar, with much
> of the complication moved to the library. That means a complete
> compiler can be described relatively simply, covering all the
> important ideas. Once you learn those, you will be ready to go on to
> other languages (both for compiling and writing the compiler in).

I really don't like C or Java and I didn't come out and say it because I
don't mean to start a holy war. I have looked at them. C just doesn't have
much value on z/OS. Java is too limiting in other ways. I am familiar with
the older languages and never was interested in much of anything that came
later although I have some experience with modern scripting languages.

> You don't say what language you are interested in.

I am not exactly sure. I was considering a subset PL/I or PL/M variant or
maybe even a new language. Even a super Pascal or Modula-something would be
interesting to me.

> There are some complications to writing Fortran compilers not covered in
> most books. Both Fortran and PL/I don't have reserved words, which
> requires special handling by the compiler. Fixed form Fortran ignores
> blanks, which requires a special lexical analysis technique, but most of
> compiler theory is in parsing, which is reasonably language independent.

I think Fortran would be hard because I couldn't write the libraries needed
due to my lack of mathematical background. F77 would be an interesting
project, the latest Fortran is way more complicated than I would attempt.

PL/I is interesting because I have access to old and new PL/I compilers and
good doc for z/OS. I figured I could develop the grammar from the manuals
but I know it would be a huge project, more than I have the time and ability
to do and a better PL/I than I or many people could write already exists,
so there's not much point in it.

> It seems to me natural that a pseudo-code description will be more
> theoretical, and harder to understand, than one using a well-known
> high-level language.

That's a bit of a tautology. If the well-known language is something /you/
know then yes! Otherwise..

Aharon Robbins

unread,

Apr 19, 2012, 11:58:08 PM4/19/12

to

It may be showing its age, but I think that "Crafting A Compiler in C"
by Fischer and LeBlanc is worth looking at. The original "Crafting
A Compiler" was written in the late 80s when Ada was going to conquer the
world and was also pushing compiler technology. Circa 1991 I translated
the Ada code and pseudo-code into C for the C version of the book. Once
I did that I finally understood how LR parsers work. :-)

(How did I get to do the work? I did my MS work under Rich LeBlanc at
Georgia Tech and had stayed in touch. :-)
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

glen herrmannsfeldt

unread,

Apr 20, 2012, 3:02:58 AM4/20/12

to

comp...@is-not-my.name wrote:

(snip)

>> It is dated, (as proven by the choice of source and target languages:
>> "Rascal" [Rudimentary Pascal] and IBM 370 assembler) but still an
>> excellent guide for your first attempts at compiler writing.

(snip)

> I don't generally like the Wirth languages because they often have built in
> limitations that make them unsuitable for real work. However they do seem
> amenable to changing them so that they are useful. People took Pascal and
> Modula-2 in new directions and many variations are supposed to be pretty
> good. I'll look at Oberon again now that you mention it. Thanks for your
> post!

One Wirth language that you might find interesting is PL/360.

PL/360 looks like a high-level language but works like assembly
language. As an example (which I am remembering from 40 years ago)

R1=R1+R1+R1;

compiles to

LR R1,R1
AR R1,R1
AR R1,R1

and so multiplies R1 by four. Note that it is low level in that the
registers are represented by variables such as R1. The PL/360
compiler, and its generated code, should run just fine on z/OS. It is
available in PL/360 source, so you can modify it and play with it all
you want.

I might have some idea what you are asking about, though. Much of my
early programming work was with OS/360. First Fortran, but after not
so long PL/I. PL/I was much more fun to program in, but not so many
systems had PL/I compilers available. Also, not so much later I
started S/360 assembler programming.

I worked on many other systems over the years, PDP-10, VAX, 80286
(running MSDOS and later OS/2), Sun, HP, and more. Still, S/360 was
always my favorite.

But I don't understand your refusal to use the tools that are
available. FLEX and BISON are freely available, you can't complain
that they cost too much. You can run them on a freely available OS
(Linux, FreeBSD, Solaris, etc.) on machines that you can find for very
low prices, or often enough given away.

The nice thing about the tools is that you can get something running
fairly fast, and without needing to get too deep into the math. You
can go as deep or shallow into the innards of FLEX and BISON as you
want. One project that should be about right for one person, and
without a lot of math, is rewriting FLEX and BISON to generate code in
another language, such as PL/I.

As I mentioned before, with LCC and the LCC book, it is pretty easy to
write a new code generator without rewriting the rest of the compiler.
Writing a PL/I front end for LCC would be somewhat more work, though.

There were some working on a PL/I front-end for gcc, though I haven't
heard much about that for some time now.

-- glen

[PL/360 was a great little language, but the source code to the
compiler was apparently lost. -John]

glen herrmannsfeldt

unread,

Apr 20, 2012, 3:33:24 AM4/20/12

to

compil...@h-rd.org wrote:

(snip)

> somwhat old, but good to read: Gries "Compiler Construction for
> digital computers"

I believe it is, more or less, older than C so shouldn't have C
dependencies. Also, the camera ready copy was printed on a 1403
printer, which should look familiar to OS/360 users.

(snip)

> [Gries is a good thought, quite concrete and the

> target machine is a thinly disguised S/360. -John]

Many years ago (about 33), just after my undergrad compiler course,I
had the idea of writing a PL/I compiler for VAX using the Gries book.
As usual, it would have been written in the language it compiled, so
it could compile itself. But I got distracted with other things over
the years.

-- glen

BartC

unread,

Apr 20, 2012, 4:45:18 AM4/20/12

to

<comp...@is-not-my.name> wrote in message news:12-0...@comp.compilers...

> I have an undergraduate degree in computer science as well but we didn't
> get into much theoretical stuff.

Same here. I started writing my own compilers out of necessity (to make it
easier to program the microprocessor circuits I was building; it was just
not practical to run someone else's compiler).

But I also created my very simple languages which could do just what I
needed.

I'd heard of the 'proper' parsing methods in my compiler course, but ignored
all that and just did whatever was necessary to tokenise source code, parse
the few statements I had in the language, and write whatever code was
required to run the results. And it worked! I think those first attempts
might even have been single pass compilers, so ultra-simple if that was the
case.

> I really want to understand the parts well enough to write my own pieces
> and not use something other people have written. That method has always been a
> good idea in the past because I was forced to understand how the code
> works. If you use other people's code you can miss things.

(Again, same here. Thirty years on and I still don't think I've ever used
someone else's actual source code (or language). (However I do use some
libraries, and someone else's OS.))

>> I hope this gives you some hope because its not the easiest subject to
>> tackle. There is no substitution to just working at something just
>> starting your own small basic compiler project to understand
>> expressions will give you very much of what you want to know for the
>> basics in my opinion and just work at that in your own time.
>
> Thanks that's exactly what I'm after. I don't have any plans to rock
> the compilation world or write books on the subject. I'd be very excited
> to be able to understand the basics enough to produce something I can use and
> to use that as a path to additional learning on the topic.

Start with a very simple language. Perhaps even a version of Basic (as a
useful language consisting of Let, If, Goto, Print (and perhaps Input) can
be created without any structured statements; only expressions need
recursive methods to deal with). You might consider also interpreting the
language rather than translating to a target language. And use an easy,
dynamic language to implement it all in. Stay away from C, C++, or anything
else with curly braces.

(Sorry, I can't recommend any books because I haven't read any...)

--
Bartc

comp...@is-not-my.name

unread,

Apr 20, 2012, 12:06:22 PM4/20/12

to

tor...@nospam.diku.dk (Torben Fgidius Mogensen) wrote:

> You could have a look at my book "Introduction to Compiler Design"

Thanks for the links.

> The books are language agnostic, describing techniques independently of
> any specific implementation langauge, and mostly independently of the
> source and target languages for the compilers. There is some math, but
> not as much as in some other compiler textbooks I have seen.

I will look them over. It is nice to know the author is available!

> I wrote the book mainly as a reaction to the books available at the
> time, which were either (IMO) too advanced for students just beginning
> the second year of CS or basically walk-throughs of specific compilers
> for and in specific languages and, hence, not generally useful.

That's an issue for me with not much theoretical background and wanting to
at least learn enough so I can go off on my own and do something
useful. Thank you.

comp...@is-not-my.name

unread,

Apr 20, 2012, 12:06:21 PM4/20/12

to

Bakul Shah <use...@nospam.bitblocks.com> wrote:

> Check out Nils M Holm's "Practical Compiler Construction", available
> at lulu.com. It is a 365 page "tour" through a *complete* compiler for
> a subset of C language. The compiler can compile itself and you can
> download the code from author's site (www.t3x.org). It doesn't use
> lex or yacc (just a hand-rolled scanner and a recursive descent
> parser). The compiler is about 4300 lines of code. It describes all
> the key concepts but given the simple design doesn't go into a lot of
> details (beyond describing the code). The book describes a i386 code
> generator. The code generator interface seems well enough abstracted.
> When challenged, Nils put together a x86-64 backend in a day!

Thanks for the description on this. I have downloaded a bunch of free books
over the years and I do have this one. Part of my problem is sorting through
them and understanding if they're something I could use. Your comments are
very helpful especially since you mentioned he explains all the code and
doesn't use external tools.

comp...@is-not-my.name

unread,

Apr 20, 2012, 12:06:22 PM4/20/12

to

compil...@nospam.h-rd.org wrote:

> Best read, easy to understand and follow:
> Compiler Construction - N. Wirth [PDF (597 KB)]
> http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

You guys! Easy to understand and follow, and Wirth in the same sentence?! I
guess it's all relative. I find Wirth terse bordering on cryptic. I thought
it was part of his charm. Still I haven't look at anything he's written for
at least 20 years or so. I should go over it again.

> somwhat old, but good to read: Gries "Compiler Construction for
> digital computers"

That might be a good one especially if it's the same Gries who wrote
PL/C. I have the PL/I and PL/C book by him and Conway but I can't put
my hands on it right now. Thank you for mentioning it.

> And probably the most refreshing one: the Lisp 1.5 manual , it has is
> an interpreter and compiler in the appendix. (
> http://www.softwarepreservation.org/projects/LISP/book/LISP%201.5%20Programmers%20Manual.pdf/view
> ).
>
> [Appendix B of the Lisp 1.5 manual (which I happen to have in
> convenient 1969 paper form) does have a pseudocode interpreter, but
> Appendix D about the compiler just describes how to use it, no
> listings. And he wouldn't like the Lisp compiler anyway, since then
> he'd have to learn LAP. Gries is a good thought, quite concrete and the
> target machine is a thinly disguised S/360. -John]

Thanks John. It looks like that may indeed be a good choice.

Jonathan Thornburg

unread,

Apr 20, 2012, 12:19:53 PM4/20/12

to

comp...@is-not-my.name wrote:
> Is it unreasonable to explain what to do in an algorithmic way without needing
> for the reader to understand the theoretical background?

You might consider

Patricia Anklam, David Cutler, Roger Heinen, Jr., and M. Donald MacLaren,
"Engineering a Compiler: VAX-11 Code Generation and Optimization"
Digital Press, 1982, ISBN 0-932376-19-3

It's a high-level description of the backend (code generation) of the
DEC Vax/VMS PL/1 and C compilers. The book gives a clear presentation
of the somewhat ad-hoc, but successful, design and implementation of
these compilers backend, using very little mathematics.

Once you understand the ideas, translating them to your choice of
target & implementation environment is a much smaller problem...

Having said that, I'll echo others in this thread: learning enough math
to read the Dragon book would be *very* valuable. In fact, even if you
don't grok the math, the Dragon book is still a great read!

ciao,

--
-- "Jonathan Thornburg [remove -animal to reply]" <jth...@astro.indiana-zebra.edu>
Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA
[That's not a bad book, but it's 100% about the code generator, since they bought
the PL/I front end from someone else. -John]

comp...@is-not-my.name

unread,

Apr 20, 2012, 11:07:56 PM4/20/12

to

Hans-Peter Diettrich <DrDiet...@nospam.aol.com> wrote

> IMO the OP will be comfortable with Wirth's books, languages and
> compilers, which are understandable even without a big theoretical
> background. Even if Wirth is concerned with *teaching* compiler
> principles, his languages and compilers are not the toys as many people
> believe. E.g. Oberon implements a complete OS, with the compiler being
> an integrated part of the entire system. From there it's only a small
> step to understanding and implementing e.g. JIT compilers, which require
> an different approach from stand-alone compilers.

I've always found Wirth's terse descriptions tough. I know he was
revolutionary so I will have to go over these again and try harder.
Thanks for your comments.

> Again I suggest the OP to dig into the various (optional) parts of an
> compiler later, when he discoverd a *practical* need/motivation for code
> flow analysis, register allocation etc. Many people (like me ;-) are
> much more open to the theory, when they have practical examples for
> their application *before*.

Very perceptive observations. Thanks for posting them.

> Life is too short for writing an full-blown heavily-optimizing
> production compiler from scratch, including its whole RTL. A beginner
> IMO is better off with a small language and compiler, where he can study
> the related problems, and can find out the areas of his personal interest.

Thank you for your comments. I found them very helpful.

Joe Schmo

unread,

Apr 21, 2012, 4:53:46 AM4/21/12

to

<comp...@is-not-my.name> wrote in message news:12-0...@comp.compilers...

> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing. Everything I've found is either
> gobbledygook equations or "let's use C/C++/Java on UNIX" or things
> that are so trivial and focused they don't explain general cases and
> can't be extended to anything useful.

The first few chapters of "Programming Language Pragmatics" by Michael Scott
for a good and fast overview. (The rest of the book is quite good also if
you are designing your own language).
"Writing Compilers & Interpreter - An Applied Approach" by Ronald Mak.
Creates a Pascal compiler in C which emits x86 assembly language.
The Fischer and LeBlanc authored book is good for the implementation
details. I.e., their particular take on an implementation anyway. It sticks
in my mind that this is a very good book (but I somehow lost the mini review
I made for myself about it) that I will obtain again in the future.

I think the whole "write a grammar and feed it through a tool to produce a
lexer and parser" thing is something to avoid, at least at first (I'm
avoiding it like the plague, FWIW). Surely that paradigm is part of the
reason that programming languages are so complex.
[COBOL managed to have arcane syntax and hundreds of reserved words before
there were any compiler tools at all. That's not it. -John]

Nor Jaidi Tuah

unread,

Apr 21, 2012, 2:30:25 AM4/21/12

to

>I need a way to get started and I can't
> seem to understand the math I have seen so far and I have nobody to
> talk it over with so I really need something more practical.

Perhaps our esteemed moderator can write "Compilers for Dummies".
Seriously! Even people with computer science background are finding
compilers difficult. CS has been dropping theoretical components in
favor of some buzzword-filled modern replacements. I'm sure many cs
students, who can cook up cool javascript animations in no time, find
compiler books impossible to digest.

Nice day
Nor Jaidi Tuah
[Unless you can identify at least 10,000 people who are likely to buy
the book, the publisher can't afford to be interested. And I have to
say that if you have CS degree and are unable to figure out what a
LALR parser does, there's something wrong with your CS degree. -John]

Uli Kusterer

unread,

Apr 21, 2012, 5:22:01 AM4/21/12

to

On 17.04.2012, at 23:28, comp...@is-not-my.name wrote:
> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing.

Hi,

I usually do C-style programming languages, and I'm by no means a
professional compiler writer, but I enjoy parsers, so I've been doing
stuff like this for ages and sponging up any bit of knowledge that
sounded useful.

IMO if you know assembler or BASIC and general algorithms (i.e. you
could implement a binary tree and walk its nodes), and you can somehow
figure out what bytes your code gets compiled to (at worst by writing
a dummy assembler program and looking at what bytes show up between a
bunch of nop instructions whose bytes you know), you should be able to
at least get a basic working knowledge of how a compiler works. Just
write the naove approach of how you would design any program.

The things that I benefited most from learning about compilers were:

- Recursive descent parsers: It's the obvious way to write a parser.
You write a function ReadProgram(), which calls ReadLine() in a loop,
which looks at the first word and then calls ReadIfStatement() if it's
"if" etc. If you find you go wrong, you add a way to either get back
to the last known good state (backtracking) or you just give an error
message. On the way you keep track of the variables in each function
and can later calculate their stack offsets once you know how many you
need etc.

- Tokenizing: Essentially grab all the words in your source text and
build an array with an entry for each so you can more quickly walk
forward and backward without having to scan individual characters. You
can also attach additional information to a token, i.e. an ID number
so you can quickly distinguish user-defined identifiers from known
identifiers like "if" or "repeat". Keep around a token's start offset
so you can report errors.

- Syntax tree: Build a tree structure that represents the parsed program. You
can then do things like evaluate constant nodes ahead of time (e.g. turn 5+4
into 9, but leave 5 + n as an addition node with a 5 and a reference to
variable 'n') by replacing groups of nodes with simpler nodes recursively.
Keep around the line number/first token offset in each node so you can report
errors.

- Code generation: A good starting point I've found is to generate a single
function. Write a little application that takes a file containing the raw
bytes of your function, load the pointer (mark it as executable if your
platform requires that), and then just call it. You write the prolog, the
epilog, and the raw instructions in between, and maybe do some labels. You may
have to remember the offset of a particular field and fill in the address
later on (e.g. for a return statement the offset to the epilog, so you don't
duplicate your clean-up code). Then advance to two functions that call each
other in the same block etc.

- Complete code generation: you generate several functions and put them in a
proper executable file, the way your OS expects. You may have to generate link
table entries ("trampolines") to call dynamically-linked system functions etc.
If you want to start simpler, write your own primitive linker just for the fun
of seeing how two functions that don't reside at fixed (relative) addresses
could call each other.

I can't really say anything about optimization on the compiled code level. I
suppose you'd build a data structure with additional information about each
instruction and then loop over it, looking for patterns that you know can be
simplified. Essentially the same as any other search-and-replace.

Is that un-computer-sciencey enough? This blog post may help with the basics
of my code generation approach:
http://orangejuiceliberationfront.com/generating-machine-code-at-runtime/ (but
it's C, and Intel, and badly wrapped by the new theme of my blog).

Cheers,
-- Uli Kusterer
http://stacksmith.com
[This is a really good summary. Thanks! -John]

Uli Kusterer

unread,

Apr 21, 2012, 5:30:35 AM4/21/12

to

On 19.04.2012, at 21:05, comp...@is-not-my.name wrote:
>> You might be right that all books are one (or more) of those three,
>> but then you should choose from among those.
>
> If that would have been an option I would have done it by now.

But it *is* an option. If a book is written in pseudocode (which you
requested) you would go and write your own program by translating that
pseudocode into whatever your implementation language eventually ends up
being. So if a book is written in C, just treat it like a pseudocode that is
assembler-like, with shorthand for function prologs and epilogs. C is a
"portable assembler". It's a fairly natural fit if you plan on using
assembler.

> C just doesn't have much value on z/OS. Java is too limiting in other ways.

I'm a bit confused now. You said you didn't want to copy and paste. Why does
the code suddenly need to "have value"? Isn't it perfectly fine as an
illustration of the concepts discussed in the book? If you want to learn how
cooking works, why would you worry that the example of cooking soup uses Maggi
soups, while all you have is store brand?

BartC

unread,

Apr 21, 2012, 7:01:55 AM4/21/12

to

"Hans-Peter Diettrich" <DrDiet...@aol.com> wrote in message

> Life is too short for writing an full-blown heavily-optimizing
> production compiler from scratch, including its whole RTL.

Especially when there might only be difference of 2 or 3 times between
performance of the best and worst code.

My own compiler for x86-32 generates pretty awful code, and on a small
handful of mostly numeric benchmarks, it averages out about 2.5 x as
slow as gcc on it's highest optimisation setting. But, gcc often
recognises these benchmarks as doing nothing useful, so removes whole
sections of code!

The true factor is probably between 1 and 2, and for critical code, I just
use inline assembly code, so it's not real problem. I'm rewriting that
compiler at the moment, and will probably achieve somewhat better
performance, but don't worry about it too much.

--
Bartc

Jonathan Thornburg

unread,

Apr 21, 2012, 11:04:14 AM4/21/12

to

BartC <b...@freeuk.com> wrote:
> Start with a very simple language. Perhaps even a version of Basic (as a
> useful language consisting of Let, If, Goto, Print (and perhaps Input) can
> be created without any structured statements; only expressions need
> recursive methods to deal with). You might consider also interpreting the
> language rather than translating to a target language. And use an easy,
> dynamic language to implement it all in. Stay away from C, C++, or anything
> else with curly braces.
>
> (Sorry, I can't recommend any books because I haven't read any...)

Another book of interest is

P. J. Brown
"Writing Interactive Compilers and Interpreters"
(Wiley, 1981)

It's a completely non-mathematical (& pretty "basic", i.e., assuming
very little prior knowledge) tour through what's needed to write a
Basic interpreter.

--
-- "Jonathan Thornburg [remove -animal to reply]" <jth...@astro.indiana-zebra.edu>
Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA

[It's not a bad book. It's quite old, but you can usually find a copy in the
usual online used bookstores. -John]

BGB

unread,

Apr 21, 2012, 9:58:29 PM4/21/12

to

On 4/21/2012 2:22 AM, Uli Kusterer wrote:
> On 17.04.2012, at 23:28, comp...@is-not-my.name wrote:
>> Guys, I'm having a bear of a time finding a good practical language
>> and OS agnostic text on writing a compiler. I'm weak in math and not
>> interested in the theoretical details. I want to understand the hows
>> and whys of compiler writing.
>
> Hi,
>
> I usually do C-style programming languages, and I'm by no means a
> professional compiler writer, but I enjoy parsers, so I've been doing
> stuff like this for ages and sponging up any bit of knowledge that
> sounded useful.

agreed.

I am mostly implemented my own custom scripting language (for my own
uses), and wrote a C compiler before, although the C compiler "decayed"
mostly into being a code-processing tool (mostly as maintaining a full C
compiler turned out to be "not terribly useful" in my use-case).

as-is, the scripting language is probably at a vaguely similar
complexity level with C (in some ways it is simpler, and in others
likely more complex), as it gradually moves in the direction of being
"essentially" statically-typed, has pointers, class/instance OO, ...

(decided to leave out a bunch of stuff related to some of the hassles of
static type-checking, and dealing with the matter of having a scoping
model where the scope is both mutable and involves indirect visibility,
meaning that variable assignments can alter what bindings may be
potentially visible from a given piece of code, making determining all
this statically a relatively non-trivial problem).

I have recently been looking some at moving more logic for doing some of
this from the VM back-end into the front-end, as there are likely some
cases that the front-end can deal with more easily but for which it
would be a hassle to add to the back-end (at the cost of adding a big
pile more VM bytecode ops).

now, how is any of this simpler than a C compiler?
at least at present my main target is an interpreter, and weak
performance is generally passable.

> IMO if you know assembler or BASIC and general algorithms (i.e. you
> could implement a binary tree and walk its nodes), and you can somehow
> figure out what bytes your code gets compiled to (at worst by writing
> a dummy assembler program and looking at what bytes show up between a
> bunch of nop instructions whose bytes you know), you should be able to
> at least get a basic working knowledge of how a compiler works. Just
> write the naove approach of how you would design any program.

yep.

> The things that I benefited most from learning about compilers were:
>
> - Recursive descent parsers: It's the obvious way to write a parser.
> You write a function ReadProgram(), which calls ReadLine() in a loop,
> which looks at the first word and then calls ReadIfStatement() if it's
> "if" etc. If you find you go wrong, you add a way to either get back
> to the last known good state (backtracking) or you just give an error
> message. On the way you keep track of the variables in each function
> and can later calculate their stack offsets once you know how many you
> need etc.

although I use recursive descent, the above sounds different from what I
usually do.

the general structure of my parsers is more like:
ReadBlock: Reads statements until EOF or '}' is seen;
calls ReadBlockStatement in a loop.

ReadBlockStatement:
Peeks token;
Sees if it is a known block-statement keyword (if/for/while/...):
Invokes appropriate parsing logic for each keyword.
Try to determine if it is a declaration:
See if parsing as a declaration works.
call ReadStatement
call EatSemicolor

ReadStatement:
checks for and handles vaious statement types
calls ReadExpression

ReadExpression:
(actually, this is a tower of functions, one for each precedence
level, working from lowest to highest precedence)
typically (not all precedence levels are exactly the same):
read expression at next lower level (n1)
while(next token is a recognized operator)
read operator token
read expression at next lower level (n2)
replace n1 with 'n1 <op> n2'
(eventually) call ReadLiteral

ReadLiteral:
peek token;
dispatch to appropriate handler if a keyword;
checks for token types, parsing out whatever:
numbers become numeric literals;
identifiers become loads;
string tokens become string literals;
...

typically, the result of all of this is to produce a syntax tree.

C is a little uglier here, since it requires keeping track of typedefs
and checking for typedef when trying to parse declarations.

> - Tokenizing: Essentially grab all the words in your source text and
> build an array with an entry for each so you can more quickly walk
> forward and backward without having to scan individual characters. You
> can also attach additional information to a token, i.e. an ID number
> so you can quickly distinguish user-defined identifiers from known
> identifiers like "if" or "repeat". Keep around a token's start offset
> so you can report errors.

partly agreed, except that it isn't really strictly necessary to build
an array up-front.

most of my parsers don't bother using an array, but instead just call
the tokenizer function directly to peek and parse tokens based on the
current "stream position" (generally a raw char pointer in my language
parsers).

the version used in my C compiler currently uses a small hash table
(sort of, the "hash function" is simply to keep only the low-order bits
of the address) to avoid re-reading the same token multiple times.

practically though, it may be both higher performance and generally
cleaner to just build a token array up-front.

I had considered using a table in the C compiler, but the "hash" option
was less effort (the parser was already written in this case, so the
hash could be added via a quick hack, rather than by making significant
changes to the rest of the parser to use an token array instead of
direct function calls).

a potential advantage of not using an array though is not needing to
allocate memory for it.

> - Syntax tree: Build a tree structure that represents the parsed program. You
> can then do things like evaluate constant nodes ahead of time (e.g. turn 5+4
> into 9, but leave 5 + n as an addition node with a 5 and a reference to
> variable 'n') by replacing groups of nodes with simpler nodes recursively.
> Keep around the line number/first token offset in each node so you can report
> errors.

agreed.

in my case, I have generally used either "lists" built from "cons cells"
for this purpose, or in other cases a variant of XML DOM nodes.

I had actually thought this was fairly standard practice, until looking
into it and observing that, no, apparently most people store their
parse-trees in raw structures. either way I guess.

typically, my parsers do almost no direct processing on the ASTs while
building them, apart from building a semantic representation of the code
as-seen.

typically, things like simplifying expressions, propagating constants,
... is done in a stage I have typically called "reduction", which
happens between parsing and prior to producing (bytecode) output.

> - Code generation: A good starting point I've found is to generate a single
> function. Write a little application that takes a file containing the raw
> bytes of your function, load the pointer (mark it as executable if your
> platform requires that), and then just call it. You write the prolog, the
> epilog, and the raw instructions in between, and maybe do some labels. You may
> have to remember the offset of a particular field and fill in the address
> later on (e.g. for a return statement the offset to the epilog, so you don't
> duplicate your clean-up code). Then advance to two functions that call each
> other in the same block etc.
>
> - Complete code generation: you generate several functions and put them in a
> proper executable file, the way your OS expects. You may have to generate link
> table entries ("trampolines") to call dynamically-linked system functions etc.
> If you want to start simpler, write your own primitive linker just for the fun
> of seeing how two functions that don't reside at fixed (relative) addresses
> could call each other.

at this point, things vary go differently, and several more stages are
involved in my case:

- typically, after the AST has been "reduced", it will be transformed
into bytecode (basically, walk the AST, figure out expression types when
relevant, and spit out any relevant bytecode ops).

previously, only a very small amount of type-analysis was done here, but
I am looking to expand this (partly to simplify the work being done in
the interpreter back-end), such that the output produced here will be
"mostly statically typed".

I am looking at going to producing bytecode output more similar to
Java-ByteCode, basically where many opcodes are qualified for specific
types, rather than leaving it solely to the back-end to figure out what
the types are. this is a little lame, given I found it elegant how, for
example, MSIL/CIL had just used a few simple operations, and the
back-end would do its thing, but some things are currently more awkward
here, and at-present this leaves some awkward holes in the type-system.

it all still leaves a bit of uncertainty though (as to whether
adding/using piles of type-annotated opcodes is really "the right
direction to be going").

- typically (assuming the interpreter), the bytecode will be processed
and then transformed into threaded code, producing a threaded-code
version of the program. there are some parts which use generated native
code thunks, but for the most part, pretty much all of the machinery
takes place in native C functions. as-is it proves problematic to do any
non-trivial scope and type-analysis at this stage, short of making the
threaded-code logic contain most of the "hard parts" of a full native
code-generator (as well as likely making it considerably slower).

some past experiments (special purpose tests), have implied that
potentially threaded code can pull off "reasonably solid" interpreter
performance (potentially within 3-5x of native).

however, at present it is considerably slower (at least with tests like
the recursive Fibonacci sequence and bubble-sort, with most "practical"
code the slowdown seems to be smaller, but typically because most of the
actual logic takes place in native code in these cases).

- if a JIT / native code-generator is being used, the process works like
this:
complex operations are generally decomposed into simpler ones, and much
of the type information from the prior stage is discarded (the
code-generator uses its own type-analysis, unlike the "dumb"
interpreter, 1);
it will basically start figuring out how types propagate through the
stack and through variables;
it will map out operation to physical storage and/or registers;
it will then spit out the generated assembler code.

1: the interpreter uses some amount of explicit type information
already, as well as a number of "complex compound operations". these
are, while fairly helpful regarding interpreter performance, fairly
pointless for generating native code, so most such operations are broken
down into "simpler primitive operations" (for example, a single opcode
for "perform 2 loads, compare the values, and perform a conditional
jump", is less useful than the individual operations for doing so).

the JIT for my scripting language is not fully written or operational,
but partial results thus far have looked promising, but this is a low
priority. mostly it has big holes is the ISA and type-system (currently,
only the dynamically-typed path is really implemented).

- any ASM produced by the JIT is fed into the assembler, converting it
into a COFF module (COFF is used here regardless of whether the target
OS is Windows or Linux).

(note: the assembler is currently used by a number of different
components, so the lack of a currently functional JIT doesn't mean it is
goes unused).

- COFF modules are fed into an in-program linker, which proceeds to link
them against the current program image. there is some amount of trickery
here, involving potentially both invoking code-generation handlers, as
well as mechanisms to determine whether or not to "proxy" function calls
(IOW: whether to link a function call directly against the target
address, or use an indirect jump through another pointer).

reasons a proxy may be used: the call is out-of-range (on x86-64, this
usually means the call has gone outside the +-2GB window), or, either
the code has been "hot patched" or the linker has reason to suspect that
such "hot patching" is likely to occur (currently, this involves
heuristics).

> I can't really say anything about optimization on the compiled code level. I
> suppose you'd build a data structure with additional information about each
> instruction and then loop over it, looking for patterns that you know can be
> simplified. Essentially the same as any other search-and-replace.

this is one possibility...

the strategy used by the codegen backend for my C compiler involved
"large numbers of special cases", but this ultimately proved a bit
problematic to debug (this type of "optimization" turns out to be a
debugging mess in most cases where one doesn't have fairly immediate
feedback as to when/where/how something has broken).

funny though, is even just using "straightforward" strategies, like
caching variables in registers, ... and the code still manages to be a
bit more optimized-seeming than what usually seems to end up being spit
out by MSVC.

I have not often been entirely impressed by what I have seen being spit
out by MSVC, and my own code-generators have often been able to
outperform it (and be competitive with GCC in my tests), however...:
making my code generators "actually work" often proves a bit more difficult.

hence why I am still mostly using an interpreter-based strategy:
it is much easier IME to maintain and debug the interpreter, whereas IME
my native code generators have more often tended to blow up in my face
than actually work reliably, and it is much more difficult than it would
seem to dig through a big pile of assembler output looking for whatever
little bug is causing the code not to work.

only "I am using an interpreter" sounds a lot less impressive than "I am
using a JIT compiler", although I remember at least one VM I know of
which had claimed to be using a JIT, but the thing was using stupid
threaded code (in pretty much the same way I am using it in my
interpreter), so I was not terribly impressed...

I guess there would always be a "cheap but lame" way to write a JIT
(and, actually, how my first working JIT worked):
just use fixed-form globs of ASM for each VM opcode, and simply append
them together in-series (emitting prologues/epilogues and labels and
so-on as-needed).

ironically, at the time, I was getting nearly the same performance as
unoptimized GCC output doing this, so I guess it probably wouldn't be
entirely unworkable (rather than trying to write more complex
code-generators with type-analysis and register allocation and so on...).

( it sometimes almost makes me wonder if I am incompetent or something
in these regards. )

> Is that un-computer-sciencey enough? This blog post may help with the basics
> of my code generation approach:
> http://orangejuiceliberationfront.com/generating-machine-code-at-runtime/ (but
> it's C, and Intel, and badly wrapped by the new theme of my blog).
>

nifty...

Hans-Peter Diettrich

unread,

Apr 22, 2012, 6:41:55 AM4/22/12

to

BartC schrieb:

> "Hans-Peter Diettrich" <DrDiet...@aol.com> wrote in message
>> Life is too short for writing an full-blown heavily-optimizing
>> production compiler from scratch, including its whole RTL.
>
> Especially when there might only be difference of 2 or 3 times between
> performance of the best and worst code.
>
> My own compiler for x86-32 generates pretty awful code, and on a small
> handful of mostly numeric benchmarks, it averages out about 2.5 x as
> slow as gcc on it's highest optimisation setting. But, gcc often
> recognises these benchmarks as doing nothing useful, so removes whole
> sections of code!

I never trust benchmarks, in detail when supplied by compiler vendors :-]

When we had several workstations for evaluation, the AIX workstation
came with a benchmark program executing in about 3 seconds on that
machine. On other machines the time ranged from 8 minutes to more than
an hour.

When I added a printf after the loop in the benchmark code, it also took
about 8 minutes! The benchmark turned out to test the compiler *default*
optimization settings, where the HP station was so incredibly slow
because it had turned off *any* optimization by default.

From that experience I also learned what optimization means in
practice. A dumb compiler, or with all optimizations disabled, may fetch
every expression operand from memory, and write back every intermediate
result (SSA?), which is nice when debugging some code. Thus *register
allocation* turned out to be a basic optimization of high value - I
wonder how somebody could ever design a CPU (with more than 8 bit
registers) with as few registers as available on most Intel processors.
Dead code elimination and moving loop-invariant computations out of
loops are the next optimizations to consider. And CSE...

Another compiler, for an M68000, turned out to be a very quick&dirty
port of an M6800 compiler. According to "an int is a pointer, a pointer
is an int" it kept all operands in the M68K *address* registers, so that
every logical or arithmetic operation had to be done in a subroutine,
which moved the operands from A0 and A1 into *data* registers, evaluated
and stored back the result into A0. The caller had to move the operands
into A0 and A1 before, of course. In my test programs every single C
statement resulted in about 3 pages of assembler code! [1]

But despite that horrible code generation, this compiler came with a
very powerful graphics and window software, with amazing runtime
behaviour. This observation again relativates the need for optimization,
at least in GUI applications.

[1] This compiler, and my rudimentary knowledge of C and compilers and
runtime libraries, made me write my first C decompiler in the late 80's.
Of course in Basic, which was the language I had used on all my
homecomputers before. Hereby I learned more and more about bad and good
practices in compiler writing, encountering any number of workarounds
for adopting old 8-bit compilers to 16/32 bit code and memory layouts.

DoDi

comp...@is-not-my.name

unread,

Apr 22, 2012, 6:10:21 AM4/22/12

to

> Aharon (Arnold) Robbins wrote:
>
> It may be showing its age, but I think that "Crafting A Compiler in C"
> by Fischer and LeBlanc is worth looking at. The original "Crafting
> A Compiler" was written in the late 80s when Ada was going to conquer the
> world and was also pushing compiler technology. Circa 1991 I translated
> the Ada code and pseudo-code into C for the C version of the book. Once
> I did that I finally understood how LR parsers work. :-)

The Ada version sounds intriguing. Thanks for mentioning it. I've done
similar things with non-compiler projects (translate something from one
language to another very different one). It helps greatly with understanding
as you said.

Uli Kusterer

unread,

Apr 22, 2012, 6:53:45 AM4/22/12

to

On 22.04.2012, at 03:58, BGB wrote:
> as-is, the scripting language is probably at a vaguely similar
> complexity level with C (in some ways it is simpler, and in others
> likely more complex), as it gradually moves in the direction of being
> "essentially" statically-typed, has pointers, class/instance OO, ...

My programming language, Hammer, goes in the opposite direction. It's
essentially dynamically typed scripting language (at least as far as the user
is concerned). Every variable is a variant, but I keep values around in their
native type. Also, the syntax is very English-like and the intention is that a
user *can not* make it crash. Of course, in practice that may not hold true
due to bugs, but it is different than C, where there are a lot of "user
errors" that are well in their rights to cause a crash.

> (decided to leave out a bunch of stuff related to some of the hassles of
> static type-checking, and dealing with the matter of having a scoping
> model where the scope is both mutable and involves indirect visibility,
> meaning that variable assignments can alter what bindings may be
> potentially visible from a given piece of code, making determining all
> this statically a relatively non-trivial problem).

Yeah, that sounds more like it ;-)

> now, how is any of this simpler than a C compiler?
> at least at present my main target is an interpreter, and weak
> performance is generally passable.

The nice part about writing your own interpreter is that you don't have to
mess as much with predefined platform specifics. You can define instructions
the way you need them to make them easy to parse. Also, it's easy to write a
debugger into your interpreter loop, and check out the contents of your stack
and lots of other things.

> although I use recursive descent, the above sounds different from what I
> usually do.

> (...) typically, the result of all of this is to produce a syntax tree.

That may be more due to my lacking English skills than actual differences.
I've simplified a lot, and left out expression parsing in the description, to
get the big picture across. My parser and syntax tree are written in C++ and
nodes have a class hierarchy starting at a basic Node and progressing to
things like a function call (with an array of sub-nodes for parameters, and a
"name" string field) etc.

My expression parser is kinda fun, because it essentially walks the subtree
of the expression currently parsed, looking at the precedence of each operator
(a simple numeric value), and then either inserts itself as the next param of
the deepest, rightmost node, or inserts itself in the middle of the tree
somewhere, taking what was previously in its slot as its left argument. So
precedence actually gets worked out during building of the tree fairly easily,
with a loop in one function call, not several functions for different
precedence levels.

That reminds me, I should check whether I've implemented right-associative
operators yet. They're fairly easy though, just a slight flip on the criteria
when something gets appended or inserted.

> C is a little uglier here, since it requires keeping track of typedefs
> and checking for typedef when trying to parse declarations.

My scripting language doesn't really have declarations, but it has something
similar in that any identifier can be used as a one-word string constant in
many places, and there aren't really any reserved identifiers. So originally I
used to look if there was a variable of that name already in use, and then
generated a variable reference, otherwise treated it as an unquoted string
constant.

Later, I realized that the former is sufficient, if I pre-fill the variable
with its own name as a string. That way, my "do" command (think eval(), it
executes a string as code, and has access to the current function's local
variables etc.) can contain a command that uses what I think is a constant is
turned into a variable. The same thing could happen in loop conditions, where
the first time round nothing has been put into a variable yet, so it's treated
as a string, then inside the loop it gets turned into a variable.

Fun little details of programming languages :-)

> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.
>
> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
> parsers).
>
> the version used in my C compiler currently uses a small hash table
> (sort of, the "hash function" is simply to keep only the low-order bits
> of the address) to avoid re-reading the same token multiple times.
>
> practically though, it may be both higher performance and generally
> cleaner to just build a token array up-front.

Yeah, my main reason was that my language, Hammer, due to its verbose,
english-like syntax, can require a lot of look-ahead. So it just made the code
much simpler, and much, much more maintainable. Also, it's very convenient to
have a struct with everything we know about the current token while debugging,
including, if you want, its line number to more easily find it in the source
code than using a pointer or global offset.

> I had considered using a table in the C compiler, but the "hash" option
> was less effort (the parser was already written in this case, so the
> hash could be added via a quick hack, rather than by making significant
> changes to the rest of the parser to use an token array instead of
> direct function calls).

I actually have myToken = GoNextToken(), myToken = GoPrevToken() convenience
calls around my token array.

> a potential advantage of not using an array though is not needing to
> allocate memory for it.

Definitely, though the overhead is small if you reference your code instead
of storing copies of strings, and stick to small numeric values for the rest
of the ivars, too. Depending on average token length, it comes out to about
the same size as your source script, or slightly larger allowing for lots of
operators etc.

>> - Syntax tree: Build a tree structure that represents the parsed program.
You
>> can then do things like evaluate constant nodes ahead of time (e.g. turn
5+4
>> into 9, but leave 5 + n as an addition node with a 5 and a reference to
>> variable 'n') by replacing groups of nodes with simpler nodes recursively.
>> Keep around the line number/first token offset in each node so you can
report
>> errors.
>
> agreed.
>
> in my case, I have generally used either "lists" built from "cons cells"
> for this purpose, or in other cases a variant of XML DOM nodes.

So I guess you're building a more dynamic structure, not unlike a
dictionary/hash map? My low-level programming instincts kinda yell at me that
this is inefficient, but honestly, the parse tree is probably the last place
where one needs to worry about that. At least until one actually finds a
situation where one runs out of memory or finds it to be a performance
bottleneck. Premature optimization, root of all evil, yadda, yadda.

> I had actually thought this was fairly standard practice, until looking
> into it and observing that, no, apparently most people store their
> parse-trees in raw structures. either way I guess.

Yeah, I use C++ classes, but then it could be argued that C++ doesn't really
have *real* classes, so I guess it's just a glorified struct as well. I have
some virtual methods, which make walking the tree as easy as kicking off a
recursive function call, but apart from that, they're very struct-y.

> typically, my parsers do almost no direct processing on the ASTs while
> building them, apart from building a semantic representation of the code
> as-seen.

I certainly don't simplify nodes *while* building the tree. I actually
intentionally start out with the stupidest representation that is still
correct (i.e. I pick where to insert expression nodes so evaluating the tree
gives the right result, but that's it). That way, I can see parser errors
easily and early on. I can debug-dump my tree with a recursive function call.

And *then* I call Simplify(). Which does a recursive depth-first traversal
and does simple optimizations. So each type of node only needs to know how to
simplify itself. E.g. operators know how to check their operands whether they
are constant or not (*after* calling Simplify() on them), and can then replace
themselves with their result if both are.

> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.

Yes, that sounds about what I do. I just used a dumber word :-)

> at this point, things vary go differently, and several more stages are
> involved in my case:
>
> - typically, after the AST has been "reduced", it will be transformed
> into bytecode (basically, walk the AST, figure out expression types when
> relevant, and spit out any relevant bytecode ops).

Yup. I recursively call GenerateCode(), passing a CodeBlock object into which
the raw instructions get written.

> previously, only a very small amount of type-analysis was done here, but
> I am looking to expand this (partly to simplify the work being done in
> the interpreter back-end), such that the output produced here will be
> "mostly statically typed".

I don't have that currently, but what I did in an earlier approach (where my
code actually compiled to C++), I had a "nail down types" phase. What it
essentially did was ask up the hierarchy to find out which types each
expression could have with its current parameters (I have about 5 types, so it
was just a bit field), and then pick the narrowest type that fit. Often it
turned out that somewhere at the end there was a +1 or so, which meant I ended
up with the widest numeric type or so. Or there was concatenation involved and
I knew the end result would be a string. But of course, often it was just a
variant as well.

> I am looking at going to producing bytecode output more similar to
> Java-ByteCode, basically where many opcodes are qualified for specific
> types, rather than leaving it solely to the back-end to figure out what
> the types are. this is a little lame, given I found it elegant how, for
> example, MSIL/CIL had just used a few simple operations, and the
> back-end would do its thing, but some things are currently more awkward
> here, and at-present this leaves some awkward holes in the type-system.

Yeah. In an early version, I just chickened out and always picked the widest
type. I.e. the + operator just always did floating point maths. These days, I
check the operand types at runtime and fall back on integer maths for stuff
like addition, subtraction and multiplication when possible.

> it all still leaves a bit of uncertainty though (as to whether
> adding/using piles of type-annotated opcodes is really "the right
> direction to be going").

It might give you better performance, particularly once you go JIT (if you
do). And it saves you a bunch of branching instructions during runtime, which
are a major slowdown these days on modern CPUs.

> - typically (assuming the interpreter), the bytecode will be processed
> and then transformed into threaded code, producing a threaded-code
> version of the program. there are some parts which use generated native
> code thunks, but for the most part, pretty much all of the machinery
> takes place in native C functions. as-is it proves problematic to do any
> non-trivial scope and type-analysis at this stage, short of making the
> threaded-code logic contain most of the "hard parts" of a full native
> code-generator (as well as likely making it considerably slower).

"threaded-code"? *looks it up on Wikipedia* Ah, so that's what it's called
when you just spit out a bunch of CALL instructions... Good to know :-) I
build a bytecode, with an index into an instruction array and two param
fields. I thought about generating CALL instructions instead, but it's easier
to debug this way, and subroutines in my language all have a variable number
of parameters, so I thought the advantage of avoiding one function pointer
dereference probably won't gain me much. And I'm still in the "get this dang
thing working" phase, after all.

> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).
>
> however, at present it is considerably slower (at least with tests like
> the recursive Fibonacci sequence and bubble-sort, with most "practical"
> code the slowdown seems to be smaller, but typically because most of the
> actual logic takes place in native code in these cases).

Yeah. My experiences are similar. Hammer is what the developer of another
HyperTalk-descended language called a "Very High Level Language" and is more
of the CISC than the RISC mindset, so the number of actual bytecode
instructions is comparatively small compared to the amount of actual
instructions the functions behind each bytecode are made up of.

> - if a JIT / native code-generator is being used, the process works like
> this:
> complex operations are generally decomposed into simpler ones, and much
> of the type information from the prior stage is discarded (the
> code-generator uses its own type-analysis, unlike the "dumb"
> interpreter, 1);

That might actually make sense, considering all the temporaries, registers
and whatever that native code "adds" to the original source code. Will have to
keep in mind if I'm ever crazy enough to do that JIT :-)

> - COFF modules are fed into an in-program linker, which proceeds to link
> them against the current program image. there is some amount of trickery
> here, involving potentially both invoking code-generation handlers, as
> well as mechanisms to determine whether or not to "proxy" function calls
> (IOW: whether to link a function call directly against the target
> address, or use an indirect jump through another pointer).
>
> reasons a proxy may be used: the call is out-of-range (on x86-64, this
> usually means the call has gone outside the +-2GB window), or, either
> the code has been "hot patched" or the linker has reason to suspect that
> such "hot patching" is likely to occur (currently, this involves
> heuristics).

Yeah, some of my early native-code tests also had a custom loader (seems to
be identical to your "in-program linker"). I essentially stored the offset to
the trampoline "external symbol table" in the file, and then fixed up the CALL
instructions to point to the symbols I looked up at runtime.

> the strategy used by the codegen backend for my C compiler involved
> "large numbers of special cases", but this ultimately proved a bit
> problematic to debug (this type of "optimization" turns out to be a
> debugging mess in most cases where one doesn't have fairly immediate
> feedback as to when/where/how something has broken).

Yeah. At the least one would have to be very disciplined during code
generation and keep information on all the jump-in and jump-out points, so one
doesn't combine two instructions from different branches. But then I suppose
one would have to keep track of those anyway, so all offsets can be fixed up
if an instruction is removed.

> funny though, is even just using "straightforward" strategies, like
> caching variables in registers, ... and the code still manages to be a
> bit more optimized-seeming than what usually seems to end up being spit
> out by MSVC.

Oooo... burrrrn ;-)

> I have not often been entirely impressed by what I have seen being spit
> out by MSVC, and my own code-generators have often been able to
> outperform it (and be competitive with GCC in my tests), however...:
> making my code generators "actually work" often proves a bit more
difficult.

TBH, it's easy to generate faster code if it doesn't have to be correct. :-p
Isn't that how SSE and OpenGL and all that stuff came about? They found some
trade-offs they could make for a particular problem domain that would make it
faster. Effectively, you're making a trade-off for correctness ;-)

> hence why I am still mostly using an interpreter-based strategy:
> it is much easier IME to maintain and debug the interpreter, whereas IME
> my native code generators have more often tended to blow up in my face
> than actually work reliably, and it is much more difficult than it would
> seem to dig through a big pile of assembler output looking for whatever
> little bug is causing the code not to work.

Fully agreed. Hence the suggestion to generate bytecode first. But of course,
if you plan to make an actual native code compiler, pointers in that direction
don't necessarily hurt.

> only "I am using an interpreter" sounds a lot less impressive than "I am
> using a JIT compiler", although I remember at least one VM I know of
> which had claimed to be using a JIT, but the thing was using stupid
> threaded code (in pretty much the same way I am using it in my
> interpreter), so I was not terribly impressed...

I would like to see performance comparison between the two, say on ARM and
Intel. Does a function pointer de-ref and a loop really make that much
difference in practice? OK, Hammer has lots of slowdowns, and its main loop
isn't that simple, so there's a few exit flags to check each go through the
loop, and a call to update a source level debugger, but hey...

> I guess there would always be a "cheap but lame" way to write a JIT
> (and, actually, how my first working JIT worked):
> just use fixed-form globs of ASM for each VM opcode, and simply append
> them together in-series (emitting prologues/epilogues and labels and
> so-on as-needed).

Actually, I could imagine that, with a good optimizer, that is probably the
common way of doing things. And it's also easy to maintain.

Cheers,
-- Uli Kusterer
http://stacksmith.org

comp...@is-not-my.name

unread,

Apr 22, 2012, 7:10:27 AM4/22/12

to

glen herrmannsfeldt <g...@nospam.ugcs.caltech.edu> wrote:

> One Wirth language that you might find interesting is PL/360.
>
> PL/360 looks like a high-level language but works like assembly
> language. As an example (which I am remembering from 40 years ago)

I have the doc and compiler code, everything was released into the
public domain. It's so close to assembler I didn't consider it might
be a good way to learn to write a compiler but maybe it is. A 10
second look shows it's pretty heavily abstracted. I will have to spend
more time on it but it may be more a tribute to the traditional Wirth
terseness than something to learn from, at least without the professor
around to ask questions of.

> But I don't understand your refusal to use the tools that are
> available.

As I said they're not available on my target platform.

> FLEX and BISON are freely available, you can't complain that they cost too
> much.

True but irrelevant!

> You can run them on a freely available OS (Linux, FreeBSD, Solaris, etc.)
> on machines that you can find for very low prices, or often enough given
> away.

Ok but those aren't my targets. I'm not interested in using those for this
project, as I said. And I would really like to understand what I am doing
and the way I have always done that is to write my own code. Why is that
upsetting (hard to understand, etc.) to you? I haven't mentioned the cost of
anything, I'm not sure where you are coming from here.

> The nice thing about the tools is that you can get something running
> fairly fast, and without needing to get too deep into the math. You
> can go as deep or shallow into the innards of FLEX and BISON as you
> want. One project that should be about right for one person, and
> without a lot of math, is rewriting FLEX and BISON to generate code in
> another language, such as PL/I.

It would be nice to "get something running fairly fast" but if I do that
depending on other pieces I don't understand it doesn't really help me. I
want to learn as much as I can doing this.

> [PL/360 was a great little language, but the source code to the
> compiler was apparently lost. -John]

I believe Jay Maynard is hosting several PL/360 packages. AFAIK they are
complete.
[If so, they'd be a good place to start. It's basically an assembler with
Algol syntax, so it has a real parser. -John]

Uli Kusterer

unread,

Apr 22, 2012, 7:12:25 AM4/22/12

to

On 22.04.2012, at 12:53, Uli Kusterer wrote:
> Fully agreed. Hence the suggestion to generate bytecode first. But of
course, if you plan to make an actual native code compiler, pointers in that
direction don't necessarily hurt.

Huh. Just realized I never actually *wrote* that suggestion, it seems. So I
guess I should re-qualify that as:

Yes, I agree, it is actually a good idea to start with a simple bytecode
interpreter first and to compile against that, before you try to generate your
own assembler. It may look like more work, but it lets you get one module
working before you have to figure out the other one.

Thanks for pointing that out, BGB.

While we're on things I wrote that are missing something I wanted to say:

> I build a bytecode, with an index into an instruction array and two param
fields. I thought about generating CALL instructions instead, but it's easier
to debug this way, and subroutines in my language all have a variable number
of parameters, so I thought the advantage of avoiding one function pointer
dereference probably won't gain me much. And I'm still in the "get this dang
thing working" phase, after all.

Another reason I'm doing the bytecode is that I'm hoping I can eventually
just save that to disk. I would love to get this thing running on iOS
eventually, and having files contain compiled x86 code in them wouldn't go
over too well on that ARM CPU :-) Also, if I save byte code (right now I save
source code) would be a slight added level of obfuscation for people who want
to ship their programs commercially (compared to aforementioned source code),
and the ability to make sure only instructions I consider "safe" can actually
be used.

Cheers,
-- Uli Kusterer
http://stacksmith.org

BartC

unread,

Apr 22, 2012, 7:51:44 AM4/22/12

to

"BGB" <cr8...@hotmail.com> wrote in message

> On 4/21/2012 2:22 AM, Uli Kusterer wrote:

>> - Recursive descent parsers: It's the obvious way to write a parser.

> although I use recursive descent, the above sounds different from what I
> usually do.

> ReadStatement:
> checks for and handles vaious statement types
> calls ReadExpression
>
> ReadExpression:
> (actually, this is a tower of functions, one for each precedence
> level, working from lowest to highest precedence)

Sounds like it's influenced by the C grammar, which defines expressions
using something like 13 or 17 layers of precedence.

Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
don't use any precedence in the grammar at all; I have precedence as an
attribute of an operator, and an expression can be parsed with a single
function.

Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
the binary operators linking successive terms, while readterm() does all
the real work (since my syntax doesn't distinguish between expressions and
statements, that's quite a big workload).

>> - Tokenizing: Essentially grab all the words in your source text and
>> build an array with an entry for each so you can more quickly walk
>> forward and backward without having to scan individual characters.

> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.

> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
> parsers).

I've tried reading all the tokens in a separate pass, but didn't really like
it. And it takes a lot more storage as well, especially with macro
expansions.

Instead I read them as I go along, but with provision for a one-symbol
look-ahead.

>> - Syntax tree: Build a tree structure that represents the parsed program.
>> You

> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.

I use the following passes (which seem to be fairly typical):

Syntax analysis (lexing and parsing)
Name resolution (a recent introduction for me)
Type analysis (static type checks and coercions, constant folding)
Code generation (to intermediate code or to byte-code)
Final pass (from intermediate code to the target code)

Usually invoked one after the other for the entire module, where a
compile-time expressions is needed, then the first three passes have to be
done immediately (and the result had better be a constant value..)

I use the same structure now when generating byte-code (originally such a
compiler was just single-pass). Because such code is usually dynamically
typed, the type analysis pass only needs a nominal amount of work, but still
takes care of a few things (l-values for example).

> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).

Assuming you're talking about dynamic typing, I found it difficult to get
within 3-5x, unless some sort of type hinting is used, or perhaps you're
comparing with not too highly optimised native code. Or it's code that is
memory-intensive, then memory access will dominate.

--
Bartc

Tomasz Kowaltowski

unread,

Apr 22, 2012, 8:55:40 AM4/22/12

to

> [... And I have to

> say that if you have CS degree and are unable to figure out what a
> LALR parser does, there's something wrong with your CS degree. -John]

I agree with our moderator and am somewhat surprised by this
discussion. IMHO compiler construction requires knowledge of many
different techniques and is usually an advanced course in CS
undergraduate programs. Trying to do it without knowing the basics
and lots of experience may be an amusing pastime but cannot be
considered a serious endeavor. I don't mean you necessarily need a
formal CS degree but you do have be able to read (and understand!)
more advanced material.

-- tk

PS: But literature review was very interesting!

Uli Kusterer

unread,

Apr 22, 2012, 10:18:32 AM4/22/12

to

On 21.04.2012, at 10:53, Joe Schmo wrote:
> I think the whole "write a grammar and feed it through a tool to produce a
> lexer and parser" thing is something to avoid, at least at first (I'm
> avoiding it like the plague, FWIW).

While I generally think that compiler-compilers and parser and lexer
generators are great domain-specific languages that help streamline the
process (at least if your language fits in their constraints, which for
example my pet language doesn't do), I also had huge problems using them at
first.

Without knowing the problems and the general approach that is taken when
writing a parser, some of the things these tools do seem awfully roundabout
and over-complicated. It helped me a lot to just write my own lexers and
parsers, because then it suddenly became apparent what the problems *are* that
these tools protect me from.

Dmitry A. Kazakov

unread,

Apr 22, 2012, 12:18:30 PM4/22/12

to

On Sun, 22 Apr 2012 12:51:44 +0100, BartC wrote:

> "BGB" <cr8...@hotmail.com> wrote in message
>> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>
>> although I use recursive descent, the above sounds different from what I
>> usually do.
>
>> ReadStatement:
>> checks for and handles vaious statement types
>> calls ReadExpression
>>
>> ReadExpression:
>> (actually, this is a tower of functions, one for each precedence
>> level, working from lowest to highest precedence)
>
> Sounds like it's influenced by the C grammar, which defines expressions
> using something like 13 or 17 layers of precedence.

Rather by an attempt to describe precedence using grammar means.

> Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
> don't use any precedence in the grammar at all; I have precedence as an
> attribute of an operator, and an expression can be parsed with a single
> function.

Yes, there is a very simple technique using two stacks, one for operands
another for operations.

> Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
> the binary operators linking successive terms, while readterm() does all
> the real work (since my syntax doesn't distinguish between expressions and
> statements, that's quite a big workload).

Actually operations have two priorities; left and right. When left < right
you have left to right association. When right > left, it becomes right to
left.

Some operators may have these priorities sufficiently different. For
example the assignment operator. If your unlucky languages allows it, then
A+B = C+D better be A+(B=(C+D)). That would require the following order of
partial priorities:

LP("=") << LP( "+") < RP("+") << RP("=")

> Instead I read them as I go along, but with provision for a one-symbol
> look-ahead.

Same here. Except that I have one token look-ahead.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

BGB

unread,

Apr 22, 2012, 3:29:27 PM4/22/12

to

On 4/22/2012 3:53 AM, Uli Kusterer wrote:
> On 22.04.2012, at 03:58, BGB wrote:
>> as-is, the scripting language is probably at a vaguely similar
>> complexity level with C (in some ways it is simpler, and in others
>> likely more complex), as it gradually moves in the direction of being
>> "essentially" statically-typed, has pointers, class/instance OO, ...
>
> My programming language, Hammer, goes in the opposite direction. It's
> essentially dynamically typed scripting language (at least as far as the user
> is concerned). Every variable is a variant, but I keep values around in their
> native type. Also, the syntax is very English-like and the intention is that a
> user *can not* make it crash. Of course, in practice that may not hold true
> due to bugs, but it is different than C, where there are a lot of "user
> errors" that are well in their rights to cause a crash.

this is not to say that the language in my case is not dynamically-typed
WRT the code the user types, but the objective is to essentially limit
what about of dynamic type-checking is used within the VM, as this can
hurt performance.

the language in my case is actually much closer to being an ECMAScript
variant, with some type annotations, and some basic level of
type-inference (it may attempt to "predict" the type of an untagged
variable at a given point in time, based on whatever was last assigned
into it).

as-is, in my VM, much of the code still ends up using dynamic
type-checking, and the goal is to shave this down somewhat (as well as
making what "is known" generally more effective).

>> (decided to leave out a bunch of stuff related to some of the hassles of
>> static type-checking, and dealing with the matter of having a scoping
>> model where the scope is both mutable and involves indirect visibility,
>> meaning that variable assignments can alter what bindings may be
>> potentially visible from a given piece of code, making determining all
>> this statically a relatively non-trivial problem).
>
> Yeah, that sounds more like it ;-)

yeah.

basically, I use a scoping model itself vaguely derived from the Self
language, rather than the (more traditional) "pure lexical scoping" model.

internally, packages are objects, and imports are variable declarations
(declaring a "delegate variable" which links to the imported package).

a problem though is that it is problematic to try to statically
determine what all is potentially visible is fairly difficult. at the
moment, this largely means that anything being pulled in from a package
import uses dynamic checking.

a "cheap trick" here in the language semantics is that the visibility
order for any binding not directly visible as part of the lexical scope
is not determined. this allows the VM to use what it can figure out in
preference to "falling back" to a generic dynamically-typed lookup.

besides this, caching via "lookup hashes" is also used (so, the pure
dynamic case isn't necessarily "that" bad).

>> now, how is any of this simpler than a C compiler?
>> at least at present my main target is an interpreter, and weak
>> performance is generally passable.
>
> The nice part about writing your own interpreter is that you don't have to
> mess as much with predefined platform specifics. You can define instructions
> the way you need them to make them easy to parse. Also, it's easy to write a
> debugger into your interpreter loop, and check out the contents of your stack
> and lots of other things.

pretty much.

one can also go and compile C code to an interpreter as well,
essentially allowing for a "portable C analogue". Quake3 actually did
something like this as well.

>> although I use recursive descent, the above sounds different from what I
>> usually do.
>> (...) typically, the result of all of this is to produce a syntax tree.
>
> That may be more due to my lacking English skills than actual differences.
> I've simplified a lot, and left out expression parsing in the description, to
> get the big picture across. My parser and syntax tree are written in C++ and
> nodes have a class hierarchy starting at a basic Node and progressing to
> things like a function call (with an array of sub-nodes for parameters, and a
> "name" string field) etc.

yep.

as noted before, I have used either lists or XML nodes here (either
using a Scheme-like AST structure, or a different AST structure loosely
derived from XML-RPC, 1).

in my case, my whole VM framework is pretty much plain C, so the
alternative would be using structs.

I am not sure what would be best overall, as each has tradeoffs.

1: one of my interpreters had an origin story of roughly hacking
together a mock-up parser for a JavaScript style language on top of a
hacked-up version of XML-RPC (which I had previously implemented support
for). overall, this was probably one of the worst-performing
interpreters I had ever written.

I have little idea how slow it was, but do know this much:
at the time, all values were boxed heap objects;
at the time, checking the type of an object was O(n), essentially a
linear search over every object in the heap;
initially, it directly interpreted the XML nodes, but later switched to
using a 16-bit word-code (and using things like jumps for implementing
loops, rather than special instructions and blocks);
...

the one which directly followed (a different form of the same language)
was one of my fastest, but its use of precise GC and similar made it
very painful to work with.

then the third (partly recombining parts of the last two) was "sort of
in the middle". this one is the direct ancestor of the current
interpreter, which has since-then been mostly undergoing incremental
changes.

> My expression parser is kinda fun, because it essentially walks the subtree
> of the expression currently parsed, looking at the precedence of each operator
> (a simple numeric value), and then either inserts itself as the next param of
> the deepest, rightmost node, or inserts itself in the middle of the tree
> somewhere, taking what was previously in its slot as its left argument. So
> precedence actually gets worked out during building of the tree fairly easily,
> with a loop in one function call, not several functions for different
> precedence levels.

interesting.

I had just used copy/paste/edit a bunch of times to make the various
precedence levels, so either way works I guess.

IIRC, at one point I had used naive left-associative arithmetic for all
operators (with the first horridly-slow VM), before noting that I could
copy/paste the function to get different levels.

> That reminds me, I should check whether I've implemented right-associative
> operators yet. They're fairly easy though, just a slight flip on the criteria
> when something gets appended or inserted.

yes, or one can use recursion for the right-node at the same precedence
level, for the "stack of functions" strategy.

>> C is a little uglier here, since it requires keeping track of typedefs
>> and checking for typedef when trying to parse declarations.
>
> My scripting language doesn't really have declarations, but it has something
> similar in that any identifier can be used as a one-word string constant in
> many places, and there aren't really any reserved identifiers. So originally I
> used to look if there was a variable of that name already in use, and then
> generated a variable reference, otherwise treated it as an unquoted string
> constant.
>
> Later, I realized that the former is sufficient, if I pre-fill the variable
> with its own name as a string. That way, my "do" command (think eval(), it
> executes a string as code, and has access to the current function's local
> variables etc.) can contain a command that uses what I think is a constant is
> turned into a variable. The same thing could happen in loop conditions, where
> the first time round nothing has been put into a variable yet, so it's treated
> as a string, then inside the loop it gets turned into a variable.
>
> Fun little details of programming languages :-)

I am not entirely sure I understand how this would work.

my language actually has two basic types of declarations:
via the "var" and "function" keywords;
via a different strategy (for using "C-like" declarations).

so, for example:
"var x;" or "var i:int;" or ...
"function foo(x, y) { ... }";
...

the alternative strategy relies on another detail:
<identifier> <identifier> ...

is not actually valid anywhere else in the syntax at statement level.
so, basically, if a construction like:
<identifier> <identifier>;
<modifier*> <identifier> <identifier>;
...

is seen, the parser can fairly safely assume it is a variable declaration.

actually, the rules are a little more complicated:
<modifier*> <type_expression> <identifier> ...

where type-expression is an special expression which is also a
syntactically valid type, consisting of, potentially:
several prefix modifier tokens (*, &, ...);
a known type-name keyword, or an identifier;
several allowed suffixes ("[]", "[...]", "<...>", "...", ...).

so, for example, "*int[] arr;", would declare an array of
integer-pointers (well, as would "var arr:*int[];").

there is a little annoyance though, from a design POV, of having two
clearly redundant declaration syntax forms is, well, redundant.

>> partly agreed, except that it isn't really strictly necessary to build
>> an array up-front.
>>
>> most of my parsers don't bother using an array, but instead just call
>> the tokenizer function directly to peek and parse tokens based on the
>> current "stream position" (generally a raw char pointer in my language
>> parsers).
>>
>> the version used in my C compiler currently uses a small hash table
>> (sort of, the "hash function" is simply to keep only the low-order bits
>> of the address) to avoid re-reading the same token multiple times.
>>
>> practically though, it may be both higher performance and generally
>> cleaner to just build a token array up-front.
>
> Yeah, my main reason was that my language, Hammer, due to its verbose,
> english-like syntax, can require a lot of look-ahead. So it just made the code
> much simpler, and much, much more maintainable. Also, it's very convenient to
> have a struct with everything we know about the current token while debugging,
> including, if you want, its line number to more easily find it in the source
> code than using a pointer or global offset.

yeah...

if I want the line number, the typical strategy is to walk from the
start of the buffer to the current position and count lines.

this didn't work for C though, due to preprocessing, so a different
strategy was used:
nearly every line of PP output is annotated with the current line-number
(though it could be potentially skipped for the case of consecutive line
numbers).

it is lame, but hell, it works...

>> I had considered using a table in the C compiler, but the "hash" option
>> was less effort (the parser was already written in this case, so the
>> hash could be added via a quick hack, rather than by making significant
>> changes to the rest of the parser to use an token array instead of
>> direct function calls).
>
> I actually have myToken = GoNextToken(), myToken = GoPrevToken() convenience
> calls around my token array.

yeah, that works.

>> a potential advantage of not using an array though is not needing to
>> allocate memory for it.
>
> Definitely, though the overhead is small if you reference your code instead
> of storing copies of strings, and stick to small numeric values for the rest
> of the ivars, too. Depending on average token length, it comes out to about
> the same size as your source script, or slightly larger allowing for lots of
> operators etc.

yes, but at a potential cost, depending if the language has or how it
implements things like unicode escapes.

<snip>

>>
>> agreed.
>>
>> in my case, I have generally used either "lists" built from "cons cells"
>> for this purpose, or in other cases a variant of XML DOM nodes.
>
> So I guess you're building a more dynamic structure, not unlike a
> dictionary/hash map? My low-level programming instincts kinda yell at me that
> this is inefficient, but honestly, the parse tree is probably the last place
> where one needs to worry about that. At least until one actually finds a
> situation where one runs out of memory or finds it to be a performance
> bottleneck. Premature optimization, root of all evil, yadda, yadda.

yeah.

it is more worrisome for the C parser because:
it parses chunks of code that are often measured in the MB range (like
what happens if "windows.h" or similar gets in the mix, esp the MSVC
version which results in about 8MB of output);
because the C parser uses XML nodes.

but, oddly enough, this is not apparently the main source of slowness.
the tokenizer tends to bog down considerably more than any code having
much to do with XML, and the preprocessor eats up more time than
building the AST.

granted, the XML implementation is itself somewhat micro-optimized,
IIRC, with direct support for numeric-valued attributes, omission of
unnecessary features for this use-case (such as namespaces), ....

so, in all, it doesn't seem "too terrible".

however, granted, the parser may take easily 500ms or 2s or so to parse
a source module if large headers are involved, so it isn't free.

given that GCC is similarly slow, I am probably not too far off.
MSVC tears through code much more quickly though, making me suspect MSVC
has some sort of "scary-fast parser" going on.

>> I had actually thought this was fairly standard practice, until looking
>> into it and observing that, no, apparently most people store their
>> parse-trees in raw structures. either way I guess.
>
> Yeah, I use C++ classes, but then it could be argued that C++ doesn't really
> have *real* classes, so I guess it's just a glorified struct as well. I have
> some virtual methods, which make walking the tree as easy as kicking off a
> recursive function call, but apart from that, they're very struct-y.

yeah.

I don't think the present form of ASTs is all that bad though as, after
all, I can quickly dump them to or read them from files, and (at least
in concept) it allows people to more easily write independent code
compatible with the parser or compiler back-end.

not that it probably matters all that much though, more just a quirk of
my personal history it seems.

>> typically, my parsers do almost no direct processing on the ASTs while
>> building them, apart from building a semantic representation of the code
>> as-seen.
>
> I certainly don't simplify nodes *while* building the tree. I actually
> intentionally start out with the stupidest representation that is still
> correct (i.e. I pick where to insert expression nodes so evaluating the tree
> gives the right result, but that's it). That way, I can see parser errors
> easily and early on. I can debug-dump my tree with a recursive function call.

agreed.

> And *then* I call Simplify(). Which does a recursive depth-first traversal
> and does simple optimizations. So each type of node only needs to know how to
> simplify itself. E.g. operators know how to check their operands whether they
> are constant or not (*after* calling Simplify() on them), and can then replace
> themselves with their result if both are.

yeah, this sounds similar to how my "reduce" process works.
there is a little context though, used mostly for things like constant
propagation and similar.

>> typically, things like simplifying expressions, propagating constants,
>> ... is done in a stage I have typically called "reduction", which
>> happens between parsing and prior to producing (bytecode) output.
>
> Yes, that sounds about what I do. I just used a dumber word :-)

fair enough.

>> at this point, things vary go differently, and several more stages are
>> involved in my case:
>>
>> - typically, after the AST has been "reduced", it will be transformed
>> into bytecode (basically, walk the AST, figure out expression types when
>> relevant, and spit out any relevant bytecode ops).
>
> Yup. I recursively call GenerateCode(), passing a CodeBlock object into which
> the raw instructions get written.

sounds functionally equivalent, though terms like "Compile" and "Emit"
(ex: "CompileStatement" or "EmitOpcode") are used instead, and the
target structure is called a "Context".

>> previously, only a very small amount of type-analysis was done here, but
>> I am looking to expand this (partly to simplify the work being done in
>> the interpreter back-end), such that the output produced here will be
>> "mostly statically typed".
>
> I don't have that currently, but what I did in an earlier approach (where my
> code actually compiled to C++), I had a "nail down types" phase. What it
> essentially did was ask up the hierarchy to find out which types each
> expression could have with its current parameters (I have about 5 types, so it
> was just a bit field), and then pick the narrowest type that fit. Often it
> turned out that somewhere at the end there was a +1 or so, which meant I ended
> up with the widest numeric type or so. Or there was concatenation involved and
> I knew the end result would be a string. But of course, often it was just a
> variant as well.

the current "real" type-system is considerably more complex.

as far as the VM itself is concerned, a simpler type-model is used:
ILFDV (Int, Long, Float, Double, Variant).

most other types map to Variant, which is (partly) subdivided:
FN: "Fixnum", actually means "any integer types represented as a variant
and within the int32 range" (originally, it did mean, specifically,
"fixnum" in prior VMs, but was relaxed to "int32 range", which may mean
either an actual fixnum or a "BVT int32");
FL: "Flonum", which represents flonum and potentially "BVT float32";
BVT: "Boxed Value Type" (should be mostly self-explanatory), these
generally represent most value types which don't fit into a fixnum.

currently (in the interpreter), the ILFD types are themselves BVT's,
mostly for sake of simplicity.

I have also recently been working on "optimizing" the handling of BVTs
somewhat, mostly so that they will be faster and less prone to leaking
memory (and making the ILFD types use dedicated free-lists rather than
general-purpose memory-allocation functions).

granted, some of this new interpreter logic consists almost more of
macros than it does of actual code (mostly because of large numbers of
very repetitive operation-handler functions).

>> I am looking at going to producing bytecode output more similar to
>> Java-ByteCode, basically where many opcodes are qualified for specific
>> types, rather than leaving it solely to the back-end to figure out what
>> the types are. this is a little lame, given I found it elegant how, for
>> example, MSIL/CIL had just used a few simple operations, and the
>> back-end would do its thing, but some things are currently more awkward
>> here, and at-present this leaves some awkward holes in the type-system.
>
> Yeah. In an early version, I just chickened out and always picked the widest
> type. I.e. the + operator just always did floating point maths. These days, I
> check the operand types at runtime and fall back on integer maths for stuff
> like addition, subtraction and multiplication when possible.
>

mostly, in my case, it was just lots of cases where the interpreter just
ends up using "variant" for everything, rather than, say, producing
threaded code specialized for 32-bit integer values or whatever else.

this means a big pile of new opcodes though, most with names like:
"ADD_XI" for "add fixed integer" and "MUL_XL_C" for "multiply fixed long
by a constant" and similar.

I don't really like this, as the interpreter "should" be smart enough to
figure all this stuff out on its own, but at present, it is not.

my native code-generator back-ends could figure this stuff out, but
granted, they had considerably more complex logic for things like
working with the type-system.

when I tried before to write an interpreter which did all of this, I
soon realized that it was working out to nearly the same functional
complexity as a native code-generator, but with worse performance,
largely defeating the point.

>> it all still leaves a bit of uncertainty though (as to whether
>> adding/using piles of type-annotated opcodes is really "the right
>> direction to be going").
>
> It might give you better performance, particularly once you go JIT (if you
> do). And it saves you a bunch of branching instructions during runtime, which
> are a major slowdown these days on modern CPUs.

well, or at least, it allows the possibility of a "better yet simpler"
JIT, since the types will already be mostly known, meaning that the JIT
wont need to be able to figure out all of this stuff to, errm, actually
work...

theoretically, with threaded-code, the run-time checks could be partly
skipped already simply by inferring the type and then choosing the
handler with the correct type, and then keeping track of type-flow and
similar when building the threaded code.

however, I have not actually generally done this, due to the relative
amount of added complexity involved.

so, although less elegant, annotating many opcodes with types could be a
lot more generally workable.

for most other cases (those not arithmetic ops), the annotations are
more likely to be in the form of "signatures" (already commonly used,
these basically encode type-information for a variable/argument list/...
into the form of a string).

>> - typically (assuming the interpreter), the bytecode will be processed
>> and then transformed into threaded code, producing a threaded-code
>> version of the program. there are some parts which use generated native
>> code thunks, but for the most part, pretty much all of the machinery
>> takes place in native C functions. as-is it proves problematic to do any
>> non-trivial scope and type-analysis at this stage, short of making the
>> threaded-code logic contain most of the "hard parts" of a full native
>> code-generator (as well as likely making it considerably slower).
>
> "threaded-code"? *looks it up on Wikipedia* Ah, so that's what it's called
> when you just spit out a bunch of CALL instructions... Good to know :-) I
> build a bytecode, with an index into an instruction array and two param
> fields. I thought about generating CALL instructions instead, but it's easier
> to debug this way, and subroutines in my language all have a variable number
> of parameters, so I thought the advantage of avoiding one function pointer
> dereference probably won't gain me much. And I'm still in the "get this dang
> thing working" phase, after all.

well, there is bytecode, just the interpreter converts it to threaded
code prior to actual execution (and just uses the threaded-code for
every time the function is called).

the actual "working logic" is, in this case, actually contained in a
mass of C functions.

I had looked it up, and concluded at the time that "treaded code" is
still technically an interpreter.

>> some past experiments (special purpose tests), have implied that
>> potentially threaded code can pull off "reasonably solid" interpreter
>> performance (potentially within 3-5x of native).
>>
>> however, at present it is considerably slower (at least with tests like
>> the recursive Fibonacci sequence and bubble-sort, with most "practical"
>> code the slowdown seems to be smaller, but typically because most of the
>> actual logic takes place in native code in these cases).
>
> Yeah. My experiences are similar. Hammer is what the developer of another
> HyperTalk-descended language called a "Very High Level Language" and is more
> of the CISC than the RISC mindset, so the number of actual bytecode
> instructions is comparatively small compared to the amount of actual
> instructions the functions behind each bytecode are made up of.

yeah.

my VM currently has a "huge" number of opcodes (several hundred),
although due to there being a number of (sometimes large) holes in the
opcode space, they are currently numbered up to 860.

previously (before adding a bunch of type-annotated arithmetic opcodes)
the limit was 540. I tried to put them all in one such "hole", but they
didn't fit, and I didn't want to break them up and spread them out.

so, I "relocated" the whole range up to a space starting at 768 (next
even multiple of 256), leading to the current range.

probably a large number of opcodes are due to "compound operations",
which perform several smaller operations in sequence. most of these
were, in turn, because a single larger piece of C code to do something
was faster than making more passes through the dispatch loop to invoke
several smaller operations.

or: a single "lpostinc" opcode is faster than "lload; dup; push_1; add;
lstore".

so, it is a tradeoff...

better performance at the cost of 100s of opcodes (at present, 459).

>> - if a JIT / native code-generator is being used, the process works like
>> this:
>> complex operations are generally decomposed into simpler ones, and much
>> of the type information from the prior stage is discarded (the
>> code-generator uses its own type-analysis, unlike the "dumb"
>> interpreter, 1);
>
> That might actually make sense, considering all the temporaries, registers
> and whatever that native code "adds" to the original source code. Will have to
> keep in mind if I'm ever crazy enough to do that JIT :-)
>

yes.

things like temporaries, register allocation, ... all serve to largely
nullify most of the gains from these complex opcodes, as the
code-generator will figure most of this out again on its own.

although it seems pointless to build compound ops merely to break them
down again in the JIT, otherwise different bytecode code would need to
be generated depending on whether the interpreter or the JIT was the
target, and decomposing the opcodes in the JIT, although slightly more
bulky, is not particularly complicated (vs pretty much everything else
going on in the thing...).

it is much like the "complexity" of a C style syntax for a programming
language:
if a person is writing a compiler or VM for a language that is much more
than a toy, the added complexity of parsing a C-like syntax is probably
the least of the worries.

>> - COFF modules are fed into an in-program linker, which proceeds to link
>> them against the current program image. there is some amount of trickery
>> here, involving potentially both invoking code-generation handlers, as
>> well as mechanisms to determine whether or not to "proxy" function calls
>> (IOW: whether to link a function call directly against the target
>> address, or use an indirect jump through another pointer).
>>
>> reasons a proxy may be used: the call is out-of-range (on x86-64, this
>> usually means the call has gone outside the +-2GB window), or, either
>> the code has been "hot patched" or the linker has reason to suspect that
>> such "hot patching" is likely to occur (currently, this involves
>> heuristics).
>
> Yeah, some of my early native-code tests also had a custom loader (seems to
> be identical to your "in-program linker"). I essentially stored the offset to
> the trampoline "external symbol table" in the file, and then fixed up the CALL
> instructions to point to the symbols I looked up at runtime.

pretty much.

the linker is basically just a large table for symbols and relocations,
and a special-purpose memory manager (for allocating memory for those
pesky code/data/bss sections).

originally, it would link things in aggressively (get handed object,
link it in directly), but I later switched to a "lazy" 2-stage strategy
(where an object is not fully linked until one of its exported symbols
is referenced) mostly to fix some nasty problems which existed
previously (linking being sensitive to object ordering, ...).

the modules used here are basically raw COFF objects though.

as-needed, the linker will also defer to OS-specific functions
("dlsym()" and "GetProcAddress()"), in order to link against DLL exports.

>> the strategy used by the codegen backend for my C compiler involved
>> "large numbers of special cases", but this ultimately proved a bit
>> problematic to debug (this type of "optimization" turns out to be a
>> debugging mess in most cases where one doesn't have fairly immediate
>> feedback as to when/where/how something has broken).
>
> Yeah. At the least one would have to be very disciplined during code
> generation and keep information on all the jump-in and jump-out points, so one
> doesn't combine two instructions from different branches. But then I suppose
> one would have to keep track of those anyway, so all offsets can be fixed up
> if an instruction is removed.

well, potentially, except I tend to produce ASM code as a giant string
to be fed into the assembler.

the assembler is, in fact, plenty fast IMO, and I speculate that by the
time one is trying to feed 10s of MB per second into the assembler, and
having this be a bottleneck, something has likely gone horribly wrong...

at first I had tried using direct code-generation, and later
function-call driven code generation, but soon found textual ASM to be
most convenient to work with, so I generally used this.

elsewhere in the VM, there are a number of cases where code generation
follows a pattern like:
Begin()
...
Print("ASM stuff...")
...
End()
...
fptr=GetAddr("foo")
...
fptr();

>> funny though, is even just using "straightforward" strategies, like
>> caching variables in registers, ... and the code still manages to be a
>> bit more optimized-seeming than what usually seems to end up being spit
>> out by MSVC.
>
> Oooo... burrrrn ;-)

I think the issue is that MSVC seems to spit out some very often rather
naive-looking code.

the bigger irony though, is that as naive looking as it often is, it
still often manages to perform reasonably well.

like, it often seems to have a hard time with things like, keeping its
variables stored in registers, as it will very often spill the contents
of a register in cases where it is not necessary, often only to load the
variable again into a different register, and often not even in obvious
cases (such as spilling the register during a jump such that all of the
jump points have the value in the same place).

I am not really sure what is going on there sometimes...

that or, maybe it is just the profiler tending to get a lot more hits on
heavily used pieces of code where the compiler decided to do something
wacky.

this is combined sometimes though with the tendency of the compiler to,
every once in a great while, actually crash during a build. usually then
bringing up a "special" send error report box (to the effect of
"Microsoft is especially interesting in collecting these error
reports"...). maybe so, a crash in the compiler is probably hardly a
good sign...

OTOH, GCC tends to produce rather optimized-looking spaghetti-code.
code at least looks efficient and preforms well, but, how does it
work?... sometimes it is not so obvious.

>> I have not often been entirely impressed by what I have seen being spit
>> out by MSVC, and my own code-generators have often been able to
>> outperform it (and be competitive with GCC in my tests), however...:
>> making my code generators "actually work" often proves a bit more
> difficult.
>
> TBH, it's easy to generate faster code if it doesn't have to be correct. :-p
> Isn't that how SSE and OpenGL and all that stuff came about? They found some
> trade-offs they could make for a particular problem domain that would make it
> faster. Effectively, you're making a trade-off for correctness ;-)

probably...

when I was writing the code generator, I used an optimization process like:
look at output;
identify any obvious inefficiencies;
figure out the logic which was involved in producing the code;
write a special case to detect the specific situation and emit an
alternate faster code sequence instead;
...

it then performed comparably with the other compilers, but lots of code
tended to break in non-obvious ways.

I later largely stopped using this code generator as:
I couldn't debug it sufficiently to make the output actually work reliably;
most of the code it contains is actually pretty horrid (whole thing is
basically bit-twiddly and special cases);
never mind its use of a multi-pass "magic bits" system, where the
compiler basically used multiple passes which operated (mostly) in
lock-step, and communicated primarily by using a synchronized byte-array
which was used typically to pass receive bit-flags from prior passes or
pass bit-flags to the next pass;
...

"magic bits" would encode things like which registers to try using (and
which have already been tried), warnings about things like potentials
for things like impending register spills, or running out of available
registers, ... (in effect, optimization by using multiple passes and a
system analogous to "road signs"). whenever the logic ran into a problem
case, it would flag a warning about it ("do not use this register here",
"insert stack padding here", "try to perform this operation without
using any additional registers", ...).

the goal would then be for the system to, between 2 and N passes
(usually with a limit of 16 or so), "converge" on an "optimal"/"stable"
output, followed by a "final" pass to actually emit the final-state code.

typically, then, the overall process looks like:
set up initial state;
while(!OutputIsStable())
Send Begin
Send all opcodes for function
Send End
Flatten final output to an ASM string.

but, sadly, this was also my high-point here:
most of my subsequent native code generators have not actually gotten
fully written, before usually falling to either "grr... this is getting
a bit hairy" or "I have more important things to be working on".

meanwhile, other parts of the VM are often moving targets, so as to add
to the challenge, ...

>> hence why I am still mostly using an interpreter-based strategy:
>> it is much easier IME to maintain and debug the interpreter, whereas IME
>> my native code generators have more often tended to blow up in my face
>> than actually work reliably, and it is much more difficult than it would
>> seem to dig through a big pile of assembler output looking for whatever
>> little bug is causing the code not to work.
>
> Fully agreed. Hence the suggestion to generate bytecode first. But of course,
> if you plan to make an actual native code compiler, pointers in that direction
> don't necessarily hurt.

actually, typically bytecode has been the intermediate stage.

however, usually most of the analysis and similar would have been left
in the back-end, leading to:
simple/naive front-end, very complex back-end.

hence, the realization that I may need to shift a little more of the
work to the front-end.

>> only "I am using an interpreter" sounds a lot less impressive than "I am
>> using a JIT compiler", although I remember at least one VM I know of
>> which had claimed to be using a JIT, but the thing was using stupid
>> threaded code (in pretty much the same way I am using it in my
>> interpreter), so I was not terribly impressed...
>
> I would like to see performance comparison between the two, say on ARM and
> Intel. Does a function pointer de-ref and a loop really make that much
> difference in practice? OK, Hammer has lots of slowdowns, and its main loop
> isn't that simple, so there's a few exit flags to check each go through the
> loop, and a call to update a source level debugger, but hey...

well, if it is like the current partially-written JIT implementation,
which just uses full dynamic checking and invoking link-time
code-generation for everything, it would probably actually work out
being slower.

the major advantage of a JIT is in being able to do things like:
produce small/compact instruction sequences;
effectively utilize the system stack and registers for storage;
...

otherwise, there is not a huge advantage over threaded code.

>> I guess there would always be a "cheap but lame" way to write a JIT
>> (and, actually, how my first working JIT worked):
>> just use fixed-form globs of ASM for each VM opcode, and simply append
>> them together in-series (emitting prologues/epilogues and labels and
>> so-on as-needed).
>
> Actually, I could imagine that, with a good optimizer, that is probably the
> common way of doing things. And it's also easy to maintain.
>

yeah.

may just go back to that strategy eventually...

BGB

unread,

Apr 22, 2012, 4:45:44 PM4/22/12

to

On 4/22/2012 4:51 AM, BartC wrote:
> "BGB"<cr8...@hotmail.com> wrote in message
>> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>
>>> - Recursive descent parsers: It's the obvious way to write a parser.
>
>> although I use recursive descent, the above sounds different from what I
>> usually do.
>
>> ReadStatement:
>> checks for and handles vaious statement types
>> calls ReadExpression
>>
>> ReadExpression:
>> (actually, this is a tower of functions, one for each precedence
>> level, working from lowest to highest precedence)
>
> Sounds like it's influenced by the C grammar, which defines expressions
> using something like 13 or 17 layers of precedence.
>
> Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
> don't use any precedence in the grammar at all; I have precedence as an
> attribute of an operator, and an expression can be parsed with a single
> function.
>
> Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
> the binary operators linking successive terms, while readterm() does all
> the real work (since my syntax doesn't distinguish between expressions and
> statements, that's quite a big workload).

I have functions for each precedence level, and there is a lot of them.

luckily, I tend to have the stack of expression precedence level
functions in their own source-file, so it is not all that bad.

>>> - Tokenizing: Essentially grab all the words in your source text and
>>> build an array with an entry for each so you can more quickly walk
>>> forward and backward without having to scan individual characters.
>
>> partly agreed, except that it isn't really strictly necessary to build
>> an array up-front.
>
>> most of my parsers don't bother using an array, but instead just call
>> the tokenizer function directly to peek and parse tokens based on the
>> current "stream position" (generally a raw char pointer in my language
>> parsers).
>
> I've tried reading all the tokens in a separate pass, but didn't really like
> it. And it takes a lot more storage as well, especially with macro
> expansions.
>
> Instead I read them as I go along, but with provision for a one-symbol
> look-ahead.

my parsers typically look at 1 or 2 tokens at a time, as usually this
seems sufficient for nearly any "ordinary" construction in a C style syntax.

a partial exception here is declarations, which often have to be tried
to be parsed, and will generally fail (revert to a prior point) if the
syntax turns out to not be an expression.

an annoyance (part of how this makes parsing C slower) is that pretty
much every normal identifier as the first token on a statement ends up
being checked if it refers to a typedef, although I guess an
optimization could be to check if the next token is in a "known bad"
list (ex: "=", "->", ...) and then skip out on looking up a typedef in
this case (a known-bad token meaning "this is obviously not a
declaration"). probably not that this would help when parsing header
contents though (as pretty much everything there is a declaration).

<snip>

yes, ok.

>> some past experiments (special purpose tests), have implied that
>> potentially threaded code can pull off "reasonably solid" interpreter
>> performance (potentially within 3-5x of native).
>
> Assuming you're talking about dynamic typing, I found it difficult to get
> within 3-5x, unless some sort of type hinting is used, or perhaps you're
> comparing with not too highly optimised native code. Or it's code that is
> memory-intensive, then memory access will dominate.

no, this was assuming that the output was resolved to static types
(although the input HLL is largely dynamically-typed, the goal is to
have largely statically-typed logic going on in the lower levels of the
VM, due either to the use of explicit annotation, and/or, type-inference).

so, these tests assumed that the VM would be able to effectively resolve
the types to their primitive forms (basically, they were intended to try
some things out in advance to help guide the design of a then-planned
new interpreter core, which never got fully implemented).

all this was approx 6 months ago.

the tests involved various numeric calculations (generally using integer
and floating point types), and compared them against their analogues as
plain C code.

there would have been an approx 3-5x slowdown between the interpreter
and native code compiled with the same optimization settings (trying
different settings, although changing the total running times, seemed to
have little impact on the relative execution times).

these weren't really general-purpose tests though, but rather were
involving hand-crafted "instruction sequences" (as lists of function
pointers).

the "stack" in the case of the VM was represented as a sliding pointer,
and was using fixed 64-bit elements (as was the argument list).

so, a double addition would have looked something like:
*((double *)(ctx->stack+1))=
*((double *)(ctx->stack+0))+
*((double *)(ctx->stack+1));
ctx->stack++;

I ended up not actually using an interpreter structured this way
(instead continuing to use and develop my older interpreter), but
performance was sufficiently good to show that threaded-code was a
viable option and "not drastically slower" than native code.

the result was, me eventually, migrating the older interpreter to the
use of threaded code (for a modest speed-up vs the "big switch()"
strategy I had been using previously).

as-is, the current interpreter performance is closer to around 100x
slower than native, for tests like Fib and bubble-sort (and generally
much smaller if a lot of C-side logic is involved).

for the newer version of the interpreter, with faster/improved "Boxed
Value Type" operations, here is a version of the opcode-handler logic:

BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
{ BVT_DefPopUV_Ty(f64, u, v); BVT_AddUV(u, v);
BVT_FreeF64(ctx, v); return(cur->next); }

and if all macros are manually expanded out...

BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
{
f64 *u, *v;
v=(f64 *)(ctx->stack[--ctx->stackpos]);
u=(f64 *)(ctx->stack[ctx->stackpos-1])
*u=(*u)+(*v);
*(f64 **)v=ctx->fbvt_double;
ctx->fbvt_double=v;
return(cur->next);
}

where "f64" is a typedef for "double".

as can be seen, this isn't quite as efficient, but this sort of logic
should be a little faster than the current (dynamically-typed) logic.

note that "SVMState *ctx" is the interpreter context, and "BSVM_ThreadOp
*cur" is the current opcode (it contains a function-pointer to the
handler, as well as any operands used by the opcode).

note that, in this interpreter, the stack is an array of pointers.

I don't yet have an estimate for "overall" performance of the new logic,
considering that a good deal more logic is still be noted before I will
be able to actually test and benchmark a lot of this stuff... (but, at
least, with the changes made thus far the VM hasn't started violently
blowing up, which is at least itself a good sign...).

BGB

unread,

Apr 22, 2012, 5:55:03 PM4/22/12

to

well, I think it depends a lot on the material...

for example, many books are fairly straightforward:
they describe the process, general stuff going on, ...

so, then, all is good.

in another case, I went and started trying to read a book (I forget
the name): introduces general topic, starts mentioning stuff
"Hindley-Milner Type Inference" and "Type Polymorphism as applied to
the Lambda Calculus" and so on, with large volumes of rather
opaque-looking mathematical notation.

I think I didn't really get too far in this one (before brain-melting
set in), before going off and looking at other stuff.

I couldn't really see how any of this was terribly relevant in a world
where "type" generally means "int" vs "float" and maybe dealing with
things like pointer and array operations, and where "polymorphism" is
mostly "one of those words that apparently has something to do with how
the class hierarchy works or similar".

it is enough to say "int + int -> int", "int + float -> float", ...

I have personally a difficult enough time trying to fully understand how
exactly SSA-form works, much less trying to implement a code-generator
based on it, hence my continued general use of stack-machines as the
conceptual model (doesn't mean "logical" stack operations map directly
to "physical" locations or operations though). at least I generally
understand stack machines.

but, then again, my track-record for writing "good" native code
generators (of those few "sufficiently complete to work") is sadly not
very good (my first real attempts in this area starting around 2007 or so).
[LALR really isn't that hard to understand, a state machine with a
stack. I'm not saying every compiler should use it, but I am saying
that it's no more complicated than other things a CS major should have
mastered. -John]

comp...@is-not-my.name

unread,

Apr 22, 2012, 6:11:50 PM4/22/12

to

Uli Kusterer <ulimakes...@nospam.googlemail.com> wrote:

> IMO if you know assembler or BASIC and general algorithms (i.e. you
> could implement a binary tree and walk its nodes), and you can somehow
> figure out what bytes your code gets compiled to (at worst by writing
> a dummy assembler program and looking at what bytes show up between a
> bunch of nop instructions whose bytes you know), you should be able to
> at least get a basic working knowledge of how a compiler works. Just

> write the naive approach of how you would design any program.

I have written a few interpreters and I thought about winging it but I
realize there is a science to compiling and there are right and wrong ways
to do things. I would like to do things the right way but maybe with my weak
background and broken undergrad CS degree that is expecting too much.

snip

> Is that un-computer-sciencey enough? This blog post may help with the basics
> of my code generation approach:
> http://orangejuiceliberationfront.com/generating-machine-code-at-runtime/ (but
> it's C, and Intel, and badly wrapped by the new theme of my blog).

Thanks, I found your comments very useful. Seems like a good summary.

glen herrmannsfeldt

unread,

Apr 22, 2012, 7:56:46 PM4/22/12

to

comp...@is-not-my.name wrote:

(snip, I wrote)

>> One Wirth language that you might find interesting is PL/360.

>> PL/360 looks like a high-level language but works like assembly
>> language. As an example (which I am remembering from 40 years ago)

> I have the doc and compiler code, everything was released into the
> public domain. It's so close to assembler I didn't consider it might
> be a good way to learn to write a compiler but maybe it is. A 10
> second look shows it's pretty heavily abstracted. I will have to spend
> more time on it but it may be more a tribute to the traditional Wirth
> terseness than something to learn from, at least without the professor
> around to ask questions of.

Well, there are a few different things you need to know to write a
compiler. One is how to write parsers. Assemblers usually don't need
much parsing, but for something like PL/360 you need most of the same
parser you would for a high-level language.

The code generator will be much simpler, though.

Parsing theory is fairly well understood, though that doesn't
mean it is easy to learn. The theory for code generators is still
somewhat of an art, especially for the newer processors like Itanium.
(I have an actual Itanium RX2600 running VMS, but likely won't
ever try to write a compiler for it.)

>> But I don't understand your refusal to use the tools that are
>> available.

> As I said they're not available on my target platform.

There is an IBM supplied C compiler for z/OS. I don't know how
much it costs, though.

Personally, I wouldn't mind having a z/OS machine at home but
can't afford one. (I might afford the hardware, but not the OS.)

I even don't mind writing JCL. By 9th grade, I was reading IBM
manuals like books, could write my own JCL, and it never bothered
me to do it.

I presume compiling FLEX and BISON on the IBM C compiler
would be pretty easy. But even so, you could do everything
as a cross compiler, running on another machine generating
z/OS assembler code, then copying it and assembling on z/OS.

>> FLEX and BISON are freely available, you can't complain that
>> they cost too much.

> True but irrelevant!

>> You can run them on a freely available OS (Linux, FreeBSD,
>> Solaris, etc.) on machines that you can find for very low
>> prices, or often enough given away.

I can't afford z/OS, but I can other systems. I don't mind working
on those other systems, even generating code for machines
I don't have.

> Ok but those aren't my targets. I'm not interested in using
> those for this project, as I said. And I would really like
> to understand what I am doing and the way I have always done
> that is to write my own code. Why is that upsetting (hard to
> understand, etc.) to you? I haven't mentioned the cost of
> anything, I'm not sure where you are coming from here.

Sometimes what you write seems like whining. Yes, I would like
to have a z/OS machine, running the IBM Enterprise PL/I compiler,
VS Fortran, Fortran H Extended, and a few other compilers and
assemblers. I don't have those, so I work with what I have.

>> The nice thing about the tools is that you can get something running
>> fairly fast, and without needing to get too deep into the math. You
>> can go as deep or shallow into the innards of FLEX and BISON as you
>> want. One project that should be about right for one person, and
>> without a lot of math, is rewriting FLEX and BISON to generate code in
>> another language, such as PL/I.

> It would be nice to "get something running fairly fast" but if I
> do that depending on other pieces I don't understand it doesn't
> really help me. I want to learn as much as I can doing this.

It is difficult to learn everything at the same time.

I suppose you could write everything in assembler, or even
punch the binary codes directly into punched cards. Using
high-level languages makes it a little easier.

People use tools because it gets them where they want to
be faster.

Reminds me of the instructions for submitting bug reports to GNU.
They expect to use the debugger to find problems, and so don't
need the low level details that one might otherwise supply.

I pretty much never debugged from hex dumps, but usually knew my
programs well enough that if I knew close to where the problem was,
I could figure it out. Sometimes I might consider using a
debugger as cheating.

You could learn all about how to write parsers, convince yourself
that BISON generated parsers according to theory that you understood,
and then allow yourself to use it.

As I wrote, rewriting BISON in PL/I, and to generate parsers
in PL/I might be a nice project to keep someone busy, and also
learn about how it works at the same time. (Especially if you
don't like C.)

You could also write your own parser generator, in your favorite
language, generating code in your favorite language, and using
your favorite parsing method. That would be more work, though,
but you would learn more that way.

>> [PL/360 was a great little language, but the source code to the
>> compiler was apparently lost. -John]

> I believe Jay Maynard is hosting several PL/360 packages.
> AFAIK they are complete.

> [If so, they'd be a good place to start. It's basically an assembler
> with Algol syntax, so it has a real parser. -John]

I thought PL/360 source was available, and ALGOL W was lost, but I
could have forgotten. There is now a scanned source listing of some
version, likely not the last, of ALGOL W. (Written in PL/360.)

Not so long after I started learning Fortran, about the time I
was learning PL/I, I had an ALGOL W manual. I remember being
impressed that it allowed 256 character identifiers, compared to
six for Fortran. It might be that I never tried running it, though.

-- glen

Bartc

unread,

Apr 23, 2012, 5:59:20 AM4/23/12

to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

> On Sun, 22 Apr 2012 12:51:44 +0100, BartC wrote:

>> Or rather two: readfactor(priority), and readterm(). Readfactor() deals
>> with
>> the binary operators linking successive terms, while readterm() does all
>> the real work (since my syntax doesn't distinguish between expressions
>> and
>> statements, that's quite a big workload).
>
> Actually operations have two priorities; left and right. When left < right
> you have left to right association. When right > left, it becomes right to
> left.
>
> Some operators may have these priorities sufficiently different. For
> example the assignment operator. If your unlucky languages allows it, then
> A+B = C+D better be A+(B=(C+D)). That would require the following order of
> partial priorities:

My scheme doesn't work tidily when some operators have right-to-left
association.

Fortunately I only have two such operators, assignment and power (:= and
**), and some extra logic takes care of that. At least, the results are what
you would expect.

However, when I do something like A+B+:=C+D, then I might need a bit more
logic to deal with that, to avoid parentheses..

--
Bartc
[We theory weenies would use an operator precedence parser. It'd be
about 12 lines of code and a tiny lookup table, and can easily handle
arbitrary precedence and associativity. See the Wikipedia article for
details and pseudocode. The original Ritchie C compiler used recursive
descent for most of the language, and operator precedence for the
expressions. It was two passes and fit in 24K bytes of RAM. -John]

BartC

unread,

Apr 23, 2012, 1:41:26 PM4/23/12

to

<comp...@is-not-my.name> wrote

> Uli Kusterer <ulimakes...@nospam.googlemail.com> wrote:
>
>> IMO if you know assembler or BASIC and general algorithms (i.e. you
>> could implement a binary tree and walk its nodes), and you can somehow

> I have written a few interpreters and I thought about winging it but I
> realize there is a science to compiling and there are right and wrong ways
> to do things. I would like to do things the right way but maybe with my
> weak background and broken undergrad CS degree that is expecting too much.

I'm intrigued as to why you think writing compilers is a science but writing
interpreters isn't? Interpreters can include a big chunk of what's in a
compiler, and these days I think can be just as challenging.

And I don't know about right ways and wrong ways to write programs, but for
compilers there are probably formal and informal ways of implementing one.

(Naturally, I've always done things informally; it wasn't my job to write
compilers, they were just useful tools I created. But despite probably being
considered toys, they were used to write actual commercial applications and
to earn a living with!)

BTW I don't think CS degrees existed when compilers started being created.

--
Bartc

comp...@is-not-my.name

unread,

Apr 23, 2012, 3:12:48 PM4/23/12

to

"Joe Schmo" <askme...@nospam.myisp.com> wrote:

> The first few chapters of "Programming Language Pragmatics" by Michael Scott
> for a good and fast overview. (The rest of the book is quite good also if
> you are designing your own language).

Thanks.

> "Writing Compilers & Interpreter - An Applied Approach" by Ronald Mak.
> Creates a Pascal compiler in C which emits x86 assembly language.
> The Fischer and LeBlanc authored book is good for the implementation
> details. I.e., their particular take on an implementation anyway. It sticks
> in my mind that this is a very good book (but I somehow lost the mini review
> I made for myself about it) that I will obtain again in the future.

Thanks, I'll look for both of these. For now I will start with the IBM-based
books since they will be the easiest for me to understand. Thanks to
everyone for the suggestions.

Arargh...@arargh.com

unread,

Apr 24, 2012, 8:13:31 PM4/24/12

to

On Fri, 20 Apr 2012 07:02:58 +0000 (UTC)

<snip>

>
>[PL/360 was a great little language, but the source code to the
>compiler was apparently lost. -John]

Not entirely. I found a copy online: http://pastebin.com/2jZpufS6 but
at a quick glance, it does not match the listing I am looking at, one
that I printed out way back in the early 70's.

And, the listing is bigger than I am willing to key, about 50 pages of
line printer output. I will look at converting the online copy back
to match my listing, sometime.

I used to have a copy on tape, that I got in the early 70s, but I
doubt that it is still readable. Besides, all of my 1/2 inch tape
drives have died. :-(

Except for the one that runs on 48V DC, for which I don't have a
power supply. And I haven't seen it in years. :-)

arargh

--
ArarghMail204 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

bas...@starynkevitch.net

unread,

May 3, 2012, 1:16:40 AM5/3/12

to

On Tuesday, April 17, 2012 11:28:46 PM UTC+2, (unknown) wrote:
> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details.

In addition to all the good advice given by others, you might be interested by Christian Queinnec's Lisp in Small Pieces
http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html
which describes many flavors of Lisp interpreters and compilers.

You might also want to look into the internal representations of existing compilers, even complex ones. For that, you could even play with GCC MELT (see http://gcc-melt.org/ for more), a high level domain specific language to work on GCC internals.

Regards.
--
Basile Starynkevitch http://starynkevitch.net/Basile/
[Lisp compilers do some really interesting things, and since the language handles
data structures so well, the code can be surprisingly easy to follow once you
spend an hour or two learning the basics of the language. -John]

Johann 'Myrkraverk' Oskarsson

unread,

Jun 6, 2012, 12:52:58 PM6/6/12

to

bas...@starynkevitch.net writes:

> On Tuesday, April 17, 2012 11:28:46 PM UTC+2, (unknown) wrote:
>> Guys, I'm having a bear of a time finding a good practical language
>> and OS agnostic text on writing a compiler. I'm weak in math and not
>> interested in the theoretical details.
>
> In addition to all the good advice given by others, you might be
> interested by Christian Queinnec's Lisp in Small Pieces
> http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html
> which describes many flavors of Lisp interpreters and compilers.

I have not read the English version, only the 2nd edition in French,
Principes d'Implantation de Scheme et Lisp.

Each chapter has its own version of a complete interpreter implementing
the features discussed. So it's quite a down to the earth discussion.

Even though it discusses lisp with an implementation in lisp it's well
worth reading, or browsing through, for any language or compiler
project.

But you need to be able to read lisp. Which may or may not be worse
than math.

It goes over lisp features, one at a time. The difference between lisp
1 and 2; that is the difference between lexical and dynamic scoping and
the andvances & difficulties each imposes on the interpreter.

Recursion, single and mutual. And how each lisp version influences the
implementation.

It goes into continuations and control flow. A discussion on how
exceptions are handled in the by interpreter.

A chapter on side effects.

For the mathimatically inclined, a chapter in implementing Lisp in
Lambda Calculus. To me, this was amusing, but not apparently relevant
to practical applications.

The chapters I haven't read yet are (or recently enough to comment) are
Fast Interpretation, Compilation, Reflection, Macros (not to be confused
with the C preprocessor), Compiling to C and Essence of an Object
System.

--
Johann Oskarsson http://www.2ndquadrant.com/ |[]
PostgreSQL Development, 24x7 Support, Training and Services --+--
|
Blog: http://my.opera.com/myrkraverk/blog/