Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

standard intermediate representation for C/C++ ?

300 views
Skip to first unread message

david l. martin

unread,
Apr 30, 1992, 6:59:36 PM4/30/92
to
Is there a standard or widely accepted definition of an intermediate
representation for ANSI C or for C++? (Or even one that's well
documented, even thought it may not be in wide use.)

By intermediate representation, I mean the sort of information which a
compiler front-end passes to the back-end. Such a definition could
obviously be in use by compiler builders, or could be used to pass
information about programs between tools in an integrated environment.
The best example I know of this is the DIANA standard for Ada.

Any information about this greatly appreciated.

Thanks.

- Dave Martin
[There's certainly no widely used standard. The RTL used in GCC is fairly
well documented in the files that come with the distribution. -John]
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.

david l. martin

unread,
May 7, 1992, 6:00:24 PM5/7/92
to
Recently I posted the following:

>Is there a standard or widely accepted definition of an intermediate
>representation for ANSI C or for C++? (Or even one that's well
>documented, even thought it may not be in wide use.)
>
>By intermediate representation, I mean the sort of information which a
>compiler front-end passes to the back-end. Such a definition could
>obviously be in use by compiler builders, or could be used to pass
>information about programs between tools in an integrated environment.
>The best example I know of this is the DIANA standard for Ada.

A couple of responses have pointed me to the RTL (Register Transfer
Language) used in GNU's gcc. However, unless I completely misunderstand
RTL, it's not really an example of what I was asking for.

According to RTL documentation, RTL describes the instructions to be
output by the compiler; that is, the instruction set of a particular
machine's architecture. What I'm seeking is a description of the syntax
and semantics of the C program being compiled (which then has to be
translated into an appropriate machine language using the RTL).

Does RTL also somehow describe the syntax and semantics of C programs? If
not, how are the syntax and semantics of C programs represented in gcc, or
other compilers? Is it all terribly ad-hoc, or does it conform to any
well-documented representation format?

Any enlightening comments would be greatly appreciated.

Thanks.

- Dave Martin
[RTL is indeed what GCC uses between the parser and the code generator, but
you correctly surmise that it's much lower level than source code. If what
you really want is something to use in source tools, RTL isn't it. -John]

Scott Meyers

unread,
May 8, 1992, 10:57:20 AM5/8/92
to
mar...@CS.UCLA.EDU (david l. martin) writes:
| >Is there a standard or widely accepted definition of an intermediate
| >representation for ANSI C or for C++? (Or even one that's well
| >documented, even thought it may not be in wide use.)

It is highly likely that this topic -- that of a high-level intermediate
representation for C++ programs -- will be the primary focus of the
Advanced Topics Workshop that will follow the upcoming USENIX C++
conference. When details are available, I will post them.

Scott

Dave Howell

unread,
May 11, 1992, 12:49:48 PM5/11/92
to
>Is there a standard or widely accepted definition of an intermediate
>representation for ANSI C or for C++?

It's a shot in the dark, but what about OSF's ANDF format? It is much
higher level than GNU RTL, but does it keep enough of the source code
struture to be useful? Possibly. I've been out of this loop for some time
but I looked into it a year and a half ago. It should be fairly standard
by now.

Dave Howell
[I believe that ANDF is quite low level, considering that one of its goals
is to make it difficult to reverse-engineer compiled programs. And does
anyone actually use it? -John]

David Lamb

unread,
May 13, 1992, 7:30:09 AM5/13/92
to
From the lack of responses to the request for a 'high-level' intermediate
representation for C/C++: Do y'all believe a "DIANA-like" representation
would be useful if someone were to develop it? My general impression was
that DIANA had succeeded only insofar as people thought they were forced
to use it by the DoD [kinda depressing for us DIANA designers]

Preston Briggs

unread,
May 13, 1992, 12:36:14 PM5/13/92
to
Dave Howell <ro...@grok19.columbiasc.NCR.COM> writes:
>It's a shot in the dark, but what about OSF's ANDF format [as an intermediate
>langauge for code tools]?

Perhaps it should. Can anybody tell me where to find documents or papers
defining ANDF or talking about its design or talking about using it or ...
So far, it's just a big rumor to me. I'd really like to read some facts.

Thanks,
Preston Briggs
[I haven't seen any public descriptions of ANDF either, and would also like
to see one. -John]

Stephen Scalpone

unread,
May 13, 1992, 9:10:08 PM5/13/92
to
> Is there a standard or widely accepted definition of an intermediate
> representation for ANSI C or for C++?

DIANA was supposed to work as the accepted intermediate representation for
Ada. However, in practice, for a number of different reasons, all of the
Ada compilers use a proprietary IR, even those based in part on DIANA.

Instead of a common IR, the Ada world (?) is moving towards a standard
method of accessing different IRs using a procedural interface, dubbed
ASIS. The ASIS spec defines an interface that is constant between Ada
compilers. But implementation of the interface is very different from
compiler to compiler.

The ASIS spec is intended to provide a single interface for toolsmiths and
CASE tools (etc.), not for compiler writers.

Steve Scalpone
Verdix Western Operations

Stephen J Bevan

unread,
May 15, 1992, 3:17:55 AM5/15/92
to
dal...@qucis.queensu.ca (David Lamb) writes:
From the lack of responses to the request for a 'high-level' intermediate
representation for C/C++: Do y'all believe a "DIANA-like" representation
would be useful if someone were to develop it? My general impression was
that DIANA had succeeded only insofar as people thought they were forced
to use it by the DoD [kinda depressing for us DIANA designers]

I think the problem may be that many people don't see the point of making
a compiler generate this special AST if all they are going to do is then
generate code.

However, IMHO this ignores some of the good uses that a standard AST can
be put to. Some examples, that spring to mind are :-

+ pretty printers.
Tools like indent for C are hopless as they consistently fail on
source that doesn't match its limited input format.

+ documentation extractors
The common solution is to awk/sed/perl/grep sections out of the
source, but as with the pretty printing, they are highly reliant on
particular formats

+ tags generators
These can be made independent of the input format.
e.g. etags fails on some C I have because the declarations don't
match the special format it uses to recognise functions.

+ code analysers
e.g. somebody has written a program that trawls over the standard AST
produced by the SRC Modula III compiler and finds which procedures
are never called.

All the above can be written without a standard AST, but they are a
_lot_ easier to write if there is one, and don't require the user to
delve into the guts of a compiler or invent ad-hoc solutions with
crufty little languages like awk, perl ... etc.

Stephen J. Bevan be...@cs.man.ac.uk
[Can you really do a DIANA-like thing for C? I'd think that the syntactic
messiness introduced by the preprocessor would make it awfully hard to
come up with anything that was simultaneously easier to handle than the
original source and didn't lose something significant in translation. -John]

Nick Haines

unread,
May 15, 1992, 12:57:17 PM5/15/92
to
I posted details of where people could obtain more information about
RSRE's TDF (the OSF ANDF) sometime last year (when I was working at
Harlequin). Harlequin had some documentation for it and RSRE were
producing more at the time.

From what I remember, TDF is like Ten15, in that it's a set of algebraic
lattices, using constructors to make bits of TDF out of smaller bits of
TDF (so you end up with a tree-shaped structure). A full textual
description of the tree was typically vast, because it includes verbose
constructors for list-like comprehensions, but of course there is a more
compact notation, and a binary encoding.

Nick Haines ni...@cs.cmu.edu
[Found it, it's in my UNCOL bibliography, message 91-12-026. Also see the
next message.]

Stavros Macrakis

unread,
May 15, 1992, 1:38:43 PM5/15/92
to
John says:

Can you really do a DIANA-like thing for C? I'd think that the syntactic
messiness introduced by the preprocessor would make it awfully hard to
come up with anything that was simultaneously easier to handle than the
original source and didn't lose something significant in
translation.

C macros are lexical macros rather than syntax macros, so in general a
Diana-like representation isn't feasible. However, if you're willing to
restrict yourself to syntactically well-formed macro bodies (you forbid ##
and, e.g., "#define BEGIN {"), you should be able to manage. This is done
by the ANDF producer (front end), in order to support installer (back end)
macro expansion, which is necessary for platform-dependent macro
expansion.

I can't see how to get around expanding syntactically ill-formed
macros, though.

-s
[I was more worried about things like typedef inside #if which can have
pervasive and rather messy effects, e.g. a particular name may be a scalar
or an array depending on preprocessor values. And can you tell us more about
experience with ANDF? -John]

Mick Jordan

unread,
May 15, 1992, 6:00:53 PM5/15/92
to
In article <92-0...@comp.compilers>, Stephen J Bevan <be...@computer-science.manchester.ac.uk> writes:
> dal...@qucis.queensu.ca (David Lamb) writes:
> [Would something like DIANA be useful for C or C++?]

>
> I think the problem may be that many people don't see the point of making
> a compiler generate this special AST if all they are going to do is then
> generate code.

ASTs are resource hogs, particularly of memory, and in the early 80s most
machines simply couldnt handle a DIANA-based compiler. Plus DIANA was
specified using an object-oriented language (IDL), which had to be mapped
into a non-object-oriented language language like C or ADA. A lot was lost
along the way, including readibility and performance.

My experience suggests that each tool in a programming environment will
have a particular set of requirements for the AST, including the compiler.
The skill is designing an AST that can satisfy them all, without allowing
the AST design to grow like topsy. To achieve this you need a good core
design coupled with extensibility.

> However, IMHO this ignores some of the good uses that a standard AST can
> be put to. Some examples, that spring to mind are :-
>
> + pretty printers.
> Tools like indent for C are hopless as they consistently fail on
> source that doesn't match its limited input format.

The relationship between the abstract syntax and concrete syntax is an
area that most AST specifications do not address. DIANA didn't, it only had
the notion of an "lx_comments" attribute on each node, with no definition
of how a comment was mapped to a node. Typically a given AST supports only
a canonical concrete representation, with the original lexical whitespace
removed. I used to believe that this was acceptable, but experience has
persuaded me that an AST needs to preserve all the original input and be
able to regenerate it exactly. This does not preclude a pretty-printer, of
course.

> + code analysers
> e.g. somebody has written a program that trawls over the standard AST
> produced by the SRC Modula III compiler and finds which procedures
> are never called.

For the record, the SRC Modula-3 compiler does not make available a public
AST interface. The program in question is based on another compiler
front-end, which does have a public AST interface, and is derived from the
now defunct Olivetti implementation of Modula-3. The AST specification was
strongly influenced by the DIANA work. However, it is directly specified
in a language with object types (Modula-3 itself), in which the compiler
and associated tools are also written. The end result is reasonably
efficient in both space and time, but still more resource hungry than some
people would find acceptable. You can find more about it in:

"An Extensible Programming Environment for Modula-3", Mick Jordan,
Proceedings of the Fourth ACM SIGSOFT Symposium on Software Development
Environments, Software Engineering Notes, 15, 6, Dec 1990.

A copy of the system can be had from gatekeeper.dec.com from the file
/pub/DEC/Modula-3/release/m3tk-2.06.tar.Z. You will need the standard SRC
Modula-3 distribution installed before you can build it.

Mick Jordan

Pedro Diniz

unread,
May 19, 1992, 3:26:49 PM5/19/92
to
caliban!i...@uunet.UU.NET (Iain Bason) writes:
>I have a (paper) copy of "TDF Specification" published by the Defense
>Research Agency/Electronics Division (that's in England).

I have a short advertising brochure on TDF which has the following
contact:

Dr. Nic E. Peeling
Defense Research Agency
Electronics Division
RSRE Malvern
St. Andrews Road
Great Malvern
Worcestershire WR14 3PS
United Kingdom
Phone: +44 684 895314
Fax: +44 684 894303
Internet: peeling%hermes...@relay.mod.uk

Good luck,
--
Pedro Nuno Cruz Diniz
--
Pedro Cruz Diniz Email:p...@inesc.inesc.pt Ph:+351-1-3100328 Fax:+351-1-525843
INESC - Instituto de Engenharia de Sistemas e Computadores
Rua Alves Redol, No. 9 /Apartado 10105

Stephen J Bevan

unread,
May 16, 1992, 4:45:23 AM5/16/92
to
May 92 07:17:55 GMT

[Can you really do a DIANA-like thing for C? I'd think that the syntactic
messiness introduced by the preprocessor would make it awfully hard to
come up with anything that was simultaneously easier to handle than the
original source and didn't lose something significant in translation. -John]

Considering the mess you can generate with macros and #ifdefs, I think a
good C-DIANA would be very tough (if not impossible) to design. However,
I think this says more about the design (sic) of C(++) than it does about
DIANA like standards.

Reverse engineering a standard AST from a concrete syntax is always going
to be tough. However, the simple solution is to design the language
around the abstract syntax instead of the concrete syntax. This has many
benefits, one of which is a standard AST for "free".

This is a simple idea that is almost 30 years old, however, it seems that
the many budding language designers are either ignorant of it or choose to
ignore it for some reason. Either way, it is very depressing as it just
leads to the proliferation of (syntactically) badly designed languages.

Stephen J. Bevan be...@cs.man.ac.uk

Sam Kendall

unread,
May 27, 1992, 2:07:20 PM5/27/92
to
Stephen J Bevan <be...@computer-science.manchester.ac.uk> writes:

Considering the mess you can generate with macros and #ifdefs, I think a
good C-DIANA would be very tough (if not impossible) to design. However,
I think this says more about the design (sic) of C(++) than it does about
DIANA like standards.

In his defense, Stroustrup (the designer of C++) has tried to invent
features that replace existing usages of macros and are syntactically
better-behaved. The most recent such feature is templates (generics).

The closest thing to a DIANA-like thing for C comes from ProCASE; I think
their product is called SmartSystem. They store a syntax tree, and they
also remember the mapping between macroexpansions and that syntax tree.
As far as I know this data structure is entirely proprietary and is not
directly accessible even to customers; they use it to drive their user
interface.
----
Sam Kendall ken...@CenterLine.COM
CenterLine Software Inc. (formerly Saber Software) uunet!saber!kendall
+1 617 498-3238 FAX: +1 617 868-6655

0 new messages