Simple constant folding in bison parser

Jeremy Fitzhardinge

unread,

Aug 5, 1992, 1:59:15 AM8/5/92

to

I'm writing an experimental compiler for a C-like language, mainly to
experiment with optimisation. I'm not very interested in writing my own
scanner and parser, so I'm using flex and bison.

One thing that occured to me was that the grammar can be made to recognize
simple constant expressions, and generate a parse tree node for the value
of the expression rather than the nodes for the operator and constant
arguments.

The restult looked something like this (much abbreviated):

%left '+'
%left '*'
%right UNARY

expr : factor
| expr '+' expr { $$ = mknode(Add, $1, $3); }
| expr '*' expr { $$ = mknode(Mult, $1, $3); }
/* etc */

factor : VAR { $$ = mkvar($1); }
| const { $$ = mknum($1); }
| '-' expr %prec UNARY { $$ = mkun(Neg, $2); }

const : INT { $$ = $1; }
| const '+' const { $$ = $1+$3; }
| const '*' const { $$ = $1*$3; }

This has a number of problems. Firstly, it immedately introduces
reduce/reduce conflices, because something like "2+3" can reduce to either
(+ 2 3) or (5). The other problem is that an expression like 3+x will not
parse, because the parser gets to a state like

const '+' . const

so the VAR token is a parse error. What I would like it to do for such an
input is to keep shifting until it gets the third token and can thus tell
whether it should reduce to "const '+' const" or "expr '+' expr". I have
a suspicion that this is something an LR parser can't do...

Is there some way this can be done using this or another approach with
yacc (or even some other freely available parser generator - this is less
popular, since I don't really want to rewrite what I have).

Thanks. Mail me replies, I'll summarize later.
--
jer...@softway.sw.oz.au ph:+61 2 698 2322-x122 fax:+61 2 699 9174
[I never tried to do in in the parser, but rather in the routines that build
the parse tree, mknode() in this case. -John]
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.

Dale R. Worley

unread,

Aug 10, 1992, 3:48:49 PM8/10/92

to

jer...@sw.oz.au (Jeremy Fitzhardinge) writes:
One thing that occured to me was that the grammar can be made to recognize
simple constant expressions, and generate a parse tree node for the value
of the expression rather than the nodes for the operator and constant
arguments.

This is handled by duplicating all the nonterminals into a ".c"
(constant expression) and ".nc" (non-constant expression) version:

expression : expr.c
| expr.nc

expr.c : factor.c
| expr.c '+' factor.c

expr.nc : factor.nc
| expr.nc '+' factor.nc
| expr.nc '+' factor.c
| expr.c '+' factor.nc

factor.c : CONST
| '(' expr.c ')'

factor.nc : VAR
| '(' expr.nc ')'

Etc.

Really, what you're doing is constructing a synthesized (passed-upwards)
attribute (constant vs. non-constant), but most LALR parser generators
don't support that. When you're parsing C, you want a lot of these
attributes, and you want the attributes to control some aspects of the
parsing. The result is that some productions are cloned 6 or 8 times.
Ugh.

Dale Worley Dept. of Math., MIT d...@math.mit.edu
--
[This certainly works, but I'd think it'd be a lot easier to handle
synthesized attributes in the value cells, where they're easy to
construct and pass up the tree. -John]

Willem Jan Withagen

unread,

Aug 17, 1992, 4:24:58 AM8/17/92

to

d...@kronecker.mit.edu (Dale R. Worley) writes:
>
>This is handled by duplicating all the nonterminals into a ".c"
> (constant expression) and ".nc" (non-constant expression) version:
>

> Really, what you're doing is constructing a synthesized (passed-upwards)
> attribute (constant vs. non-constant), but most LALR parser generators
> don't support that. When you're parsing C, you want a lot of these
> attributes, and you want the attributes to control some aspects of the
> parsing. The result is that some productions are cloned 6 or 8 times.
> Ugh.

Well just a little nit pick here. I do not consider this part of a
parser. Determining wether an expression is constant or not is part of
constant folding, which is usually not done while parsing. But the
taken approach could be made to work.

> [This certainly works, but I'd think it'd be a lot easier to handle
> synthesized attributes in the value cells, where they're easy to
> construct and pass up the tree. -John]

The following is how I determine whether an expression is constant or
not using one of the tools in the Cocktail toolset.

Please note that the grammar is totally separated from this,
and that rules xxxxx = {........}. give the calculations for a specific
node in the tree build from the input file.
The order of the attribute evaluation is however determined by the
requirements of the rules: Obviously one can only calculate the Object in
Qualid after the calculation of the environment Env, ....

DECLARE
/* What is the expression type.
*/
Expr
= -> [ Typ: tType OUT ]
.
/* Is the expression (so far) constant
/* and what is the value.
*/
Expr
= -> [ IsConst: Boolean OUT ]
[ Val :tValue ]
.
RULE
UniExpr = { Typ := UniResultType( Expr:Typ, Operator );
IsConst := UniIsConstant( Expr:Typ, Operator );
Val := UniVal( Expr:Typ, Operator );
}.
BinExpr = { Typ := BinResultType( LExpr:Typ, Operator,
RExpr:Typ);
IsConst := BinIsConstant( LExpr:IsConst, Operator,
RExpr:IsConst);
Val := BinVal( LExpr:Val, Operator, RExpr:Val);
}.
IntConst = { Typ := nTypeINTEGER;
Val := CVal;
IsConst := True;
}.
RealConst = { Typ := nTypeREAL;
Val := CVal;
IsConst := True;
}.
/* Process the reference to an identifier
*/
Qualid = { Object := IdentifyNameEnv(Ident, Env);
Typ := GetObjectType(Object);
IsConst := IsSimpleConstant(Object);
Val := { if (IsConst) {
Val = GetObjectValue(Object);
} else {
Val = NoValue;
}
};

Willem Jan Withagen.
--
Digital Information Systems Group, Room EH 10.35
Eindhoven University of Technology
P.O. 513 Tel: +31-40-473401,Fax: +31-40-448375
5600 MB Eindhoven The Netherlands
Internet: w...@eb.ele.tue.nl
X400: C=nl;ADMD=400net;PRMD=surf;O=tue;OU=ele;OU=eb;S=WJW;
[The GMD Cocktail toolset has been most recently discussed in message
92-03-025. It's available for FTP. -John]

Hunk

unread,

Aug 17, 1992, 1:28:49 PM8/17/92

to

w...@eb.ele.tue.nl (Willem Jan Withagen) writes:
>Well just a little nit pick here. I do not consider this part of a
>parser. Determining wether an expression is constant or not is part of
>constant folding, which is usually not done while parsing. But the
>taken approach could be made to work.

A expression evaluation routine I wrote for an expression parser in a
recently released assembler (C-expression syntax is handled) is set up in
such a way that all constant subexpressions will be reduced to leaf nodes
during parsing. The question about whether a given expression is constant
or not doesn't even arise because it's already been handled by the parser.
Basically, if the expression tree returned by the parser is a leaf node
then it's a constant, otherwise it's not.

Dale R. Worley

unread,

Aug 17, 1992, 3:25:22 PM8/17/92

to

d...@kronecker.mit.edu (Dale R. Worley) writes:

Really, what you're doing is constructing a synthesized (passed-upwards)
attribute (constant vs. non-constant), but most LALR parser generators
don't support that. When you're parsing C, you want a lot of these
attributes, and you want the attributes to control some aspects of the
parsing. The result is that some productions are cloned 6 or 8 times.
Ugh.

[This certainly works, but I'd think it'd be a lot easier to handle

synthesized attributes in the value cells, where they're easy to
construct and pass up the tree. -John]

It would be for constant folding, but in C, some of the attributes affect
the parsing process, and so can't be added on in a layer of processing
that the parser is blind to. If I remember correctly, an important action
is to classify declarators as to whether the identifier being declared is
(at present) a typedef name or not. Typedef-redeclaring declarators can
only appear in declarations that contain "type specifiers" (i.e., int,
float, struct, etc.). If this restriction is not expressed in the grammar
productions, the grammar is still unambiguous (I think), but it is not
LALR(1).

What one would really like to write in the parser is a synthesized
attribute (which is just passed upward in most declarator-building
productions), and then constrain the appropriate productions to use only
declarators with the proper values of the attribute.

Dale Worley Dept. of Math., MIT d...@math.mit.edu
--

Tim Pierce

unread,

Aug 19, 1992, 11:53:46 AM8/19/92

to

The discussion about constant folding has intrigued me a little bit, and
I've been a little stumped figuring out how I would implement this in the
general case. Writing code into mknode() to fold the addition or
multiplication of two nodes with constant value makes sense to me, for
example. But what about an expression such as "4 + y + 4"? As I see it,
a shift-reduce parser following the rule

expr -> expr + expr
| CONST
| ID

would process this expression as follows:

4 + y + 4
CONST + y + 4
expr + y + 4
expr + ID + 4
expr + expr + 4
expr + 4
expr + CONST
expr + expr
expr

At no time does the parser have any inkling that there are two constant
terms in this expression that can be folded together. In order to encode
this recognition, I imagine you'd have to fix the parser to rearrange
terms in some order that would permit constant folding. I've checked the
dragon book (second edition) but they seemed pretty vague on
implementation ideas. Any thoughts?
--
Tim Pierce, twpi...@amherst.edu, (BITnet: TWPIERCE@AMHERST)
[The only approach I've seen is pretty much brute force -- when you have a
subtree with multiple instances of an associative operator, flatten the
tree into a list, sort the list to put all the constants together,
collapse the constants, and remake a tree. It is my dim recollection that
the Ritchie PDP-11 compiler actually did this, because you can get
significant savings in some kinds of addressing expressions. -John]

Scott Amspoker

unread,

Aug 20, 1992, 11:41:14 AM8/20/92

to

ma...@csd4.csd.uwm.edu (Hunk) writes:
>A expression evaluation routine I wrote for an expression parser in a
>recently released assembler (C-expression syntax is handled) is set up in
>such a way that all constant subexpressions will be reduced to leaf nodes
>during parsing. The question about whether a given expression is constant
>or not doesn't even arise because it's already been handled by the parser.
>Basically, if the expression tree returned by the parser is a leaf node
>then it's a constant, otherwise it's not.

It all depends on what you call a constant. The address of a global
object is a constant and may be used in an initializer list:

char c[10];
char *cp = &c[2];

The address of c[2] is a constant but cannot be folded by the compiler.
Because of this, we have had to use a flag in a parse tree node to
indicate the 'constant' status of the sub-expression.

--
Scott Amspoker, Basis International, Albuquerque, NM
sc...@bbx.basis.com
[I've worked with compilers that considered a constant to be a numeric
offset plus an optional static symbol address. This can be very convenient
for generating addressing code. -John]

Buehlmann Thomas

unread,

Aug 21, 1992, 7:51:46 AM8/21/92

to

twpi...@amhux1.amherst.EDU (Tim Pierce) writes:

>The discussion about constant folding has intrigued me a little bit, and
>I've been a little stumped figuring out how I would implement this in the
>general case. Writing code into mknode() to fold the addition or
>multiplication of two nodes with constant value makes sense to me, for
>example. But what about an expression such as "4 + y + 4"? As I see it,
>a shift-reduce parser following the rule

>expr -> expr + expr
> | CONST
> | ID

Watch out, this optimization may be inherently unsafe since you change the
semantical meaning between the optimized version and the non-optimized
version in such a way that different results may be produced. Just two
short examples:

1) x/12 * 12
You may reduce this to x, but what happens, if x is not divisible by 12?
In this case you obviously got a clash between the mathematical meaning of
the expression and the 'procedural' meaning the programmer wanted to
achieve.

2) assume the maximal size of int be 16!
12 + x + 5
depending on x and the implementation you might get different results,
maybe even the compiler complains while folding the constants. However the
writer might have known that x is always strictly less than -1!

Willem Jan Withagen

unread,

Aug 21, 1992, 4:44:59 AM8/21/92

to

you write:
=> example. But what about an expression such as "4 + y + 4"? As I see it,
=> a shift-reduce parser following the rule
=>
=> expr -> expr + expr
=> | CONST
=> | ID

And John says:
=> [The only approach I've seen is pretty much brute force -- when you have a
=> subtree with multiple instances of an associative operator, flatten the
=> tree into a list, sort the list to put all the constants together,
=> collapse the constants, and remake a tree. It is my dim recollection that
=> the Ritchie PDP-11 compiler actually did this, because you can get
=> significant savings in some kinds of addressing expressions. -John]

Now I don't want to open another tarpit again, but this certainly is going
to have effects on the semantics of the used program. When it is only
applied to address calculation which comes from arrays, ... then the user
would not always have to find out. But what about:
y := maxint-1;
...... (nothing for y)
x := maxint-y+10;
This might or might not give the correct results.

And it gets even more tricky when the user has used ()'s to make shure that
thing were done in the proper order.
- Do we then annotate the intermediate code that it should not be
reorganised?
- Or do we only reorganise when we can show that there is no
danger?
And how do we show this?
- what about Floating point stuff?

Willem Jan

PS: In the compiler I'm doing I've decided to put an IM->IM
cse-eliminator (as attribute evaluator) which can be applied
whenever on the IM-tree.
These questions arise from the fact that I think that agresive
actions could do more harm than that they are worth.

--
Digital Information Systems Group, Room EH 10.35
Eindhoven University of Technology
P.O. 513 Tel: +31-40-473401,Fax: +31-40-448375
5600 MB Eindhoven The Netherlands
Internet: w...@eb.ele.tue.nl
X400: C=nl;ADMD=400net;PRMD=surf;O=tue;OU=ele;OU=eb;S=WJW;

[This was for integer expressions in C, which traditionally haven't paid
much attention to overflows. You're right, though, particularly for floating
point expressions users will not appreciate arbitrary rearragement. -John]

Jonathan Eifrig

unread,

Aug 25, 1992, 7:37:29 PM8/25/92

to

Our Moderator asks:
>[Are there specification languages that would allow you do define your
>language precisely enough to tell you mechanically whether a proposed
>optimization was valid? -John]

In general, of course not: any such tool could, among other such
miraculous feats, decide the halting problem. I think that the best one
could hope for is a mechanical proof checker/proof development tool, sort
of like Nuprl for code transformations.

Curiously, I was wondering something related: How many people are
using formal semantic methods to either (1) generate a compiler or (2)
prove a compiler correct in "real life?" My gut feeling was "not many,"
but then I started thinking about some of the transformational and
continuations-based compilation approaches that have been published in the
last few years, and started wondering.

In principle, it is possible to write an operational semantics for
a language in, say, natural deduction style, and mechanically translate
this into CPS-style interpreter for the language, using the proof-rules as
primitive operations of the interpreter, if the semantics for the language
has certain reasonable properties (basically, the proof rules have to be
"deterministic": no two proof rules should have unifying conclusions.)
Partially evaluating this interpreter then yields _some_ sort of compiler,
albeit an inefficient one. I don't know of any example of such a
compiler, however. More likely, the proof rules are used to verify that
the initial translation of the source language into some standard
intermediate language is faithful, and then the intermediate language
compiled. (See kelsey89a.)

As long as we're on the subject of semantics tools, I have to give
a plug to the folks from INRIA and their Centaur system. It's a package
of integrated tools, allowing one to (1) define an abstract syntax for a
language and its concrete representation, automatically constructing a
parser and lexer for the language, (2) define various "pretty-printers"
for the abstract syntax, yielding printers and syntax-directed editors of
various sorts, (3) give a natural-deduction style operational semantics of
the language, similar to the Harper-Honsell-Plotkin LF system,
automatically constructing an interpreter for the language, and (4) create
an integrated environment for playing around with your new language, via
X. It's a great tool for playing around with little languages with novel
features. It's not free, but not real expensive, either; only a few
hundred bucks. Contact "centaur...@mirsa.inria.fr" for more info.

Ref's:
------

@inproceedings{harper87,
author = "Robert Harper and Furio Honsell and Gordon Plotkin",
title = "A Framework for Defining Logics",
booktitle = lics2,
year = 1987,
pages = "194--204",
abstract = "The Logical Framework (LF) is a sytem for defining a wide
class of logics. It is based on a general treatment of syntax, rules, and
proofs in terms of a typed lambda-calculus with dependent types. Syntax
is treated in a style similar to, but more general than, Martin-Lof's
system of arities. The treatment of rules and proofs focuses on the
notion of a {\em judgement}. Logics are encoded in the LF via a new
principle, the {\em judgements as types} principle, whereby each judgement
is identified with the type of its proofs. This allows for a smooth
treatment of discharge and variable occurence conditions and leads to a
uniform treatment of rules and proofs whereby rules are viewed as proofs
of higher-order judgements and proof checking is reduced to type checking.
An important benefit of our treatment of formal systems is that
logic-independent tools such as proof editors and proof-checkers can be
constructed."
}

@inproceedings{kelsey89a,
author = "Richard Kelsey and Paul Hudak",
title = " Realistic Compilation by Program Transformation",
booktitle = popl16,
year = 1989,
pages = "281--292",
abstract = "Using concepts from denotational semantics, we have produced a
very simple compiler that can be used to compile standard programming
languages and produces object code as efficient as that of production
compilers. The compiler is based entirely on source-to-source
transformations performed on programs that have been translated into an
intermediate language resembling the lambda calculus. The output of the
compiler, while still in the intermediate language, can be trivially
translated into machine code for the target machine. The compilation by
transformation strategy is simple: the goal is to remove any dependencies
on the intermediate language semantics that the target machine cannot
implement directly. Front-ends have been written for Pascal, BASIC, and
Scheme and the compiler produces code for the MC68020 microprocessor."
}
--
Jack Eifrig (eif...@cs.jhu.edu) The Johns Hopkins University, C.S. Dept.

Dave Berry

unread,

Aug 26, 1992, 11:15:40 AM8/26/92

to

We've done some work on producing tools from operational semantics. In
our case we've looked at debuggers (which are harder than compilers). In
my thesis I looked at the definition of evaluation step in terms of
operational semantics, and described a system that generated animating
interpreters from an operational semantics. Fabio da Silva is just
completing his thesis on another, more rigorous, approach. He includes an
algorithm from generating a deterministic semantics from a certain class
of non-deterministic semantics.

Both our systems require some form of determinacy restriction on the
semantics. I distinguish between a truly operational semantics, and one
written in the natural semantics style but that doesn't have an immediate
operational interpretation.

Unfortunately neither of us have published much. I just haven't had the
time since completing my thesis. We hope to produce a joint paper
sometime. As yet this is still theoretical work, but something may come
of it someday.

Dave.

Jonathan Bowen

unread,

Aug 26, 1992, 9:32:48 AM8/26/92

to

eif...@beanworld.cs.jhu.edu (Jonathan Eifrig) writes:
> Curiously, I was wondering something related: How many people are
>using formal semantic methods to either (1) generate a compiler or (2)
>prove a compiler correct in "real life?" My gut feeling was "not many,"
>but then I started thinking about some of the transformational and
>continuations-based compilation approaches that have been published in
>the last few years, and started wondering.

At Oxford, we are studying the verification of compiling specifications,
and their rapid prototyping using logic programming. The compiler is
specified as a set of theorems (to be proved correct with respect to the
refinement ordering of the language) which turn out to be in the form of
Horn clauses in general. A novel aspect of the approach (due to Prof
C.A.R. Hoare) is that the target machine is defined in terms of an
interpreter written in the high-level language to be copiled. This allows
the proofs to be undertaken largely algebraically using laws about the
programming laguage. We have used variants of Occam and the transputer as
own paradigm, and are currently investigating compilation to a "normal
form" that could allow "code generation" down to either a traditional
instruction set, or a netlist of hardware components for implementation on
a Field Programmable Gate Array, or a combination of both. We have also
used OBJ3 as a term rewriting system, that also allows a rapid prototype
implementation of a compiler, rewriting the source program to a normal
form that is close to object code and are considering code optimisation
which is often lacking in formal approaches. We have even considered how a
decompiler may be produced almost directly from the compiling
specification since the logic program specifies a relation that is
essentially reversible (although termination problems, etc., must be
considered). Constraint Logic Programming (CLP) may well help to solve
some of the problems of using logic programming by allowing a more
declarative approach to the implementation of the constraint predicates
involved in the compiling theorems.

Much of the work (and the inspiration behind its inception) is due to the
European collaborative ESPRIT Basic Research ProCoS project ("Provably
Correct Systems"). This project is investigating verification techniques
for the entire development process from requirements through to
specification, program, object code and ultimately the hardware itself.
Another group on the project at Kiel University (in Germany) are
investigating the verification of a more realistic compiler, including the
bootstrapping a verified compiler, using an operation semantics approach,
but this is a longer term problem.

Below are some references for those interested in following up the various
aspects of this work.

An overview of the verification and rapid prototyping approach:

Bowen JP, He Jifeng, Pandya PK.
An approach to verifiable compiling specification and prototyping.
In: Deransart P, Ma\l{}uszy\'nski J (eds)
Programming Language Implementation and Logic Programming.
International Workshop PLILP 90.
Springer-Verlag, 1990, pp 45--59
(Lecture notes in computer science no. 456)

A comparive study of using theorem provers (including OBJ3):

Augusto Sampaio.
A comparative study of theorem provers:
proving correctness of compiling specifications.
Programming Research Group Technical Report TR-20-90, 1990, 50 p.

A presentation of the underlying verification approach:

Hoare CAR.
Refinement algebra proves correctness of compiling specifications.
In: Morgan CC, Woodcock JCP (eds)
3rd Refinement Workshop.
Springer-Verlag, 1991, pp 33--48
(Workshops in computing)
(Also issued as a Programming Research Group Technical Report PRG-TR-6-90)

An overview of the work on the ProCoS project:

Hoare CAR, He Jifeng, Bowen JP, Pandya PK.
An algebraic approach to verifiable
compiling specification and prototyping
of the ProCoS level 0 programming language.
In: Directorate-General of the
Commission of the European Communities (eds)
ESPRIT '90 Conference Proceedings, Brussels.
Kluwer Academic Publishers B.V., 1990, pp 804--818

A more detailed presentation of the ProCoS work:

He Jifeng, Olderog ER (eds).
Interfaces between Languages for Concurrent Systems. Vol 2,
ESPRIT BRA 3104 Provably Correct Systems
ProCoS Draft Final Deliverable,
October 1991 (To be issued as a Technical Report, DTH, Denmark)

Another approach to compiler verification (using operational semantics):

von Karger, B. (ed).
Compiler development. Vol 3,
ESPRIT BRA 3104, Provably Correct Systems
ProCoS Draft Final Deliverable,
October 1991 (To be issued as a Technical Report, DTH, Denmark)

An overview of compilation directly into hardware:

Page I, Luk W.
Compiling Occam into field-programmable gate arrays.
in Moore W, Luk W (eds),
FPGAs,
Oxford Workshop on Field Programmable Logic and Applications.
Abingdon EE\&CS Books, 15 Harcourt Way, Abingdon OX14 1NV, UK,
1991

A detailed account of the logic programming approach to decompilation:

Jonathan Bowen.
>From Programs to Object Code and back again using Logic Programming.
ESPRIT II REDO Project Document 2487-TN-PRG-1044, 1991.

An example of a real-time programming language and a compiler:

He Jifeng and Jonathan Bowen.
Time Interval Semantics and Implementation of a Real-Time
Programming Language.
In Proc. Fourth Euromicro Workshop on Real-Time Systems,
IEEE Computer Society Press, pp~110--115, June 1992.

Information on the rapid prototyping approach using logic programming:

Jonathan Bowen.
>From Programs to Object Code using Logic and Logic Programming.
In: Robert Giegerich and Susan L.\ Graham (eds),
Code Generation -- Concepts, Tools, Techniques,
Proceedings of the International Workshop on Code
Generation, Dagstuhl, Germany, 20--24 May 1991.
Springer-Verlag, Workshops in Computing, August 1992.

An overview of two approaches to decompilation:

Bowen JP, Breuer PT.
Decompilation.
In: van Zuylen H (ed)
The REDO Handbook:
A Compendium of Reverse Engineering for Software Maintenance,
chapter 9,
Wiley, 1992 (To appear)

A more detailed account of decompilation using functional/logic approaches:

Peter Breuer and Jonathan Bowen.
Decompilation: the Enumeration of Types and Grammars.
Programming Research Group Technical Report TR-11-92, 1992.
(A shortened version is to be presented at the Workshop on Static
Analysis 92, Bordeaux, France, 23--25 September 1992.)

Hardware compilation:

He Jifeng, Ian Page and Jonathan Bowen.
{\em Normal Form Approach to FPGA Implementation of occam},
ProCoS project document [OU HJF 9/2], January 1992.
(Draft, available on request)

An example of compiler optimisation:

He Jifeng and Jonathan Bowen.
Specification, Verification and Prototyping of an Optimized Compiler.
1992. (Draft, available on request)

Programming Research Group Technical Reports and REDO project documents
are available from our librarian on <lib...@comlab.ox.ac.uk>.
--
Jonathan Bowen, <Jonatha...@comlab.ox.ac.uk>
Oxford University Computing Laboratory.

Paul Eggert

unread,

Aug 28, 1992, 4:59:16 PM8/28/92

to

he...@zoo.toronto.edu (Henry Spencer) writes:
> In fact, ANSI C handed down a much stricter line on this:....
> The only restriction is that if overflows are
> visible, optimizations can't add or remove overflows.

Actually, in ANSI C, the behavior on overflow is undefined, so a
conforming implementation optimization can remove overflows.

Spencer's right that the Ritchie compiler's treatment of integer overflow
was broken, but unfortunately the C Standard lets a compiler behave in
this way. (C's not alone in this regard, of course; e.g. the Fortran
standard has the same problem.) That's too bad, since the problems that
it leads to can be quite mysterious. For example, in the following code:

i = 0;
if (0 < j)
i = j;
assert (0 <= i);

integer overflow can cause the assertion to fail!

There's a trick to this, of course. Here's a complete C program
containing the above code. Assume a 32-bit int.

#include <assert.h>
int big = 2147483647;
main() {
int i, j;
j = big + 1; /* This overflows. */

i = 0;
if (0 < j)
i = j;
assert (0 <= i);
}

Suppose the compiler optimizes `main's body into something like this:

j = big + 1; /* This overflows. */

i = 0;
if (0 <= big)
i = j;
assert (0 <= i);

The C Standard allows this optimization, because it works when `big + 1'
does not overflow, and the code can do anything it pleases when `big + 1'
_does_ overflow. However, I suspect most programmers would say this
optimization is incorrect, because it gives `i' a negative value when
`big' is INT_MAX.

This is not a contrived example. I derived the above program from a bug I
encountered when building DEC SRC Modula-3 with GCC 2.2.2 -O on a Sparc.
Happily, though, the GCC maintainers are programmers, not language
lawyers; they've agreed that this behavior is a bug, and it'll be fixed in
the next release.

Henry Spencer

unread,

Aug 26, 1992, 4:34:53 PM8/26/92

to

w...@eb.ele.tue.nl (Willem Jan Withagen) writes:

>Now I don't want to open another tarpit again, but this certainly is going

>to have effects on the semantics of the used program...

>[This was for integer expressions in C, which traditionally haven't paid

>much attention to overflows...

In fact, ANSI C handed down a much stricter line on this: expressions
execute as written, with the only escape clause being the "as if" rule
(which says that the compiler can do anything it pleases if the user can't
legally tell the difference). You can still do the things that old C
compilers do, but only if you are generating code that ignores integer
overflow. Alternatively, you may still be able to do some optimization if
you generate code that catches the cases where overflow would matter --
ANSI C doesn't constrain what *happens* when overflow occurs, so the
program doesn't have to dump core in precisely the same way as the
unoptimized one would. The only restriction is that if overflows are
visible, optimizations can't add or remove overflows. This is for
integers, of course; for floating point, optimizations are a lot more
tightly constrained under "as if".

A subtle point worth mentioning here... A *compiler* that ignores the
possibility of overflow might not generate *code* that ignores overflows.
On many machines, overflow is figured into how some of the conditional
branches work. If the generated code is to truly ignore overflows, the
compiler has to be aware of the issue and be careful about the code it
generates. The original Ritchie C compiler, for example, generated code
that could malfunction in peculiar ways in the presence of overflow.
Consider the following with 16-bit ints:

int foo;

foo = 32767;
if ((foo += 200) < 0)
printf("%d is negative\n", foo);
else
printf("%d is non-negative (!)\n", foo);

Guess which message you got out of code from the Ritchie compiler...
--
Henry Spencer @ U of Toronto Zoology, he...@zoo.toronto.edu utzoo!henry

Jonathan Eifrig

unread,

Aug 28, 1992, 6:11:48 PM8/28/92

to

Jonatha...@prg.oxford.ac.uk (Jonathan Bowen) writes:
>eif...@beanworld.cs.jhu.edu (Jonathan Eifrig) writes:
>> Curiously, I was wondering something related: How many people are
>>using formal semantic methods to either (1) generate a compiler or (2)
>>prove a compiler correct in "real life?"

>At Oxford, we are studying the verification of compiling specifications,

>and their rapid prototyping using logic programming.

I think perhaps you're misinterpreting my posting, and I just want
to clarify my question: I'm curious as to whether or not any of the
semantics-based techniques for compiler generation have made it
"off-campus" into the real world. My rule of thumb is whether or not some
scheme has generated a compiler that is intrinsically interesting to
someone other than those who made it.

For example, the ORBIT compiler for Scheme was actually used at
Yale as its installed Scheme compiler, I believe, as part of its T system;
this is the earliest example of a continuation-passing style compiler that
I know of that made it out into the real world. (Earlier examples
welcomed!) Hundreds, if not thousands, of commercial products have been
made with lex/yacc parsers, moving LALR parsing into the real world.

But has anyone used, say, Lee's MESS system to make something like
the "Gorgunza C" compiler? And if not, why not?

ObRefs:

@string{scc86 = "Proceedings of the 1986 ACM SIGPLAN Symposium on Compiler
Construction"}

@inproceedings{kranz86,
author = "David Kranz and Richard Kelsey and Jonathan Rees and Paul Hudak
and James Philbin and Norman Adams",
title = "ORBIT: An Optimizing Compiler for Scheme",
booktitle = scc86,
year = 1986,
pages = "219--253"
}

@book{lee89,
author = "Peter Lee",
title = "Realistic Compiler Generation",
publisher = "MIT Press",
year = 1989,
series = "Foundations of Computing"
}
[Yes, ORBIT is the compiler in T. My impression from reading Lee's book
is that MESS is indeed pretty messy, not well enough packaged for others
to use. And packaging is important: yacc became the most widely used
parser generator not because it broke any new ground in LR parsing ("ya"
stands for "yet another", after all) but because it was a lot easier to
use than any of the others available in the early 1970s. -John]

fa...@dcs.edinburgh.ac.uk

unread,

Sep 18, 1992, 4:25:02 AM9/18/92

to

Dave Berry <d...@dcs.ed.ac.uk> writes:

> We've done some work on producing tools from operational semantics. In
> our case we've looked at debuggers (which are harder than compilers). In
> my thesis I looked at the definition of evaluation step in terms of
> operational semantics, and described a system that generated animating
> interpreters from an operational semantics. Fabio da Silva is just
> completing his thesis on another, more rigorous, approach. He includes an
> algorithm from generating a deterministic semantics from a certain class
> of non-deterministic semantics.

This is to announce the availability of a report in which I summarise the
main results of my PhD Thesis. My thesis should also become available in
a few weeks. Both report and thesis can be obtained from Lorraine Edgar
(l...@dcs.ed.ac.uk). If anyone else is interested in the issues of
correctness proofs of compilers and debuggers I will be very glad to
exchange ideas and experiences.

@TechReport{,
author = "Fabio Q. B. da Silva",
title = "Correctness Proofs of Compilers and Debuggers: an
Overview of an Approach Based on Structural
Operational Semantics",
institution = "LFCS, Department of Computer Science, University of
Edinburgh",
year = "1992",
OPTtype = "",
number = "ECS-LFCS-92-233",
address = "Edinburgh, EH9 3JZ, Scotland",
month = "September",
OPTnote = ""
}

@PhdThesis{,
author = "Fabio Q. B. da Silva",
title = "Correctness Proofs of Compilers and Debuggers: an
Approach Based on Structural Operational Semantics",
OPTschool = "LFCS, Department of Computer Science, University of
Edinburgh",
year = "1992",
OPTaddress = "Edinburgh, EH9 3JZ, Scotland",
OPTmonth = "",
note = "Thesis submitted for the degree of Ph.D., LFCS, Department
of Computer Science, University of Edinburgh,
Edinburgh, EH9 3JZ, Scotland"
}

Fabio
--
Fabio Q. B. da Silva

University of Edinburgh JANET: fa...@uk.ac.ed.dcs
LFCS - Computer Science Dept. UUCP: ..!mcsun!ukc!lfcs!fabio
JCMB - The King's Buildings ARPA: fabio%dcs.ed...@nsfnet-relay.ac.uk
Mayfield Road - Edinburgh
EH9 3JZ - Scotland