How can a simple programmer detect a bug in a compiler ? is there some well known verification techniques ?
I am afraid that many bugs are hidden inside compiler activated with sophisticated optimisation options, and few people can detect them: when a user program does not work, we usually think that the problem comes from the program itself, not from the compiler nor from the processor.
On the other hand, when a compilation flag is on and the program crashes, we usually think that the bug comes from the compiler not from the program...
> How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
The most general method:
Reduce the code, where you observed an possible compiler bug, until you have a small program to reproduce the error. Then give that program to the compiler writers.
The most sophisticated method:
Use an decompiler to re-create the source code from the executable, and search for differences. A decompiler may find erroneous machine code itself, when the code is not reducible to high-level statements and expressions.
There exist many other methods in between, like inspecting the disassembled source code for obvious errors. This is what a decompiler will do automatically.
A simple method is using another compiler, or other optimization options, to find out different behaviour of the executables. But this is a weak method, sensitive to uninitialized variables, wild pokes and other coding errors.
On Mon, 27 Apr 2009 14:42:19 +0200, Sid Touati wrote: > How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ? ...
Since you didn't say what language or compiler you are using, I can only answer in very general terms.
In some circumstances, optimizers are known to break code. Sometimes by optimizing away side effects. Sometimes causing heisen-bugs by changing the order of evaluations and sometimes by tripping over "clever" code.
That said, when I come across a problem, I always assume the bug is in my code until I can provide a test case that clearly demonstrates that the compiler itself is at fault. Then I have someone else review my code to make sure I'm not being an idiot.
Sid Touati wrote: > How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
> I am afraid that many bugs are hidden inside compiler activated with > sophisticated optimisation options, and few people can detect them: when > a user program does not work, we usually think that the problem comes > from the program itself, not from the compiler nor from the processor.
> On the other hand, when a compilation flag is on and the program > crashes, we usually think that the bug comes from the compiler not from > the program...
Identifying compiler bugs is usually a lot of work. Start with a program that fails somewhere in its thousands of lines of code, and spend the next few hours removing pieces until you have a minimal program that reproduces the problem (or at least a problem -- there could be more than one). In the early 90's, I spent a few years as a contractor porting C++ code from one buggy compiler to another, so I saw a lot of these. In one case, a C compiler generated bad code when the program used a pointer to a function that took a float argument. A later version of the compiler would have worked, but the customer was too cheap to upgrade. I think I spent about 12 straight hours on that one.
You'll need to be able to read compiler-generated assembly code. Ironically, you won't need to be an expert in the compiled language; I used the Annotated Reference Manual a lot, but by the time I got a C++ file down to the 20 lines or so needed to demonstrate the problem, it was pretty obvious that there was a bug.
It's all worth it when you can send a description of the problem and some sample code to the vendor, and it gets fixed.
(This was all a *lot* more fun years ago when I worked on Burroughs Large Systems and they shipped the source code and I could fix some of these in the field and report the problem to Mission Viejo along with my suggested patch.)
Sid Touati <SidTou...@inria.fr> wrote: > How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
There are some useful ideas in Flash Sheridan "Practical testing of a C99 compiler using output comparison" Software--Practice and Experience volume 37 (2007), pages1475--1488
ciao,
-- -- "Jonathan Thornburg [remove -animal to reply]" <jth...@astro.indiana-zebra.edu> Dept of Astronomy, Indiana University, Bloomington, Indiana, USA "C++ is to programming as sex is to reproduction. Better ways might technically exist but they're not nearly as much fun." -- Nikolai Irgens [Too bad SPE now costs $3278/yr. -John]
SidTou...@inria.fr (Sid Touati) wrote: > How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
If there is, I don't know it. Compiler writers generally test their compilers quite hard - there are various test suites of source code available, and all serious compiler writing organisations will develop test sets of their own and expand them as they fix bugs - but there is no simple and all-embracing method.
> I am afraid that many bugs are hidden inside compiler activated with > sophisticated optimisation options, and few people can detect them: > when a user program does not work, we usually think that the problem > comes from the program itself, not from the compiler nor from the > processor.
> On the other hand, when a compilation flag is on and the program > crashes, we usually think that the bug comes from the compiler not > from the program...
I take it you mean that when you change some compilation option and the program crashes, you suspect the compiler. At that point, the compiler is a target for suspicion, but not the only one. There can be a code bugs that fail to show up under the conditions where the program was developed, but will do so under different conditions.
A classic C example is declaring an array in function scope, and accessing off the end. If the memory you access isn't used for anything else, the program may well work. A different compiler or different optimisation may well arrange the variables of the function differently, so that now the off-the-end access tramples some other variable in a way that shows. The bug was much older than the change of compiler, and it's not the compiler's fault at all.
I've had a fair amount of experience of digging out compiler bugs, because my employment consists largely of porting a large and complicated piece of software that can be tested quite thoroughly. Every time we use a new or updated compiler, we expect to find a few code bugs that never showed up before, and a few compiler bugs too. Modern compilers seem much more reliable than those of 20 years ago, but they are not perfect, and eventual perfection seems most unlikely. We try quite hard to report the compiler bugs we find, in a usable way, and this may have a lot to do with the apparent increase in reliability that we see.
The basic process is easy to describe, but much harder to carry out. I'm going to assume C or C++ below, because they're what I'm used to doing:
First, find out what's actually going wrong. This doesn't mean "Test Foo-bar-2345 crashes when file mad_algorithm_17.cpp is compiled at -O3 or higher", although that may well be a step towards finding out what's going on. It means finding out which variables or expressions are going bad. This involves a load of painstaking debugging, which has to be done very, very carefully. It is likely to involve the discovery of workarounds along the way, such as reducing optimisation level or re-arranging code, but this is only likely, not certain. Everything that follows depends on doing this right. Expect it to take several days, not half an hour, although it gets quicker with practice.
Second, you need to be able to reproduce the problem in a completely standalone piece of code that is not part of your project. The preferred way to do it is a simple standalone program that uses only system headers (or equivalent for other languages) and demonstrates the problem. Submit this with /everything/ needed to recreate the compilation. This includes all the compiler and linker options (I prefer to write a shell script or batch file that does the job, because that's more explicit than an IDE project file usually is), the precise version of the compiler that you're using, the exact operating system that you are running on, and so on.
If you can't make a standalone program, there is another way to do it, but it requires learning the assembly language for the platform. If you can read the assembler generated by the compile and figure out what's wrong in the generated code, submitting a heavily annotated assembly listing explaining the problem, along with the source file that you compile to create it, and all the supporting details, usually seems to be acceptable. MS, Intel, HP and Sun have taken them from me and fixed the bugs, anyway.
This is not easy work. If you can do it, without being driven mad in the process, it is very worthwhile. Compiler manufacturers become fond of you, because you report bugs that they can actually find and fix, as opposed to most of what they get. But it is a job where half- measures are pointless. You have to make absolutely everything totally and unambiguously explicit. You have to nail things down so thoroughly so that a team of US lawyers being paid millions couldn't pick holes in it. Someone who programs by grabbing examples and pasting them together, and gets bored if he can't solve a bug in half an hour should not take up this game.
One compiler manufacturer whose product I'd given up using because it had too many bugs asked for a set of source and basic tests, and gave it to one of their engineers to nail down the compiler bugs. From the reports I got, he simply found out each source file that was causing a problem, and handed them over to the compiler team with the explanation "something goes wrong in here". That is no use to anyone, and the compiler team just ignored him.
> That said, when I come across a problem, I always assume the bug is in my > code until I can provide a test case that clearly demonstrates that the > compiler itself is at fault.
I can count on one hand the number of genuine compiler bugs I've been able to isolate and reproduce well enough to report as official bugs. (Even the so-called optimization bugs turned out to be my violation of some contract with the compiler, such as aliasing when I oughtn't have.)
That said, I couldn't count in all the bits of a gigabyte how many times I've heard programmers (including myself) say: "The *($@&&^ compiler is buggy!"
Sid Touati wrote: > How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
Most assembly code that I know about is not proven correct by any sort of formal method, or verification technique. There was a research project that implemented a provably correct assembler, but the code I looked at was quite strange, and the assembler itself was for an architecture I don't use IIRC.
> I am afraid that many bugs are hidden inside compiler activated with > sophisticated optimisation options, and few people can detect them: > when a user program does not work, we usually think that the problem > comes from the program itself, not from the compiler nor from the > processor.
Yes, indeed that's a problem. In some ways rigorous testing may help, and in other ways it may hinder future changes. You definitely need a test suite anyway for your compiler to verify the correctness of the code produced, but as you might expect with languages like assembly, some things may work unexpectedly due to state not changing that you didn't expect. If and when you have multiple CPUs to support the complexity increases.
Consider for example that a register has a value due to a specific instruction that came before another code pattern that depends on that value, but didn't explicitly set that value. This kind of bug happens too often. It may even pass in many cases a torture test.
> On the other hand, when a compilation flag is on and the program > crashes, we usually think that the bug comes from the compiler not > from the program...
I agree, and understand. That's why you need great test coverage. My compilers have been rather simple for the most part, but the tests end up almost being as large as the compiler itself, when I do it right.
I think in some ways having a more structured design with hypertext, and multiple layers would help a great deal to express and structure solutions for the complex problems we face, as NLS has proven. The NLS system was implemented in the 1960's and they had good ideas. My belief however about NLS is something that most people don't share in the computing industry. I have been going forward in my spare time on a project. My project is focusing on enabling hypertext, layers, and code constraints applied in a visual way. In fact the constraints have to be visual to work in any sort of sane and expressable manner.
NLS had MOL (machine-oriented language) as its base. NLS was a very powerful and interesting system to me, and without it we might not have the computer mouse. The system allowed managing complexity in ways I don't think we can do in general now, without more indirection.
On 2009-04-27, Sid Touati <SidTou...@inria.fr> wrote:
> How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
> I am afraid that many bugs are hidden inside compiler activated with > sophisticated optimisation options, and few people can detect them: when > a user program does not work, we usually think that the problem comes > from the program itself, not from the compiler nor from the processor.
Usually it is that way, program bugs are more numerous than compiler bugs, if only because the compiler is vetted by a multitude of users.
It can be simply that turning on the optimization changes memory layout that can causes hidden program bugs that mutilate memory suddenly to become critical.
In general the best idea is to first do usual debugging stuff, like using fencing heap allocators, turn on options that your compilers might have (like randomized initialization of variables, range checks, stack integrity cheching etc), as well as running in tools that can detect certain classes of errors (like valgrind)
If that doesn't help, you can quickly inspect the assembler if you have a feeling where the bug might be, and then (as the others already advised) start narrowing down.
> On the other hand, when a compilation flag is on and the program > crashes, we usually think that the bug comes from the compiler not > from the program...
It might be simply that the pointer bugs that surface without optimization have been fixed during normal development. And the ones WITH optimization only close to the release point, giving the impression that the optimizer is very bugged.
Sid Touati <SidTou...@inria.fr> writes: > How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
There have been research projects about verifying the correctness of compilers, but they have (to my knowledge) all been for fairly simple compilers. You will, basically, have to build the compiler with verification in mind to have any chance of doing this.
Another approach is to make a verifying compiler: One that checks the correctness of each compiled program. This way, you don't prove that the compiler is correct, but you catch "bad" programs.
These techniques are, however, sot for "simple programmers" (whatever that means). Also, these techniques need formal specifications of both the source language and the target machine. And it is often the case that such formal specifications themselves have bugs, so you just move the problem to a higher level of abstraction, unless you define the language by it formal semantics.
> I am afraid that many bugs are hidden inside compiler activated with > sophisticated optimisation options, and few people can detect them: when > a user program does not work, we usually think that the problem comes > from the program itself, not from the compiler nor from the processor.
And that is also the most likely case.
> On the other hand, when a compilation flag is on and the program > crashes, we usually think that the bug comes from the compiler not from > the program...
Even if your program changes behaviour when you turn on optimisation flags, this may not imply faults in the compiler, just that the language has a weak specification that leaves, say, the order of evaluating function arguments up to the compiler. The non-optimising compiler may just evaluate in order, while an optimising compiler may rearrange the order for better scheduling or such. If some of the arguments have side effects, this can change program behaviour.
If you use floating point computations and the language definition is not specific about evaluation order and precision, optimisation may change the precision of your calculations which may cause subsequent program behaviour to diverge (two values that test equal without optimisation may test not equal afterwards, or vice-versa).
That said, compilers are very large and complex pieces of software and incredibly difficult to test, since you can't normally validate the correctness of the output -- the best you can do is to test the output, so your tests are more indirect than if you test, say, a sorting functin where you can fairly easily verify that the output is a sorted permutation of the input.
So, it is not unlikely that there are errors in your compiler. The unlikely part is that you will be the first to notice the errors. Not only are compilers tested rather thoroughly, but if a compiler is in wide usage, most bugs are bound to show up fairly quickly. It is normally only when you program in a very unusual style or push the limits of the language that you find new bugs.
If you are convinced that the problem is not in your program, you should (as other posters have said) reduce your program to the smallest possible that still shows the problem and only then report the problem to the compiler vendor. If the vendor agrees that it is, indeed, a bug in the compiler, they are likely to ask to add your program to their test suite, so subsequent versions of the compiler test for this problem.
>How can a simple programmer detect a bug in a compiler ? is there some >well known verification techniques ?
There's a brillant paper by Eric Eide and John Regehr from the University of Utah where they use a clever variant of fuzz testing to find bugs in the way compilers implement C's volatile storage qualifier. They use a code generator to create random programs, and they instrument their execution to find out their memory access patterns. They detect compiler bugs by compiling the programs with different optimisation levels and seeing when the access patterns differ.
>> How can a simple programmer detect a bug in a compiler ? is there some >> well known verification techniques ?
> If there is, I don't know it. Compiler writers generally test their > compilers quite hard - there are various test suites of source code > available, and all serious compiler writing organisations will develop > test sets of their own and expand them as they fix bugs - but there is > no simple and all-embracing method.
The part of the test suites (eg, Perennial) that compiler writers uses is often only the part that checks the syntax and semantics. While code generation stress testers might be included they are used less frequently.
j...@cix.compulink.co.uk wrote: > SidTou...@inria.fr (Sid Touati) wrote: >> How can a simple programmer detect a bug in a compiler ? is there some >> well known verification techniques ? > If there is, I don't know it.
Best is to find the smallest program that demonstrates the bug, and that you can verify satisfies the appropriate language standard.
> Compiler writers generally test their compilers quite hard - there > are various test suites of source code available, and all serious > compiler writing organisations will develop test sets of their own > and expand them as they fix bugs - but there is no simple and > all-embracing method.
I can only think of a few that I have seen over the years, that I could reduce down and demonstrate with a small program. One was a C compiler that miscompiled x++ in the case where x was (double). While legal, presumably it didn't come up in the testing that was done. (If I remember it right, it compiled as ++x.) I have somewhat of a tendency to try things that I know are supposed to work, but are less obvious than others.
On Tue, 28 Apr 2009 12:57:33 -0700, Quinn Tyler Jackson wrote: > Jeremy J Starcher said:
>> That said, when I come across a problem, I always assume the bug is in >> my code until I can provide a test case that clearly demonstrates that >> the compiler itself is at fault.
> I can count on one hand the number of genuine compiler bugs I've been > able to isolate and reproduce well enough to report as official bugs. > (Even the so-called optimization bugs turned out to be my violation of > some contract with the compiler, such as aliasing when I oughtn't have.)
For large "well-known" compilers, that is true for me as well.
Back when I was using ZBASIC (A cross-platform compiled basic) I had the development team on speed-dial and they ended up using our test suite before sending us anything.
> That said, I couldn't count in all the bits of a gigabyte how many times > I've heard programmers (including myself) say: "The *($@&&^ compiler is > buggy!"
Around here I usually hear "The *($@&&^ library is junk. I have to write my own!"
Quinn Tyler Jackson wrote: > I can count on one hand the number of genuine compiler bugs I've been able > to isolate and reproduce well enough to report as official bugs. (Even the > so-called optimization bugs turned out to be my violation of some contract > with the compiler, such as aliasing when I oughtn't have.)
Good test software is incredibly hard to write often taking more resources than the compiler function it was meant to test.
It is difficult in many cases to create a simple example of a compiler bug due to the kind of optimization that is found in many compilers. It is common that code generation for syntactically identical sequences to generate substantially different code due to some overall application consideration. Two simple examples are spilled registers and even simple instruction addressing modes. (This alone should be enough to do testing at the optimization level of code for the final product)
Compiler bugs are a function of the degree of optimization that is in the compiler. As soon as alternative code generation sequences are added testing becomes much more complex and the probability of a customer encountering a compiler bug is higher.
We use both random testing and syntax directed testing with our compilers because the payoff to effort is high. Both of these techniques are a single tool in a very large toolbox.
Eric Eide and John Regehr are to be commended for their work on Volatiles. The success of their work was careful planning on the design of the random tests and some very good analysis work. Their paper illustrates just how difficult a simple language feature is to both implement and validate.
Application specific testing tends to have a high payoff. Using reference designs and customer application code as part of the regression tests while tracking compiler generated metrics will tend to uncover induced bugs as the compiler evolves and will focus effort to the compiler area's that customers actually will be using.
>> How can a simple programmer detect a bug in a compiler ? is there some >> well known verification techniques ? ... > So, it is not unlikely that there are errors in your compiler. The > unlikely part is that you will be the first to notice the errors.
I would like to amplify this point. At some level, compilers aren't that different from other programs written by other programmers. So, think of the kind of bugs you experience.
1) most bugs occur in new code
sometimes the new code stresses old code, but there is an exponential decay in the numbers of bugs founds per block of code over time and you can compute things like half-lives to guess at the number of unfound bugs. (Most compiler code is also "old", so the number of remaining bugs is diminishing.)
2) most bugs actually cause spectacular failures (or are off- by-ones); very few bugs give you subtly wrong output
How often do you write code and msake a small error (say multiplying by 7 when you mean 8, or xor'ing when you mean to and) that affects everything, but only by a little and is subtle? It is more likely, you write errors where you sometimes go 1 place to far or short in a loop or you forget to initialize something, or you depend on something else about some other code which just isn't true. Compiler writers make the same kinds of mistakes.
Thus, if you are to find a bug in the compiler, the most likely effect is that the compiler will crash. If it doesn't crash, the code for the common cases (+, -, *, araays, and so forth) is probably correct. Even the code for most uncommon cases have probably been explored by someone other than you, unless you are using a new feature of a compiler.
2) many of the subtle bugs come from using grey-areas, where the semantics themselves are subtle and one gets them wrong.
For example, I've just been fixing some code of mine where I was filling up 64 bit structures (on a 32 bit machine) using bit shifts--the bugs weren't in the compiler, the bugs were in my using the wrong assumptions about the promotion rules. People who mention aliasing problems are talking about a similar effect. Now, these bugs would have been in this compiler I'm writing, except that they showed up in my unit testing, and I ensured that I went over all of the relevant places and fixed them. Is it possible that I left a hole, yes, but unlikely, and much more unlikely by the time that even my initial customers see this compiler.
And, this brings up another point, most of the code I've worked on in most compilers is pretty vanilla. Sure compiler writer's are geeks and write some code that is too-clever, but a lot of compiler code just uses arrays and pointer based data structures (e.g. trees, dags) and integer and boolean arithmetic. Thus, most the errors tend to be vanilla also. We have pretty good techniques for catching and preventing such errors. Most of the compiler code I write checks and asserts each pointer access (or array bound). This tends to increase the ratio of spectacular to hidden failures, but helps me catch more bugs earlier.
And, yes, this list ended without a specific point.
Hope this helps, -Chris
*************************************************************************** ** Chris Clark Internet: christopher.f.cl...@compiler-resources.com Compiler Resources, Inc. or: comp...@world.std.com 23 Bailey Rd Web Site: http://world.std.com/~compres Berlin, MA 01503 voice: (508) 435-5016 USA fax: (978) 838-0263 (24 hours) --------------------------------------------------------------------------- --
>Sid Touati <SidTou...@inria.fr> writes: >> On the other hand, when a compilation flag is on and the program >> crashes, we usually think that the bug comes from the compiler not from >> the program...
>Even if your program changes behaviour when you turn on optimisation >flags, this may not imply faults in the compiler, just that the >language has a weak specification that leaves, say, the order of >evaluating function arguments up to the compiler. The non-optimising >compiler may just evaluate in order, while an optimising compiler may >rearrange the order for better scheduling or such. If some of the >arguments have side effects, this can change program behaviour.
An optimizer that breaks a program is a bad idea. There are apologists (of program-breaking optimizers) that claim that the program was already broken without the optimizer, because it does not conform to some language standard. But actually the program does conform with the language as it is implemented by the compiler without optimization and it behaves as intended by the programmer, so it is correct.
That the program does not conform to some other language specification should be irrelevant to the optimizer, because the optimizer should preserve the behaviour of the non-optimizing compiler. Ok, in some cases this costs a little (e.g., to preserve the order of potential side effects in the example above), and in some cases it's not practically possible and would buy very little (e.g., preserving the exact values of return addresses so an out-of-bounds array read may get the same value), but in general that's how optimizer writers should approach the issue.
Otherwise we will soon be back to the bad old days when everybody just considered turning on the optimizer too risky because it tends to break programs; and the rhetoric of the apologists won't help, because at least for languages like C most (all?) substantial programs don't conform to the language standard, and there is no cost-effective way to turn such programs into standard-conforming programs. So if the optimizer feels free to break such programs, programmers just will not use the optimizer.
If an optimizer writer also wants to excel on benchmarks by weakening the semantic guarantees to the lowest common denominator (i.e., the language standard plus the stuff used in the benchmarks), they can add special flags for that, say, -Ospec.
On Apr 27, 8:42 am, Sid Touati <SidTou...@inria.fr> wrote:
> How can a simple programmer detect a bug in a compiler ? is there some > well known verification techniques ?
> I am afraid that many bugs are hidden inside compiler activated with > sophisticated optimisation options, and few people can detect them: when > a user program does not work, we usually think that the problem comes > from the program itself, not from the compiler nor from the processor.
> On the other hand, when a compilation flag is on and the program > crashes, we usually think that the bug comes from the compiler not from > the program...
This is an interesting question. Invariably compiler bugs are discovered at the end of a long _program_ debugging process, when all the possibilities for bad code have been eliminated.
In my experience, assuming code improvement passes are inherently buggy is a bit dangerous. One of the 3 or 4 true compiler bugs I've seen in too many years of code hacking occurred only with all code improvement options _turned off_.
One moral is that when developing code, unless your debugging and production build options are exactly alike, you should maintain and test with both continuously. Don't test with the debugging build, then think that you'll get away with making a final production build and shipping. That way lay dragons.
> An optimizer that breaks a program is a bad idea. There are > apologists (of program-breaking optimizers) that claim that the > program was already broken without the optimizer, because it does not > conform to some language standard. But actually the program does > conform with the language as it is implemented by the compiler without > optimization and it behaves as intended by the programmer, so it is > correct.
Can you please provide us with the list of optimizations that will not affect the behavior of any program that uses language features that are defined by the language standard to have undefined behavior?
Best, Christopher [I read an interesting article about the IBM Fortran X compiler, an improved version of the classic Fortran H compiler. The guy who did it made sure that the new optimized results were always bit identical to what the old compiler produced. But I don't know whether he extended that to programs that weren't valid Fortran, e.g., with aliased arrays. -John]
"Christopher Glaeser" <c...@nullstone.com> writes: >> An optimizer that breaks a program is a bad idea. There are >> apologists (of program-breaking optimizers) that claim that the >> program was already broken without the optimizer, because it does not >> conform to some language standard. But actually the program does >> conform with the language as it is implemented by the compiler without >> optimization and it behaves as intended by the programmer, so it is >> correct.
>Can you please provide us with the list of optimizations that will not >affect the behavior of any program that uses language features that are >defined by the language standard to have undefined behavior?
*The* list, for all languages? Please contact me for my consultancy rates:-). Actually, you could make identifying such program-breaking behaviour an additional service of your business.
Anyway, to give some examples: many forms of constant folding, copy propagation and common subexpression elimination; but typically a particular optimization can be performed in a behaviour-preserving way or in a program-breaking way. E.g., evaluating a constant expression may result in an exception (e.g., consider "1/0"); proper constant folding has to preserve that visible behaviour (giving a warning about the expression may also be a good idea) instead of compiling just anything because the language standard does not cover this case.
> [I read an interesting article about the IBM Fortran X compiler, an improved > version of the classic Fortran H compiler. The guy who did it made sure that > the new optimized results were always bit identical to what the old compiler > produced. But I don't know whether he extended that to programs that weren't > valid Fortran, e.g., with aliased arrays. -John]
I don't know about Fortran X, but the successor to the original H was H extended, sometimes called HX. As well as I know it, H was free but HX was a "program product" (licensed and charged for).
One extension was the REAL*16 and COMPLEX*32, extended precision floating point data types and corresponding library functions.
One example from the "IBM Fortran IV H Extended Compiler Programmers Guide": SC28-6852-1_OS_FORTRAN_H_Pgmr_Jun72.pdf
DO 11 I=1,10 DO 12 J=1,10 9 IF(B(I).LT.0) GO TO 11 12 C(J)=SQRT(B(I)) 11 CONTINUE
that the compiler (presumably both H and HX) moves the SQRT out of the J loop, but not the IF statement. That is legal Fortran that fails, and that IBM knows fails. (As they suggest, the programmer should move the IF out of the loop, too.)
-- glen [Yup, HX, same program. I gather HX has the same broken optimizations that H did. In its defense, Fortran H did have to invent a lot of the analysis techniques it used, and run on a 128K (for the young folks, yes, that's a K) machine. -John]
glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > (John wrote) > [Yup, HX, same program. I gather HX has the same broken optimizations > that H did. In its defense, Fortran H did have to invent a lot of the > analysis techniques it used, and run on a 128K (for the young folks, yes, > that's a K) machine. -John]
If it generates bit compatible output, it would have to have the same broken optimizations.
I believe, though, that would be 256K, though people I knew gave it 300K.
IBM uses a binary system similar to roman numerals, where a lower value to the right is added, and to the left is subtracted.
F is 64K, G is 128K, H is 256K, such that a 192K machine would be indicated by GF and a 7MB machine by JM. The numbering doesn't work quite as well for programs, but PL/I (F) was designed to run in 44K on a 64K machine, leaving 20K for the OS. (That might result in the symbol table residing on disk. At the end of every run, PL/I (F) would indicate the smallest region that would keep the symbol table in main memory.)
As virtual memory became more popular, one common thing to do was to remove the overlay structure from Fortran H. Load it into the linker without any OVERLAY control cards, and then write it out again.
-- glen [Darn those off by one bugs. You're right, it needed a 256K machine rather than 128K. And like pretty much every other OS program it ran a lot faster if you had enough storage to undo the origami and get rid of the overlays. -John]
> Anyway, to give some examples: many forms of constant folding, copy > propagation and common subexpression elimination; but typically a > particular optimization can be performed in a behaviour-preserving > way or in a program-breaking way.
Apparently I don't fully understand your postion. Your previously stated contraint on the optimizer, including and especially for programs with undefined behavior, was ...
> the optimizer should preserve the behaviour of the non-optimizing > compiler.
In a language like C, the behavior of programs that reference uninitialized variables can be affected by the slightest change to register assignments. Given that optimizations like copy propagation and common subexpression elimination can potentially change register assignments, and thus change the behavior of these programs, why would you include these optimizations in your list of permissible optimizations?
>> the optimizer should preserve the behaviour of the non-optimizing >> compiler.
>In a language like C, the behavior of programs that reference >uninitialized variables can be affected by the slightest change to >register assignments.
Not if the unoptimizing compiler initializes all variables to some known value.
>Given that optimizations like copy propagation and common >subexpression elimination can potentially change register >assignments, and thus change the behavior of these programs, why >would you include these optimizations in your list of permissible >optimizations?
As you can see, in this case this is easy to solve. It has the nice side effect of making even the unoptimized behaviour more stable, which programmers also like, because it makes, e.g., debugging easier.
While I hate disagreeing with you Anton, what you suggest here isn't necessarily practical, at least not in all cases.
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes: > "Christopher Glaeser" <c...@nullstone.com> writes: > [Anton Ertl:] >>> the optimizer should preserve the behaviour of the non-optimizing >>> compiler.
>>In a language like C, the behavior of programs that reference >>uninitialized variables can be affected by the slightest change to >>register assignments.
> Not if the unoptimizing compiler initializes all variables to some > known value.
The problem with this solution is that there is at least 1 well-known algorithmic trick that doesn't work in this case. It is a way to initialize a large unbounded array efficiently, when you only use a portion of the array in your algorithhm. However, it depends upon being able to not actually initialize the array for the unused sections. You find a way to map the array such that you can track the "highest" element used and when accessing the array, you compare the element you are about to use with the highest element initialized so far, and if it is beyond the bound, you initialize enough elements to cover the element you are accessing and update your tracker.
I don't know of any optimizers that will insert that kind of trick in ones code. I don't know of optimizers that will replace bubble-sort with quick-sort either.
So, if you have an algorithm that needs a trick like the above to meet its performance promises, then the unoptimizing compiler breaks your algorithm (and we don't have a well-known optimization technique that fixes that breakage).
Now, your point about stability is well taken. Most programmers would prefer that they can be blindly cavalier about memory usage and uninitialized values would simply always be zero. It makes a lot of off-by-one errors benign. However, that freedom comes at a price, one that is often unacceptable. Yes, you can zero stack frames on every procedure entry, but if someone starts putting large auotmatic arrays on the stack, the performance can drop steeply.
I don't know of (nor believe there even is) a general solution to the problem. That's why we have both C and SML and Lisp and ..., which are all very different solutions to the problem, with different tradeoffs.
Yes, one can design a language that is safe, but every attempt to do so, has restricted the language such that certain short-cuts that may be generally unsafe, but can be safe in certain instances, are prohibited. The question then is whether those short-cuts are required in ones application.
-Chris
*************************************************************************** *** Chris Clark Internet: christopher.f.cl...@compiler-resources.com Compiler Resources, Inc. or: comp...@world.std.com 23 Bailey Rd Web Site: http://world.std.com/~compres Berlin, MA 01503 voice: (508) 435-5016 USA fax: (978) 838-0263 (24 hours)