Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Language standards vs. implementation, was Re: A right alternative to IEEE-754's format

50 views
Skip to first unread message

Quadibloc

unread,
Apr 10, 2018, 11:02:46 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

It is legitimate to say that if a language specification states,
explicitly, that if you do such-and-so, what might happen is not
specified (undefined) or may vary from one machine and compiler to
another (implementation-dependent), then, as this is a thing that cannot
be relied upon, it may be dispensed with if that helps the compiler
generate faster code.

But it is also the case that many C programmers have, for a long time,
been using compilers which did not have advanced optimization
technology, so that certain now undefined behaviors could be relied upon.

Not to mention that the language specification has been updated over the years.

Having to use a special compiler option to make old programs work...
is better than nothing, but I can understand why it is felt to be not good enough.

On the other hand, changing the language spec so that all the old
tricks still work - at the cost of reducing the available level of
optimization for programs that don't use any tricks - well, I can see why that's rejected too.

As I've pointed out before, it seems like both sides could only be made
happy if each side had its own language to define.

But that has its own problem; one side, at least, will lose the
benefits of its language being C, the universal language!

Well, here's an idea. It's unlikely to be adopted, because it will be
viewed as a victory for your side.

Define C as offering, explicitly in its definition, all those behaviors
that programmers of old relied upon. Define subset C as a well-behaved
language without that stuff, that is as easy to optimize as Pascal or FORTRAN.

When you turn up the optimization switch on a compiler, that is not to
change which language it compiles. If you have a program in Subset C, you
have to use a different compiler switch to say so - and then the
various optimization levels will likely run faster.

There. Everyone happy.

Nick Maclaren

unread,
Apr 10, 2018, 11:04:04 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

In article <5b7f2483-dd96-451d...@googlegroups.com>,
Quadibloc <jsa...@ecn.ab.ca> wrote:
>
>> >- Direct compiling to machine code and not using intermediate assembler
>> >to get away from the two copy problem with code generation ISA restrictions.
>
>> Well, er, yes, in theory. But suitable intermediate non-text languages
>> (assembler is, I agree, outdated) are a vast simplification of compilers
>> that are designed for multiple source languages and multiple target
>> machines. gcc is one such.
>
>Also, that's hardly a tactic that postdates Aho, Hopcroft, and Ullman.

Most definitely. Both approaches were old hat LONG before.

>Fortran G
>may have compiled to a P-code like form, being written by an external company
>that made compilers to order for whatever architecture - but Fortran H went
>directly to 360 machine code.

And therein hangs a tale. Fortran G was a fairly good compiler, and
actually generated BETTER code than Fortran H did, in many important
respects. More relevantly, attempting to fix those in Fortran H, X
and Q, and even VS Fortran, was impossible because of its design.


Regards,
Nick Maclaren.

Walter Banks

unread,
Apr 10, 2018, 11:04:46 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]



On 2018-04-03 6:53 AM, already...@yahoo.com.dmarc.email wrote:
> On Tuesday, April 3, 2018 at 12:45:21 PM UTC+3, David Brown wrote:
>> On 02/04/18 23:28, already...@yahoo.com.dmarc.email wrote:
>>
>> And Greenhills, and Keil, and ImageCraft, and Tasking, and
>> Bytecraft, and HiTech, and SDCC, and many others.
>>
>> Certainly it is a much harder environment for competition than it
>> used to be - toolchain vendors can't compete on price alone.
>
> Keil is a part of behemoth (ARM) since long ago. I didn't encounter
> Green Hills compiler in wild for a long time. Not sure they are still
> actively developed. Tasking - had seen their RTOS used, never
> compiler. For others, I don't remember ever seeing them used. May be,
> it's just me.
>
There are some good reasons for many of the non GCC compilers in
embedded systems. GCC doesn't handle very well some of the ISA's that
are used in many of the embedded systems applications.

In my experience there are a lot of ISA's designed for machine generated
code that don't map very well in the GCC (some do as well) A processor
that I have been working on for example it is essentially impossible to
write an assembler for.

GCC tools are for the most part using old compiler technology. Some of
is decades old.

There is a lot of work going on in area's that just are ineffective in
with GCC tools that are easier to deal with user other code generation
tools.

In my case a massively parallel processors, AI ISA's, ISA's for machine
generated code and various event driven processors used in automotive,
general aviation and instrumentation.

Walter Banks
Byte Craft Limited

Nick Maclaren

unread,
Apr 10, 2018, 11:05:36 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

In article <d315af03-75ee-4b7f...@googlegroups.com>,
MitchAlsup <Mitch...@netscape.net> wrote:
>On Monday, April 9, 2018 at 6:30:35 AM UTC-5, Walter Banks wrote:
>>
>> GCC tools are for the most part using old compiler technology. Some of
>> is decades old.
>
>But has there been any real advances since Oho and Ullman came out?

Yes. But that doesn't mean that it has superseded all of the older
approaches. Unless there is a problem with the technologies, there
is no reason to condemn their use.


Regards,
Nick Maclaren.

Walter Banks

unread,
Apr 10, 2018, 11:05:51 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

On 2018-04-09 7:48 AM, David Brown wrote:
> On 09/04/18 13:30, Walter Banks wrote:

>>
>> GCC tools are for the most part using old compiler technology.
>> Some of is decades old.
>
> You are fond of saying that, but I don't remember hearing any
> details or examples.
>

- Strategy passes to determine how an applications should be compiled
this time.

- Direct compiling to machine code and not using intermediate assembler
to get away from the two copy problem with code generation ISA restrictions.

- Whole application building. Why is linking still being done when its
purpose was to get around computer limitations?

w..

Walter Banks

unread,
Apr 10, 2018, 11:06:14 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

On 2018-04-09 10:29 AM, MitchAlsup wrote:
> On Monday, April 9, 2018 at 6:30:35 AM UTC-5, Walter Banks wrote:
>> In my experience there are a lot of ISA's designed for machine
>> generated code that don't map very well in the GCC (some do as
>> well) A processor that I have been working on for example it is
>> essentially impossible to write an assembler for.
>
> To write an assembler, or to write IN assembler.
>
> I worked on a machine that it was even hard to read assembler.
> Multiple instructions were grouped together and executed as if
> atomically. Then all of the resource requirements were hoisted to a
> header of the group so the HW scheduler could make the execute/wait
> decision more easily.

I have seen both as issues in ISA's. The eTPU co-processor has many code
generation restrictions that need to be part of the tool set. What I did
in that one was actually generate the intermediate code format in the
compiler to take advantage of the compiler generated code error checking
and instruction building operations for three running threads in the
ISA. This instruction set is now close to 20 years old. I Only know of
two applications using assembler in the eTPU out of several thousand
written in C.

The processor I am thinking that is close to be release essentially has
no clean way to describe the ISA operations that are meaningful to
describe application code operations. In combination with that are
generation timing and code organization requirements. The ISA is
organized around things that computers do well and hand coding not so much.

>>
>> GCC tools are for the most part using old compiler technology. Some
>> of is decades old.
>
> But has there been any real advances since Oho and Ullman came out?

The short answer is yes. Most are around getting away from limits on
compiler host processors, having a bigger picture of the application
code and how to handle more details on the ISA in the compiler. The
basic parsing and language processing is the same. As an example,
doing data and control flow analysis earlier rather than later in
the compiling process makes a big difference in generated code.

w..

Nick Maclaren

unread,
Apr 10, 2018, 11:06:27 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

In article <pag2qg$lq4$1...@gioia.aioe.org>,
Walter Banks <wal...@bytecraft.com> wrote:
>On 2018-04-09 7:48 AM, David Brown wrote:
>
>>> GCC tools are for the most part using old compiler technology.
>>> Some of is decades old.
>>
>> You are fond of saying that, but I don't remember hearing any
>> details or examples.
>
>- Strategy passes to determine how an applications should be compiled
>this time.

Yes. Yuck. It's a nightmare for debugging, and makes it damn-near
impossible to tune code that is going to be run by someone else.

>- Direct compiling to machine code and not using intermediate assembler
>to get away from the two copy problem with code generation ISA restrictions.

Well, er, yes, in theory. But suitable intermediate non-text languages
(assembler is, I agree, outdated) are a vast simplification of compilers
that are designed for multiple source languages and multiple target
machines. gcc is one such.

>- Whole application building. Why is linking still being done when its
>purpose was to get around computer limitations?

No, it wasn't. That was ONE purpose. A far more important one was to
allow and support separate compilation, as needed when an application
uses a library built by someone else. And how many large and serious
programs DON'T do that?


Regards,
Nick Maclaren.

Quadibloc

unread,
Apr 10, 2018, 11:06:59 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

On Monday, April 9, 2018 at 10:45:15 AM UTC-6, Nick Maclaren wrote:
> In article <pag2qg$lq4$1...@gioia.aioe.org>,
> Walter Banks <wal...@bytecraft.com> wrote:

> >- Direct compiling to machine code and not using intermediate assembler
> >to get away from the two copy problem with code generation ISA restrictions.

> Well, er, yes, in theory. But suitable intermediate non-text languages
> (assembler is, I agree, outdated) are a vast simplification of compilers
> that are designed for multiple source languages and multiple target
> machines. gcc is one such.

Also, that's hardly a tactic that postdates Aho, Hopcroft, and Ullman. Fortran G
may have compiled to a P-code like form, being written by an external company
that made compilers to order for whatever architecture - but Fortran H went
directly to 360 machine code.

John Savard

Walter Banks

unread,
Apr 10, 2018, 11:07:46 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

On 2018-04-01 11:52 AM, Tim Rentsch wrote:
> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>
>> A responsible software maintainer does not change behaviour that
>> users make use of. See, e.g.,
>> <https://felipec.wordpress.com/2013/10/07/the-linux-way/>.
>
> Interesting article. Thank you for posting it.
>
>> Unfortunately, there's an epidemic of irresponsibility among C
>> compiler maintainers.
>
> I can't completely agree with this reaction. In some ways, sure, but
> for choices that are allowed because of undefined behavior the
> question is not so black-and-white. Some of the responsibility
> belongs to the ISO C standard (and the people who produce it).
> Unfortunately it's a difficult problem; I know there is interest in
> the ISO C group to find a middle ground, somewhere between
> unspecified behavior and undefined behavior, but it isn't easy to
> find that. For example, consider this reasonable-sounding rule: no
> library interface should ever result in undefined behavior, not
> counting things like bad pointer inputs (and null pointers should
> never be in the set of bad inputs). But what about printf()? In
> printf() we have an interface with large parts of the input domain
> that give undefined behavior. POSIX takes advantage of this to
> define the behavior of positional format specifications, which are
> quite useful in some contexts. But, and here is the important part,
> formats other than those allowed in the POSIX spec /are still
> undefined behavior/. Moreover that freedom is important, to allow
> further extensions to be added at some later date.
>
> I should add that I am mostly on your side. I think what compiler
> writers are doing with so-called "aggressive optimization" belongs
> more to the problem set than the solution set. But solving the
> problem has to include getting changes made to the ISO C standard, so
> that compiler writers have no choice if they want their stuff to be
> conforming. I know doing that is not an easy task; ultimately though
> it seems unavoidable if we are to get things to improve.
>

As someone who has done a significant amount of compiler development
there really never enough testing. Once past sanity tests and detailed
test suites a lot can be gained just by running regression tests even if
the tests are not executed. Detailed metrics alone can be a very
revealing indication of significant compiler problems.

This can be especially true while developing optimization a surprising
number of new optimizations do not have the intended effect on old
functioning programs.

w..

Tim Rentsch

unread,
Apr 10, 2018, 11:08:29 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

Walter Banks <wal...@bytecraft.com> writes:

> As someone who has done a significant amount of compiler development
> there really never enough testing. Once past sanity tests and
> detailed test suites a lot can be gained just by running regression
> tests even if the tests are not executed. Detailed metrics alone
> can be a very revealing indication of significant compiler problems.
>
> This can be especially true while developing optimization a
> surprising number of new optimizations do not have the intended
> effect on old functioning programs.

I think what I meant didn't come across the way I meant it. I
agree with what (I think) you are saying: it's hard to test
compilers effectively, and it is much harder to test heavily
optimizing compilers effectively. In fact this raises an
interesting question, namely, what is a good methodology for
testing compilers in the presence of heavy optimization and
programs that may be "broken" in the sense of having undefined
behavior. That strikes me as a topic worth pursuing (along
with all the other topics on that list, which wasn't short to
begin with).

However all that is orthogonal to what I was trying to say in my
previous comments. My comments are not about verifying how a
compiler behaves but about validating the requirements for how
the compiler should behave. More specifically, in what cases do
we /want/ a compiler to take advantage, or be allowed to take
advantage, of the freedom that "undefined behavior" provides? I
am convinced that in some cases compilers take greater advantage
than is helpful, in the sense of total cost of ownership. Let me
emphasize here: not greater advantage than the Standard allows,
but greater advantage than is helpful when considering the whole
picture. I don't know what is the solution to this problem, but
I have no doubt that there is a problem, and one that needs to
be addressed at a high level rather than a patchwork quilt of
local fixes.

Did that make things any clearer? I hope it did.

Walter Banks

unread,
Apr 10, 2018, 11:08:59 AM4/10/18
to
[[ this string is copied from comp.arch because your moderation found it interesting ]]

I think we are all on the same page. Add in Terje's comments of
optimizations that work well with a dozen micro-tests and find when
testing larger amounts of code fail in many ways to help and often for
various reason make the application less effective in some way.

Most of my comments are related to embedded systems tools and
applications which partly changes the situation. Embedded systems are
compile once and run a lot of times as opposed to run a few times. This
makes it attractive to spend more cycles at compile time.

We do a lot of regression tests using real application code. In many
cases it isn't practical to execute this code to get execution data. It
is still worth building the code and looking at the compilation metrics.

What comes out of the metrics are clues when a specific optimization
should not be used. We in many cases still keep Terje's micro-tests and
add the cases that did not help that should not be used.

There are cases where two different optimizations each work well but
together are performance reducing. The general case for this type of
optimization failure is generically resource competition. It is not the
general case of the caching example but something more direct for things
like processors with multiple parallel functionally diverse alu's
competing for a specialized rare resource, for example a multiply and
accumulate.

We burn a lot of compiler cycles managing target resources (embedded
systems) We do whole application strategy passes very early in the
compiling process essentially doing two things application data
management and execution flow. Both of these help a lot to decide when
and where an optimization should happen when the application is being built.

Test new optimizations one addition at a time with regression testing
otherwise many surprises get hidden.

w..

Gene Wirchenko

unread,
Apr 11, 2018, 1:21:29 PM4/11/18
to
On Tue, 10 Apr 2018 11:07:44 -0400 (EDT), "Walter Banks"
<wal...@bytecraft.com> wrote:

[snip]

>This can be especially true while developing optimization a surprising
>number of new optimizations do not have the intended effect on old
>functioning programs.

I am a compiler non-expert. Could you give some non-trivial
examples (or point to some), please?

Sincerely,

Gene Wirchenko

Martin Ward

unread,
Apr 12, 2018, 11:16:02 AM4/12/18
to
As I understand it, a major cause is the 199 or so cases
of "undefined behaviour" in the C standard. People write programs
which rely on the compiler doing a particular thing,
then an optimisation is introduced which "exploits" the undefined
behaviour (usually to delete code or tests), and the program
stops working as expected

These posts give some examples:

https://blog.regehr.org/archives/213

https://blog.regehr.org/archives/759

Gcc may optimize out tests for buffer overflows
because of integer overflows:

https://lwn.net/Articles/278137/

Quote:

if (buffer + len >= buffer_end)
die_a_gory_death("len is out of range\n");

Here, the programmer is trying to ensure that len (which might come from
an untrusted source) fits within the range of buffer. There is a
problem, though, in that if len is very large, the addition could cause
an overflow, yielding a pointer value which is less than buffer. So a
more diligent programmer might check for that case by changing the code
to read:

if (buffer + len >= buffer_end || buffer + len < buffer)
loud_screaming_panic("len is out of range\n");

This code should catch all cases; ensuring that len is within range.
There is only one little problem: recent versions of GCC will optimize
out the second test (returning the if statement to the first form shown
above), making overflows possible again. So any code which relies upon
this kind of test may, in fact, become vulnerable to a buffer overflow
attack.

This behavior is allowed by the C standard, which states that, in a
correct program, pointer addition will not yield a pointer value outside
of the same object. So the compiler can assume that the test for
overflow is always false and may thus be eliminated from the expression.

[The desire for efficiency over mathematical analysis takes us
back to the other topic ("language design after Algol 60") :-)]

--
Martin

Dr Martin Ward | Email: mar...@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

bartc

unread,
Apr 12, 2018, 11:23:37 AM4/12/18
to
On 10/04/2018 16:05, Walter Banks wrote:
>> On 09/04/18 13:30, Walter Banks wrote:

>>> GCC tools are for the most part using old compiler technology.
>>> Some of is decades old.
>>
>> You are fond of saying that, but I don't remember hearing any
>> details or examples.
>>
>
> - Strategy passes to determine how an applications should be compiled
> this time.
>
> - Direct compiling to machine code and not using intermediate assembler


> - Whole application building. Why is linking still being done when its
> purpose was to get around computer limitations?

Whole project compiling? I have a whole project compiler for one
language, and a half-completed one for another. Both need to be very
fast because they have to compile a whole application from scratch each
time (aiming for 0.1 secs build time per typical application).

But both languages have the features necessary to make that possible.

C doesn't; separate compilation and linking might still be the simplest
model for it.

Compilation speed is compromised anyway by needing to re-process header
files multiple times. (Precompiled headers aren't a solution because you
still have to process that precompiled header file; it might just be
faster than working with source code.)

I'm not saying it's not practical with C, but the language makes it harder.

(By 'whole project' I mean all the source modules that are normally
processed to end up with a single executable or shared library file.
External binary libraries stay external.)

--
bartc

Martin Ward

unread,
Apr 13, 2018, 1:18:42 PM4/13/18
to
On 12/04/18 12:15, bartc wrote:
> I'm not saying it's not practical with C, but the language makes it harder.

I think you just summed up C in a single sentence! Congratulations :-)

Albert van der Horst

unread,
May 12, 2018, 1:07:05 PM5/12/18
to
In article <18-0...@comp.compilers>, Martin Ward <mar...@gkc.org.uk> wrote:

[ discussing undefined behavior in C ]

>Gcc may optimize out tests for buffer overflows
>because of integer overflows:
>
>https://lwn.net/Articles/278137/
>
>Quote:
>
> if (buffer + len >= buffer_end)
> die_a_gory_death("len is out of range\n");
>
>Here, the programmer is trying to ensure that len (which might come from
>an untrusted source) fits within the range of buffer. There is a
>problem, though, in that if len is very large, the addition could cause
>an overflow, yielding a pointer value which is less than buffer. So a
>more diligent programmer might check for that case by changing the code
>to read:
>
> if (buffer + len >= buffer_end || buffer + len < buffer)
> loud_screaming_panic("len is out of range\n");
>

The diligent programmer gets nervous as he sees "buffer[len]" in his
code and realises that that may lead to problems if len is out of
range.
So he adds code of the sort
&buffer[len] >= buffer_end &buffer[len] <buffer
He looks at this code and doesn't get nervous!
It looks more like a complete moron than a diligent programmer.

The reasonable solution is of course
if ( len < 0 || len > sizeof(buffer) )
panic("security breach: attempted out of buffer processing");

That makes perfect sense and will not be thrown out by any compiler.

>This code should catch all cases; ensuring that len is within range.
>There is only one little problem: recent versions of GCC will optimize
>out the second test (returning the if statement to the first form shown
>above), making overflows possible again. So any code which relies upon
>this kind of test may, in fact, become vulnerable to a buffer overflow
>attack.

There is an other problem, some one tries to break you program and
you try to execute the code without warning the autorities.

If GCC smokes out code like that, they have my blessing.

<SNIP>

--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
0 new messages