Would this FP work for everyone?!

Steve Richfie1d

unread,

May 31, 2004, 7:09:00 PM5/31/04

to

After LOTS of discussion on another thread, it occurred to me that the
following implementation would work for EVERYONE!

Suppose there were something like 4 bit-vectors of properties, one for
16-bit, one for 32-bit, one for 48-bit, and one for 64-bit
representation and arithmetic.

The bits in each of the vectors would be the various properties of the
floating point format of that precision, something like...

Hidden most significant bit/digit?
Length of guard digit (if any)?
Significance, Normalized, unnormalized, or logarithmic arithmetic?
Gradual significance reduction ala the 1959 A&M JACM paper?
Normal or stochastic rounding?
Single value or interval (divided) arithmetic?
Full NaNs, stripped NaNs(present IEEE), or nearest numerical value?
Number Base (This one's for your decimal arithmetic, Mike)?
Conditions mask regarding what to interrupt to software.
Simulate ALL operations (needed for hardware diagnostics)
implemented as just a permanently unassigned bit!
Whatever I forgot to put into this list - any suggestions?
Unassigned bits for lots of future expansion.

Mike, does your approach need anything not in this list. Isn't it
something like:

Exposed most significant digit.
No guard digit.
Unnormalized.
Normal rounding.
stripped NaNs.
Base 10.
Conditions mask = 0 (Same as IEEE-754).

Is this correct?

These would be assigned so that all zeros produces the present IEEE-754
operation - default on startup. Compatible implementations would REQUIRE
nothing more than maintaining the bit vector and interrupting on things
not handled by the hardware, if indeed the hardware is capable of
handling ANYTHING. The standard would (eventually) include C (ugh) coded
interrupt routines to perform any/all operations, so that anything that
wasn't implemented in hardware can simply interrupt to the
committee-supplied software.

IBM did somewhat the same thing in support of their 360-25 and 360-44
systems. The 360-25 was little more than a minicomputer programmed to
pretend that it was a 360, and the 360-44 was really just a 360-40 that
had a lot of instructions stripped out to make room for a better
floating-point implementation. Both of these computers made
"unimplemented instruction" traps into the operating system when they
encountered an unimplemented instruction, whereupon kernel code
simulated the missing instructions. I'm guessing that this code
descended from the original 360 simulators they used before there was
any real 360 hardware. If the kernel code didn't recognize the
instruction, it was passed back to the applications program in the hopes
that the program knew how to handle the instruction.

Here, the instruction set is fixed, but in fact the appropriate bit
vector effectively becomes part of the instruction. Until someone
suggests a better approach, I suggest separate interrupt entry points
only for every type of operation. These routines would then test the
bits in the vector to guide their operation, or decide that the selected
operation exceeds their programming and interrupt to a secondary vector
of routines that hopefully the user has supplied, much like the 360 did
with unimplemented instructions that its kernel code didn't recognize.
Any bits set in the "unassigned" list would cause such a branch to user
code, even if the other bits are serviceable.

Other than the "don't change anything" footdraggers, does anyone see a
problem with this sort of an approach, where we incorporate selected
modes rather than complete take-it-or-leave-it canned solutions, e.g.
IEEE-754? Further, this allows for a REALLY robust standard without
forcing anyone to build any particular thing in hardware.

Any objections? Any suggestions?

Steve Richfield

Dik T. Winter

unread,

May 31, 2004, 8:22:18 PM5/31/04

to

In article <601848b7c6a1127d...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> The standard would (eventually) include C (ugh) coded
> interrupt routines to perform any/all operations, so that anything that
> wasn't implemented in hardware can simply interrupt to the
> committee-supplied software.

Forget this. The committee will *never* supply software implementing the
standard (or whatever standard). That is a huge effort and will cost
lots of money.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Steve Richfie1d

unread,

May 31, 2004, 9:41:15 PM5/31/04

to

Dik,

> In article <601848b7c6a1127d...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> > The standard would (eventually) include C (ugh) coded
> > interrupt routines to perform any/all operations, so that anything that
> > wasn't implemented in hardware can simply interrupt to the
> > committee-supplied software.
>
> Forget this. The committee will *never* supply software implementing the
> standard (or whatever standard). That is a huge effort and will cost
> lots of money.

Maybe its time to pass the hat to Intel, AMD, etc., who would be the
obvious beneficiaries.

This also sounds like a good candidate for open source, where everyone
contributes the part that they really care about.

How about the rest of the plan. Did it look OK to you?

Steve Richfield

Raymond Toy

unread,

Jun 1, 2004, 12:03:45 PM6/1/04

to

>>>>> "Dik" == Dik T Winter <Dik.W...@cwi.nl> writes:

Dik> In article <601848b7c6a1127d...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
>> The standard would (eventually) include C (ugh) coded
>> interrupt routines to perform any/all operations, so that anything that
>> wasn't implemented in hardware can simply interrupt to the
>> committee-supplied software.

Dik> Forget this. The committee will *never* supply software implementing the
Dik> standard (or whatever standard). That is a huge effort and will cost
Dik> lots of money.

Why not? Cellular standards for voice encoding very often now include
a reference implementation in C, in addition to a text description of
the algorithm.

Ray

Steve Richfie1d

unread,

Jun 1, 2004, 2:41:23 PM6/1/04

to

After some on and off forum comments, I hereby make some ammendments to
my proposal:

To support efficient multiple precision implementations:

1. Additional roundoff type: None.
2. Additional mode: Remainder. Throws the results away and returns the
numerical value of the bits discarded through denormalization and/or
rounding. The sign may be different than the result where rounding
upwards has taken place.

The conversion from one representation to another is a function of the
programming interface. There needs to be some facility to convert
anything to anything, with the perverse cases being handled in software.
This will be addressed once the programming interface that supports
multiple modes has been firmed up.

Some languages, like APL and the original BASIC, have matrix operations.
These GREATLY simplify many applications. Indeed, the first compilers
for many array processing computers are APL compilers. There should be
some support for this in the FP programming interface, especially since
it can simply be left to the simulation software if desired. This should
facilitate the inclusion of array processing capability into the many
languages that are now missing this.

To assure availability of a C-coded simulation to manufacturers, I
hereby offer copies of the source code of the to-be-coded copiously
commented interrupt routines along with perpetual updates and support
and rights to include binaries with their products for ~$20K each. After
the first shipment of this, I will offer binaries only for ~$100 each.

Order yours now to get a jump on your competition!

I believe that these prices are low enough to be affordable by the
people who need this software, yet high enough to tempt others to offer
it cheaper, thereby saving me from having it program and support it. Any
takers?

Steve Richfield
================

Dik T. Winter

unread,

Jun 1, 2004, 8:13:29 PM6/1/04

to

In article <sxdu0xv...@edgedsp4.rtp.ericsson.se> Raymond Toy <t...@rtp.ericsson.se> writes:
> Dik> Forget this. The committee will *never* supply software

> Dik> implementing the standard (or whatever standard). That is a
> Dik> huge effort and will cost lots of money.

>
> Why not? Cellular standards for voice encoding very often now include
> a reference implementation in C, in addition to a text description of
> the algorithm.

I have no idea how long such an algorithm is, but making a reference standard
in C of IEEE floating point implies a fairly complete multiple precision
package with quite a few quirks.

Steve Richfie1d

unread,

Jun 1, 2004, 9:05:18 PM6/1/04

to

Dik,

> > Dik> Forget this. The committee will *never* supply software
> > Dik> implementing the standard (or whatever standard). That is a
> > Dik> huge effort and will cost lots of money.
> >
> > Why not? Cellular standards for voice encoding very often now include
> > a reference implementation in C, in addition to a text description of
> > the algorithm.
>
> I have no idea how long such an algorithm is, but making a reference standard
> in C of IEEE floating point implies a fairly complete multiple precision
> package with quite a few quirks.

Remember, the reference implementation in C does NOT have to run fast.

Doesn't sound too tough, just lay the argument fields out as integers,
so the computations on them, and then wrap them back up as a FP result.

Would you write it for $20K? That is what is available as soon as the
first order comes in. I'll write it if myself if no one else wants the
project.

Any interest?

Steve Richfield

Dik T. Winter

unread,

Jun 1, 2004, 9:35:03 PM6/1/04

to

In article <309901eec51fbdc9...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> > I have no idea how long such an algorithm is, but making a reference
> > standard in C of IEEE floating point implies a fairly complete multiple
> > precision package with quite a few quirks.
>
> Remember, the reference implementation in C does NOT have to run fast.

I know, I know.

> Doesn't sound too tough, just lay the argument fields out as integers,
> so the computations on them, and then wrap them back up as a FP result.

Yes, I know how it should be done, there is even a paper out there (by
David Goldberg from Xerox Palo Alto Research Center), that can be found
(I think) at Sun's website, and also as an appendix in Hennesey and
Patterson. That will be a big help. But it is a lot of work.

> Would you write it for $20K? That is what is available as soon as the
> first order comes in. I'll write it if myself if no one else wants the
> project.
>
> Any interest?

No. I have a regular job.

Steve Richfie1d

unread,

Jun 2, 2004, 1:00:37 AM6/2/04

to

Dik,

> > Doesn't sound too tough, just lay the argument fields out as integers,

> > do the computations on them, and then wrap them back up as a FP result.

>
> Yes, I know how it should be done, there is even a paper out there (by
> David Goldberg from Xerox Palo Alto Research Center), that can be found
> (I think) at Sun's website, and also as an appendix in Hennesey and
> Patterson. That will be a big help. But it is a lot of work.

There are several versions of this same paper on SUN's web site. I'm not
sure whether your comments are version sensitive. That having been said...

I was for 2 years the in-house numerical analysis consultant to the
University of Washington Physics and Astronomy departments. To deal with
the continuous flow of good-looking published equations that didn't
quite work right on computers, I had to cut through the notation to get
down to what they were REALLY trying to do. Hence, I've learned to be
suspicious of long equations, preferring simple explanations instead.
Often, I must do my own translation from one to the other. I found that
carrying significance and dimensionality through analysis helped a lot,
as there are a LOT of problems with equations that aren't dimensionally
sound, e.g. E = M x C^2

However, Goldberg seems to go out of his way to obfuscate his arguments
with equations. Sure, an equation sometimes helps to prove, for example,
that something comes out EXACTLY right, but he seems to bury everything
in equations, something I've learned to distrust.

Has anyone written IEEE-754 in C yet? Something like that might be a
good place to start.

> No. I have a regular job.

Lucky you! Do you need any help?! This looks like a good forum to hire
from. Just post your ad, then look through the postings of those who
respond to see where their head is at.

Steve Richfield

Nick Maclaren

unread,

Jun 2, 2004, 10:23:26 AM6/2/04

to

In article <sxdu0xv...@edgedsp4.rtp.ericsson.se>,

Ever since Algol 68 proved that it was a bad idea, people have
kept reinventing the mistake of making reference implementations
the primary specification. This does not work.

The reason is that no programming language (and DEFINITELY not
C) is precise enough to specify the intent. People end up forever
arguing over whether the anomaly is an error in the reference
implementation, part of the specification or an error in the
language standard that it is using. And that is EVEN if the code
is written by one of the VERY rare people who is into serious
portability.

Furthermore, clarity is almost incompatible with robustness
(performance is actually less of an issue). A good reference
implementation will be 50%+ error detection and diagnosis, both
for external and internal errors, which usually includes a lot of
special cases that obscure the main path.

Reference implementations have their place as examples, but are
not a good idea as specifications. They might be, if there was a
suitable language to write them in - but C is about as far from
being such a language as it is possible to be.

Regards,
Nick Maclaren.

Raymond Toy

unread,

Jun 2, 2004, 12:12:44 PM6/2/04

to

>>>>> "Nick" == Nick Maclaren <nm...@cus.cam.ac.uk> writes:

Nick> In article <sxdu0xv...@edgedsp4.rtp.ericsson.se>,

Nick> |> Why not? Cellular standards for voice encoding very often now include
Nick> |> a reference implementation in C, in addition to a text description of
Nick> |> the algorithm.

Nick> Ever since Algol 68 proved that it was a bad idea, people have

How did Algol 68 prove that it was a bad idea?

Nick> kept reinventing the mistake of making reference implementations
Nick> the primary specification. This does not work.

I think the text description is the actual true reference. The C
reference is a description of how to do the arithmetic in
fixed-point. Invariably, if the C reference is given, a large set of
test vectors is also given. Which just proves that the test vectors
are self-consistent with the C code. :-(

And the C reference is just a reference. No DSP compiler today can
produce a fast version of the implementation; these are typically
hand-written in assembly, taking advantage of the architecture of the
DSP.

Ray

Raymond Toy

unread,

Jun 2, 2004, 12:15:05 PM6/2/04

to

>>>>> "Steve" == Steve Richfie1d <St...@NOSPAM.smart-life.net> writes:

Steve> Has anyone written IEEE-754 in C yet? Something like that might be a
Steve> good place to start.

Don't know about C, but there is (used to be?) a software
emulation of the Intel FPU in Linux.

Ray

Raymond Toy

unread,

Jun 2, 2004, 12:19:41 PM6/2/04

to

>>>>> "Dik" == Dik T Winter <Dik.W...@cwi.nl> writes:

Dik> In article <sxdu0xv...@edgedsp4.rtp.ericsson.se> Raymond Toy <t...@rtp.ericsson.se> writes:
Dik> Forget this. The committee will *never* supply software
Dik> implementing the standard (or whatever standard). That is a
Dik> huge effort and will cost lots of money.
>>
>> Why not? Cellular standards for voice encoding very often now include
>> a reference implementation in C, in addition to a text description of
>> the algorithm.

Dik> I have no idea how long such an algorithm is, but making a reference standard
Dik> in C of IEEE floating point implies a fairly complete multiple precision
Dik> package with quite a few quirks.

FWIW, the text specification of the algorithm is usually on the order
of 100 pages or more. The IEEE 754 spec is 20 some pages, right?

But this doesn't really say much about the relative complexity.

Ray

Nick Maclaren

unread,

Jun 2, 2004, 2:23:53 PM6/2/04

to

In article <sxdvfia...@edgedsp4.rtp.ericsson.se>,

Raymond Toy <t...@rtp.ericsson.se> wrote:
>
>How did Algol 68 prove that it was a bad idea?

Because there were bugs in the code, it didn't do what the authors
thought it did, and what it did was perverse. All possible in text,
but much easier to do in sample code - and Algol 68 is a MUCH better
software engineering language than C!

>I think the text description is the actual true reference. The C
>reference is a description of how to do the arithmetic in
>fixed-point. Invariably, if the C reference is given, a large set of
>test vectors is also given. Which just proves that the test vectors
>are self-consistent with the C code. :-(

Quite. That is the sane way to proceed.

>And the C reference is just a reference. No DSP compiler today can
>produce a fast version of the implementation; these are typically
>hand-written in assembly, taking advantage of the architecture of the
>DSP.

Never mind the performance - a reference implementation that gets it
wrong, or is more obscure than the text, is what I dislike.

Regards,
Nick Maclaren.

glen herrmannsfeldt

unread,

Jun 2, 2004, 2:28:50 PM6/2/04

to

Microsoft math libraries used to include three versions.
One for x87 only, one that would allow emulation through
interrupts, and one that did it all in software to reduced
precision (but faster than doing it exactly).

Linux/390 (for IBM System/390) includes emulation of binary
(IEEE) floating point as it was added partway though the ESA/390
series. I believe it is done in the usual IBM way, though
the interrupt routine for an illegal opcode.

-- glen

Nick Maclaren

unread,

Jun 2, 2004, 2:30:08 PM6/2/04

to

In article <sxdn03m...@edgedsp4.rtp.ericsson.se>,

Raymond Toy <t...@rtp.ericsson.se> wrote:
>
>FWIW, the text specification of the algorithm is usually on the order
>of 100 pages or more. The IEEE 754 spec is 20 some pages, right?

Less than a dozen, originally.

>But this doesn't really say much about the relative complexity.

The IEEE 754 specification isn't complex, just confusingly written.

There are at least five people posting to this group who would have
little difficulty in writing a reference implementation for IEEE 754.
At least two of us have done something very similar previously. If
performance is not the issue, it is not a lot of work.

It would be much harder to add the current decimal arithmetic and/or
interval arithmetic, but the exercise would improve the current draft
of the new standard.

This is all separate from whether it is a good idea.

Regards,
Nick Maclaren.

glen herrmannsfeldt

unread,

Jun 2, 2004, 2:35:04 PM6/2/04

to

Nick Maclaren wrote:

(snip)

> Ever since Algol 68 proved that it was a bad idea, people have
> kept reinventing the mistake of making reference implementations
> the primary specification. This does not work.

IBM S/360 was described in APL before APL was a programming
language. I suppose all the operators were defined as exact
as they needed to be.

> The reason is that no programming language (and DEFINITELY not
> C) is precise enough to specify the intent. People end up forever
> arguing over whether the anomaly is an error in the reference
> implementation, part of the specification or an error in the
> language standard that it is using. And that is EVEN if the code
> is written by one of the VERY rare people who is into serious
> portability.

If you fix the widths of the different data types, and don't
use floating point operations, I wouldn't think it would be
so bad. Probably define that it must use twos complement,
and also the result of division of negative numbers.

> Furthermore, clarity is almost incompatible with robustness
> (performance is actually less of an issue). A good reference
> implementation will be 50%+ error detection and diagnosis, both
> for external and internal errors, which usually includes a lot of
> special cases that obscure the main path.

Well, yes, but those errors are an important part of the
specification.

> Reference implementations have their place as examples, but are
> not a good idea as specifications. They might be, if there was a
> suitable language to write them in - but C is about as far from
> being such a language as it is possible to be.

I could think of worse ones...

-- glen

Nick Maclaren

unread,

Jun 2, 2004, 3:29:10 PM6/2/04

to

In article <s5pvc.832$uY.740@attbi_s53>,

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>> The reason is that no programming language (and DEFINITELY not
>> C) is precise enough to specify the intent. People end up forever
>> arguing over whether the anomaly is an error in the reference
>> implementation, part of the specification or an error in the
>> language standard that it is using. And that is EVEN if the code
>> is written by one of the VERY rare people who is into serious
>> portability.
>
>If you fix the widths of the different data types, and don't
>use floating point operations, I wouldn't think it would be
>so bad. Probably define that it must use twos complement,
>and also the result of division of negative numbers.

No, those issues are too easy. Anyone experienced in numerical
portability wouldn't have much trouble with them - though I agree
that we are rare birds nowadays!

I was thinking of the numerous ambiguities in the standard, and
places where it is impossible to avoid undefined behaviour. The
latter are PROBABLY avoidable for asuch code, but I should be VERY
chary of assuming it.

>> Reference implementations have their place as examples, but are
>> not a good idea as specifications. They might be, if there was a
>> suitable language to write them in - but C is about as far from
>> being such a language as it is possible to be.
>
>I could think of worse ones...

I can't. Seriously. I know of few languages where the gulf between
what the specification says, unambiguously, and what people think
that it says is anywhere near as wide.

Regards,
Nick Maclaren.

Jeff Kenton

unread,

Jun 2, 2004, 4:53:21 PM6/2/04

to

Nick Maclaren wrote:

>>>Reference implementations have their place as examples, but are
>>>not a good idea as specifications. They might be, if there was a
>>>suitable language to write them in - but C is about as far from
>>>being such a language as it is possible to be.
>>
>>I could think of worse ones...
>
>
> I can't. Seriously. I know of few languages where the gulf between
> what the specification says, unambiguously, and what people think
> that it says is anywhere near as wide.

Oh good, a subject for a new thread ;-)

I nominate XSLT /XPath. Among its many faults as a spec is a decimal data
type that has arithmetic that is implementation defined. How many other
languages do you know that have implementation defined arithmetic?

--

-------------------------------------------------------------------------
= Jeff Kenton Consulting and software development =
= http://home.comcast.net/~jeffrey.kenton =
-------------------------------------------------------------------------

Steve Richfie1d

unread,

Jun 2, 2004, 5:31:04 PM6/2/04

to

Nick

> The IEEE 754 specification isn't complex, just confusingly written.
>
> There are at least five people posting to this group who would have
> little difficulty in writing a reference implementation for IEEE 754.
> At least two of us have done something very similar previously. If
> performance is not the issue, it is not a lot of work.

Agreed. Of course we need a long list of optional additional tweaks,
many of which are reductions in functionality.

Also, I'll check a contact at Microsoft. Maybe I could get them to
donate their old 8087 emulation source code?!

> It would be much harder to add the current decimal arithmetic and/or
> interval arithmetic, but the exercise would improve the current draft
> of the new standard.

As far as I'm concerned, these get written by he first person who stands
up and says they want them. Given that the rest of the code is there to
see and they presumably know what they want, they should be able to
produce it.

>
> This is all separate from whether it is a good idea.

Yes. Nobody has yet addressed THAT issue. From what I've seen here, we
could as a group repair all of the repairable problems. Does that leave
something that is good? Do you want a piece of the credit/blame for
this? As I mentioned in my earlier posting, there could also be some
money in this.

It appeared to me that the minimum necessary over a bare-bones 754
implementation to implement full functionality was the ability to force
trapping by all floating-point instructions. Then, when a user says that
they want something besides the default 754, the traps get turned on and
everything gets handled in software until the user again restores the
mode to default 754.

Later when partial support is supplied in hardware, the hardware traps
whenever it is asked to do something that it wasn't wired to do.

Note that in DSP implementations, they may only have logarithmic
hardware, so they would have to switch to that before attempting any
computation to avoid trapping.

A special wrapper could be provided for current non-trapping hardware,
that would simply produce an error on any attempt to change the format.
This same approach could also work for existing DSPs.

Is anyone (else) interested in this presently unfunded project?

Steve Richfield

Steve Richfie1d

unread,

Jun 2, 2004, 5:39:16 PM6/2/04

to

Nick,

>>If you fix the widths of the different data types, and don't
>>use floating point operations, I wouldn't think it would be
>>so bad. Probably define that it must use twos complement,
>>and also the result of division of negative numbers.
>
>
> No, those issues are too easy. Anyone experienced in numerical
> portability wouldn't have much trouble with them - though I agree
> that we are rare birds nowadays!
>
> I was thinking of the numerous ambiguities in the standard, and
> places where it is impossible to avoid undefined behaviour. The
> latter are PROBABLY avoidable for asuch code, but I should be VERY
> chary of assuming it.

Remember, stochastic rounding is on the list of options. Do we specify a
particular pseudo-random sequence generator, or just allow the
implementer to pump in any bits they want? Any thoughts? I don't really
give a damn, as I think it only affects testing.

>>>Reference implementations have their place as examples, but are
>>>not a good idea as specifications. They might be, if there was a
>>>suitable language to write them in - but C is about as far from
>>>being such a language as it is possible to be.
>>
>>I could think of worse ones...

> I can't. Seriously. I know of few languages where the gulf between
> what the specification says, unambiguously, and what people think
> that it says is anywhere near as wide.

COBOL! Joking aside, the only thing that C has going for it is its very
large sucker base. However, that is a pretty big thing.

Steve Richfield

Nick Maclaren

unread,

Jun 2, 2004, 6:19:51 PM6/2/04

to

In article <57rvc.33141$pt3.11331@attbi_s03>,

Jeff Kenton <Jeffrey...@comcast.net> wrote:
>Nick Maclaren wrote:
>
>>>>Reference implementations have their place as examples, but are
>>>>not a good idea as specifications. They might be, if there was a
>>>>suitable language to write them in - but C is about as far from
>>>>being such a language as it is possible to be.
>>>
>>>I could think of worse ones...
>>
>> I can't. Seriously. I know of few languages where the gulf between
>> what the specification says, unambiguously, and what people think
>> that it says is anywhere near as wide.
>
>Oh good, a subject for a new thread ;-)
>
>I nominate XSLT /XPath. Among its many faults as a spec is a decimal data
>type that has arithmetic that is implementation defined. How many other
>languages do you know that have implementation defined arithmetic?

Most languages have undefined arithmetic! Seriously. Look at them.

Regards,
Nick Maclaren.

Steve Richfie1d

unread,

Jun 2, 2004, 6:29:01 PM6/2/04

to

Nick,

> Most languages have undefined arithmetic! Seriously. Look at them.

Not only that, even the LENGTHS are undefined beyond "short" or "long",
"single' or "double", etc.

Probably the biggest step backwards in computer architecture was the
move from 6/9 bit characters, 18 bit short integers, and 36 bit long
integers and FP, to the the present 8/16/32 scheme. There aren't enough
characters, the short integers are too short for many/most things, and
the FP is so starved for bits that we are pushed into hidden bits.

The vagueness of the languages made this transition possible - which I
think in retrospect was unfortunate! I long for the old 36-bit FP.

The safe thing to do in reference software is to stick to long forms and
only use the bits you need. The tiny loss of space won't make any
difference. Ditto for the slight loss in speed. Indeed, with the most
modern processors, forms shorter than the memory width actually run SLOWER.

Steve Richfield

glen herrmannsfeldt

unread,

Jun 2, 2004, 6:46:18 PM6/2/04

to

Nick Maclaren wrote:

> In article <s5pvc.832$uY.740@attbi_s53>,
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

(snip)
(someone wrote)

>>>Reference implementations have their place as examples, but are
>>>not a good idea as specifications. They might be, if there was a
>>>suitable language to write them in - but C is about as far from
>>>being such a language as it is possible to be.

>>I could think of worse ones...

> I can't. Seriously. I know of few languages where the gulf between
> what the specification says, unambiguously, and what people think
> that it says is anywhere near as wide.

Well, the ones I think of are BASIC, Fortran, PL/I, for
example. BASIC you really have no idea, though it is often
floating point only. Fortran has at least one INTEGER
type, sometimes more but the standard only requires one.
I believe the negative divide question is still there,
though I am not sure about that one. PL/I allows you
to very precisely specify what you want, but it doesn't
guarantee that is what you will get. Decimal is often
implemented with binary arithmetic, though you get to
specify it in decimal digits.

As far as said gulf, I would say it is PL/I because the
specifications of the results of differing mode, scale,
precision, etc., are so complex. Constants have the
mode, scale, and precision that they are written in,
not necessarily the one desired.

-- glen

Steve Richfie1d

unread,

Jun 2, 2004, 7:17:15 PM6/2/04

to

Glen,

If you can find some way to give it an environment to run in, APL is the
perfect language to specify hardware, because specifying hardware was
just what it was invented for. However, I suspect that there is no
smooth path to link APL modules into the kernel. Perhaps someone else
here knows something about his that I don't, like how to do it?

Steve Richfield
=================

Terje Mathisen

unread,

Jun 3, 2004, 2:29:52 AM6/3/04

to

Nick Maclaren wrote:
> The IEEE 754 specification isn't complex, just confusingly written.
>
> There are at least five people posting to this group who would have
> little difficulty in writing a reference implementation for IEEE 754.
> At least two of us have done something very similar previously. If
> performance is not the issue, it is not a lot of work.

I wrote a 128-bit version of the full IEEE instruction set, in 90%
inline asm + a little C, using integer operations, just so I could
verify the patch I wrote for the FDIV (and Arctan) bug in the original
Pentium cpu.

I skipped one part: I hard coded (inline code) rounding to always be
nearest, instead of having a central rounding function.

This took less than a week of afternoons.

Doing it in really portable C is tough if you also care about
performance. I.e. if you can depend on having t_int64/t_uint64 available
then it becomes _much_ easier, particularly if 64-bit ints are
guaranteed to have the same endian ordering as 64-bit fp values.

> It would be much harder to add the current decimal arithmetic and/or
> interval arithmetic, but the exercise would improve the current draft
> of the new standard.
>
> This is all separate from whether it is a good idea.

Indeed. :-)

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

glen herrmannsfeldt

unread,

Jun 3, 2004, 4:40:07 AM6/3/04

to

Terje Mathisen wrote:

(snip)

> I wrote a 128-bit version of the full IEEE instruction set, in 90%
> inline asm + a little C, using integer operations, just so I could
> verify the patch I wrote for the FDIV (and Arctan) bug in the original
> Pentium cpu.

(snip)

> Doing it in really portable C is tough if you also care about
> performance. I.e. if you can depend on having t_int64/t_uint64 available
> then it becomes _much_ easier, particularly if 64-bit ints are
> guaranteed to have the same endian ordering as 64-bit fp values.

You mean like D and G float for VAX?

-- glen

Steve Richfie1d

unread,

Jun 3, 2004, 10:08:06 AM6/3/04

to

There IS a way to simulate this proposal to enhance the FP methodology
on current UNmodified IEEE-754 hardware, though a small part of the
simulator would have to be written in assembly language and modified
from platform to platform.

When you call the library routine to change the operation away from the
default IEEE-754, the routine would save the bit-vector and start
interpretively executing the program, instruction by instruction, much
as debugger might do, all so that it can recognize FP instructions and
simulate traps from them. When sometime later a call to the library
routine restores IEEE-754 mode, the interpreter would be stopped and
normal operation restored.

While this may sound **REALLY** horrendous, I don't think that it is
nearly as bad as it might first appear, because...

An efficiently written interpreter like this would probably run at
something like ~10% speed. However, sloppy software simulated FP
operations would be lucky to run at ~1% speed. Hence, if ~10% of the
instructions are FP instructions, then this sort of interpretation would
slow execution down to ~half of the speed of a computer that can enable
trapping on FP instructions to software.

If these percentages are anywhere close, software simulated FP would
slow a program to ~10% speed, and doing this with an interpretive system
w/o traps would run at ~5% speed. Not pretty, but good enough for
concept testing and use by some critical applications.

Of course, if the percentage of FP instructions is higher than 10%, then
the overhead of this approach becomes less.

This should provide THE key for immediate implementation, as it works
with or without current IEEE_754 hardware.

Any thoughts? ideas? suggestions?

Steve Richfield
=====================

glen herrmannsfeldt

unread,

Jun 3, 2004, 3:56:14 PM6/3/04

to

Steve Richfie1d wrote:

> There IS a way to simulate this proposal to enhance the FP methodology
> on current UNmodified IEEE-754 hardware, though a small part of the
> simulator would have to be written in assembly language and modified
> from platform to platform.
>
> When you call the library routine to change the operation away from the
> default IEEE-754, the routine would save the bit-vector and start
> interpretively executing the program, instruction by instruction,

(snip)

Some processors have a flag that will interrupt on executing
any FP instruction. I believe the purpose is to avoid unnecessary
save/restore of the FP state, but it should be useful for this.
Only the FP instructions would then need to be interpretively
executed.

-- glen

Steve Richfie1d

unread,

Jun 3, 2004, 6:40:06 PM6/3/04

to

Glen,

> Some processors have a flag that will interrupt on executing
> any FP instruction. I believe the purpose is to avoid unnecessary
> save/restore of the FP state, but it should be useful for this.
> Only the FP instructions would then need to be interpretively
> executed.

YES, That WOULD seem to do it. Does anyone know which machines do and/or
do not have this flag?

This also raises other curious questions. Exactly how is this flag
currently being used? We'd have to track and simulate the existing use
of this flag, i.e. trap into the existing trap-handling code in the
cases where existing code might enable this flag. This could be
difficult/impossible from non-kernel code.

Also, is this flag even available from non-kernel code? It may be both
necessary and adequate to just add the trap code to the kernel.

Also, do the trap-manipulating instructions themselves trap?! If not, we
could be having our FP traps manipulated without our knowledge and
permission.

Also, unrelated traps out of interpreted code will NOT be subject to
interpretation, which should cause no problem. However, hardware
solutions, including the use of a flag that forces trapping all FP
instructions, must save/restore the FP mode control bits if the FP is
used at all in the interrupt routines. Hence, there is this "little"
compatibility issue when switching from software to later hardware
solutions. I don't think that this is a big problem - unless I've missed
something critical here. Have I?

As usual, such things raise the usual "porting" issues. I ported a UNIX
kernel from a Motorola single processor system to a dual processor
system a few years back, and was awash in such issues.

Steve Richfield

Terje Mathisen

unread,

Jun 4, 2004, 5:05:52 AM6/4/04

to

glen herrmannsfeldt wrote:

> Terje Mathisen wrote:
>> Doing it in really portable C is tough if you also care about
>> performance. I.e. if you can depend on having t_int64/t_uint64
>> available then it becomes _much_ easier, particularly if 64-bit ints
>> are guaranteed to have the same endian ordering as 64-bit fp values.
>
>
> You mean like D and G float for VAX?

I don't know particularly about those, but I have heard about some weird
("middle-endian" anyone?) orderings perpetrated by DEC.

Raymond Toy

unread,

Jun 4, 2004, 9:46:03 AM6/4/04

to

>>>>> "Steve" == Steve Richfie1d <St...@NOSPAM.smart-life.net> writes:

Steve> Glen,

>> Some processors have a flag that will interrupt on executing
>> any FP instruction. I believe the purpose is to avoid unnecessary
>> save/restore of the FP state, but it should be useful for this.
>> Only the FP instructions would then need to be interpretively
>> executed.

Steve> YES, That WOULD seem to do it. Does anyone know which machines do
Steve> and/or do not have this flag?

All x86 processors before the 486 obviously had such a flag because
the FP unit was a separate chip. I think the flag still exists on
later chips even though the FPU is integrated.

Earlier versions of RISC chips also had a similar flag for the same
reason. Sparc has a floating-point-enable bit, even for current
chips.

Don't know if any of these are accessible to user code.

Ray

Dale Morris

unread,

Jun 4, 2004, 11:34:48 AM6/4/04

to

"Steve Richfie1d" <St...@NOSPAM.smart-life.net> wrote in message
news:d9bf488351aceb4e...@news.teranews.com...

> Glen,
>
> > Some processors have a flag that will interrupt on executing
> > any FP instruction. I believe the purpose is to avoid unnecessary
> > save/restore of the FP state, but it should be useful for this.
> > Only the FP instructions would then need to be interpretively
> > executed.
>
> YES, That WOULD seem to do it. Does anyone know which machines do and/or
> do not have this flag?
>
> This also raises other curious questions. Exactly how is this flag
> currently being used?

Itanium has a set of flags (one for the lower 32 FRs and one for the upper
96 FRs) that cause any FP instruction referencing a particular register to
trap. The flags are only available to privileged code.

These sorts of flags are generally supported in modern architectures to
enable lazy state save and restore of the FRs. This shortens the dynamic
path through the context switch path in the OS for applications which don't
use floating-point (or which use it only little - Itanium has two flags
because the lower 32 FRs are used when we do integer multiplication and
division, which is done on the FPU).

- Dale Morris
Itanium processor architect
Hewlett Packard

Steve Richfie1d

unread,

Jun 4, 2004, 12:46:05 PM6/4/04

to

Dale,

> Itanium has a set of flags (one for the lower 32 FRs and one for the upper
> 96 FRs) that cause any FP instruction referencing a particular register to
> trap. The flags are only available to privileged code.
>
> These sorts of flags are generally supported in modern architectures to
> enable lazy state save and restore of the FRs. This shortens the dynamic
> path through the context switch path in the OS for applications which don't
> use floating-point (or which use it only little - Itanium has two flags
> because the lower 32 FRs are used when we do integer multiplication and
> division, which is done on the FPU).

As I understand what you are saying:

1. These traps are currently unused except to identify defective kernel
code, if any.

2. Users can't mess them up.

Unless I've missed some detail, this is **EXACTLY** what is needed to
emulate new FP features. Right? All that is needed is a new OS call to
identify the features that are wanted, and the OS then turns the traps
on to emulate it until another system call is issued to restore
operation to the default IEEE-754, whereupon the traps are turned back off.

Defective kernel code could still be recognized and handled, because
there would be an FP trap from protected code without the new
feature-manager having set the traps, if indeed it is EVER used in
protected code. There, control would simply be turned over to wherever
it now goes on an FP trap.

A few OS tweaks would obviously be needed, like saving and restoring the
FP trapping status instead of just presuming that FP traps were not
enabled, turning off FP traps when starting a new task, etc.

Is there some way of reading whether FP traps are enabled short of
performing suitable but time-consuming experiments?

Do you have any idea how this is handled in Pentiums?

BTW, any opinions about the general concept of a wide-open feature-bit
driven FP system that is backwards compatible with 754, uses whatever
hardware capability that happens to be present, and falls back to
software as needed?

This would seem (to me) to provide absolutely the best of both of our
respective positions, as I and others could then develop our
applications without stressing over whether there would be hardware to
support the features that our respective applications need (because
current hardware would still work), and Intel could just sit back and
see which capabilities come into common use before committing to
silicon. Whatever is left out of the silicon still works - it just slows
applications that use unimplemented features down to ~10% speed. Also,
note that some of the "features" are really just optionally disabling
some facet of IEEE-754, and so would be really trivial to implement,
especially since speed isn't a critical concern for obscure modes of
operation, so there is no need to shorten pipelines where a stage is no
longer needed, etc.

Do you see any potential flaws in this "best of both worlds" concept?

Thanks yet again for your help here.

Steve Richfield
==================

Nick Maclaren

unread,

Jun 4, 2004, 3:11:27 PM6/4/04

to

In article <febab880da5a9034...@news.teranews.com>,

Steve Richfie1d <St...@NOSPAM.smart-life.net> wrote:
>
>As I understand what you are saying:
>
>1. These traps are currently unused except to identify defective kernel
>code, if any.
>
>2. Users can't mess them up.
>
>Unless I've missed some detail, this is **EXACTLY** what is needed to
>emulate new FP features. Right? All that is needed is a new OS call to
>identify the features that are wanted, and the OS then turns the traps
>on to emulate it until another system call is issued to restore
>operation to the default IEEE-754, whereupon the traps are turned back off.

No, it is a bloody awful way of doing it!

Yes, it is functionally adequate, but it is precisely NOT how to
add such a feature - at least if you have a modicum of knowledge
of previous, proven, better approaches. It has at least the
following major disadvantages:

1) It is privileged, and therefore cannot be done by run-time
systems, applications and so on. Why on earth not?

2) It halts the CPU in a particularly horrible way, and then
the kernel decides that it didn't need to after all. This causes
unnecessary serialisation and even system hangs.

3) Largely because of (2), it runs like a drain.

Regards,
Nick Maclaren.

Steve Richfie1d

unread,

Jun 4, 2004, 5:19:12 PM6/4/04

to

Nick

> Yes, it is functionally adequate, but it is precisely NOT how to
> add such a feature - at least if you have a modicum of knowledge
> of previous, proven, better approaches.

What better way is there given the assortment of existing hardware that
this must run on? The big miracle here is that there is SOME horrible
way to run.

I'm expecting that there won't be a lot of computationally intensive
applications until better hardware arrives, so I'm not really concerned
about performance, only a smooth growth path from next fall when the
software could be working, to several years from now when this is in all
new hardware.

> It has at least the
> following major disadvantages:
>
> 1) It is privileged, and therefore cannot be done by run-time
> systems, applications and so on.

Of course a new system call, as I described in my previous email, would
make the link between application and hardware control. There are lots
of good reasons not to turn trap control directly over to users.

Another alternative would be for users to just use the privileged
instruction, which would then trap the processor, which needs to know
about such usage anyway for smooth save/restore for other traps into the
kernel. Of course this is the "Virtual Machine" approach of letting
users specify what they want by using privileged instructions that trap.

I don't see a lot of overhead switching between FP modes, so trapping to
the kernel seems to be a non-issue/problem (to me).

> Why on earth not?

Because of a past severe lack of creativity and respect for the needs of
various users among the IEEE FP committee members. This should have all
been figured out 20 years ago. The IEEE FP was obsolete at least 25
years years before its creation!

My goal is to find a way out of the mess that has been created by IEEE
FP. If you have a better path to this end, then PLEASE, tell me what it
is. Until then, I'll take **ANY** way that works, regardless of how slow
and inelegant it might appear.

> 2) It halts the CPU in a particularly horrible way, and then
> the kernel decides that it didn't need to after all. This causes
> unnecessary serialisation and even system hangs.

I sure see how it kills performance, and thoroughly expect that trapped
instructions will take ~100X the time of a similar untrapped
instruction, even where the trap code is still in the cache. However, I
don't see how it would hang a system. Please explain. Is there some sort
of nasty bug here that clever software can't get past?

> 3) Largely because of (2), it runs like a drain.

OK with me until better hardware arrives. I'm sure that the processor
manufacturers will LOVE this, because it will help the sale of new
processors to those who need it, which again is OK with me. I'll be one
of the first happy customers.

Steve Richfield
==================

> Regards,
> Nick Maclaren.

Brian Inglis

unread,

Jun 5, 2004, 12:47:23 PM6/5/04

to

On Wed, 02 Jun 2004 21:31:04 GMT in comp.arch.arithmetic, Steve
Richfie1d <St...@NOSPAM.smart-life.net> wrote:

>Nick
>
>> The IEEE 754 specification isn't complex, just confusingly written.
>>
>> There are at least five people posting to this group who would have
>> little difficulty in writing a reference implementation for IEEE 754.
>> At least two of us have done something very similar previously. If
>> performance is not the issue, it is not a lot of work.
>
>Agreed. Of course we need a long list of optional additional tweaks,
>many of which are reductions in functionality.
>
>Also, I'll check a contact at Microsoft. Maybe I could get them to
>donate their old 8087 emulation source code?!

There are open source 387 emulations available in C; this one in a
single 70KB 4KLOC source file is part of the DJGPP project:
http://www.ludd.luth.se/~ams/djgpp/cvs/djgpp/src/libemu/src/emu387.cc

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply

Dik T. Winter

unread,

Jun 5, 2004, 10:13:06 PM6/5/04

to

In article <65155456351ef12a...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> > 2) It halts the CPU in a particularly horrible way, and then
> > the kernel decides that it didn't need to after all. This causes
> > unnecessary serialisation and even system hangs.
>
> I sure see how it kills performance, and thoroughly expect that trapped
> instructions will take ~100X the time of a similar untrapped
> instruction, even where the trap code is still in the cache. However, I
> don't see how it would hang a system. Please explain. Is there some sort
> of nasty bug here that clever software can't get past?

When the processor traps on a FP instruction, it will save the state and
switch to kernel mode to enter a trap routine. The kernel has to check
whether the user actually wanted the trap, sep up a stack frame for the
trap in user space, and enter a user supplied routine back in user mode.
By this time any expectation of the trap code still being in the cache
can be forgotten. Moreover, such kernel codes are extremely sensitive.
If the kernel traps when in the trap routine there are serious security
problems. (I still remember the divide by 0 bug on Sparc which could
give the user root access to the system.)
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Steve Richfie1d

unread,

Jun 5, 2004, 11:33:17 PM6/5/04

to

Two important improvements to this developing concept:

1. In each of the bit vector fields, one of the combinations says
"don't care", equivalent to "fastest way". This causes the selection of
the corresponding feature to be whatever runs fastest. In the event
there is more than one equal-speed possibility, the selection would be
whatever 754 uses. If this isn't compatible with the other
specifications, then the "best" would be selected. Then, the
corresponding bits in the bit vector will be changed to the selection
that was actually made, so that the software can then determine how
"don't care" was interpreted by simply reading the bits back out.

2. Somewhat unusual simulator construction: The simulator core could be
written using conditional compilation statements instead of IF
statements, as though it was going to be compiled for just one
particular configuration bit-vector combination. This code could then be
placed into an INCLUDE file that is included many times with different
common combinations of parameters. Clever code would interpret calls to
set the bit vectors to see if one of the common combinations has been
specified. If so, the number of the appropriate incarnation of the
simulator core would be stored into a STATIC variable. Subsequent
trapped arithmetic instructions would first perform a computed GOTO on
the static variable to jump to the appropriate incarnation of the
simulator code. This should provide nearly the same speed as though the
code were custom written just for each combination of configuration
bits, and probably run twice as fast or faster as with traditional
IF-coded simulator structures, yet it would probably be easier to read
and understand than IF-coded logic.

Better, faster, easier - that's what software design is all about.

Any comments?

Steve Richfield
=======================

Steve Richfie1d

unread,

Jun 6, 2004, 12:16:28 AM6/6/04

to

Dik,

> > > 2) It halts the CPU in a particularly horrible way, and then
> > > the kernel decides that it didn't need to after all. This causes
> > > unnecessary serialisation and even system hangs.
> >
> > I sure see how it kills performance, and thoroughly expect that trapped
> > instructions will take ~100X the time of a similar untrapped
> > instruction, even where the trap code is still in the cache. However, I
> > don't see how it would hang a system. Please explain. Is there some sort
> > of nasty bug here that clever software can't get past?
>
> When the processor traps on a FP instruction, it will save the state and
> switch to kernel mode to enter a trap routine. The kernel has to check
> whether the user actually wanted the trap, sep up a stack frame for the
> trap in user space, and enter a user supplied routine back in user mode.

Why not just put the simulator into the kernel? This would sure save a
bunch of context switching.

> By this time any expectation of the trap code still being in the cache
> can be forgotten.

Why? The cache doesn't know/care what sort of code is being executed. So
long as you haven't run through some fraction of a megabyte of code
between FP traps, things should still be in cache. I'd expect that most
FP would be done by FP instructions that are fairly close together, and
that very little kernel code outside of the simulator would be executed;
unless of course there is some I/O, whereupon who cares how slow the FP
is? Am I missing something here?

All this aside, I don't see speed as being a big issue, at least at the
beginning. Many applications only need a few thousand special format
operations to get their critical work done, so even taking a millisecond
each wouldn't be a big problem. Remember, they can still use IEEE FP at
full speed for everything that it works for.

> Moreover, such kernel codes are extremely sensitive.
> If the kernel traps when in the trap routine there are serious security
> problems.

This should be recognized and handled as a system crash as I described,
as unrelated trap routines now routinely set the FP traps (where
available), especially where they do NOT use FP instructions in the
unrelated trap routines. This saves them from saving the FP registers,
and saves them from not screwing up user programs if some kernel coder
inadvertently uses an unsaved FP register. In short, this shouldn't be a
problem with reasonable coding practices, and screwups here should be
routinely caught and produce the usual BSOD type of display indicating
the problem.

(I still remember the divide by 0 bug on Sparc which could
> give the user root access to the system.)

Just because someone was able to screw something up doesn't mean that
there is any real problem in NOT screwing it up. Whenever you code
kernel code, and anything you write in C is VERY fragile. The
combination is REALLY nasty. It takes a very special kind of fool to
rush in where angels fear to tread.

Yes, I understand that I may have to code the critical code myself, ugh.

Steve Richfield

Boudewijn Dijkstra

unread,

Jun 6, 2004, 11:03:57 AM6/6/04

to

"Steve Richfie1d" <St...@NOSPAM.smart-life.net> schreef in bericht
news:3e5b538e2c804942...@news.teranews.com...
> [snip]
>
> Has anyone written IEEE-754 in C yet? Something like that might be a
> good place to start.

Yes. John R. Hauser has written the SoftFloat library, which claims to be
conforming to IEEE 754 and is written in ISO-compatible C. The package
provides configurations for 386-Win32-GCC and SPARC-Solaris-GCC.

Besides the compiler, it requires a binary machine that is binary, with a
word size that is a power of 2, with 8-bit bytes and signed integers that
modularly
wrap around on overflows. Support for the extended double-precision and
quadruple-precision formats depends on the C compiler implementing a 64-bit
integer type.

These seem very reasonable requirements, don't they? :-)

URL: http://www.jhauser.us/arithmetic/SoftFloat.html

Mike Cowlishaw

unread,

Jun 4, 2004, 2:04:54 PM6/4/04

to

Steve Richfie1d wrote:
> Also, note that some of the "features" are really just optionally
disabling
> some facet of IEEE-754, and so would be really trivial to implement,
> especially since speed isn't a critical concern for obscure modes of
> operation, so there is no need to shorten pipelines where a stage is
> no longer needed, etc.
>

> Do you see any potential flaws in this ...

This would seem to be a major flaw in itself. History has shown over and
over again that as soon as a standard describes a feature as 'optional' then
some/many/all implementers just do not bother to implement it. Hence (in
general) users cannot rely on it being there, and hence the feature may as
well not be in the standard at all. One ends up programming to the
'required' part of the standard.

There are parts of the standard which are really an essential aspect, such
as the requirement that results be correctly rounded for the base arithmetic
operations. Yet even with this is questioned by some, who argue that it
would be cheaper to give results to within one ulp.

I, personally, would like to see the standard nailed down to the
detail than given a certain context and certain operands there is only
one possible encoded result. 754r goes further towards that than
754 did, but I suspect it will not attain that level of detail. Adding
more features and options could make that goal harder to attain.

mfc

Dik T. Winter

unread,

Jun 6, 2004, 7:46:40 PM6/6/04

to

In article <d1dc1b4474e7a6b5...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> > When the processor traps on a FP instruction, it will save the state and
> > switch to kernel mode to enter a trap routine. The kernel has to check
> > whether the user actually wanted the trap, sep up a stack frame for the
> > trap in user space, and enter a user supplied routine back in user mode.
>
> Why not just put the simulator into the kernel? This would sure save a
> bunch of context switching.

Security problems, accountability, and whatever you want.

>
> > By this time any expectation of the trap code still being in the cache
> > can be forgotten.
>
> Why? The cache doesn't know/care what sort of code is being executed. So
> long as you haven't run through some fraction of a megabyte of code
> between FP traps, things should still be in cache. I'd expect that most
> FP would be done by FP instructions that are fairly close together, and
> that very little kernel code outside of the simulator would be executed;
> unless of course there is some I/O, whereupon who cares how slow the FP
> is? Am I missing something here?

Obviously every part of code is some fraction of a megabyte, so I think
you mean a significant part of a megabyte. I confess that I have no
idea how large current caches are for instructions, but I do think they
are not particularly large.

> All this aside, I don't see speed as being a big issue, at least at the
> beginning. Many applications only need a few thousand special format
> operations to get their critical work done, so even taking a millisecond
> each wouldn't be a big problem. Remember, they can still use IEEE FP at
> full speed for everything that it works for.

If you trap on each FP instruction, if it is detected that the special
code is not needed, you cerrtainly have to restart in user mode, otherwise
you may experience a trap in kernel mode.

> > Moreover, such kernel codes are extremely sensitive.
> > If the kernel traps when in the trap routine there are serious security
> > problems.
>
> This should be recognized and handled as a system crash as I described,

This is *not* easy to recognise, and that is where the divide by 0
SPARC bug came from. But system crashes are also not really nice.
(I also remember that you could get a system panic when you executed
a defined, but illegal, floating-point instruction in the debugger
on the SPARC. Not nice for other people that were working on the
system concurrently.)

One more objection, doing the emulation in kernel mode may lock
out other processing during the execution.

> In short, this shouldn't be a
> problem with reasonable coding practices, and screwups here should be
> routinely caught and produce the usual BSOD type of display indicating
> the problem.

I never experience a BSOD, and would not like it at all. Yes, I experience
system crashes sometimes, but I expect that they do not occur more than
once a few months. If they are more frequent I will wonder at the
robustness of the system.

> > (I still remember the divide by 0 bug on Sparc which could
> > give the user root access to the system.)
>
> Just because someone was able to screw something up doesn't mean that
> there is any real problem in NOT screwing it up.

There *is* a serious problem. Only a very slight screw-up (as in this
case) can lead to very serious consequences when you execute code for
the user in kernel mode.

> Whenever you code
> kernel code, and anything you write in C is VERY fragile. The
> combination is REALLY nasty. It takes a very special kind of fool to
> rush in where angels fear to tread.

The SPARC code was in assembler. Still a more nasty combination.

glen herrmannsfeldt

unread,

Jun 7, 2004, 12:00:38 AM6/7/04

to

Terje Mathisen wrote:

> glen herrmannsfeldt wrote:
>
>> Terje Mathisen wrote:
>>
>>> Doing it in really portable C is tough if you also care about
>>> performance. I.e. if you can depend on having t_int64/t_uint64
>>> available then it becomes _much_ easier, particularly if 64-bit ints
>>> are guaranteed to have the same endian ordering as 64-bit fp values.

>> You mean like D and G float for VAX?

> I don't know particularly about those, but I have heard about some weird
> ("middle-endian" anyone?) orderings perpetrated by DEC.

The PDP-11 is a byte addressed 16 bit little endian machine.
When floating point was added, it was added on a 16 bit
bus, as big (16 bit word) endian.

For compatibility reasons, this format was adopted by VAX.

I did once see a VAX Fortran program initializing floating
point variables with hex (Z type) constants, which are
converted to little endian 32 bit words in storage.
(They might have been EQUIVALENCED if it didn't allow them
directly.) So in hex form, the sign is in the fifth
nybble of an eight nybble hex constant. It looks very
strange that way.

-- glen

Nick Maclaren

unread,

Jun 7, 2004, 4:31:05 AM6/7/04

to

In article <65155456351ef12a...@news.teranews.com>,

Steve Richfie1d <St...@NOSPAM.smart-life.net> writes:
|>
|> > Yes, it is functionally adequate, but it is precisely NOT how to
|> > add such a feature - at least if you have a modicum of knowledge
|> > of previous, proven, better approaches.
|>
|> What better way is there given the assortment of existing hardware that
|> this must run on? The big miracle here is that there is SOME horrible
|> way to run.

None and not really.

|> I'm expecting that there won't be a lot of computationally intensive
|> applications until better hardware arrives, so I'm not really concerned
|> about performance, only a smooth growth path from next fall when the
|> software could be working, to several years from now when this is in all
|> new hardware.

The mind boggles at the first part of that :-)

|> Of course a new system call, as I described in my previous email, would
|> make the link between application and hardware control. There are lots
|> of good reasons not to turn trap control directly over to users.

Please name ONE good reason not to allow applications to control
their own floating-point model. It used to be possible and, to some
extent, still is.

|> Another alternative would be for users to just use the privileged
|> instruction, which would then trap the processor, which needs to know
|> about such usage anyway for smooth save/restore for other traps into the
|> kernel. Of course this is the "Virtual Machine" approach of letting
|> users specify what they want by using privileged instructions that trap.
|>
|> I don't see a lot of overhead switching between FP modes, so trapping to
|> the kernel seems to be a non-issue/problem (to me).

Hmm. Think serialisation. In the past few years, I have seen
systems brought down by large numbers of floating-point interrupts.

|> > 2) It halts the CPU in a particularly horrible way, and then
|> > the kernel decides that it didn't need to after all. This causes
|> > unnecessary serialisation and even system hangs.
|>
|> I sure see how it kills performance, and thoroughly expect that trapped
|> instructions will take ~100X the time of a similar untrapped
|> instruction, even where the trap code is still in the cache. However, I
|> don't see how it would hang a system. Please explain. Is there some sort
|> of nasty bug here that clever software can't get past?

Because modern hardware uses a single trap mechanism for everything,
it has to halt the processor dead (in case it is a machine check or
something similarly horrible). In most SMP designs, this means that
other processors may potentially block until it has then released
enough privilege to reallow interrupts.

This is not how to do it.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Jun 7, 2004, 4:33:10 AM6/7/04

to

Dale Morris

unread,

Jun 7, 2004, 12:21:03 PM6/7/04

to

"Steve Richfie1d" <St...@NOSPAM.smart-life.net> wrote in message

news:febab880da5a9034...@news.teranews.com...

> Dale,
>
> > Itanium has a set of flags (one for the lower 32 FRs and one for the
upper
> > 96 FRs) that cause any FP instruction referencing a particular register
to
> > trap. The flags are only available to privileged code.
> >
> > These sorts of flags are generally supported in modern architectures to
> > enable lazy state save and restore of the FRs. This shortens the dynamic
> > path through the context switch path in the OS for applications which
don't
> > use floating-point (or which use it only little - Itanium has two flags
> > because the lower 32 FRs are used when we do integer multiplication and
> > division, which is done on the FPU).
>
> As I understand what you are saying:
>
> 1. These traps are currently unused except to identify defective kernel
> code, if any.

I have no idea how you got that idea from what I said. I said they are used
for lazy save and restore of floating-point registers. This is not bug, but
rather a peformance feature of OSs that implement it.

Since you seem now to be on a path of gathering information so you can
convince some OS team to implement some of your ideas, I'd recommend reading
some about how OSs work to help your arguments. "ia-64 linux kernel" by
Mosberger and Eranian is a quite good start.

Steve Richfie1d

unread,

Jun 7, 2004, 3:07:39 PM6/7/04

to

Dale

> I have no idea how you got that idea from what I said. I said they are used
> for lazy save and restore of floating-point registers. This is not bug, but
> rather a peformance feature of OSs that implement it.
>
> Since you seem now to be on a path of gathering information so you can
> convince some OS team to implement some of your ideas, I'd recommend reading
> some about how OSs work to help your arguments. "ia-64 linux kernel" by
> Mosberger and Eranian is a quite good start.

I actually have quite a bit of OS experience, including porting a UNIX
kernel from a single to a dual processor system.

When proposing a new way of doing things to run on ALL computers, I must
look at the worst of cases, not the best or average of situations. Of
course, here the worst of cases is that there are no traps to use. Maybe
this is enough of a mess that there really ARE no usable traps? By my
guesstimates, the absence of traps slows emulation from around 10% speed
to around 5% as fast as if the arithmetic features were in the hardware.
I think you are saying that, at least in the case of the Itanium, to
forget the traps and eat the ~2:1 loss in speed until better hardware is
available, because I'd probably end up slowing down just as much if I
used the traps, but put in a LOT more work, have to build something into
each OS, and possibly end up with something with some VERY nasty bugs.
Is THIS the message you were trying to communicate.

Yes, I now see how you get lazy save traps when operating normally.
Clearly there would have to be some logic to distinguish between a lazy
save and a FP implementation situation, Indeed, if this is implemented
in the kernel, there could be more than one task emulating DIFFERENT FP
features. Hence, the FP emulation mode (if any) would probably have to
be stored in the task table to make this decision.

I'm beginning to see that this is the end of a long piece of spaghetti
hanging off of the plate. This can't just be dropped into an OS, rather
it would have to be integrated into it. Not a big integration, but still
an integration just the same. Hmmm.

Thanks for the "heads up" on this issue. Now that we're on the same
page, your advise is to forget using the traps because there is little
to gain and a LOT to lose?

Steve Richfield
================

Steve Richfie1d

unread,

Jun 7, 2004, 3:21:32 PM6/7/04

to

Nick,

> |> I'm expecting that there won't be a lot of computationally intensive
> |> applications until better hardware arrives, so I'm not really concerned
> |> about performance, only a smooth growth path from next fall when the
> |> software could be working, to several years from now when this is in all
> |> new hardware.
>
> The mind boggles at the first part of that :-)

Given John Hauser's solid 386 emulator to start with, the only
programming needed is to add the new features and write probably the
simplest interpreter that has EVER been written. The HARD parts of this
are the people-parts, like getting the world to know that this exists.

> |> Of course a new system call, as I described in my previous email, would
> |> make the link between application and hardware control. There are lots
> |> of good reasons not to turn trap control directly over to users.
>
> Please name ONE good reason not to allow applications to control
> their own floating-point model. It used to be possible and, to some
> extent, still is.

I agree. I was looking to use some "features" (problems) in existing
hardware that made some things only available to kernel code. Maybe I
should just eat the 2:1 loss in speed from not using these and hope to
gain some large part of this back by staying out of the kernel.

> |> Another alternative would be for users to just use the privileged
> |> instruction, which would then trap the processor, which needs to know
> |> about such usage anyway for smooth save/restore for other traps into the
> |> kernel. Of course this is the "Virtual Machine" approach of letting
> |> users specify what they want by using privileged instructions that trap.
> |>
> |> I don't see a lot of overhead switching between FP modes, so trapping to
> |> the kernel seems to be a non-issue/problem (to me).
>
> Hmm. Think serialisation. In the past few years, I have seen
> systems brought down by large numbers of floating-point interrupts.

I wonder which systems have this problem, and which don't? Any idea?

> |> > 2) It halts the CPU in a particularly horrible way, and then
> |> > the kernel decides that it didn't need to after all. This causes
> |> > unnecessary serialisation and even system hangs.
> |>
> |> I sure see how it kills performance, and thoroughly expect that trapped
> |> instructions will take ~100X the time of a similar untrapped
> |> instruction, even where the trap code is still in the cache. However, I
> |> don't see how it would hang a system. Please explain. Is there some sort
> |> of nasty bug here that clever software can't get past?
>
> Because modern hardware uses a single trap mechanism for everything,
> it has to halt the processor dead (in case it is a machine check or
> something similarly horrible). In most SMP designs, this means that
> other processors may potentially block until it has then released
> enough privilege to reallow interrupts.
>
> This is not how to do it.

The alternative is to have an interpreter interpret ALL instructions
from the point where an alternative FP mode is selected, until switching
back to IEEE-754. This is not very expensive in code where many/most of
the operations are FP operations, but slows computers WAY down where the
FP operations are sparse. It sounds like you are saying that the FP
operations would have to be pretty sparse before trapping FP operations
would outrun interpretation? Did I get it right?

Steve Richfield
================
> Regards,
> Nick Maclaren.

Nick Maclaren

unread,

Jun 7, 2004, 4:21:34 PM6/7/04

to

In article <1fba6070d40dab9c...@news.teranews.com>,

Steve Richfie1d <St...@NOSPAM.smart-life.net> wrote:
>
>I agree. I was looking to use some "features" (problems) in existing
>hardware that made some things only available to kernel code. Maybe I
>should just eat the 2:1 loss in speed from not using these and hope to
>gain some large part of this back by staying out of the kernel.

Experience on the early workstations is that a good library
implementation is comparable in performance with a bad hardware
plus interrupt one. Yes, really.

>> Hmm. Think serialisation. In the past few years, I have seen
>> systems brought down by large numbers of floating-point interrupts.
>
>I wonder which systems have this problem, and which don't? Any idea?

Yes. Most of them, to one degree or another. Currently, the only
large-SMP Unix that I know of that serialises is Solaris (and I am
trying to get Sun to regard it as a bug). OSF/1 crashed because of
this, and both IRIX and AIX got very, very sick - including daemons
failing because of missed timeouts. I have never used HP-UX, Tru64
Unix or Linux on large SMP boxes, but would expect comparable issues.
And I would expect Windows NT/XP/PDQ/NBG/etc. to be, if anything,
worse. I haven't used z/OS recently, but that had the same problem
up until at least MVS/XA.

>> Because modern hardware uses a single trap mechanism for everything,
>> it has to halt the processor dead (in case it is a machine check or
>> something similarly horrible). In most SMP designs, this means that
>> other processors may potentially block until it has then released
>> enough privilege to reallow interrupts.
>>
>> This is not how to do it.
>
>The alternative is to have an interpreter interpret ALL instructions
>from the point where an alternative FP mode is selected, until switching
>back to IEEE-754. This is not very expensive in code where many/most of
>the operations are FP operations, but slows computers WAY down where the
>FP operations are sparse. It sounds like you are saying that the FP
>operations would have to be pretty sparse before trapping FP operations
>would outrun interpretation? Did I get it right?

No. And that is not the alternative, if one can design hardware,
which is what I was referring to.

Your comments about sparsity and the tradeoffs are correct, of course.
Given current hardware, that is the choice. Well, except that Alpha
style JIT compilation is a good way of doing the interpretation.

Regards,
Nick Maclaren.

Steve Richfie1d

unread,

Jun 7, 2004, 6:56:49 PM6/7/04

to

Nick,

>>back to IEEE-754. This is not very expensive in code where many/most of
>>the operations are FP operations, but slows computers WAY down where the
>>FP operations are sparse. It sounds like you are saying that the FP
>>operations would have to be pretty sparse before trapping FP operations
>>would outrun interpretation? Did I get it right?
>
> No. And that is not the alternative, if one can design hardware,
> which is what I was referring to.

I'm just looking for a ~2-year "bridge", during which time hopefully
some manufacturer will make the hardware. At least with such a bridge,
the applications that really need these features and can stand running
at 5% speed can proceed.

I believe that MOST of the significance-needing financial projection
applications, i.e. the REALLY high-value applications, can stand this
loss of speed. Others, like NN applications desiring logarithmic
representation, will just have to wait for better hardware for
deployment, but they can start testing very soon.

> Your comments about sparsity and the tradeoffs are correct, of course.
> Given current hardware, that is the choice. Well, except that Alpha
> style JIT compilation is a good way of doing the interpretation.

Getting more complex, only the jump and FP instructions would need to be
JIT'd. The no-longer interpreter, more like a debugger, could scan
ahead, and breakpoint future FP or jump instruction(s), then resume
execution. Of course, some self-modifying code wouldn't work right, but
I doubt whether that would break many things.

Store instructions could also be breakpointed to see if they touched
code, and if so, flush JIT code appropriately. However, this still
misses code that reads itself in preparation for modification, which
could inadvertently read one of the breakpoints. Also, what if a JIT'd
instruction trapped? It may still work OK, but the address would be
wrong for debugging. I get the feeling that complexity begets complexity
without end here.

I'm not sure just where to strike the compromise between complexity and
fragility to obtain speed, and stark simplicity to get something out
ASAP that works 100%. With any luck the last copy of this will be
deleted in 3-4 years, so I'm drawn to simplicity, especially where
everyone already expects this to run like crap.

Any thoughts?

Steve Richfield

Nick Maclaren

unread,

Jun 8, 2004, 3:27:47 AM6/8/04

to

In article <10b603a6c8b02c9e...@news.teranews.com>,

Steve Richfie1d <St...@NOSPAM.smart-life.net> writes:
|>
|> >>back to IEEE-754. This is not very expensive in code where many/most of
|> >>the operations are FP operations, but slows computers WAY down where the
|> >>FP operations are sparse. It sounds like you are saying that the FP
|> >>operations would have to be pretty sparse before trapping FP operations
|> >>would outrun interpretation? Did I get it right?
|> >
|> > No. And that is not the alternative, if one can design hardware,
|> > which is what I was referring to.
|>
|> I'm just looking for a ~2-year "bridge", during which time hopefully
|> some manufacturer will make the hardware. At least with such a bridge,
|> the applications that really need these features and can stand running
|> at 5% speed can proceed.
|>

|> Any thoughts?

Fat hope. Consider the near-negligible amount of thinking that has
been put into the software engineering aspects of floating-point in
four decades, expecting improvement on a short scale is excessively
optimistic.

Regards,
Nick Maclaren.

Mike Cowlishaw

unread,

Jun 8, 2004, 6:08:20 AM6/8/04

to

Steve Richfie1d wrote:
> I'm just looking for a ~2-year "bridge", during which time hopefully
> some manufacturer will make the hardware.

That's a little unrealistic. If a manufacturer started work on something
like this today, one might expect to see the first silicon in about 3-4
years and products a year or two later.

mfc

Steve Richfie1d

unread,

Jun 8, 2004, 9:37:36 PM6/8/04

to

Nick,

> |>
> |> >>back to IEEE-754. This is not very expensive in code where many/most of
> |> >>the operations are FP operations, but slows computers WAY down where the
> |> >>FP operations are sparse. It sounds like you are saying that the FP
> |> >>operations would have to be pretty sparse before trapping FP operations
> |> >>would outrun interpretation? Did I get it right?
> |> >
> |> > No. And that is not the alternative, if one can design hardware,
> |> > which is what I was referring to.
> |>
> |> I'm just looking for a ~2-year "bridge", during which time hopefully
> |> some manufacturer will make the hardware. At least with such a bridge,
> |> the applications that really need these features and can stand running
> |> at 5% speed can proceed.
> |>
> |> Any thoughts?
>
> Fat hope. Consider the near-negligible amount of thinking that has
> been put into the software engineering aspects of floating-point in
> four decades, expecting improvement on a short scale is excessively
> optimistic.

My hope was that they would see that by making the format and
functionality variable, that it could later be made to work any way they
selected, whether or not my particular standard became THE standard.
Hence, the risk of implementation would be a **LOT** lower than with any
fixed standard like IEEE-754.

On the other hand, I agree that logic seems to have little to do with
the evolution of semiconductors. Sadly, you are probably right.

Steve Richfield

Steve Richfie1d

unread,

Jun 8, 2004, 9:39:57 PM6/8/04

to

Mike,

Such is the difference between optimism and pessimism. Considering my
past success (or lack thereof) with optimism, I bow to your realism.

Steve Richfield

Herman Rubin

unread,

Jun 9, 2004, 11:08:19 AM6/9/04

to

In article <ca3ppj$a1r$1...@pegasus.csx.cam.ac.uk>,
Nick Maclaren <nm...@cus.cam.ac.uk> wrote:

>In article <10b603a6c8b02c9e...@news.teranews.com>,
>Steve Richfie1d <St...@NOSPAM.smart-life.net> writes:

....................

>|> I'm just looking for a ~2-year "bridge", during which time hopefully
>|> some manufacturer will make the hardware. At least with such a bridge,
>|> the applications that really need these features and can stand running
>|> at 5% speed can proceed.

>|> Any thoughts?

>Fat hope. Consider the near-negligible amount of thinking that has
>been put into the software engineering aspects of floating-point in
>four decades, expecting improvement on a short scale is excessively
>optimistic.

It seems that both the hardware and software thinking is
concerned only with items for which it is deemed the market
will be huge, except for what goes into supercomputers.
Even there, the approach is limited.

Notation for anything not in the "immediate language"
becomes hopelessly long and confusing for humans, placing
the ease of understanding by the much faster, but
unintelligent, computers first. Hardware instructions
disappear because the simplistic languages do not have ways
of properly using them. Simple software to enable users to
quickly type in and edit mathematical expressions and
papers, for example, does not exist, although some did
before; typesetting languages, etc., are fine for printed
works, but not for preliminary stages. Why can we not get
fixed-width multi-font editors? They are certainly simpler
than markup languages and typesetters.

As for "who needs it", put in the relatively simple flexibility,
and intelligent users will find a way to use it.

>Regards,
>Nick Maclaren.

--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Mike Cowlishaw

unread,

Jun 9, 2004, 4:30:54 PM6/9/04

to

> It seems that both the hardware and software thinking is
> concerned only with items for which it is deemed the market
> will be huge, except for what goes into supercomputers.
> Even there, the approach is limited.

I suspect that is a very simple economic calculation. To add
a new arithmetic unit to a current processor takes (let's be
optimistic here) 30-60 people working full time for 5 years.
That then requires enhancements to operating systems,
compilers, middleware, and applications to support it (say
500 people, 30% of their time, over a shifted 5 years).

Ignoring manufacturing (fabs are not cheap), publicity,
documentation, marketing, and all those sorts of things,
we're in the area of 200 person-years just to develop the
thing. In the USA, at a conservative $150k per person
per year (salaries plus burden, post dot-com), that
is an investment of $30 million .. before the first product
is out of the door. So you need a market which will
return as profit a lot more than that (in order to fund the
next step), just to cover the development costs. Is that
huge? Maybe not. But it's not trivial.

mfc

Steve Richfie1d

unread,

Jun 9, 2004, 7:52:01 PM6/9/04

to

Mike,

Which is EXACTLY why you want to fix ALL of the problems in one shot,
and not just add one non-essential feature like decimal arithmetic for
which there are workarounds, when there are whole segments of users who
are presently UNABLE to use IEEE-754, like financial simulation,
cryptographic, and DSP users.

The one-at-a-timers would incur this cost time and time again, rather
than just fixing all the problems at the same time. THIS is why I see
your decimal proposal, if implemented separately, as a net negative. Not
because it isn't a good thing for the users it was designed for, but
because it incurs the expenditure of large sums of money while leaving
other more serious problems unaddressed that could be addressed at
little additional cost.

The other argument is that by addressing more users with a set of fixes,
you have a better shot at paying off your R&D.

Surely you can see the obvious economic logic in fixing EVERYTHING in
one shot?

Hence, individual details should NOT be decided on their present
individual market sizes, but rather be taken as a group. Also, some of
the markets that are presently unaddressed by IEEE-754, like logarithmic
DSP arithmetic, may be even LARGER than the ones that ARE addressed by
754! Consider, are there more cell phones or PCs?!

Much of the costs you mentioned go into assuring absolute perfection.
With (much more than now) clever trap-control logic, anything that turns
out not to work right can simply be trapped and emulated in software,
thereby removing the astronomical risks of imperfect silicon, and
thereby reducing the total cost.

Further, with enough different ways of doing things, you gain the
"windows effect". Windows was revolutionary, because for the first time
in history really buggy software could still work for people. If
something didn't work right, just do it some other way. With my FP
proposal, you get much the same thing with FP. If stochastic rounding
doesn't work, use IEEE-754 style rounding as it is probably still good
enough, and vice versa if IEEE-754 rounding fails. If significance
doesn't work, then use unnormalized and let the emulator handle multiply
and divide where these work differently, etc.

In short, why have only decimal, when you can have decimal and
everything else for the same OR LESS total cost? Potentially less cost
because of the tolerability of bugs with the new structure.

Your argument regarding the size of hardware engineering projects is
absolutely valid, and following it makes an excellent case for you
proceeding in a very different direction than you have been traveling.

Steve Richfield

Herman Rubin

unread,

Jun 10, 2004, 10:21:58 AM6/10/04

to

In article <ca7s22$3t4$1...@news.btv.ibm.com>,

Mike Cowlishaw <m...@uk.ibm.com> wrote:
>> It seems that both the hardware and software thinking is
>> concerned only with items for which it is deemed the market
>> will be huge, except for what goes into supercomputers.
>> Even there, the approach is limited.

>I suspect that is a very simple economic calculation. To add
>a new arithmetic unit to a current processor takes (let's be
>optimistic here) 30-60 people working full time for 5 years.
>That then requires enhancements to operating systems,
>compilers, middleware, and applications to support it (say
>500 people, 30% of their time, over a shifted 5 years).

When a new arithmetic unit is being produced, a small
amount of thinking about increasing the flexibility
will do a great deal. The same holds for software.

As far as enhancements to operating systems, I think a
few "dehancements" might be a good idea. The current
operating systems have more glitz than real useful
features. Instead of attempting to get compilers
which will limit users to what the compiler writers
can come up with, and try to guess what is wanted in
the program, how about a "high level" assembler, which
will have types, overloaded operators, but with user
control and type override, macros which are easy to
write and use from the user's standpoint, etc.? From
what my computer friends tell me, this would not be
too difficult to produce. Optimization should be done
from this type of assembler programs, not trying to
force the user, and should often be interactive.

>Ignoring manufacturing (fabs are not cheap), publicity,
>documentation, marketing, and all those sorts of things,
>we're in the area of 200 person-years just to develop the
>thing. In the USA, at a conservative $150k per person
>per year (salaries plus burden, post dot-com), that
>is an investment of $30 million .. before the first product
>is out of the door. So you need a market which will
>return as profit a lot more than that (in order to fund the
>next step), just to cover the development costs. Is that
>huge? Maybe not. But it's not trivial.

Which is why you need lots of input from people who can
use these hardware features before just putting out what
satisfies the current poorly designed languages and
compilers. I have been told that a couple of undergraduate
CS students could produce a good multifont fixed-width
editor in a few months; the Unicode people seem to be
sitting on it, with the idea of producing a full-fledged
typesetting language. I do NOT want a typeSETTER; I
want a typeWRITER which will cut my time to produce
mathematical type in half. This is cheap software I am
asking for, not the current complicated stuff which
costs much for deciding where to break lines, how to
align expressions, etc., which I want to manage myself.

What we have now is like requiring the whole deluxe
package to get a windshield wiper.

Mike Cowlishaw

unread,

Jun 10, 2004, 1:25:09 PM6/10/04

to

> and not just add one non-essential feature like decimal arithmetic for
> which there are workarounds, when there are whole segments of users
> who are presently UNABLE to use IEEE-754, like financial simulation

Please explain why financial simulation cannot use decimal arithmetic?
[No hurry, I'm just off on vacation, will check back here when I
return in a week or so. :-)]

mfc

Steve Richfie1d

unread,

Jun 10, 2004, 11:25:13 PM6/10/04

to

Mike,

You've worked for years developing your stock market forecasting
package. You have thousands of lines of formulas to factor in everything
that you know relates to both individual stocks and to whole segments of
the stock market. Now, you turn it on...

Let's track Mike's Decimal Technologies (MDT) stock. Your program knows
the stock's price today, and there is no reason to expect anything
drastic to happen in the next day or so, but your program expects a
~0.1% daily appreciation. There is every reason to expect your program's
projection for tomorrow to be good to at least 3 decimal places, but of
course there is no way for anyone to know that because the computer is
displaying 8 digit answers.

OK. so let's let the program run for a week. Of course, this will be 7
iterations through the same loop to see what will be happening to MDT
stock in a week. Unfortunately, not EVERYTHING gets factored into the
formulas, and there are always unexpected events, not to mention a LOT
of process noise. Even if both MDT's situation and the market as a whole
are pretty stable, the program's output will only be good to, maybe, 2
decimal places, but again, without the computer tracking significance,
you get 8 digits?

Now, let's let the program run for 3 months. There are SO many things
that can and will happen over 3 months that the projections probably
aren't good for more than one decimal digit. Of course, the computer
still prints out 8 digits, so how would anyone know that only the first
one is worth anything? Of course, one digit is NOT enough to make most
rational investment decisions, but nonetheless users now throw
millions/billions of dollars at companies based on the garbage 8-digit
numbers that programs now spew out, because present limitations in
computer arithmetic fail to tell them how limited the projections truly are.

If you simulate just a little longer, maybe 5-6 months, the uncertainty
completely swamps the answer, so that any numerical output, even the
first digit, is meaningless/wrong. Without any sort of traditional
computational "fault", the correct answer should be a NaN!

To avoid people challenging me for leaving out an essential detail here,
"volatility" is HIGHLY variable. Some days things are very stable,
whereas other days things are wild. There has been LOTS of study of
market volatility and ways of predicting it. The bottom line here is
that volatility means loss of significance. With enough volatility, you
can't reliably predict much of anything. Now, most people simply give up
predicting past a point of high volatility, but it is often possible to
retain SOME significance through these events. Unfortunately, with
present non-significance arithmetic, there is simply NO WAY of knowing
whether there is any accuracy at all in your surviving projections.

The methodology for tracking significance loss is COMPLEX, and generally
quite beyond having users build them into their programs. Interval
arithmetic presumes intervals, whereas most projections are concerned
with standard deviations in probable answers rather than the sharp
boundaries that interval arithmetic brings.

This problem was studied and explained in a 1959 article by Ashenhurst
and Metropolis in the JACM, and implemented at Remote Time-Sharing in
1970. System (but not necessarily hardware) support is REQUIRED for
projection applications to determine at what point their projections
become worthless, and this is not presently available ANYWHERE.

I recently watched a TV special that explored the potential effects of
global warming. The special examined 5 different simulations by 5
different teams, each of which was familiar with the work of the others.
These came up with 5 drastically DIFFERENT projected results! It was
obvious to me, but apparently not the researchers, that the uncertainty
simply exceeded the answer is at least 4 of the cases. Without an
arithmetic system that tracks significance loss, there is currently NO
WAY to identify which if any of the five simulations is correct. With
this prevailing situation, would YOU believe any of these simulations?
If you were a government official, would YOU act on any of them? If
instead some corporation who pays part of your campaign expenses said
that a particular action was best, wouldn't you be inclined to take
their word over any of these mutually contradictory simulations? THAT is
why the lack of significance arithmetic has destroyed the value of
computer projections to guide governmental action with tragic results.

What would it be worth to our and other governments to know what to do
to avoid stock market crashes? ... to know what to do about global
warming? To know what to do about the pending exhaustion of oil
supplies? The value of significance arithmetic to provide believable and
defendable answers to these questions may actually be in the TRILLIONS
of dollars! Obviously, the value here goes far beyond the sale prices of
the processors used. Without significance, present programs are pretty
much worthless to guide governmental action.

As you can see, decimal arithmetic really wouldn't help these sorts of
computations much if at all. They need significance arithmetic to succeed.

As an aside, I worked on the corporate model for Airborne Freight a few
years back. The projections were worthless, but no one cared! They would
run the simulation while changing assumptions as to what they would do
in the future and see which actions worked most to the company's favor.
While the actual projections were worthless, good strategies produced
better worthless numbers than bad strategies did. This sort of analysis
is pretty common in the corporate world. Of course, had they had
significance, they could have seen the range of probabilities and
possibly made some projections that weren't completely worthless!

Steve Richfield

Steve Richfie1d

unread,

Jun 10, 2004, 11:38:45 PM6/10/04

to

Herman,

> As far as enhancements to operating systems, I think a
> few "dehancements" might be a good idea. The current
> operating systems have more glitz than real useful
> features. Instead of attempting to get compilers
> which will limit users to what the compiler writers
> can come up with, and try to guess what is wanted in
> the program, how about a "high level" assembler, which
> will have types, overloaded operators, but with user
> control and type override, macros which are easy to
> write and use from the user's standpoint, etc.? From
> what my computer friends tell me, this would not be
> too difficult to produce. Optimization should be done
> from this type of assembler programs, not trying to
> force the user, and should often be interactive.

There are some VERY good assemblers out there. So far, the best I've
seen is IBM's mainframe assembler. I had a macro I often used that did
dynamic register assignments, including the block assignments that some
instructions needed!

I've worked on several projects where most of the programming was in
high-level macros. The most interesting was the FORTRAN supercomputer
compiler at CDC. They used macros that, for example, you could specify a
list of fields you needed extracted, and the macros would conjure up the
OPTIMUM sequence to extract those fields.

I have an unbuilt design for a bidirectional macro assembler, that can
go from source to object, and back. This transforms one-way
assembler-programming into an interactive experience, without all of the
complexity of current "visual" approaches. This should be capable of
"assembling" (vs. "compiling") most present high-level languages. With
this, you could stop a program, dump it out, carry it to an entirely
different type of processor, load it up, and resume right where you left
off. Now, to find someone who wants this enough to pay for me to build it!

>>Ignoring manufacturing (fabs are not cheap), publicity,
>>documentation, marketing, and all those sorts of things,
>>we're in the area of 200 person-years just to develop the
>>thing. In the USA, at a conservative $150k per person
>>per year (salaries plus burden, post dot-com), that
>>is an investment of $30 million .. before the first product
>>is out of the door. So you need a market which will
>>return as profit a lot more than that (in order to fund the
>>next step), just to cover the development costs. Is that
>>huge? Maybe not. But it's not trivial.
>
> Which is why you need lots of input from people who can
> use these hardware features before just putting out what
> satisfies the current poorly designed languages and
> compilers.

Which of course is why I am here!

Steve Richfield

Dik T. Winter

unread,

Jun 11, 2004, 7:57:41 AM6/11/04

to

In article <e9820d6f755a1cdb...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
...

> If you were a government official, would YOU act on any of them? If
> instead some corporation who pays part of your campaign expenses said
> that a particular action was best, wouldn't you be inclined to take
> their word over any of these mutually contradictory simulations?

If a corporation pays part of my campaign expenses it better not tell
me that a particular action is best, because if I follow it it would
make me liable to prosecution.

> What would it be worth to our and other governments to know what to do
> to avoid stock market crashes? ... to know what to do about global
> warming? To know what to do about the pending exhaustion of oil
> supplies? The value of significance arithmetic to provide believable and
> defendable answers to these questions may actually be in the TRILLIONS
> of dollars!

From what you wrote upto now, the only thing it provides is making some
answers unbelievalbe and undefendable.

> Without significance, present programs are pretty
> much worthless to guide governmental action.

And what if the significance arithmetic just says that the answer is
reliable? Has that more value to guide governmental action?

Steve Richfie1d

unread,

Jun 11, 2004, 9:57:23 AM6/11/04

to

Dik,

> > If you were a government official, would YOU act on any of them? If
> > instead some corporation who pays part of your campaign expenses said
> > that a particular action was best, wouldn't you be inclined to take
> > their word over any of these mutually contradictory simulations?
>
> If a corporation pays part of my campaign expenses it better not tell
> me that a particular action is best, because if I follow it it would
> make me liable to prosecution.

Not in the US! Here, we operate on a vast system of legalized bribery.
THAT is why we invaded Iraq, why in the face of global warming our
government still gives tax breaks to the purchasers of Humvees and other
such gas guzzlers, why job outsourcing is actually ENCOURAGED by our
screwed up tax system, and why things will probably just keep getting
worse. When the next American Revolution comes, and if things just keep
getting worse like they are it WILL come, it will be over our system of
legalized bribery. I don't think it is impossible to have something like
the French Revolution here to kill them all!

> > What would it be worth to our and other governments to know what to do
> > to avoid stock market crashes? ... to know what to do about global
> > warming? To know what to do about the pending exhaustion of oil
> > supplies? The value of significance arithmetic to provide believable and
> > defendable answers to these questions may actually be in the TRILLIONS
> > of dollars!
>
> From what you wrote upto now, the only thing it provides is making some
> answers unbelievalbe and undefendable.

As significance deteriorates, you get a succession of answers, like:

1.23E3 1.2E3 1.E3 E3 Indefinite

If your program isn't too screwed up, you can believe what (little) you
get out of it. Somewhere on the way to Indefinite, people lose interest
in the output. This is a **LOT** better than producing, say, 2597.8657
instead of, say, E3 if the exponent is the only thing that is still
actually calculable.

> > Without significance, present programs are pretty
> > much worthless to guide governmental action.
>
> And what if the significance arithmetic just says that the answer is
> reliable? Has that more value to guide governmental action?

Yes. It certainly would be a LOT better if not for a 20 years of
IEEE-754 to destroy the perceived value of simulations. With
significance, different simulations should produce pretty much the SAME
results, or alternatively, provide the means to debug why they don't.
Retraining people to understand significance will definitely take a LOT
of time and effort.

The whole idea of IEEE-754 was to produce consistency in computed
results, but here it completely drops the ball. Two people can now write
simulators to simulate the same process and produce vastly different
results due to what should be insignificant differences in their programs.

Now, IEEE-754 has become SO pervasive there isn't even a good way to
demonstrate significance.

Steve Richfield

Herman Rubin

unread,

Jun 11, 2004, 10:41:26 AM6/11/04

to

In article <9907d786c6f5fc67...@news.teranews.com>,

Steve Richfie1d <St...@NOSPAM.smart-life.net> wrote:
>Herman,

>> As far as enhancements to operating systems, I think a
>> few "dehancements" might be a good idea. The current
>> operating systems have more glitz than real useful
>> features. Instead of attempting to get compilers
>> which will limit users to what the compiler writers
>> can come up with, and try to guess what is wanted in
>> the program, how about a "high level" assembler, which
>> will have types, overloaded operators, but with user
>> control and type override, macros which are easy to
>> write and use from the user's standpoint, etc.? From
>> what my computer friends tell me, this would not be
>> too difficult to produce. Optimization should be done
>> from this type of assembler programs, not trying to
>> force the user, and should often be interactive.

>There are some VERY good assemblers out there. So far, the best I've
>seen is IBM's mainframe assembler. I had a macro I often used that did
>dynamic register assignments, including the block assignments that some
>instructions needed!

The problem is not so much with the assemblers, but with
the assembler SYNTAX. In this respect, Seymour Cray's
assemblers seem to be about the best.

The early assemblers had to be adjusted to be easy for
the computer, as there was essentially no character
handling capability; the word was the usual unit.
The situation now is quite different, so assemblers
can use some features now limited to compilers. Also,
the assembler does not need to have the arguments in
the order they will appear in the machine instruction.

For example, take the hardware instruction x = y - z.
The current assembler for this is

sub(mod) a,b,c

where a,b,c is some permutation of x,y,z. What I would
like to see is, instead,

x{sx} ={t} y{sy} - z{sz}

where the optional t is used if the type of subtraction
is not the one corresponding to the type of x, and the
s fields are used if the arguments are to be read in a
different mode than the usual addressing mode. This
can at least be the case if the arguments are registers,
and there are short and long register parts.

With this approach, I was able to write almost all of
the CDC 205 vector instructions in one format, using
short symbolic methods for the types, operations, and
the 8 bit fields in the instruction modifier, in an
easily understood format; this included bit vector fields,
and the instructions could have six arguments.

BTW, the instruction description section for most hardware
is at least 10 times as long as necessary. And why are
register numbers, and instruction fields, expressed in
decimal? A field of "513" is "201" in hex, and this is
much more intelligible, and if there are register banks,
a decimal number is a mess. The same holds with respect
to addresses.

I know that you are calling for decimal arithmetic, and
why. But for the purposes of assembler programming,
binary is natural. Incidentally, why do the HLL's
insist that data in the program be input as decimal?
There are cases where precise binary is needed.

Macros also should be written in the above format; typing
the macro name should rarely be necessary. Assume that
an extra character typed is another chance for error.

>I've worked on several projects where most of the programming was in
>high-level macros. The most interesting was the FORTRAN supercomputer
>compiler at CDC. They used macros that, for example, you could specify a
>list of fields you needed extracted, and the macros would conjure up the
>OPTIMUM sequence to extract those fields.

This is as it should be. Many compilers claim that the
insertion of assembler instructions turns of optimization;
it should be the other way around. BTW, did you ever have
problems with optimizers doing what they should not? Try
having an optimizing compiler include an empty loop, or
try to have it recognize that separate calls to a random
number generator do not get the same result.

>I have an unbuilt design for a bidirectional macro assembler, that can
>go from source to object, and back. This transforms one-way
>assembler-programming into an interactive experience, without all of the
>complexity of current "visual" approaches. This should be capable of
>"assembling" (vs. "compiling") most present high-level languages. With
>this, you could stop a program, dump it out, carry it to an entirely
>different type of processor, load it up, and resume right where you left
>off. Now, to find someone who wants this enough to pay for me to build it!

It would be a nice exercise for a CS student. That is why
nobody is going to do it.

>>>Ignoring manufacturing (fabs are not cheap), publicity,
>>>documentation, marketing, and all those sorts of things,
>>>we're in the area of 200 person-years just to develop the
>>>thing. In the USA, at a conservative $150k per person
>>>per year (salaries plus burden, post dot-com), that
>>>is an investment of $30 million .. before the first product
>>>is out of the door. So you need a market which will
>>>return as profit a lot more than that (in order to fund the
>>>next step), just to cover the development costs. Is that
>>>huge? Maybe not. But it's not trivial.

>> Which is why you need lots of input from people who can
>> use these hardware features before just putting out what
>> satisfies the current poorly designed languages and
>> compilers.

>Which of course is why I am here!

>Steve Richfield

Tim Peters

unread,

Jun 11, 2004, 6:49:14 PM6/11/04

to

[Steve Richfield]
...

> What would it be worth to our and other governments to know what to do
> to avoid stock market crashes? ... to know what to do about global
> warming? To know what to do about the pending exhaustion of oil
> supplies? The value of significance arithmetic to provide believable
> and defendable answers to these questions may actually be in the
> TRILLIONS of dollars!

Do people working in these fields have the same complaints about current fp
arithmetic? It's possible, but I haven't seen it. What I do see is along
the lines of this, a Congressional Budget Office study of Social Security
long-term finances, using (what appear to me to be appropriate) Monte Carlo
techniques:

http://www.cbo.gov/showdoc.cfm?index=3235&sequence=0

That's a long paper, and while they have long discussions about
uncertainties in the inputs to the models, and the models themselves, they
don't complain that they're afraid to trust the arithmetic.

If you read long enough to get to the results, they're "what you expect":
the distribution of predicted outcomes spreads out over time. The remaining
sources of uncertainty are identifed as the model and its inputs.

Dik T. Winter

unread,

Jun 11, 2004, 7:16:45 PM6/11/04

to

In article <cacgam$3p...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> The early assemblers had to be adjusted to be easy for
> the computer, as there was essentially no character
> handling capability; the word was the usual unit.

But that is very early, say before 1968. The earliest assembler I encountered
was quite readable for the user.

> This is as it should be. Many compilers claim that the
> insertion of assembler instructions turns of optimization;
> it should be the other way around.

But that is because the assembler instructions can screw-up optimisation.

> BTW, did you ever have
> problems with optimizers doing what they should not?

Yup. On almost all systems I ever did work on.

> Try
> having an optimizing compiler include an empty loop, or
> try to have it recognize that separate calls to a random
> number generator do not get the same result.

Depends on the language. In C, srand() + srand() specifies two different
calls to the random number generator. In Fortran, RAND(0.) + RAND(0.)
specifies that the compiler may assume there is only a single call to
the random number generator (this is just part of the standard).

Dik T. Winter

unread,

Jun 11, 2004, 8:29:55 PM6/11/04

to

In article <9bKdnbYX1sa...@comcast.com> "Tim Peters" <tim...@comcast.net> writes:
...
> http://www.cbo.gov/showdoc.cfm?index=3235&sequence=0
...

> If you read long enough to get to the results, they're "what you expect":
> the distribution of predicted outcomes spreads out over time. The remaining
> sources of uncertainty are identifed as the model and its inputs.

That is quite common in economic predictions. But they predict quite well,
given the uncertainty. Most effort is in improving the model (which is not
clear-cut at all).

BTW, the same is true in weather prediction. Increasing of the number of
inputs, improving the model, and whatever. No complaint about f-p at all.

On the other hand, the stock-exchange market is for a large part not
predictable. There are too many external factors that can play a
very large role. Consider the shares of Ahold (a Dutch company).
When the income of the new executive officer was made public this
resulted in a boycott of the shops and a huge drop in share values.
They are still attempting to recover from it. Even significance
arithmetic would not have predicted that.

Steve Richfie1d

unread,

Jun 11, 2004, 11:37:53 PM6/11/04

to

Dik,

> In article <9bKdnbYX1sa...@comcast.com> "Tim Peters" <tim...@comcast.net> writes:
> ...
> > http://www.cbo.gov/showdoc.cfm?index=3235&sequence=0

The *BIG* problem with Social Security investments around the world is
that they don't understand self-hedging investments. To illustrate, if
they make an investment in, say, steel production and some other company
finds a cheaper way to do it, then the entire investment can be lost.
However, if they invest in retirement homes and geriatric care
facilities and someone finds a way to offer the services cheaper, then
they STILL have their paid-off facilities along with a captive
"customer" base. People, corporations, and governments should invest in
what they do, and not just "in the market".

> ...
> > If you read long enough to get to the results, they're "what you expect":
> > the distribution of predicted outcomes spreads out over time. The remaining
> > sources of uncertainty are identifed as the model and its inputs.
>
> That is quite common in economic predictions. But they predict quite well,
> given the uncertainty. Most effort is in improving the model (which is not
> clear-cut at all).
>
> BTW, the same is true in weather prediction. Increasing of the number of
> inputs, improving the model, and whatever. No complaint about f-p at all.

Weather seldom has the sorts of "volatility" that investments do, though
there ARE cases of competing weather systems, etc.

> On the other hand, the stock-exchange market is for a large part not
> predictable. There are too many external factors that can play a
> very large role. Consider the shares of Ahold (a Dutch company).
> When the income of the new executive officer was made public this
> resulted in a boycott of the shops and a huge drop in share values.
> They are still attempting to recover from it. Even significance
> arithmetic would not have predicted that.

No, significance arithmetic can't solve every problem, but what
significance arithmetic WOULD do in this case is spot when the actual
performance of Ahold went outside of the expected window established by
the significance computations. It's been my experience that people
aren't very good at judging when things go "out of control" in the
control-theory sense, and so significance arithmetic might recognize the
magnitude of such a problem a day or few earlier than a person would, so
you could beat the others out of the market.

Steve Richfield

Steve Richfie1d

unread,

Jun 11, 2004, 11:55:43 PM6/11/04

to

Tim,

Regarding...

> Do people working in these fields have the same complaints about current fp
> arithmetic? It's possible, but I haven't seen it.

and...

> they don't complain that they're afraid to trust the arithmetic.

You're right. These people are NOT computational theorists. They'll
gladly tell you all of the details about IEEE-754, but won't know about
ANY of its shortcomings, unless they read someone else's report about
them. For example, they probably have no present clue that all measures
have value, significance, and dimensionality.

The several present fields involving predictive simulation suffer from
common problems, the fundamentals of which appear to remain invisible to
those who wrestle with them on a daily basis. That these people can't
see the sources of their difficulties because they BELIEVE (in a
religious sense) in their present arithmetic does NOT mean that these
aren't the real sources of their difficulties.

In the case of long-term investments like Social Security, there are
enough problems over the long haul of boom and bust cycles that their
total effect can be estimated, even though the individual events that
influence their ups and downs are beyond estimation.

Significance arithmetic will not change ANY results - it will just tell
you which ones are reaching beyond the capability of the methodology
being used. This electronic finger-pointing should hopefully keep a lot
of erroneous simulation results out of the public and governmental view.
No, it's not perfect, but it is a **LOT** better than the present
situation where all erroneous results are available to 8 places for the
asking.

Steve Richfield

Nick Maclaren

unread,

Jun 12, 2004, 7:24:38 AM6/12/04

to

In article <Hz63B...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
>In article <cacgam$3p...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> > The early assemblers had to be adjusted to be easy for
> > the computer, as there was essentially no character
> > handling capability; the word was the usual unit.
>
>But that is very early, say before 1968. The earliest assembler I encountered
>was quite readable for the user.

Agreed. That was a difference between assembler and machine code;
the former was usually as readable as autocodes (if written by a
programmer with Clue).

> > This is as it should be. Many compilers claim that the
> > insertion of assembler instructions turns of optimization;
> > it should be the other way around.
>
>But that is because the assembler instructions can screw-up optimisation.

Some have done better, by specifying precisely what is allowed,
and even making it part of the language.

> > BTW, did you ever have
> > problems with optimizers doing what they should not?
>
>Yup. On almost all systems I ever did work on.

And I. On every one of the hundred or so optimising compilers for
the half dozen or so languages I have used heavily.

> > Try
> > having an optimizing compiler include an empty loop, or
> > try to have it recognize that separate calls to a random
> > number generator do not get the same result.
>
>Depends on the language. In C, srand() + srand() specifies two different
>calls to the random number generator. In Fortran, RAND(0.) + RAND(0.)
>specifies that the compiler may assume there is only a single call to
>the random number generator (this is just part of the standard).

It's not that simple :-( Both languages make a complete and utter
pig's ear of this, and both are undefined behaviour - assuming that
you mean rand, not srand, of course. C99 closes the minor loophole
in this regard, but leaves the main one open.

To people who think that they know C: find wording in either C90
or C99 that EXPLICITLY requires fred()+joe() to execute fred to
completion before starting joe or vice versa.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Jun 12, 2004, 7:32:41 AM6/12/04

to

In article <2204dc4b91715ea5...@news.teranews.com>,

Steve Richfie1d <St...@NOSPAM.smart-life.net> wrote:
>
>Significance arithmetic will not change ANY results - it will just tell
>you which ones are reaching beyond the capability of the methodology
>being used. This electronic finger-pointing should hopefully keep a lot
>of erroneous simulation results out of the public and governmental view.
>No, it's not perfect, but it is a **LOT** better than the present
>situation where all erroneous results are available to 8 places for the
>asking.

There are two reasons why that is very dubious:

Before worrying about significance flagging, of ANY type, it is
obviously necessary to get error detection right. IEEE 754 makes a
mess of it, the new draft is (if anything) worse, Fortran and C++
leave it almost completely undefined, and both Java and C90 require
many important numeric errors to be ignored (and the significance
restored to infinite). And C99 beggars description, it is so bad.

There are a lot of important methods that do VASTLY better than
significance flagging would indicate, and its use would deprecate
many good, reliable methods in favour of bad, unreliable ones that
have a lower apparent loss of significance.

Regards,
Nick Maclaren.

Dik T. Winter

unread,

Jun 12, 2004, 11:51:25 AM6/12/04

to

In article <55ed756d621b7d01...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> Dik,
>
> > In article <9bKdnbYX1sa...@comcast.com> "Tim Peters" <tim...@comcast.net> writes:
> > ...
> > > http://www.cbo.gov/showdoc.cfm?index=3235&sequence=0
>
> The *BIG* problem with Social Security investments around the world is
> that they don't understand self-hedging investments. To illustrate, if
> they make an investment in, say, steel production and some other company
> finds a cheaper way to do it, then the entire investment can be lost.
> However, if they invest in retirement homes and geriatric care
> facilities and someone finds a way to offer the services cheaper, then
> they STILL have their paid-off facilities along with a captive
> "customer" base. People, corporations, and governments should invest in
> what they do, and not just "in the market".

On the other hand, the social securities have too much money to invest in
retirement homes and geriatric care facilities. Consider the largest
old age pension institution in the Netherlands. It has to invest billions
of Euro's. And retirement homes and geriatric care facilities are paid
mostly from tax money. That is the case in the Netherlands. (BTW, most
of its money is invested in the US. If I understand it well, it is one
of the largest non-US investors in the US.)

> > BTW, the same is true in weather prediction. Increasing of the number of
> > inputs, improving the model, and whatever. No complaint about f-p at all.
>
> Weather seldom has the sorts of "volatility" that investments do, though
> there ARE cases of competing weather systems, etc.

There is still quite a bit of volatility in weather. For instance, nobody
is able to predict (not even an hour in advance) whether a tornado will
occur in the Netherlands. Nevertheless, they do occur. Also weather
forecasts that go beyond two or three days are getting increasingly
unreliable, as will be said by the weather bureaus's, and they may
even give margins. But as far as I know they do not use significance
arithmetic, but error analysis.

> > On the other hand, the stock-exchange market is for a large part not
> > predictable. There are too many external factors that can play a
> > very large role. Consider the shares of Ahold (a Dutch company).
> > When the income of the new executive officer was made public this
> > resulted in a boycott of the shops and a huge drop in share values.
> > They are still attempting to recover from it. Even significance
> > arithmetic would not have predicted that.
>
> No, significance arithmetic can't solve every problem, but what
> significance arithmetic WOULD do in this case is spot when the actual
> performance of Ahold went outside of the expected window established by
> the significance computations.

Would not have helped. Once the salary of the executive officer was
published, the next day the boycott started and share values started
to drop. In general the big swings on the stock market (and they are
the most important) are not predictable.

Dik T. Winter

unread,

Jun 12, 2004, 11:56:51 AM6/12/04

to

In article <caep5m$nad$1...@pegasus.csx.cam.ac.uk> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> In article <Hz63B...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:

...

> >Depends on the language. In C, srand() + srand() specifies two different
> >calls to the random number generator. In Fortran, RAND(0.) + RAND(0.)
> >specifies that the compiler may assume there is only a single call to
> >the random number generator (this is just part of the standard).
>
> It's not that simple :-( Both languages make a complete and utter
> pig's ear of this, and both are undefined behaviour - assuming that
> you mean rand, not srand, of course.

Eh? Fortran does not make a mess of this. It is clearly stated that
when you have two identical function calls, that the functions ought
to return the same value on both calls, so the compiler need only
make a single call and use the return value twice.

That C90 is not watertight is unfortunate, but in C90 it is clear that
there are two calls.

Fred J. Tydeman

unread,

Jun 12, 2004, 12:43:44 PM6/12/04

to

Nick Maclaren wrote:
>
> To people who think that they know C: find wording in either C90
> or C99 that EXPLICITLY requires fred()+joe() to execute fred to
> completion before starting joe or vice versa.

C99:

6.5.2.2 Function calls

10 The order of evaluation of the function designator, the actual
arguments, and subexpressions within the actual arguments is
unspecified, but there is a sequence point before the actual call.

12 EXAMPLE In the function call
(*pf[f1()]) (f2(), f3() + f4())
the functions f1, f2, f3, and f4 may be called in any order. All side
effects have to be completed before the function pointed to by
pf[f1()] is called.
---
Fred J. Tydeman Tydeman Consulting
tyd...@tybor.com Programming, testing, numerics
+1 (775) 287-5904 Vice-chair of J11 (ANSI "C")
Sample C99+FPCE tests: ftp://jump.net/pub/tybor/
Savers sleep well, investors eat well, spenders work forever.

Nick Maclaren

unread,

Jun 12, 2004, 4:55:02 PM6/12/04

to

In article <Hz7DM...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
>In article <caep5m$nad$1...@pegasus.csx.cam.ac.uk> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> > In article <Hz63B...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
>...
> > >Depends on the language. In C, srand() + srand() specifies two different
> > >calls to the random number generator. In Fortran, RAND(0.) + RAND(0.)
> > >specifies that the compiler may assume there is only a single call to
> > >the random number generator (this is just part of the standard).
> >
> > It's not that simple :-( Both languages make a complete and utter
> > pig's ear of this, and both are undefined behaviour - assuming that
> > you mean rand, not srand, of course.
>
>Eh? Fortran does not make a mess of this. It is clearly stated that
>when you have two identical function calls, that the functions ought
>to return the same value on both calls, so the compiler need only
>make a single call and use the return value twice.

I am afraid that you are wrong. Ever since Fortran 77, the standard
has specified that any global entity that becomes defined by a call
to a function becomes undefined after and undefined subset of such calls.
Therefore all calls to random number generators may leave the random
state undefined.

I wish it were not so :-(

>That C90 is not watertight is unfortunate, but in C90 it is clear that
>there are two calls.

Yes, it is. But it is NOT clear that they don't overlap.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Jun 12, 2004, 5:01:08 PM6/12/04

to

In article <40CB32C0...@tybor.com>,

Fred J. Tydeman <tyd...@tybor.com> wrote:
>Nick Maclaren wrote:
>>
>> To people who think that they know C: find wording in either C90
>> or C99 that EXPLICITLY requires fred()+joe() to execute fred to
>> completion before starting joe or vice versa.
>
>C99:
>
>6.5.2.2 Function calls
>
>10 The order of evaluation of the function designator, the actual
>arguments, and subexpressions within the actual arguments is
>unspecified, but there is a sequence point before the actual call.
>
>12 EXAMPLE In the function call
>(*pf[f1()]) (f2(), f3() + f4())
>the functions f1, f2, f3, and f4 may be called in any order. All side
>effects have to be completed before the function pointed to by
>pf[f1()] is called.

Sigh. Fred, you know better than that!

While there are sequence points before each call (and, in C99 but
not C90, after each call), there is no specified ordering of the
calls. There is therefore no reason that they should not overlap.

Note that the sequence point rules implicitly introduce the possibility
of overlapping, and that there is no obvious reason to prefer a Von
Neumann model over a functional model if one is relying on the
common conventions of the field.

Regards,
Nick Maclaren.

Herman Rubin

unread,

Jun 12, 2004, 5:11:40 PM6/12/04

to

In article <Hz63B...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:

>In article <cacgam$3p...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> > The early assemblers had to be adjusted to be easy for
> > the computer, as there was essentially no character
> > handling capability; the word was the usual unit.

>But that is very early, say before 1968. The earliest assembler I encountered
>was quite readable for the user.

Even the present assemblers are not that hard to read in
most cases; they are if there are a lot of tag fields.
But assembler instructions are hard to write, and the
manuals for both assembler instructions and hardware are
usually written with the assumption that each instruction
must be described separately, and that the reader does
not know enough mathematics to understand what the
instructions are doing. Probably about 10 pages could
handle the non-privileged instructions.

Intelligent use of symbols and short methods of writing
are important in the development of mathematics and
good use of it. Current assembler syntax is like having
to write plus(x,y) instead of x+y everywhere, etc. It
is even worse, one has to write pluslong or plusshort
or plusfloat or plusdouble. A well-designed assembler
language is almost as easy to write as any HLL, especially
if there is a good user macro setup.

> > This is as it should be. Many compilers claim that the
> > insertion of assembler instructions turns of optimization;
> > it should be the other way around.

>But that is because the assembler instructions can screw-up optimisation.

Usually bad optimization. It is not necessary to restrict
optimization to the assumptions of the compiler writer,
which is what we have now. Most compiler writers do not
know enough about what the mathematician want to do, and
it is not necessary for the compiler to guess.

> > BTW, did you ever have
> > problems with optimizers doing what they should not?

>Yup. On almost all systems I ever did work on.

Quite some time ago, I wrote a program intended for much use
in assembler. After Knuth's paper came out about Fortran
efficiency, I tried the main loop in Fortran, using Knuth's
definitions, and seeing what the local compilers would do.
The "ideal" compiler would achieve 45% efficiency; to do
better required keeping track of registers across blocks,
not allowed by Knuth. The non-optimizing compilers achieved
33%, and the optimizing ones 22%. Half of the weakness of
the non-optimizers was in one short part not using a tricky
concoction based on knowing exactly what some machine
instructions could do.

> > Try
> > having an optimizing compiler include an empty loop, or
> > try to have it recognize that separate calls to a random
> > number generator do not get the same result.

>Depends on the language. In C, srand() + srand() specifies two different
>calls to the random number generator. In Fortran, RAND(0.) + RAND(0.)
>specifies that the compiler may assume there is only a single call to
>the random number generator (this is just part of the standard).

Which is why the programmer needs to be able to decide.

Fred J. Tydeman

unread,

Jun 12, 2004, 6:41:44 PM6/12/04

to

Nick Maclaren wrote:
>
> While there are sequence points before each call (and, in C99 but
> not C90, after each call), there is no specified ordering of the
> calls. There is therefore no reason that they should not overlap.

6.8#2 Except as indicated, statements are executed in sequence.

If some of the statements in a first function are executed, and then
some statements in a second function are executed (eg, overlap), then
all of the statements in the first function were not executed in
sequence.

> Note that the sequence point rules implicitly introduce the possibility
> of overlapping, and that there is no obvious reason to prefer a Von
> Neumann model over a functional model if one is relying on the
> common conventions of the field.

Overlapping within a statement (between sequence points) is allowed.
Overlapping between statements (across a sequence point) is not
allowed.

Dik T. Winter

unread,

Jun 12, 2004, 8:08:24 PM6/12/04

to

In article <cafqj6$jlv$1...@pegasus.csx.cam.ac.uk> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> In article <Hz7DM...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
> >In article <caep5m$nad$1...@pegasus.csx.cam.ac.uk> nm...@cus.cam.ac.uk (Nick Maclaren) writes:

...

> > > It's not that simple :-( Both languages make a complete and utter
> > > pig's ear of this, and both are undefined behaviour - assuming that
> > > you mean rand, not srand, of course.
> >
> > Eh? Fortran does not make a mess of this. It is clearly stated that
> > when you have two identical function calls, that the functions ought
> > to return the same value on both calls, so the compiler need only
> > make a single call and use the return value twice.
>
> I am afraid that you are wrong. Ever since Fortran 77, the standard
> has specified that any global entity that becomes defined by a call
> to a function becomes undefined after and undefined subset of such calls.
> Therefore all calls to random number generators may leave the random
> state undefined.

But I did not say anything like that. Even since Fortran 66 (I think)
it is clear that functions are pure, i.e. can not have side-effects.
That is why a compiler may assume that two identical calls return the
same value. That is why random number generators should be subroutines,
not functions. (There are quite a few places where it is stated that
variables become undefined...)

I believe that also in the development of Pascal there was discussion
whether functions should be pure or not. And I am quite sure there was
a similar discussion with Algol 60.

Dik T. Winter

unread,

Jun 12, 2004, 8:35:22 PM6/12/04

to

In article <cafric$35...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> In article <Hz63B...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
> >In article <cacgam$3p...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> > > The early assemblers had to be adjusted to be easy for
> > > the computer, as there was essentially no character
> > > handling capability; the word was the usual unit.
>
> >But that is very early, say before 1968. The earliest assembler I encountered
> >was quite readable for the user.
>
> Even the present assemblers are not that hard to read in
> most cases; they are if there are a lot of tag fields.
> But assembler instructions are hard to write, and the
> manuals for both assembler instructions and hardware are
> usually written with the assumption that each instruction
> must be described separately, and that the reader does
> not know enough mathematics to understand what the
> instructions are doing. Probably about 10 pages could
> handle the non-privileged instructions.

The only assembler I found hard to write was for the 205. The 40+ other
machines for which I did write assembler were reasonably easy. Even the
compiler writers for the 205 apparently had problems with the machine.
Otherwise, why should the subtraction of two real arrays use a
different instruction than the subtraction of two complex arrays?
Why did it not use a single instruction for vector
A(I) = - B(I) - C(I)
? Instructions were complex and sometimes bizarre.

> Intelligent use of symbols and short methods of writing
> are important in the development of mathematics and
> good use of it. Current assembler syntax is like having
> to write plus(x,y) instead of x+y everywhere, etc.

In the first assembler I encountered, the addition of the G register to
the A register was simply:
A + G
(the machine had instructions with designators for two registers).

> > > This is as it should be. Many compilers claim that the
> > > insertion of assembler instructions turns of optimization;
> > > it should be the other way around.
>
> > But that is because the assembler instructions can screw-up optimisation.
>
> Usually bad optimization. It is not necessary to restrict
> optimization to the assumptions of the compiler writer,
> which is what we have now.

Compilers have (in general) no idea what the interspeced assembler
instructions do. They leave them untouched until the final phase
when the assembler comes in. By that time optimisation is already
done. On the other hand, I have seen optimising assemblers (MIPS)
that *did* screw-up some code. (And I have seen an assembler that
even did screw-up a particular instruction ;-).)

> > > BTW, did you ever have
> > > problems with optimizers doing what they should not?
>
> > Yup. On almost all systems I ever did work on.
>
> Quite some time ago, I wrote a program intended for much use
> in assembler.

I have a program that checks a number for primality (not probable,
but a real primality prover). It can be done in C, Fortran (it is
in both languages), and (to increase speed) in various levels of
assembler. It also checks vectorisation.

There are only very few of the 40+ system to which I ported it that
did so without problem. Mostly were optimisation errors in the
compilers. But porting to the Cray Y-MP was most interesting, it
showed a bug in the implementation of one of the instructions.
(The half precision multiply did not shift the result if it did
overflow. So 0.0 was returned rather than 1.0.)

Getting all these things correct is *hard*. It was, of course, a
pretty pathological case, but I ran into it when I was testing with
a Mersenne prime.

> After Knuth's paper came out about Fortran
> efficiency, I tried the main loop in Fortran, using Knuth's
> definitions, and seeing what the local compilers would do.
> The "ideal" compiler would achieve 45% efficiency; to do
> better required keeping track of registers across blocks,
> not allowed by Knuth. The non-optimizing compilers achieved
> 33%, and the optimizing ones 22%. Half of the weakness of
> the non-optimizers was in one short part not using a tricky
> concoction based on knowing exactly what some machine
> instructions could do.

I have once played with the CDC Algol 68 compiler. It could do
really amazing optimisations, but the source was tweaked so that
the optimisation level was quite low. I have tested what would
happen if the level was set higher (yes, I had access to the
source of the compiler). Really, it would tweak branches of
switches such that the same variables ended up in the same
registers. But it did take a horribly long time to get anything
compiled.

> > Depends on the language. In C, srand() + srand() specifies two different
> > calls to the random number generator. In Fortran, RAND(0.) + RAND(0.)
> > specifies that the compiler may assume there is only a single call to
> > the random number generator (this is just part of the standard).
>
> Which is why the programmer needs to be able to decide.

But you *can*.

Toby Thain

unread,

Jun 13, 2004, 4:06:24 AM6/13/04

to

hru...@odds.stat.purdue.edu (Herman Rubin) wrote in message news:<ca9qq6$3v...@odds.stat.purdue.edu>...
> ...

> CS students could produce a good multifont fixed-width
> editor in a few months; the Unicode people seem to be
> sitting on it, with the idea of producing a full-fledged
> typesetting language. I do NOT want a typeSETTER; I
> want a typeWRITER which will cut my time to produce
> mathematical type in half. This is cheap software I am
> asking for, not the current complicated stuff which
> costs much for deciding where to break lines, how to
> align expressions, etc., which I want to manage myself.

[OT] You don't like TeX?

Toby Thain

unread,

Jun 13, 2004, 4:29:49 AM6/13/04

to

nm...@cus.cam.ac.uk (Nick Maclaren) wrote in message news:<cafqj6$jlv$1...@pegasus.csx.cam.ac.uk>...
> ...

>
> >That C90 is not watertight is unfortunate, but in C90 it is clear that
> >there are two calls.
>
> Yes, it is. But it is NOT clear that they don't overlap.

Isn't a single thread assumed - which would imply sequential calls?
(in whatever order)

T

>
>
> Regards,
> Nick Maclaren.

Nick Maclaren

unread,

Jun 13, 2004, 5:40:47 AM6/13/04

to

In article <Hz80E...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
> >
> > I am afraid that you are wrong. Ever since Fortran 77, the standard
> > has specified that any global entity that becomes defined by a call
> > to a function becomes undefined after and undefined subset of such calls.
> > Therefore all calls to random number generators may leave the random
> > state undefined.
>
>But I did not say anything like that. Even since Fortran 66 (I think)
>it is clear that functions are pure, i.e. can not have side-effects.
>That is why a compiler may assume that two identical calls return the
>same value. That is why random number generators should be subroutines,
>not functions. (There are quite a few places where it is stated that
>variables become undefined...)

Sorry, but that is NOT true. Firstly, it was Fortran 77 that introduced
the constraint - in Fortran 66, there was no such condition. Secondly,
there are lots of other places in the standard where the converse is
implied. During the life of Fortran 77, it wasn't clear which of
the two parts of the standard should be fixed - it wasn't until
Fortran 90 that X3J3 came down clearly on the side of random number
functions not being conforming.

I designed the NAG random number generators (in 1972), and we had a
LONG debate over what the standard specified. Our eventual conclusion
(and that of pretty well everyone else) was that they WERE allowed,
but that there was no guarantee that two identical calls in the same
statement would result in two calls. We (and many other groups) had
similar debates during the late 1970s and early 1980s, and the
majority belief was that this constraint would be removed in the
next release of Fortran.

Fortran 200x has introduced the concept of PURE procedures, and I
pleaded for a clarification of this mess, but it was rejected on the
grounds that it was too much of a wormcan for X3J3 to open :-(

Does that or does that not accumulate to Fortran making a mess of
this area?

Regards,
Nick Maclaren.

glen herrmannsfeldt

unread,

Jun 13, 2004, 5:53:45 AM6/13/04

to

Nick Maclaren wrote:

(snip)

> Sorry, but that is NOT true. Firstly, it was Fortran 77 that introduced
> the constraint - in Fortran 66, there was no such condition.

(snip)

> I designed the NAG random number generators (in 1972), and we had a
> LONG debate over what the standard specified. Our eventual conclusion
> (and that of pretty well everyone else) was that they WERE allowed,
> but that there was no guarantee that two identical calls in the same
> statement would result in two calls.

(snip)

I might remember some from F66 days where the new seed was
explicitly resupplied each time, such that it would have to
work.

ISEED=12345;

DO 10 I=1,10
ISEED=RANDOM(ISEED);
10 WRITE(6,20) MOD(ISEED/997,6)+1
20 FORMAT(1X,I1)

This form would not allow more than one call in the
same statement, though I am not sure that optimizers
aren't allowed to work between statements.

-- glen

Nick Maclaren

unread,

Jun 13, 2004, 5:54:13 AM6/13/04

to

In article <40CB86A8...@tybor.com>,

Fred J. Tydeman <tyd...@tybor.com> wrote:

>Nick Maclaren wrote:
>>
>> While there are sequence points before each call (and, in C99 but
>> not C90, after each call), there is no specified ordering of the
>> calls. There is therefore no reason that they should not overlap.
>
>6.8#2 Except as indicated, statements are executed in sequence.
>
>If some of the statements in a first function are executed, and then
>some statements in a second function are executed (eg, overlap), then
>all of the statements in the first function were not executed in
>sequence.

That is wrong. They ARE executed in sequence - just not contiguously.
One of the main problems of the C standard is that it uses standard
terminology in specialist ways, but without defining the precise
meaning. This is one such place, where you and similar people seem
to believe that the standard mathematical concept "in sequence"
implies contiguity. Well, the problem is that it doesn't normally
do so, which introduced the ambiguity.

Don't you remember the long debates on the reflector over whether the
following is undefined behaviour?

(A,i=1,B)+(C,i=2,D)

In the following order, each expression is executed in sequence and
all sequence points are honoured, but the expression is undefined.

A B
<synchronise>
i=1 i=2
<synchronise>
C D

>> Note that the sequence point rules implicitly introduce the possibility
>> of overlapping, and that there is no obvious reason to prefer a Von
>> Neumann model over a functional model if one is relying on the
>> common conventions of the field.
>
>Overlapping within a statement (between sequence points) is allowed.
>Overlapping between statements (across a sequence point) is not
>allowed.

That is true. Now look at my example again, which is the exact
analogue of two function calls. In C90, the following was clearly
undefined, because there was no sequence point at the end of the
calls. This was closed by C99, and was the minor loophole that
I referred to, but the major one is the one above.

sin(-1.0)+exp(DBL_MAX)

Regards,
Nick Maclaren.

Dan Nagle

unread,

Jun 13, 2004, 8:13:14 AM6/13/04

to

Hello,

On 13 Jun 2004 09:40:47 GMT, nm...@cus.cam.ac.uk (Nick Maclaren) wrote:

<snip>

>Fortran 200x has introduced the concept of PURE procedures, and I
>pleaded for a clarification of this mess, but it was rejected on the
>grounds that it was too much of a wormcan for X3J3 to open :-(

Well, first the nitpicking: f95, not f03 introduced the HPF concept
of a pure procedure to standard Fortran.

The reason for not considering Nick's suggestion was that,
as the standard now reads, a PURE procedure may be assumed
to have no side effects, but other procedures clearly may have some.
The 'right' solution was thought to be to introduce the notion
of a 'VOLATILE' procedure, which would be required to be referenced
once per textual appearance. However, at the late date at which
the suggestion was made (during the public comment period), there was
considered to be too little time to do the job right. So J3
decided to delay consideration until f03++.

>Does that or does that not accumulate to Fortran making a mess of
>this area?

It may be a mess, but many working codes depend on whatever
the compiler in use is doing now. So to specify exactly how
to do things without providing a way to preserve existing behavior
is probably not the way to go.

If the volatile functions proposal passes (the planning stage
of f03++ is _way_ too early to even guess what it will have),
then a declaration of PURE will mean "no side effects- compiler
may elide 'redundant' references"; a declaration of VOLATILE
will mean "guaranteed side effects- compiler may not elide";
unspecified will mean "compiler may do what it's doing now
whatever that is".

And we hope that covers the bases.

>Regards,
>Nick Maclaren.

--
Cheers!

Dan Nagle
Purple Sage Computing Solutions, Inc.

Steve Richfie1d

unread,

Jun 13, 2004, 9:21:07 AM6/13/04

to

Regarding redundant pure procedure calls:

In observing this thread, I stopped to consider when/where I made lots
of redundant pure procedure calls that I would like combined. The only
case I could conjure up was my periodic use of multi-dimensional
subscripting functions to get past the lack of EQUIVALENCE statements in
many languages. An excellent example:

A Neural Network program written in Visual Basic (mostly because of its
easily switchable error handling between stop-on-all-exceptions and
compile-without-any-checking) needed to run the computationally
intensive core on singly-dimensioned arrays to avoid the overhead of
multi-dimensional arrays, whereas the I/O needed to refer to things
according to their true multi-dimensional structure. A simple function
would convert multiple subscripts to the single subscript needed to
access into a single-dimensioned array that really contained the
multiply-dimensioned information. With complex I/O logic, programming
was MUCH easier with many redundant calls to the subscripting routine,
often within a single complex statement. I doubt that VB was able to
optimize these into single calls, but speed wasn't really a critical
issue in the I/O.

However, Fortran DOES have EQUIVALENCE statements, so this technique
isn't needed there. Does anyone have a good FORTRAN example where
multiple redundant pure procedure calls that should be combined are
routinely made for some good reason?

Steve Richfield

Steve Richfie1d

unread,

Jun 13, 2004, 9:37:56 AM6/13/04

to

Dik,

> The only assembler I found hard to write was for the 205. The 40+ other
> machines for which I did write assembler were reasonably easy. Even the
> compiler writers for the 205 apparently had problems with the machine.
> Otherwise, why should the subtraction of two real arrays use a
> different instruction than the subtraction of two complex arrays?
> Why did it not use a single instruction for vector
> A(I) = - B(I) - C(I)
> ? Instructions were complex and sometimes bizarre.

To understand the 205, you must first understand the 201. The 201 had a
MUCH richer instruction set that was designed to facilitate COBOL and
other business programming. Unfortunately, this pushed their price too
high, so no one wanted to purchase them. Subsequently, the extra
instructions were stripped out. However, we down in the compiler
trenches always expected the extra instructions to return someday as the
price of electronics dropped and dropped, so the compiler was never
reconceptualized to operate without these instructions and their
numerous side-effects.

> Compilers have (in general) no idea what the interspeced assembler
> instructions do. They leave them untouched until the final phase
> when the assembler comes in. By that time optimisation is already
> done.

Most optimizations in most compilers are done on "parse trees" long
before there is any consideration of actually making instructions from
them. However, on the 205, once the instructions were tentatively issued
and in-line instructions were merged in, then the whole thing went to
instruction optimization and scheduling, which then squeezed what it
could, e.g. overlapping the in-line instructions with other compiled code.

Steve Richfield

Fred J. Tydeman

unread,

Jun 13, 2004, 11:34:57 AM6/13/04

to

Nick Maclaren wrote:
>
> >Nick Maclaren wrote:
> >>
> >> While there are sequence points before each call (and, in C99 but
> >> not C90, after each call), there is no specified ordering of the
> >> calls. There is therefore no reason that they should not overlap.
> >
> >6.8#2 Except as indicated, statements are executed in sequence.
> >
> >If some of the statements in a first function are executed, and then
> >some statements in a second function are executed (eg, overlap), then
> >all of the statements in the first function were not executed in
> >sequence.
>
> That is wrong. They ARE executed in sequence - just not contiguously.
> One of the main problems of the C standard is that it uses standard
> terminology in specialist ways, but without defining the precise
> meaning. This is one such place, where you and similar people seem
> to believe that the standard mathematical concept "in sequence"
> implies contiguity. Well, the problem is that it doesn't normally
> do so, which introduced the ambiguity.

C90, Defect Report (DR) 087 has as part of its response:
... function calls do not overlap.

Herman Rubin

unread,

Jun 13, 2004, 3:16:48 PM6/13/04

to

In article <Hz80E...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
>In article <cafqj6$jlv$1...@pegasus.csx.cam.ac.uk> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> > In article <Hz7DM...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
> > >In article <caep5m$nad$1...@pegasus.csx.cam.ac.uk> nm...@cus.cam.ac.uk (Nick Maclaren) writes:

.......................

> > I am afraid that you are wrong. Ever since Fortran 77, the standard
> > has specified that any global entity that becomes defined by a call
> > to a function becomes undefined after and undefined subset of such calls.
> > Therefore all calls to random number generators may leave the random
> > state undefined.

>But I did not say anything like that. Even since Fortran 66 (I think)
>it is clear that functions are pure, i.e. can not have side-effects.
>That is why a compiler may assume that two identical calls return the
>same value. That is why random number generators should be subroutines,
>not functions. (There are quite a few places where it is stated that
>variables become undefined...)

They should be either read from buffers or should be
OPEN subroutine calls. BTW. which of the current languages
even allow this quite important idea at this time? It is
often the case that a subroutine call can even take more
space in the program than the entire open subroutine, let
alone the time for the call and return.

Nick Maclaren

unread,

Jun 13, 2004, 3:28:03 PM6/13/04

to

In article <d6ce4a6c.04061...@posting.google.com>,

Toby Thain <to...@telegraphics.com.au> wrote:
>nm...@cus.cam.ac.uk (Nick Maclaren) wrote in message news:<cafqj6$jlv$1...@pegasus.csx.cam.ac.uk>...
>>

>> >That C90 is not watertight is unfortunate, but in C90 it is clear that
>> >there are two calls.
>>
>> Yes, it is. But it is NOT clear that they don't overlap.
>
>Isn't a single thread assumed - which would imply sequential calls?
>(in whatever order)

Yes, it is assumed - but it is not specified!

Furthermore, no, it doesn't imply that. I posted an interpretation
to the server that used a single thread and still had overlapping
code. I.e. each block of code (between sequence points) was atomic,
they were executed in order, but could be shuffled ad lib. while
maintaining the standard's promises.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Jun 13, 2004, 3:33:30 PM6/13/04

to

In article <49goc0l9joad5bevi...@4ax.com>,

Dan Nagle <dna...@erols.com> wrote:
>On 13 Jun 2004 09:40:47 GMT, nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
>
>>Fortran 200x has introduced the concept of PURE procedures, and I
>>pleaded for a clarification of this mess, but it was rejected on the
>>grounds that it was too much of a wormcan for X3J3 to open :-(
>
>Well, first the nitpicking: f95, not f03 introduced the HPF concept
>of a pure procedure to standard Fortran.

Thanks for the correction.

>The reason for not considering Nick's suggestion was that,
>as the standard now reads, a PURE procedure may be assumed
>to have no side effects, but other procedures clearly may have some.
>The 'right' solution was thought to be to introduce the notion
>of a 'VOLATILE' procedure, which would be required to be referenced
>once per textual appearance. However, at the late date at which
>the suggestion was made (during the public comment period), there was
>considered to be too little time to do the job right. So J3
>decided to delay consideration until f03++.

Hmm. I find the difference between that and "too much of a wormcan
for X3J3 to open" to be close to nitpicking ....

>>Does that or does that not accumulate to Fortran making a mess of
>>this area?
>
>It may be a mess, but many working codes depend on whatever
>the compiler in use is doing now. So to specify exactly how
>to do things without providing a way to preserve existing behavior
>is probably not the way to go.

If you remember, my comment was more-or-less "I don't know how to
sort out this mess, but PLEASE make it explicit!"

Yes, you are right. I am a little disappointed that nobody else
seems to have picked it up before, as it was a known and major
issue both when Fortran 77 and Fortran 90 came in.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Jun 13, 2004, 3:37:18 PM6/13/04

to

In article <40CC7420...@tybor.com>,

Fred J. Tydeman <tyd...@tybor.com> wrote:
>

>C90, Defect Report (DR) 087 has as part of its response:
>... function calls do not overlap.

As I understand, ISO rules state that a new version of a standard
cancels all Defect Reports on the previous one, as they are assumed
to have been merged into the new standard. This one has not been.

The claim of some people that all DRs carry through was shown to
be false by someone on the BSI panel who provided examples of one
that clearly was not.

So far, the BSI's request for a list of DRs that are deemed to have
been carried through, with a statement that the rest have been
cancelled, has fallen on deaf ears.

Regards,
Nick Maclaren.

Herman Rubin

unread,

Jun 13, 2004, 3:50:31 PM6/13/04

to

In article <Hz81M...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
>In article <cafric$35...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> > In article <Hz63B...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:
> > >In article <cacgam$3p...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> > > > The early assemblers had to be adjusted to be easy for
> > > > the computer, as there was essentially no character
> > > > handling capability; the word was the usual unit.

> > >But that is very early, say before 1968. The earliest assembler I encountered
> > >was quite readable for the user.

> > Even the present assemblers are not that hard to read in
> > most cases; they are if there are a lot of tag fields.
> > But assembler instructions are hard to write, and the
> > manuals for both assembler instructions and hardware are
> > usually written with the assumption that each instruction
> > must be described separately, and that the reader does
> > not know enough mathematics to understand what the
> > instructions are doing. Probably about 10 pages could
> > handle the non-privileged instructions.

>The only assembler I found hard to write was for the 205.

The CALLQ8 instructions for the Fortran compiler were
easier for me than the assembler instructions, even with
the need to compute the hexadecimal for the eight bits in
the G field. The tag field mnemonics were not what I would
have used.

But the problem with writing in assembler is keeping track
of the mnemonics and the order of the arguments; the syntax
is still "making it easier for the compiler". Some of the
other syntactical problems were also difficult, like having
variable names different in the assembler instructions and
the HLL instructions; this might be why the compilers have
problems with assembler instructions. Another one is the
use of decimal for register labels; it is rare that one
has to do arithmetic on them, and their structure is well
suited for hex, occasionally octal. The number of registers
is, in most machines, a power of 2, or if not, a small
integer times such a power. In any case, one should try
to design syntax, both for languages and assemblers, to
come close to minimizing the number of characters which
need to be typed.

The 40+ other
>machines for which I did write assembler were reasonably easy. Even the
>compiler writers for the 205 apparently had problems with the machine.

I am not surprised; I do not expect compiler writers to
really handle what intelligent users can do. It seems
that they are trying to make things foolproof, which
usually means that only a fool can follow their reasoning.
But as Einstein said, Nature seems winning in producing
better fools.

>Otherwise, why should the subtraction of two real arrays use a
>different instruction than the subtraction of two complex arrays?
>Why did it not use a single instruction for vector
> A(I) = - B(I) - C(I)
>? Instructions were complex and sometimes bizarre.

> > Intelligent use of symbols and short methods of writing
> > are important in the development of mathematics and
> > good use of it. Current assembler syntax is like having
> > to write plus(x,y) instead of x+y everywhere, etc.

>In the first assembler I encountered, the addition of the G register to
>the A register was simply:
> A + G
>(the machine had instructions with designators for two registers).

With the advent of more triple address machines, and with
registers not being where the arithmetic is performed, but
a small usually separate memory with some special properties,
this is no longer a good idea.

> > > > This is as it should be. Many compilers claim that the
> > > > insertion of assembler instructions turns of optimization;
> > > > it should be the other way around.

> > > But that is because the assembler instructions can screw-up optimisation.

> > Usually bad optimization. It is not necessary to restrict
> > optimization to the assumptions of the compiler writer,
> > which is what we have now.

>Compilers have (in general) no idea what the interspeced assembler
>instructions do.

Then they are poorly designed compilers.

They leave them untouched until the final phase
>when the assembler comes in. By that time optimisation is already
>done. On the other hand, I have seen optimising assemblers (MIPS)
>that *did* screw-up some code. (And I have seen an assembler that
>even did screw-up a particular instruction ;-).)

To quote (from memory) from the assembler manual for the
VAX, which is the first machine owned by the Statistics
Department,

This manual is for compiler writers and assembler
maintainers only. It is not intended for users.

It is this attitude which is the problem. The arrogance
to assume that the ideas of the language designer and the
compiler writer give the best way to carry out a user's
program is the real problem. This then ends up by telling
hardware designers that only those operations are important.

> > > > BTW, did you ever have
> > > > problems with optimizers doing what they should not?

> > > Yup. On almost all systems I ever did work on.

> > Quite some time ago, I wrote a program intended for much use
> > in assembler.

>I have a program that checks a number for primality (not probable,
>but a real primality prover). It can be done in C, Fortran (it is
>in both languages), and (to increase speed) in various levels of
>assembler. It also checks vectorisation.

>There are only very few of the 40+ system to which I ported it that
>did so without problem. Mostly were optimisation errors in the
>compilers. But porting to the Cray Y-MP was most interesting, it
>showed a bug in the implementation of one of the instructions.
>(The half precision multiply did not shift the result if it did
>overflow. So 0.0 was returned rather than 1.0.)

That is what happens if one does not have double length
multiply in the first place. But few languages had
this originally; it is a place where a lack of a simple
need which should have been noticed on day 2 of language
design; the need for having a list of items on the left
of the equal sign. This is NOT the same as a struct;
there is no statement about what goes where, and the
types can be arbitrary. Of course, it can be done by
a subroutine call, but this has all the previously named
problems, and who would recognize a subroutine call as
giving the results of a single hardware instruction?

>But you *can*.

You can, but it is clumsy.

Herman Rubin

unread,

Jun 13, 2004, 4:08:30 PM6/13/04

to

In article <d6ce4a6c.04061...@posting.google.com>,
Toby Thain <to...@telegraphics.com.au> wrote:

>hru...@odds.stat.purdue.edu (Herman Rubin) wrote in message news:<ca9qq6$3v...@odds.stat.purdue.edu>...
>> ...
>> CS students could produce a good multifont fixed-width
>> editor in a few months; the Unicode people seem to be
>> sitting on it, with the idea of producing a full-fledged
>> typesetting language. I do NOT want a typeSETTER; I
>> want a typeWRITER which will cut my time to produce
>> mathematical type in half. This is cheap software I am
>> asking for, not the current complicated stuff which
>> costs much for deciding where to break lines, how to
>> align expressions, etc., which I want to manage myself.

>[OT] You don't like TeX?

Very definitely NOT. I use it, but it can be hellish to
find an error with the horribly restrictive syntax. Knuth
did not even consider that all of the escape characters
were currently used for many mathematical purposes at the
time TeX was created. The only ASCII characters which I
myself have not used for mathematical purposes are

` @ $ % _ " ?

>> What we have now is like requiring the whole deluxe
>> package to get a windshield wiper.

Dik T. Winter

unread,

Jun 13, 2004, 8:01:45 PM6/13/04

to

In article <03bd84818585d081...@news.teranews.com> St...@NOSPAM.smart-life.net writes:
> Dik,
>
> > The only assembler I found hard to write was for the 205. The 40+ other
> > machines for which I did write assembler were reasonably easy. Even the
> > compiler writers for the 205 apparently had problems with the machine.
> > Otherwise, why should the subtraction of two real arrays use a
> > different instruction than the subtraction of two complex arrays?
> > Why did it not use a single instruction for vector
> > A(I) = - B(I) - C(I)
> > ? Instructions were complex and sometimes bizarre.
>
> To understand the 205, you must first understand the 201.

The 201? I think you either mean the STAR-100 or the 203.

> The 201 had a
> MUCH richer instruction set that was designed to facilitate COBOL and
> other business programming.

Indeed, string instructions, and if I remember right, also decimal string
instructions.

> Unfortunately, this pushed their price too
> high, so no one wanted to purchase them. Subsequently, the extra
> instructions were stripped out. However, we down in the compiler
> trenches always expected the extra instructions to return someday as the
> price of electronics dropped and dropped, so the compiler was never
> reconceptualized to operate without these instructions and their
> numerous side-effects.

I have no idea what this has to do with a real vector subtract using a
completely different instruction from a complex vector subtract. Also
not why A(I) = - B(I) - C(I) used two instructions rather than one.
(Of the first two one used the vector subtract instruction, the other
used the vector add instruction with sign control. In some pathological
cases the results could be different. The third operation could be
done with a vector subtract with sign control.)

> > Compilers have (in general) no idea what the interspeced assembler
> > instructions do. They leave them untouched until the final phase
> > when the assembler comes in. By that time optimisation is already
> > done.
>
> Most optimizations in most compilers are done on "parse trees" long
> before there is any consideration of actually making instructions from
> them.

Yup, that is why interspeced assembler code inhibits optimisation across
that code.

> However, on the 205, once the instructions were tentatively issued
> and in-line instructions were merged in, then the whole thing went to
> instruction optimization and scheduling, which then squeezed what it
> could, e.g. overlapping the in-line instructions with other compiled code.

That is pretty standard with optimising assemblers too (see the MIPS
assembler). But in general these optimisation do not go beyond
peep-hole optimisation.

Dik T. Winter

unread,

Jun 13, 2004, 8:04:27 PM6/13/04

to

In article <cai970$10...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
...

> >But I did not say anything like that. Even since Fortran 66 (I think)
> >it is clear that functions are pure, i.e. can not have side-effects.
> >That is why a compiler may assume that two identical calls return the
> >same value. That is why random number generators should be subroutines,
> >not functions. (There are quite a few places where it is stated that
> >variables become undefined...)
>
> They should be either read from buffers or should be
> OPEN subroutine calls. BTW. which of the current languages
> even allow this quite important idea at this time?

Almost all Algol based languages. Algol 60, Algol 68, Pascal, C, ...

Dik T. Winter

unread,

Jun 13, 2004, 8:28:07 PM6/13/04

to

In article <caib67$6f...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> In article <Hz81M...@cwi.nl>, Dik T. Winter <Dik.W...@cwi.nl> wrote:

...

> >In the first assembler I encountered, the addition of the G register to
> >the A register was simply:
> > A + G
> >(the machine had instructions with designators for two registers).
>
> With the advent of more triple address machines,

IX1 X2+X3
makes perfect sense to me. More so than
add x1,x2,x3
with its permutations, depending on assembler and architecture
(cf. the AT&T assembler for the i386, on the other hand, it made
sense for AT&T.)

> and with
> registers not being where the arithmetic is performed, but
> a small usually separate memory with some special properties,
> this is no longer a good idea.

I have no idea what kind of machines you are pointing at. I know
there are such machines, but they are scarce.

> > Compilers have (in general) no idea what the interspeced assembler
> > instructions do.
>
> Then they are poorly designed compilers.

Nope. A compiler should have *no* idea about the assembler instructions
at all. It will generate them only at the final end. It can be
problematical to know everything about every assembler instruction that
may be present. Moreover, different processor generations have
different instruction sets or properties, and redoing compiler knowledge
for each and every processor is a beast.

> They leave them untouched until the final phase
> >when the assembler comes in. By that time optimisation is already
> >done. On the other hand, I have seen optimising assemblers (MIPS)
> >that *did* screw-up some code. (And I have seen an assembler that
> >even did screw-up a particular instruction ;-).)
>
> To quote (from memory) from the assembler manual for the
> VAX, which is the first machine owned by the Statistics
> Department,
>
> This manual is for compiler writers and assembler
> maintainers only. It is not intended for users.

Yes, good old 'as(1)'. Only Unix assembler manuals had that quote,
and it was even false in the old days. Kernel writers had also use
for it.

> It is this attitude which is the problem. The arrogance
> to assume that the ideas of the language designer and the
> compiler writer give the best way to carry out a user's
> program is the real problem. This then ends up by telling
> hardware designers that only those operations are important.

How wrong you are. On nearly *every* processor there are instructions
that *no* compiler will generate.

> > There are only very few of the 40+ system to which I ported it that
> > did so without problem. Mostly were optimisation errors in the
> > compilers. But porting to the Cray Y-MP was most interesting, it
> > showed a bug in the implementation of one of the instructions.
> > (The half precision multiply did not shift the result if it did
> > overflow. So 0.0 was returned rather than 1.0.)
>
> That is what happens if one does not have double length
> multiply in the first place.

Wrong again. I did *not* use that instruction because of the lack of
double length multiply. And I think that for the purposes for which it
was intended, it did satisfy the needs. (The sequence in the assembler
manual showing how to use the instruction would not result in wrong
results. I just put it to a different use, depending on the
description.)

> But few languages had
> this originally; it is a place where a lack of a simple
> need which should have been noticed on day 2 of language
> design; the need for having a list of items on the left
> of the equal sign. This is NOT the same as a struct;
> there is no statement about what goes where, and the
> types can be arbitrary. Of course, it can be done by
> a subroutine call, but this has all the previously named
> problems, and who would recognize a subroutine call as
> giving the results of a single hardware instruction?

Herman, meet META. META, meet Herman. But apparently nobody
noted that need, so I do not know how strong that need actually
was. (BTW, META was from Xerox PARC and strongly based on
Algol 68.)

Nick Maclaren

unread,

Jun 14, 2004, 3:10:55 AM6/14/04

to

In article <Hz9uv...@cwi.nl>,

"Dik T. Winter" <Dik.W...@cwi.nl> writes:
|> In article <cai970$10...@odds.stat.purdue.edu> hru...@odds.stat.purdue.edu (Herman Rubin) writes:
|> ...
|> > >But I did not say anything like that. Even since Fortran 66 (I think)
|> > >it is clear that functions are pure, i.e. can not have side-effects.
|> > >That is why a compiler may assume that two identical calls return the
|> > >same value. That is why random number generators should be subroutines,
|> > >not functions. (There are quite a few places where it is stated that
|> > >variables become undefined...)
|> >
|> > They should be either read from buffers or should be
|> > OPEN subroutine calls. BTW. which of the current languages
|> > even allow this quite important idea at this time?
|>
|> Almost all Algol based languages. Algol 60, Algol 68, Pascal, C, ...

C is not an Algol-based language, any more than PL/I is. It is
CPL-based.

I am a little lost about which idea we are discussing.

Regards,
Nick Maclaren.