Index register and accumulator

legi...@gmail.com

unread,

Feb 18, 2013, 10:52:57 AM2/18/13

to

Hello,

I am learning about index registers and accumulators. I was wondering if someone could express the difference between these terms in a metaphor that I might understand.

Thanks

MitchAlsup

unread,

Feb 18, 2013, 12:24:47 PM2/18/13

to

An index register helps form memory addresses
An accumulator helps with the calculations

Stephen Fuld

unread,

Feb 18, 2013, 12:26:39 PM2/18/13

to

On 2/18/2013 7:52 AM, legi...@gmail.com wrote:
> Hello,
>
> I am learning about index registers and accumulators. I was wondering if someone could express the difference between these terms in a metaphor that I might understand.

Think about adding up a column of one digit numbers. The accumulator
holds the values so far, the index register tells you how far down the
column you have gotten.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Andy (Super) Glew

unread,

Feb 18, 2013, 9:15:08 PM2/18/13

to

On 2/18/2013 9:26 AM, Stephen Fuld wrote:
> On 2/18/2013 7:52 AM, legi...@gmail.com wrote:
>> Hello,
>>
>> I am learning about index registers and accumulators. I was wondering
>> if someone could express the difference between these terms in a
>> metaphor that I might understand.
>
>
> Think about adding up a column of one digit numbers. The accumulator
> holds the values so far, the index register tells you how far down the
> column you have gotten.

I like this metaphor, Steven. It may well be more than a metaphor,
historically accurate.

Let me pile on with:

Stuff already written:
* http://semipublic.comp-arch.net/wiki/Accumulators
* http://semipublic.comp-arch.net/wiki/Accumulator_ISA_vs_Uarch
** talking about src/dst and extra width...

Stuff that I need to write:
* http://semipublic.comp-arch.net/wiki/Special_Purpose_Registers

Starting a list for the last:

* [[Accumulators]]
* [[Index Registers]]

* [[Hi/Lo Registers]]
** [[Multiply Hi/Lo Registers]]
** [[Divide Hi/Lo Registers]]

* [[Shift Count]]

* [[REP Count]]

* [[X and Y registers]]

Inviting additions - what are your favorite examples of special purpose
registers?

--
The content of this message is my personal opinion only. Although I am
an employee (currently of MIPS Technologies; in the past of companies
such as Intellectual Ventures and QIPS, Intel, AMD, Motorola, and
Gould), I reveal this only so that the reader may account for any
possible bias I may have towards my employer's products. The statements
I make here in no way represent my employers' positions on the issue,
nor am I authorized to speak on behalf of my employers, past or present.

Ivan Godard

unread,

Feb 18, 2013, 11:10:54 PM2/18/13

to

Here's few from the Mill I can talk about (there are no computation
registers):

cLwbReg = 10, // code region lower bound
coreNumReg = 11, // ordinal of core on chip
cpReg = 12, // code pointer
cppReg = 13, // constant pool pointer
cUpbReg = 14, // code region upper bound
cycReg = 15, // cycle counter (issue or not)
dLwbReg = 16, // static data region lower bound
dpReg = 17, // static data pointer
dUpbReg = 18, // static data region upper bound
entryReg = 19, // entry address of current ebb
faultReg = 21, // fault vector base
floatReg = 22, // floating-point state
fpReg = 24, // frame pointer
funcReg = 26, // function entry address
inpReg = 27, // inbound argument pointer
issueReg = 28, // instructions issued counter
noCacheLwbReg = 30, // MMIO region lower bound
noCacheUpbReg = 31, // MMIO region upper bound
processReg = 32, // process ID
rtcReg = 33, // real time clock
runReg = 34, // start-stop control
spReg = 40, // stack pointer, and stack region
// upper
stepReg = 42, // issue cycles within current ebb
threadReg = 43, // thread ID
tpReg = 44, // task (thread) pointer
trapReg = 45, // trap vector base

Joe Pfeiffer

unread,

Feb 19, 2013, 1:02:13 AM2/19/13

to

Stephen Fuld <SF...@alumni.cmu.edu.invalid> writes:

> On 2/18/2013 7:52 AM, legi...@gmail.com wrote:
>> Hello,
>>
>> I am learning about index registers and accumulators. I was wondering if someone could express the difference between these terms in a metaphor that I might understand.
>
>
> Think about adding up a column of one digit numbers. The accumulator
> holds the values so far, the index register tells you how far down the
> column you have gotten.

Very nicely described indeed. It should be mentioned that in some
architectures the accumulators and index registers may be separate,
while in others there might be just one set of registers, and the
difference is in how you use them.

Quadibloc

unread,

Feb 19, 2013, 2:23:16 AM2/19/13

to

On Feb 18, 8:52 am, legitb...@gmail.com wrote:
> I am learning about index registers and accumulators. I was wondering if someone could express the difference between these terms in a metaphor that I might understand.

Well, in an older computer, the accumulator corresponded to the
display in a pocket calculator. So, usually, when the computer did any
arithmetic, the accumulator was where it put the result.

Some computers also had a "multiplier quotient register", which
corresponded exactly to a counter register in a mechanical adding
machine for doing multiplication and division.

An index register was added to computers later on, as an innovation
that avoided the need to modify the address part of an instruction,
and then execute it, in order to make programs that performed the same
operation on different locations in memory, such as when referencing
arrays. Instead, an instruction could be marked as "indexed", and then
the contents of the index register would be added to the address
before use.

But this is probably what had already been said to you, which you are
having trouble understanding.

John Savard

nm...@cam.ac.uk

unread,

Feb 19, 2013, 2:48:19 AM2/19/13

to

In article <51b9b85e-39de-4c4a...@j9g2000vbz.googlegroups.com>,
Quadibloc <jsa...@ecn.ab.ca> wrote:
>On Feb 18, 8:52=A0am, legitb...@gmail.com wrote:
>> I am learning about index registers and accumulators. I was wondering if =
>someone could express the difference between these terms in a metaphor that=

> I might understand.
>
>Well, in an older computer, the accumulator corresponded to the
>display in a pocket calculator. So, usually, when the computer did any
>arithmetic, the accumulator was where it put the result.

Sometimes. The accumulator was often the ONLY 'register' and
was also used for indexing.

Regards,
Nick Maclaren.

Paul A. Clayton

unread,

Feb 19, 2013, 9:02:03 AM2/19/13

to

On Monday, February 18, 2013 9:15:08 PM UTC-5, Andy (Super) Glew wrote:
[snip]

> Stuff that I need to write:
> * http://semipublic.comp-arch.net/wiki/Special_Purpose_Registers
>
> Starting a list for the last:
>
>
> * [[Accumulators]]
> * [[Index Registers]]
>
> * [[Hi/Lo Registers]]
> ** [[Multiply Hi/Lo Registers]]
> ** [[Divide Hi/Lo Registers]]
>
> * [[Shift Count]]
>
> * [[REP Count]]
>
> * [[X and Y registers]]
>
> Inviting additions - what are your favorite examples of special purpose
> registers?

How could you leave out link register and
stack pointer register? :-) (These might
be true SPRs or dedicated GPRs with special
instructions. The zero register is a
SPR mapped to GPR space, though it is read-
only [unless used as a single-use value
store that returns to zero after each use
:-)].)

(Other SPRs mapped into the GPR space might
include OS temporary/scratch registers which
can be overwritten by interrupt handlers,
but that is an ABI matter, so I would not
count them as SPRs. For MIPS, the SPR
[Stack Pointer register] is not an SPR :-),
but the Link Register could be considered
such since JAL and JALR both implicitly use
the LR.)

Register frames might not count even if they
can be swapped in and out flexibly.

If a register stack save area pointer allowed
non-privileged writes (requiring privileged
use to use non-privileged stores or load a
trusted value [possibly only if a modified
status bit is set]--or be shadowed), it
might be considered an SPR (though the benefit
of non-privileged writes seems small).

You are probably not looking for configuration
and status registers, but it is not clear
how these would be distinguished from 'ordinary'
SPRs (perhaps by requiring privilege for
explicit--not side-effect--writes?). Is a
flags register a status register (written as a
side effect of operations but perhaps in some
ISAs otherwise not writable by unprivileged
code)? (The Current Instruction Pointer is
probably my favorite status register. :-) )

Presumably you are also distinguishing type
registers (like FP, predicate/condition,
address+data, branch target [Itanium had
multiple BRs]) from SPRs (perhaps by
singularity? [but then Power's condition
registers might count as SPR--or a single
32-bit SPR??--since, apart from non-SIMD
compares, which CR is used depends on the
type of instruction]).

If a GPR is used as a Global/TLS pointer
register but only as a hint for a Knapsack
Cache, is it a SPR?

SPARC64 VIIIfx has the Extended Arithmetic
Register to hold prefixes for following
instructions. (It is claimed to be a
non-privileged register, but it is not clear
if it can be written--apart from instruction
fetch--by unprivileged code. I do not know
how it is handled for exceptions.)

Instruction registers have been proposed
(along with execute-register), but unless
such is singular it might be counted as a
different type register rather than a SPR.

timca...@aol.com

unread,

Feb 20, 2013, 5:00:00 PM2/20/13

to nm...@cam.ac.uk

On Tuesday, February 19, 2013 2:48:19 AM UTC-5, nm...@cam.ac.uk wrote:
> Sometimes. The accumulator was often the ONLY 'register' and
> was also used for indexing.

And in other systems there was no index register, some kind
of indirect addressing was used for that purpose. I think
Whirlwind worked this way. Lots of spin-offs from that
architecture (PDP-8, HP 1000, CDC PPs on the 6000s).

- Tim

Quadibloc

unread,

Feb 20, 2013, 10:25:16 PM2/20/13

to

On Feb 20, 3:00 pm, timcaff...@aol.com wrote:

> And in other systems there was no index register, some kind
> of indirect addressing was used for that purpose. I think
> Whirlwind worked this way. Lots of spin-offs from that
> architecture (PDP-8, HP 1000, CDC PPs on the 6000s).

I tend to think of the PDP-8 and many other machines as spin-offs from
the IBM 704 instead. Yes, it had an index register - but it also had
indirect addressing.

John Savard

ken...@cix.compulink.co.uk

unread,

Feb 21, 2013, 6:41:19 AM2/21/13

to

In article <b881e9a2-1040-422f...@googlegroups.com>,

timca...@aol.com () wrote:

> And in other systems there was no index register,

When the 8080 was designed it added direct addressing, the 8008 could
only address memory via the HL register. The Z80 added two index
registers. Off curse a lot also depended on the instruction set. Someone
has claimed that the index register on the 6809 was far more useful than
the Z80 ones due to additional addressing modes.

Ken Young

timca...@aol.com

unread,

Feb 21, 2013, 12:57:22 PM2/21/13

to

I believe a) Whirlwind predated the 704, and b) Ken Olsen worked on
Whirlwind (as a graduate student, I think).

- Tim

Tom Gardner

unread,

Feb 21, 2013, 6:35:41 PM2/21/13

to

Even the 6800's index register was better than the Z80s! I remember
trying to code a doubly linked list, and finding that it was faster
and used fewer bytes to use the HL register to point to the nodes.
And you couldn't push the IX register onto the stack except bytewise
via another register!

The Z80 had an easier hardware interface than the 6800, which
appealed to the hardware engineers that built computers. The 6809
rectified that but was too late in the marketplace.

bert

unread,

Feb 23, 2013, 4:45:10 AM2/23/13

to an...@spam.comp-arch.net

On Tuesday, February 19, 2013 3:15:08 AM UTC+1, Andy (Super) Glew wrote:
> Inviting additions - what are your favorite examples
> of special purpose registers?

The 'excess constants' register on the LEO 3. Loading it with
hex 66666 converted the arithmetic to packed decimal. Other
suitable values converted it to UK pounds, shillings and pence;
hours, minutes and seconds; or tons, hundredweights and stones.
--

Ivan Godard

unread,

Feb 23, 2013, 5:22:37 AM2/23/13

to

Oh really! Can you link to an engineering description - Google is
getting my only the business/historical story, nothing useful technically.

Ivan

Terje Mathisen

unread,

Feb 23, 2013, 8:01:03 AM2/23/13

to

Seems rather obvious:

The 'excess constant' was added to the results of the regular add, with
the per-nybble carries generated by this operation instead of just the
A+B+Incoming_Carry terms.

I.e. each nybble adder would consist of two parts, one to generate the
regular result and the other would merge in the excess part to generate
any outgoing carry.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Ivan Godard

unread,

Feb 23, 2013, 12:43:34 PM2/23/13

to

Not obvious to me.

The well-known 6666 technique for packed decimal puts one digit in each
four-bit nibble, and adds 6 to generate the carry as you say. One could
do the same for h/m/s (for example) with a low order nibble size of 6
bits and adding 4 for the carry. And the nibble is 4 bits for l/s/d with
an added 4 in the low order, and 5 bits with an added 12 in the next
nibble. As you say, it's obvious how to do any one of those.

However, I don't see how a single piece of hardware does *all* of them
because not only does the constant change, the nibble size does too.
Hence which bit of the adder is to supply the carry differs among the
cases. I suppose you could mux to select the right bit, but that's more
than just picking a constant as you implied was the case.

Or you could select the widest nibble of all interesting cases (6 bits
in your examples) and adjust the constant accordingly. However, that
makes the packed decimal constant be 54-54-54, not 6-6-6 as you gave;
such a design wastes a lot of adder bits; and you need circuitry to
explode 4-bit packed decimal up to 6-bit adder.

There are probably other ways it could be done too, especially if the
adder was bit sequential rather than bit parallel, as seems likely for
the day. I'd like to know how they *actually* did it, so if you can find
a tech doc source please cite it.

Ivan

Terje Mathisen

unread,

Feb 23, 2013, 2:24:26 PM2/23/13

to

Ivan Godard wrote:
> On 2/23/2013 5:01 AM, Terje Mathisen wrote:
>> Seems rather obvious:
>>
>> The 'excess constant' was added to the results of the regular add, with
>> the per-nybble carries generated by this operation instead of just the
>> A+B+Incoming_Carry terms.
>>
>> I.e. each nybble adder would consist of two parts, one to generate the
>> regular result and the other would merge in the excess part to generate
>> any outgoing carry.
>
> Not obvious to me.
>
> The well-known 6666 technique for packed decimal puts one digit in each
> four-bit nibble, and adds 6 to generate the carry as you say. One could
> do the same for h/m/s (for example) with a low order nibble size of 6
> bits and adding 4 for the carry. And the nibble is 4 bits for l/s/d with
> an added 4 in the low order, and 5 bits with an added 12 in the next
> nibble. As you say, it's obvious how to do any one of those.
>
> However, I don't see how a single piece of hardware does *all* of them
> because not only does the constant change, the nibble size does too.
> Hence which bit of the adder is to supply the carry differs among the
> cases. I suppose you could mux to select the right bit, but that's more
> than just picking a constant as you implied was the case.

Hmmm...

I would do mm:ss as 4 bcd digits, with 0xA6A6 in the excess register,
but the hours part is harder: How do we get from 23 to 00 + carry?

(Just doing AM/PM + 12-hour clocks is of course easy to fit within a
nybble scenario with 0xE4 as the excess but I can't see any (obvious)
way to do it for a 24-hour clock.)

Afair the old UK system had 20 shillings and 12 pence, right?

That would also fit with a single nybble for the pence count, with 0xE64
as the excess.

With 8 stones in a hundredweight and 14 pounds in a stone we need just
two nybbles for those two parts and an excess of 0x82.

> Or you could select the widest nibble of all interesting cases (6 bits
> in your examples) and adjust the constant accordingly. However, that
> makes the packed decimal constant be 54-54-54, not 6-6-6 as you gave;
> such a design wastes a lot of adder bits; and you need circuitry to
> explode 4-bit packed decimal up to 6-bit adder.
>
> There are probably other ways it could be done too, especially if the
> adder was bit sequential rather than bit parallel, as seems likely for
> the day. I'd like to know how they *actually* did it, so if you can find
> a tech doc source please cite it.

I can't see how to do it for arbitrary step boundaries, but at least for
those imperial values I would use one or two nybbles for each part and
adjust the excess amounts.

This does imply that you can't do all the calculations in BCD all the
time, i.e. 13 stones would be stored in a single nybble as 0xC.

Ivan Godard

unread,

Feb 23, 2013, 2:58:32 PM2/23/13

to

Yes, all of that - but how did *they* do it?

Ivan

Terje Mathisen

unread,

Feb 23, 2013, 5:45:56 PM2/23/13

to

That would be nice to know, but only after trying to find out a nice way
to it ourselves. :-)

Have you thought of any interesting tricks?

Ivan Godard

unread,

Feb 23, 2013, 6:01:05 PM2/23/13

to

On 2/23/2013 2:45 PM, Terje Mathisen wrote:
> Ivan Godard wrote:
>> On 2/23/2013 11:24 AM, Terje Mathisen wrote:
>>> I can't see how to do it for arbitrary step boundaries, but at least for
>>> those imperial values I would use one or two nybbles for each part and
>>> adjust the excess amounts.
>>>
>>> This does imply that you can't do all the calculations in BCD all the
>>> time, i.e. 13 stones would be stored in a single nybble as 0xC.
>>
>> Yes, all of that - but how did *they* do it?
>
> That would be nice to know, but only after trying to find out a nice way
> to it ourselves. :-)
>
> Have you thought of any interesting tricks?

If I had then I wouldn't be banging on you so hard :-)

Ivan

Quadibloc

unread,

Feb 23, 2013, 9:44:01 PM2/23/13

to

It seems simple enough to me as well.

If loading it with sixes causes decimal arithmetic, then presumably it
controls when carries take place out of groups of four bits.

The comparison would be to excess-three arithmetic. The carries take
place at the right time, but if no carry goes out, the four bits
contain an excess-six number; if one does, the result is normal BCD.

So, given that, in this case, it seems to me that the register would
work as follows:

Addition is binary addition of the two operands and the excess digits
register.

Whenever a group of four bits does not have a carry out of it, the
four bits corresponding from the excess digits register are subtracted
out again.

Much simpler than the convert instructions on the 7090...

John Savard

Ivan Godard

unread,

Feb 23, 2013, 10:43:56 PM2/23/13

to

Yes, of course; so far so good. Now how do you make the *same* adder add
hours/minutes/seconds using Binary Code Sexagesimal with 8 bits per
digit? And then also make it add pounds/shillings/pence using Binary
Coded Base20/12, with five bits for one digit and four for another?

I can't figure it out, or at least haven't yet, so if you can then
please enlighten me. The best I have requires a set of carry-bit select
muxes that you tell what base you are using, which is a good deal more
complicated than the simple "choose a constant" that Terje recalled.

Ivan

EricP

unread,

Feb 24, 2013, 12:57:21 AM2/24/13

to

It took a lot of rummaging about the web (there is actually
a lot of LEO info out there) and found some info that may be helpful

Mathematics and Software at Leo Computers
John Gosden, 1995
http://www.cs.man.ac.uk/CCS/res/res17.htm#e

about the design of LEO 3

"At this time I was unhappy about the difficulties of using binary
machines for clerical work, especially the need to convert both
input and output, and the need to match Cobol facilities to handle
digits and characters individually. However I was even more unhappy
with the way decimal machines could not handle sterling.

As a result I conceived and designed the "mixed radix register" scheme
to allow mixed radix arithmetic. This register in the arithmetic unit
could specify a separate radix for each digit position. For example,
in sterling, the pence were represented by one digit with a radix of
12 and shillings by two digits with radices 10 and 2. As I recall,
it used one extra register, and for a typical "add" instruction just
a few extra steps in the microcode. Thus it had little speed penalty.
It was really a very simple scheme to implement. Fantl recently
recalled it as "fiendishly cunning"."

Systems architectures for the LEO III computer
http://www.ourcomputerheritage.org/ccs-L3x2.pdf

Instruction set and instruction times for the LEO III computer.
http://www.ourcomputerheritage.org/L3x3.pdf

enjoy
Eric

Ivan Godard

unread,

Feb 24, 2013, 2:58:20 AM2/24/13

to

Bingo! Many thanks for your diligent digging.

They didn't do one base20 digit, they did two digits of base 10 and 2,
and likewise (no doubt) two digits (base 10 and 6) for the base60
numbers. So it was really packed BCD blocked into digit groups.

Works fine, although I'm somewhat disappointed - I was hoping somebody
had done real sexagesimal arithmetic! :-)

Incidentally, the IBM/IEEE representation of decimal floating point does
something similar, storing base1000 numbers in 10-bit fields called
"declets".

Ivan

Terje Mathisen

unread,

Feb 24, 2013, 6:09:28 AM2/24/13

to

Ivan Godard wrote:
> On 2/23/2013 9:57 PM, EricP wrote:
>> As a result I conceived and designed the "mixed radix register" scheme
>> to allow mixed radix arithmetic. This register in the arithmetic unit
>> could specify a separate radix for each digit position. For example,
>> in sterling, the pence were represented by one digit with a radix of
>> 12 and shillings by two digits with radices 10 and 2. As I recall,

I.e. exactly as I guessed in my last post: Allowing any radix up to 16
for BCD coded numbers.

:-)

>> it used one extra register, and for a typical "add" instruction just
>> a few extra steps in the microcode. Thus it had little speed penalty.
>> It was really a very simple scheme to implement. Fantl recently
>> recalled it as "fiendishly cunning"."
>>
>> Systems architectures for the LEO III computer
>> http://www.ourcomputerheritage.org/ccs-L3x2.pdf
>>
>> Instruction set and instruction times for the LEO III computer.
>> http://www.ourcomputerheritage.org/L3x3.pdf
>
> Bingo! Many thanks for your diligent digging.
>
> They didn't do one base20 digit, they did two digits of base 10 and 2,
> and likewise (no doubt) two digits (base 10 and 6) for the base60
> numbers. So it was really packed BCD blocked into digit groups.

Right.

>
> Works fine, although I'm somewhat disappointed - I was hoping somebody
> had done real sexagesimal arithmetic! :-)

That would have required more hw and a much more complicated conversion
to printable (COBOL) ascii.

>
> Incidentally, the IBM/IEEE representation of decimal floating point does
> something similar, storing base1000 numbers in 10-bit fields called
> "declets".

Yes and no:

Decimal FP have two alternative encodings for the 10-bit groups, either
mod 1000 or a really interesting/funky modified BCD which uses 9 bits
for 3 BCD digits of 0-7 and the 10th bit to signal alternate encodings
when one or more digit is 8 or 9.

This encoding has been selected in order to allow conversion to/from a
3-digit (12-bit) BCD format with a minimum amount of hw.

In sw the best approach is a pair of 10/12-bit lookup tables, and then
to work in base1000 internally, unless you want it to be fast:

In that case I would use an internal format with extra storage so I
could skip the digit-aligned normalization step after every operation:

Store the numbers as binary integers with a decimal exponent, do any
required scaling with reciprocal muls.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 6:37:36 AM2/24/13

to

In article <809pv9-...@ntp-sure.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

>Ivan Godard wrote:
>>
>> Incidentally, the IBM/IEEE representation of decimal floating point does
>> something similar, storing base1000 numbers in 10-bit fields called
>> "declets".
>
>Yes and no:
>
>Decimal FP have two alternative encodings for the 10-bit groups, either
>mod 1000 or a really interesting/funky modified BCD which uses 9 bits
>for 3 BCD digits of 0-7 and the 10th bit to signal alternate encodings
>when one or more digit is 8 or 9.
>
>This encoding has been selected in order to allow conversion to/from a
>3-digit (12-bit) BCD format with a minimum amount of hw.

Er, no. That was one iteration, if I recall. The final version
has the declet one, and a modified binary form where MOST of the
mantissa is in plain binary. My understanding is that the latter
was wanted by Intel (the other supporter), which has now backed off.

>In sw the best approach is a pair of 10/12-bit lookup tables, and then
>to work in base1000 internally, unless you want it to be fast:
>
>In that case I would use an internal format with extra storage so I
>could skip the digit-aligned normalization step after every operation:
>
>Store the numbers as binary integers with a decimal exponent, do any
>required scaling with reciprocal muls.

Or just throw the whole, horrible mess out of the window, use a
binary format, and convert when needed. Decimal floating-point
is a marketing solution to a known technical non-requirement.

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 24, 2013, 11:49:34 AM2/24/13

to

On 2/24/2013 3:09 AM, Terje Mathisen wrote:
> Ivan Godard wrote:

<snip>

>>
>> Incidentally, the IBM/IEEE representation of decimal floating point does
>> something similar, storing base1000 numbers in 10-bit fields called
>> "declets".
>
> Yes and no:
>
> Decimal FP have two alternative encodings for the 10-bit groups, either
> mod 1000 or a really interesting/funky modified BCD which uses 9 bits
> for 3 BCD digits of 0-7 and the 10th bit to signal alternate encodings
> when one or more digit is 8 or 9.
>
> This encoding has been selected in order to allow conversion to/from a
> 3-digit (12-bit) BCD format with a minimum amount of hw.
>
> In sw the best approach is a pair of 10/12-bit lookup tables, and then
> to work in base1000 internally, unless you want it to be fast:
>
> In that case I would use an internal format with extra storage so I
> could skip the digit-aligned normalization step after every operation:
>
> Store the numbers as binary integers with a decimal exponent, do any
> required scaling with reciprocal muls.

What you describe for software is essentially the Intel/IEEE version of
decimal float. Right - IEEE has *two* incompatible representations for
decimal float, the IBM and Intel versions. I was on the IEEE committee
that did the new 754 standard, and that outcome was after the worst case
of standards-committee packing I have ever seen.

Ivan

Ivan Godard

unread,

Feb 24, 2013, 12:34:18 PM2/24/13

to

Not so. It is a very technical requirement, with real world consequences.

Many business computations are required *by law* to be exact and to have
exactly specified rounding, in decimal to match the real world of pounds
and dollars. Because many quantities that are exact in decimal are
non-terminating fractions in binary, this technical requirement cannot
be satisfied as you suggest, because round-off error in converted binary
will not provide the same result for a computation that is exact in
decimal. Scientific applications cannot tolerate computation that is
*almost* accurate but gets the rounding wrong, and the same is true of
business applications.

As for the impact, we on the committee (mostly HPC types with little
business background) wondered if there was really a need to upgrade 854
as part of our work. Then we saw measurements of real-world database
applications. If a column is typed "numeric" then the database keeps the
values in decimal, and the programs - yes, often COBOL - do the
arithmetic in decimal too. Over a sample of several thousand
applications, umptillion CPU cycles, and billions of dollars of
hardware, some 31% of all processor cycles were spent in *decimal
arithmetic emulation routines*.

Frankly, we had unknowingly shared the snobbery of scientific
programmers and CS types. Our eyes were opened, and the revised standard
merges the two radices.

Interesting to me, it turns our that decimal floating point has better
behavior than binary for many scientific codes as well. The reasons are
esoteric and this is not my expertise, but it boils down to the fact
that IEEE decimal is not really "floating" point in the sense that
binary is. Instead, decimal is really a scaled fixed-point
representation, because it doesn't do normalization. Consequently, in
decimal 1.0, 1.00, and 1.000 are three different numbers with three
different representations, whereas in binary they are all the same.

Because decimal keeps the quantum (effectively, the number of trailing
zeros) while binary normalizes it away, in decimal you can detect loss
of significance that cannot be discovered in binary absent heroic side
computations that are never done until after the plane has crashed. A
64-bit answer that is garbage (except, maybe, for the leading bit or
two) is no answer, and an alarming number of scientific results display
this behavior in regions of the problem space.

It helps that the standard mandates 64- and 128-bit decimal, and doesn't
support 32-bit; the programmer is less likely to be caught up in the
rush to "wrong answers faster" tradition that dates back at least as far
as Seymour Cray.

My complements to the members of the COBOL standards committee who were
very helpful, and patient, with us. FWIW, our Mill can be configured
with hardware IEEE decimal, although I doubt there will be a decimal
Mill any time soon, due to those "marketing" reasons you disparage.
Selling to the big-iron database market requires a multi-billion dollar
company's sales force, while the scientific/engineering/floating point
markets buy on the basis of BLAS benchmarks that ignore loss of
significance.

For some truly inspired rants about round-off and significance issues in
binary floating point, Google "William Kahan floating point".

nm...@cam.ac.uk

unread,

Feb 24, 2013, 12:37:20 PM2/24/13

to

In article <kgdg8u$k8f$1...@dont-email.me>,

I'll see you with the Virtual Terminal Protocol :-) In an attempt
to extend terminal functionality to 'full-screen', it eventually
decided on two options, based on DEC's VT-100 and IBM's 3270.
An implementation was fully conforming with either - and, if you
know them, thinking of points of commonality isn't an easy task.

It sank like the proverbial lead balloon, even among communities
that had been involved in the committee.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 1:10:33 PM2/24/13

to

In article <kgdisp$39r$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>>
>> Or just throw the whole, horrible mess out of the window, use a
>> binary format, and convert when needed. Decimal floating-point
>> is a marketing solution to a known technical non-requirement.
>
>Not so. It is a very technical requirement, with real world consequences.

No. Decimal FIXED-POINT is the requirement. It doesn't help.

>Many business computations are required *by law* to be exact and to have
>exactly specified rounding, in decimal to match the real world of pounds
>and dollars. Because many quantities that are exact in decimal are
>non-terminating fractions in binary, this technical requirement cannot
>be satisfied as you suggest, because round-off error in converted binary
>will not provide the same result for a computation that is exact in

>decimal. ...

You are mistaken, because you assume that I meant binary floating-
point. I didn't. This requirement has been met since time immemorial
by using scaled integers. Improving the hardware support for fixed-
point emulation would help.

Decimal floating-point doesn't cut the mustard for a large number
of reasons. Here are SOME of the reasons:

a) The rounding modes include many that are not in IEEE 754;
some even require carry values.

b) There are also strict rules on precision and overflow,
at specific limits.

c) Many of them constrain multiplication and division; the
former has to be done by a double precision operating and rounding
to s precision, and the latter is too horrible to contemplate
starting from floating-point.

>As for the impact, we on the committee (mostly HPC types with little
>business background) wondered if there was really a need to upgrade 854
>as part of our work. Then we saw measurements of real-world database
>applications. If a column is typed "numeric" then the database keeps the
>values in decimal, and the programs - yes, often COBOL - do the
>arithmetic in decimal too. Over a sample of several thousand
>applications, umptillion CPU cycles, and billions of dollars of
>hardware, some 31% of all processor cycles were spent in *decimal
>arithmetic emulation routines*.

I don't believe that statistic. Waiting for memory and recovering
from glitches is VASTLY higher. But, even if it were true, my
point is that what has been standardised won't meet even Cobol's
basic requirements, let alone those of all of the laws, and it
will STILL require an emulation layer!

>Interesting to me, it turns our that decimal floating point has better
>behavior than binary for many scientific codes as well. The reasons are
>esoteric and this is not my expertise, but it boils down to the fact
>that IEEE decimal is not really "floating" point in the sense that
>binary is. Instead, decimal is really a scaled fixed-point
>representation, because it doesn't do normalization. Consequently, in
>decimal 1.0, 1.00, and 1.000 are three different numbers with three
>different representations, whereas in binary they are all the same.

Well, it is my area, and it isn't. Back in the days when there were
a lot of radices in use (including decimal, though I never personally
used it), the investigations found that binary was marginally better
than anything else. Only marginally.

What you say sounds like a garbling of the numerical differences
between scaled fixed point and floating-point, which was well-known
(to expert numerical analysts) by 1970. And, in general, floating-
point is much better. For the few codes where fixed-point is better,
the IEEE 754 form won't work because of multiplication and division.

Quanta are merely an extension of the old unnormalised floating-
point and that was well-known to be a numerical disaster area,
except in microkernels written by top numerical experts. As I
understand it, IEEE 754 has essentially specified prenormalisation
for multiplication and division, so it does avoid the number one
disaster.

>For some truly inspired rants about round-off and significance issues in
>binary floating point, Google "William Kahan floating point".

Which apply, even more strongly, to decimal floating-point, because
of the "wobbling significance" problem. I can't remember offhand
references to the numerical problems with fixed-point, but they
are in the context of pivoting. Nor can I remember any references
to the binary versus other investigations, though I did a few.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 24, 2013, 2:10:58 PM2/24/13

to

Ivan Godard wrote:
> Interesting to me, it turns our that decimal floating point has better
> behavior than binary for many scientific codes as well. The reasons are
> esoteric and this is not my expertise, but it boils down to the fact
> that IEEE decimal is not really "floating" point in the sense that
> binary is. Instead, decimal is really a scaled fixed-point
> representation, because it doesn't do normalization. Consequently, in
> decimal 1.0, 1.00, and 1.000 are three different numbers with three
> different representations, whereas in binary they are all the same.

This means that you store those three numbers as 10e-1, 100e-2 and
1000e-3, right?

> Because decimal keeps the quantum (effectively, the number of trailing
> zeros) while binary normalizes it away, in decimal you can detect loss
> of significance that cannot be discovered in binary absent heroic side
> computations that are never done until after the plane has crashed. A
> 64-bit answer that is garbage (except, maybe, for the leading bit or
> two) is no answer, and an alarming number of scientific results display
> this behavior in regions of the problem space.

There has of course been non-ieee fp formats which did sort of the same
in binary, i.e. the mantissa was always an integer.

>
> It helps that the standard mandates 64- and 128-bit decimal, and doesn't
> support 32-bit; the programmer is less likely to be caught up in the
> rush to "wrong answers faster" tradition that dates back at least as far
> as Seymour Cray.

:-(

I still like truncate-to-zero instead of denormalization, mostly because
it makes for a cleaner implementation.

If you ever get into these ranges, and depend upon gradual underflow,
then I _really_ hope you know what you are doing.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 2:24:58 PM2/24/13

to

In article <kgdl2p$1jb$1...@needham.csi.cam.ac.uk>, <nm...@cam.ac.uk> wrote:
>
>Which apply, even more strongly, to decimal floating-point, because
>of the "wobbling significance" problem. I can't remember offhand
>references to the numerical problems with fixed-point, but they
>are in the context of pivoting. Nor can I remember any references
>to the binary versus other investigations, though I did a few.

Stupid of me. I should have thought. Try "wobbling precision".
Nick Higham may well also describe the floating- versus fixed-
point numerical issues.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 2:33:49 PM2/24/13

to

In article <275qv9-...@ntp-sure.tmsw.no>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>

>This means that you store those three numbers as 10e-1, 100e-2 and
>1000e-3, right?

No, but close enough :-)

>I still like truncate-to-zero instead of denormalization, mostly because
>it makes for a cleaner implementation.

And, generally, cleaner numerical analysis.

>If you ever get into these ranges, and depend upon gradual underflow,
>then I _really_ hope you know what you are doing.

The advantage of gradual underflow is that it has slightly fewer
'gotchas' than hard underflow. But it still has some that hard
underflow doesn't have. Algorithms that work on one may not work
on the other, and that catches quite a few people (including the
authors of LAPACK!)

And, of course, any base other than two has a gotcha that base 2
doesn't have: neither (a+b)/2 nor (a/2+b/2) are guaranteed to
be in the range [a,b] any longer.

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 24, 2013, 3:23:30 PM2/24/13

to

On 2/24/2013 10:10 AM, nm...@cam.ac.uk wrote:
> In article <kgdisp$39r$1...@dont-email.me>,
> Ivan Godard <iv...@ootbcomp.com> wrote:
>>>
>>> Or just throw the whole, horrible mess out of the window, use a
>>> binary format, and convert when needed. Decimal floating-point
>>> is a marketing solution to a known technical non-requirement.
>>
>> Not so. It is a very technical requirement, with real world consequences.
>
> No. Decimal FIXED-POINT is the requirement. It doesn't help.

Don't be misled by the names of things. IEEE decimal is in fact decimal
fixed point.

>
>> Many business computations are required *by law* to be exact and to have
>> exactly specified rounding, in decimal to match the real world of pounds
>> and dollars. Because many quantities that are exact in decimal are
>> non-terminating fractions in binary, this technical requirement cannot
>> be satisfied as you suggest, because round-off error in converted binary
>> will not provide the same result for a computation that is exact in
>> decimal. ...
>
> You are mistaken, because you assume that I meant binary floating-
> point. I didn't. This requirement has been met since time immemorial
> by using scaled integers. Improving the hardware support for fixed-
> point emulation would help.

Scaled integers (as built on ordinary integer arithmetic) suffer from
the problem that scale must be maintained statically, whereas one wants
(and many algorithms require) scale to be maintained dynamically. The
intent of the quanta parts of IEEE decimal is to permit automatic
scaling, as binary floating point does, while preserving significance
information. Loss of significance is detectable by the hardware in IEEE
decimal, and ignored in IEEE binary.

You are absolutely right that better hardware support for fixed point
would be a good thing. However, neither the IEEE charter nor the
practicalities of legacy languages and applications would permit a
redefinition of integer. Consequently fixed-point support was defined in
the only domain in which the languages and practice already supported
extensive fixed-point, namely the commercial computing world.

You will find all the "hardware support for fixed point emulation" you
want in IEEE decimal.

>
> Decimal floating-point doesn't cut the mustard for a large number
> of reasons. Here are SOME of the reasons:
>
> a) The rounding modes include many that are not in IEEE 754;
> some even require carry values.

International accounting standards that long predate computers mandate
so-called "banker's rounding" for currency calculations; in many
countries, including your own, this requirement is also by statute. This
rounding mode is essential if hardware decimal is to be usable at all.
It is of course meaningless in binary where the rounding will be wrong
anyway. We considered adding it to binary anyway simply for the
symmetry, but decided that there was no reason to burden the hardware
guys with a pointless mode that would potentially break binary codes
that assume the existing set of modes.

> b) There are also strict rules on precision and overflow,
> at specific limits.

You would prefer no rules?

> c) Many of them constrain multiplication and division; the
> former has to be done by a double precision operating and rounding
> to s precision, and the latter is too horrible to contemplate
> starting from floating-point.

There are no "many of them". There is one standard, and almost nothing
is "up to the implementation". Implementation-defined things include the
bitwise representation of values (which you cannot detect with decimal
operations; you'd have to cast to integer and look at the bits). The
semantics of trap handlers is also implementation-defined, but that's
true of binary FP as well, and the definitions and facilities for traps
are identical in the two radices.

>
>> As for the impact, we on the committee (mostly HPC types with little
>> business background) wondered if there was really a need to upgrade 854
>> as part of our work. Then we saw measurements of real-world database
>> applications. If a column is typed "numeric" then the database keeps the
>> values in decimal, and the programs - yes, often COBOL - do the
>> arithmetic in decimal too. Over a sample of several thousand
>> applications, umptillion CPU cycles, and billions of dollars of
>> hardware, some 31% of all processor cycles were spent in *decimal
>> arithmetic emulation routines*.
>
> I don't believe that statistic. Waiting for memory and recovering
> from glitches is VASTLY higher. But, even if it were true, my
> point is that what has been standardised won't meet even Cobol's
> basic requirements, let alone those of all of the laws, and it
> will STILL require an emulation layer!

Well, I didn't do the studies, but I believe the counts were of issued
instructions and hence would not have included memory stalls that were
not hidden by the OOO. I *have* looked at the emulation codes that were
consuming the cycles, and I believe the numbers for apps in which
essentially all the computation is being done in decimal, which is the
routine case. YMMV.

I *can* address the COBOL question. The IEEE committee worked very
closely with the COBOL committee on the new standard, and they said they
were completely satisfied. They had to make one important change to the
COBOL standard, from mandating 34 digit precision to only 33-digit, so
as to get something that would fit in a quad (the former size dated from
when numbers were BCD strings on a 1401), but they were happy to do so
to get the added range and be free from the task of explicitly coding
the decimal point maintenance.

>> Interesting to me, it turns our that decimal floating point has better
>> behavior than binary for many scientific codes as well. The reasons are
>> esoteric and this is not my expertise, but it boils down to the fact
>> that IEEE decimal is not really "floating" point in the sense that
>> binary is. Instead, decimal is really a scaled fixed-point
>> representation, because it doesn't do normalization. Consequently, in
>> decimal 1.0, 1.00, and 1.000 are three different numbers with three
>> different representations, whereas in binary they are all the same.
>
> Well, it is my area, and it isn't. Back in the days when there were
> a lot of radices in use (including decimal, though I never personally
> used it), the investigations found that binary was marginally better
> than anything else. Only marginally.
>
> What you say sounds like a garbling of the numerical differences
> between scaled fixed point and floating-point, which was well-known
> (to expert numerical analysts) by 1970. And, in general, floating-
> point is much better. For the few codes where fixed-point is better,
> the IEEE 754 form won't work because of multiplication and division.

I'm not sure why you single out multiplication and division, which work
just fine on IEEE decimal.

There is only one case I know of in which a standard that preserves
significance requires added thought (and code). That's in algorithms
that actually *increase* significance, although they don't look like it.
A Newton-Rapheson sqrt is adding 10 bits or so of significance with
every iteration, yet in standard decimal the quanta will be shrinking.
You have to use the (standard) operations to assert the new and correct
significance of the result.

Of course, I doubt many COBOL finance programs are doing
Newton-Rapheson. :-)

> Quanta are merely an extension of the old unnormalised floating-
> point and that was well-known to be a numerical disaster area,
> except in microkernels written by top numerical experts. As I
> understand it, IEEE 754 has essentially specified prenormalisation
> for multiplication and division, so it does avoid the number one
> disaster.

Unnormalized floating point with improper rounding is a disaster for
computations that approximate the real line; you tend to get cumulative
roundoff errors. Used to be that the same occured in binary FP too. That
is fixed in the standard by requiring that IEEE decimal offer
"round-to-nearest-even" just like IEEE binary; use that and there is no
problem with unnormalization. The applications for IEEE decimal do *not*
approximate the real line, as you will discover if you walk into a bank
and try to cash a check for 2pi pounds. :-)

There is no specification for prenormalization; not sure where you got
the idea. Instead the standard uses the same terminology as it uses for
binary: computation shall be *as if* the operation were performed
exactly, and the result is then rounded as specified by the mode. I'm
reasonably confident that doing prenormalization followed by a hardware
binary operation and then post un-normalization would not conform to
standard; I do not recommend that you buy any machine that does it that
way.

If you are interested in how to do it right, I suggest you contact Mike
Cowlishaw at IBM Cambridge who has a full software emulation package
that is public domain (and was used as the base for the IBM hardware
implementation). You can download the package at ICU:
http://download.icu-project.org/files/decNumber/

>> For some truly inspired rants about round-off and significance issues in
>> binary floating point, Google "William Kahan floating point".
>
> Which apply, even more strongly, to decimal floating-point, because
> of the "wobbling significance" problem. I can't remember offhand
> references to the numerical problems with fixed-point, but they
> are in the context of pivoting. Nor can I remember any references
> to the binary versus other investigations, though I did a few.

You are complaining that a sewing machine is not a hammer. I doubt many
will use IEEE decimal for LU decomposition. :-)

However, we did consider it apt for stability exploration. Kahan had a
technique that he had found very useful in heavy numeric applications as
a way to get a sense of the "goodness" of a result. He would run the
application twice, once with rounding mode set to round-up (toward
positive infinity) and once to round-down. The difference between the
two results gave an approximation to the significance. He had quite a
few harmless-looking examples where the difference was of the same
magnitude as the result itself.

Applied here, one could run an existing app again in decimal and then
try to figure out why the quantum had shrunk to zero :-)

Ivan Godard

unread,

Feb 24, 2013, 3:59:00 PM2/24/13

to

On 2/24/2013 11:10 AM, Terje Mathisen wrote:
> Ivan Godard wrote:
>> Interesting to me, it turns our that decimal floating point has better
>> behavior than binary for many scientific codes as well. The reasons are
>> esoteric and this is not my expertise, but it boils down to the fact
>> that IEEE decimal is not really "floating" point in the sense that
>> binary is. Instead, decimal is really a scaled fixed-point
>> representation, because it doesn't do normalization. Consequently, in
>> decimal 1.0, 1.00, and 1.000 are three different numbers with three
>> different representations, whereas in binary they are all the same.
>
> This means that you store those three numbers as 10e-1, 100e-2 and
> 1000e-3, right?

yes

>> Because decimal keeps the quantum (effectively, the number of trailing
>> zeros) while binary normalizes it away, in decimal you can detect loss
>> of significance that cannot be discovered in binary absent heroic side
>> computations that are never done until after the plane has crashed. A
>> 64-bit answer that is garbage (except, maybe, for the leading bit or
>> two) is no answer, and an alarming number of scientific results display
>> this behavior in regions of the problem space.
>
> There has of course been non-ieee fp formats which did sort of the same
> in binary, i.e. the mantissa was always an integer.

And were also best thought of as scaled-integer rather than floating point.

>> It helps that the standard mandates 64- and 128-bit decimal, and doesn't
>> support 32-bit; the programmer is less likely to be caught up in the
>> rush to "wrong answers faster" tradition that dates back at least as far
>> as Seymour Cray.
>
> :-(
>
> I still like truncate-to-zero instead of denormalization, mostly because
> it makes for a cleaner implementation.
>
> If you ever get into these ranges, and depend upon gradual underflow,
> then I _really_ hope you know what you are doing.

Bill Kahan, who invented gradual underflow, certainly knew what he was
doing. His was the last generation of serious numerical analysts,
though. Nick teaches students who need to know that FP is not "real",
and I bet he'd tell you that the students aren't interested.

The idea behind gradual underflow was to have the density of
representable numbers be similar around zero to the density further out.
However, when I told him he could have FTZ quad for the cost of denormal
single, Kahan himself told me he'd take the quad.

Ivan Godard

unread,

Feb 24, 2013, 4:02:05 PM2/24/13

to

That's a generic problem. Except in base 10, neither (a+b)/10 nor (a/10
+ b/10) are guaranteed to be in the range [a,b] either.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 4:05:02 PM2/24/13

to

In article <kgdsq1$7ck$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>On 2/24/2013 10:10 AM, nm...@cam.ac.uk wrote:
>> In article <kgdisp$39r$1...@dont-email.me>,
>> Ivan Godard <iv...@ootbcomp.com> wrote:
>>>>
>>>> Or just throw the whole, horrible mess out of the window, use a
>>>> binary format, and convert when needed. Decimal floating-point
>>>> is a marketing solution to a known technical non-requirement.
>>>
>>> Not so. It is a very technical requirement, with real world consequences.
>>
>> No. Decimal FIXED-POINT is the requirement. It doesn't help.
>
>Don't be misled by the names of things. IEEE decimal is in fact decimal
>fixed point.

Hmm. Not as I read the standard. It's a truly ghastly kludge,
that attempts to be fixed-point to people who want that and
floating-point to people who want that.

>> You are mistaken, because you assume that I meant binary floating-
>> point. I didn't. This requirement has been met since time immemorial
>> by using scaled integers. Improving the hardware support for fixed-
>> point emulation would help.
>
>Scaled integers (as built on ordinary integer arithmetic) suffer from
>the problem that scale must be maintained statically, whereas one wants
>(and many algorithms require) scale to be maintained dynamically.

That is, after all, why floating-point was adopted. My first
computing book was Modern Computing Methods :-) However, my point
is that those business requirements you refer to positively want
static scaling, because the laws require primarily a fixed
precision, secondarily rounding, and tertially overflow.

>The
>intent of the quanta parts of IEEE decimal is to permit automatic
>scaling, as binary floating point does, while preserving significance
>information. Loss of significance is detectable by the hardware in IEEE
>decimal, and ignored in IEEE binary.

I know :-(

Attempting to be all things to all men usually ends up with meeting
none of the requirements.

>You are absolutely right that better hardware support for fixed point
>would be a good thing. However, neither the IEEE charter nor the
>practicalities of legacy languages and applications would permit a
>redefinition of integer.

What does fixed-point arithmetic have to do with integer arithmetic?

>You will find all the "hardware support for fixed point emulation" you
>want in IEEE decimal.

I am sorry, but no. Yes, I could use it, but it doesn't help to
actually get it right. I would still need to emulate. And, if you
have ever tried to reverse the double rounding effect, you will
know what foul programming is :-(

>> Decimal floating-point doesn't cut the mustard for a large number
>> of reasons. Here are SOME of the reasons:
>>
>> a) The rounding modes include many that are not in IEEE 754;
>> some even require carry values.
>
>International accounting standards that long predate computers mandate
>so-called "banker's rounding" for currency calculations; in many
>countries, including your own, this requirement is also by statute. This
>rounding mode is essential if hardware decimal is to be usable at all.

That is merely one rounding of many specified in various statutes.
So emulation is needed.

>> b) There are also strict rules on precision and overflow,
>> at specific limits.
>
>You would prefer no rules?

But they aren't the same rules as IEEE 754 decimal provides!
So emulation is needed.

>> c) Many of them constrain multiplication and division; the
>> former has to be done by a double precision operating and rounding
>> to s precision, and the latter is too horrible to contemplate
>> starting from floating-point.
>
>There are no "many of them". There is one standard, and almost nothing
>is "up to the implementation".

I was referring to the laws.

>Well, I didn't do the studies, but I believe the counts were of issued
>instructions and hence would not have included memory stalls that were
>not hidden by the OOO. I *have* looked at the emulation codes that were
>consuming the cycles, and I believe the numbers for apps in which
>essentially all the computation is being done in decimal, which is the
>routine case. YMMV.

So it's 31% of a very small percentage. Yes, that makes sense.

>I *can* address the COBOL question. ...

I am not going to recheck the Cobol and IEEE 754 standards again,
so will accept your statement. I could have made an error when
I did.

>> What you say sounds like a garbling of the numerical differences
>> between scaled fixed point and floating-point, which was well-known
>> (to expert numerical analysts) by 1970. And, in general, floating-
>> point is much better. For the few codes where fixed-point is better,
>> the IEEE 754 form won't work because of multiplication and division.
>
>I'm not sure why you single out multiplication and division, which work
>just fine on IEEE decimal.

Because they don't do fixed-point arithmetic - they will rescale
to preserve precision.

>
>There is only one case I know of in which a standard that preserves
>significance requires added thought (and code). That's in algorithms
>that actually *increase* significance, although they don't look like it.
>A Newton-Rapheson sqrt is adding 10 bits or so of significance with
>every iteration, yet in standard decimal the quanta will be shrinking.
>You have to use the (standard) operations to assert the new and correct
>significance of the result.

That's not actually the problem.

>Unnormalized floating point with improper rounding is a disaster for
>computations that approximate the real line; you tend to get cumulative
>roundoff errors. Used to be that the same occured in binary FP too. That
>is fixed in the standard by requiring that IEEE decimal offer
>"round-to-nearest-even" just like IEEE binary; use that and there is no
>problem with unnormalization. The applications for IEEE decimal do *not*
>approximate the real line, as you will discover if you walk into a bank
>and try to cash a check for 2pi pounds. :-)

That was never the problem with unnormalised numbers. I was talking
about ones with 'proper' rounding but no prenormalisation.

>There is no specification for prenormalization; not sure where you got
>the idea. Instead the standard uses the same terminology as it uses for
>binary: computation shall be *as if* the operation were performed
>exactly, and the result is then rounded as specified by the mode.

That's the point.

>I'm
>reasonably confident that doing prenormalization followed by a hardware
>binary operation and then post un-normalization would not conform to
>standard; I do not recommend that you buy any machine that does it that
>way.

I am absolutely certain that it wouldn't. That wasn't my point,
anyway.

>If you are interested in how to do it right, I suggest you contact Mike
>Cowlishaw at IBM Cambridge who has a full software emulation package
>that is public domain (and was used as the base for the IBM hardware
>implementation).

Yes, I know - and I know him, though I didn't know he had moved up
here. I have had this 'debate' with him, too :-(

>> Which apply, even more strongly, to decimal floating-point, because
>> of the "wobbling significance" problem. I can't remember offhand
>> references to the numerical problems with fixed-point, but they
>> are in the context of pivoting. Nor can I remember any references
>> to the binary versus other investigations, though I did a few.
>
>You are complaining that a sewing machine is not a hammer. I doubt many
>will use IEEE decimal for LU decomposition. :-)

No, what I am doing is saying that the claims of extra accuracy for
aare known to be the converse of the truth. However, as is well
known, decimal floating-point isn't MUCH less accurate than binary.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 4:08:06 PM2/24/13

to

In article <kgdusj$kpl$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>On 2/24/2013 11:10 AM, Terje Mathisen wrote:
>>
>> I still like truncate-to-zero instead of denormalization, mostly because
>> it makes for a cleaner implementation.
>>
>> If you ever get into these ranges, and depend upon gradual underflow,
>> then I _really_ hope you know what you are doing.
>
>Bill Kahan, who invented gradual underflow, certainly knew what he was
>doing. His was the last generation of serious numerical analysts,
>though. Nick teaches students who need to know that FP is not "real",
>and I bet he'd tell you that the students aren't interested.

I would. But I also knew quite a lot of that generation of serious
numerical analysts, and almost all belonged to the camp that
preferred FTZ.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 24, 2013, 4:10:00 PM2/24/13

to

In article <kgdv2b$kpl$2...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>>
>> And, of course, any base other than two has a gotcha that base 2
>> doesn't have: neither (a+b)/2 nor (a/2+b/2) are guaranteed to
>> be in the range [a,b] any longer.
>
>That's a generic problem. Except in base 10, neither (a+b)/10 nor (a/10
>+ b/10) are guaranteed to be in the range [a,b] either.

This is getting ridiculous. Do you REALLY not know the relative
importance? Think root finders and similar algorithms.

Regards,
Nick Maclaren.

Quadibloc

unread,

Feb 24, 2013, 5:12:35 PM2/24/13

to

On Feb 23, 8:43 pm, Ivan Godard <i...@ootbcomp.com> wrote:

> Yes, of course; so far so good. Now how do you make the *same* adder add
> hours/minutes/seconds using Binary Code Sexagesimal with 8 bits per
> digit? And then also make it add pounds/shillings/pence using Binary
> Coded Base20/12, with five bits for one digit and four for another?

But that should be obvious. If "excess-three" does base-ten
arithmetic, "excess-two" does base-twelve arithmetic. So put 4 in the
excess register instead of 6.

You can make each digit have any base you like (as long as it's 16 or
less), and the same rule will apply: add the excess digit, and if
there's no carry, subtract it out again.

So twenty shillings to the pound works simply because 20 has no prime
factors greater than 16.

The tricky thing is, how do you *multiply* using a register like that?
(Answer: multiply by regular binary numbers, getting partial products
by repeatedly adding the register contents to themselves - Russian
Peasant Multiplication, which also serves as a technique often used by
computers for fast exponentiation.)

John Savard

Quadibloc

unread,

Feb 24, 2013, 5:16:39 PM2/24/13

to

On Feb 24, 4:37 am, n...@cam.ac.uk wrote:

> Or just throw the whole, horrible mess out of the window, use a
> binary format, and convert when needed. Decimal floating-point
> is a marketing solution to a known technical non-requirement.

I don't know what decimal floating-point is good for either. However,
it has one potential application, even if it doesn't apply to real
life.

If we were so poor that twenty people had to share one computer to do
spreadsheets at the same time, using a native format for the
arithmetic in spreadsheets (assuming it has to be floating-point, has
to be exact decimal arithmetic, so that naive users won't hurt
themselves) would make them use less resources.

Unfortunately, just like the world doesn't have a decent 5-cent cigar,
it doesn't have computationally weak thin clients that are cheap
enough to justify not giving everyone their own computer.

So decimal floating-point would be a great idea, if this was 1972.

John Savard

Quadibloc

unread,

Feb 24, 2013, 5:25:27 PM2/24/13

to

On Feb 24, 12:58 am, Ivan Godard <i...@ootbcomp.com> wrote:

> They didn't do one base20 digit, they did two digits of base 10 and 2,
> and likewise (no doubt) two digits (base 10 and 6) for the base60
> numbers. So it was really packed BCD blocked into digit groups.
>
> Works fine, although I'm somewhat disappointed - I was hoping somebody
> had done real sexagesimal arithmetic! :-)

This suggests to me an obvious improvement in the design.

Instead of monitoring every fourth bit for carries, have an extra mask
register which indicates which bit to look at. Thus, you could fill
the excess register with the bits 01100, and the mask register with
10000, and, voila, five bits of the accumulator are now used for a
base-20 digit.

John Savard

EricP

unread,

Feb 24, 2013, 5:37:33 PM2/24/13

to

Ivan Godard wrote:
>
> Of course, I doubt many COBOL finance programs are doing
> Newton-Rapheson. :-)

For bonds, given a coupon bond price, calculate its yield.

i.e. reverse the equation that calculates Price given Yield
(simplified for illustration)

P = SUM (100*R/M / ((1 + Y/M)^k))
k=1..N

P = price, Y = Yield, R = Interest Rate, M = Coupons per Year

Hint: It helps if you recognize that is the sum of
the first N terms of a geometric series.

There are of different formula depending on
the types of bond, and type of coupon.
These formula are not mentioned in most books
but are covered by securities industry standards.

Accuracies are specified usually as digits after the decimal point,
like 7 or 8. Others talk of digits of significance like 10.
Rounding is specified as taking place just prior to display.

And don't forget that you could need to convert to Zimbabwean dollars
(see http://en.wikipedia.org/wiki/Hyperinflation).

Eric

nm...@cam.ac.uk

unread,

Feb 24, 2013, 5:45:40 PM2/24/13

to

In article <2847910b-8501-4ddb...@y4g2000yqa.googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>
>> Or just throw the whole, horrible mess out of the window, use a

>> binary format, and convert when needed. =A0Decimal floating-point

>> is a marketing solution to a known technical non-requirement.
>
>I don't know what decimal floating-point is good for either. However,
>it has one potential application, even if it doesn't apply to real
>life.
>
>If we were so poor that twenty people had to share one computer to do
>spreadsheets at the same time, using a native format for the
>arithmetic in spreadsheets (assuming it has to be floating-point, has
>to be exact decimal arithmetic, so that naive users won't hurt
>themselves) would make them use less resources.
>

>So decimal floating-point would be a great idea, if this was 1972.

Not really. Back then, we still had experience of such things,
and the near-universal consensus was that they were best abandoned
in favour of binary. That was on the grounds of the (marginal)
numerical benefit of binary.

The killer with using it to protect naive users is that it does
so when they are on courses and writing test codes, and only bites
when they run code for real. The point is that it does what they
expect only up to addition, subtraction and trivial multiplication.
Exactly the same is true of determinism, but at a later stage still
(typically serious optimisation or parallelism).

I have given my arithmetic course for 7 years now, and invariably
ask the students whether they were taught how floating-point differs
from true real arithmetic, and so far have not had a single yes.
And the rules I am describing are base-independent. And I have
seen users get confused by why their (real) code doesn't behave
according to the rules of real arithmetic more times than I care to
think.

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 24, 2013, 6:13:04 PM2/24/13

to

I think you have not understood something, probably the expected usage
and perhaps the standard itself.

Computations that do not overflow maintain a fixed precision according
to well defined rules for quanta. Unless you are computing with numbers
that exceed 99,999,999,999,999,999,999,999,999,999,999 (place the
decimal point where you will) then the precision is preserved in useful,
common sense ways. If however you are the Zimbabwean National Bank then
you will get gradual overflow, which is perhaps not a problem in
Zimbabwe. I would assert that gradual overflow (duly trapped if desired)
is better than wrong answers even in Zimbabwe.

Because quantum size is controllable, the program can also control at
which digit position rounding is applied. If the program is calculating
in mills (which many tax codes are required to do) then property tax
will be rounded at the mill. Code to do the same in binary is non-trivial.

>> The
>> intent of the quanta parts of IEEE decimal is to permit automatic
>> scaling, as binary floating point does, while preserving significance
>> information. Loss of significance is detectable by the hardware in IEEE
>> decimal, and ignored in IEEE binary.
>
> I know :-(
>
> Attempting to be all things to all men usually ends up with meeting
> none of the requirements.

A worthy epigram. However check before applying it :-)

>
>> You are absolutely right that better hardware support for fixed point
>> would be a good thing. However, neither the IEEE charter nor the
>> practicalities of legacy languages and applications would permit a
>> redefinition of integer.
>
> What does fixed-point arithmetic have to do with integer arithmetic?

As a language designer I have always felt that fixed-point rightly
belonged to the integral group of types; integer could be extended to
fixed point more naturally than float could be cut down to fixed, or
rather than fixed could be extended to float. YMMV.

>> You will find all the "hardware support for fixed point emulation" you
>> want in IEEE decimal.
>
> I am sorry, but no. Yes, I could use it, but it doesn't help to
> actually get it right. I would still need to emulate. And, if you
> have ever tried to reverse the double rounding effect, you will
> know what foul programming is :-(

I'm confused. What do you feel the need to emulate? And where do you see
double rounding (other than where it also occurs in binary)?

Perhaps again you have misunderstood the standard and how it can be
used? Double rounding (and anything else that differs from the effect of
exact single rounding) is explicitly disallowed, just as it is for
binary. And as the operation and modes sets of decimal are supersets of
those for binary, I can't imagine anything that you might think needed
emulation. Please enlighten me.

>
>>> Decimal floating-point doesn't cut the mustard for a large number
>>> of reasons. Here are SOME of the reasons:
>>>
>>> a) The rounding modes include many that are not in IEEE 754;
>>> some even require carry values.
>>
>> International accounting standards that long predate computers mandate
>> so-called "banker's rounding" for currency calculations; in many
>> countries, including your own, this requirement is also by statute. This
>> rounding mode is essential if hardware decimal is to be usable at all.
>
> That is merely one rounding of many specified in various statutes.
> So emulation is needed.

In general no. In particular, no more than would be required if the
operation were performed in integer with programattically maintained
implied radix point, and much less than in binary.

Because the standard defines operations that will produce a value of the
same quantum as some other value, it it trivial to generate a correct
addend for a rounding at any quantum. In the case of some esoteric
rounding requirement, the code does exactly the same thing that a
computer of the kind that sits of a high stool and wears a green eye
shade would do, and will compute the same result.

>
>>> b) There are also strict rules on precision and overflow,
>>> at specific limits.
>>
>> You would prefer no rules?
>
> But they aren't the same rules as IEEE 754 decimal provides!
> So emulation is needed.

I think you've not understood how to code in decimal. The standard does
not dictate a precision (other than that implied by representation
width, and I hope quad is enough for you) nor overflow behavior. It
provides the tools by which you (or rather your code) can control the
precision and define the overflow behavior.

Saying what you want to happen is called "programming", not "emulation".

<snip>

>>
>> I'm not sure why you single out multiplication and division, which work
>> just fine on IEEE decimal.
>
> Because they don't do fixed-point arithmetic - they will rescale
> to preserve precision.

Only if they would otherwise have lost precision due to overflow. I
grant you that true fixed point would have faulted on overflow, or
(following the history of integer overflow behavior on present
computers) silently lost bits.

I know which behavior I prefer. YMMV.

Ivan Godard

unread,

Feb 24, 2013, 6:19:54 PM2/24/13

to

Sounds reasonable, but...

IBM is not known for throwing away money. Not only was the DPD
representation of IEEE decimal invented at an IBM lab (and the hardware
implementation too), but they have added a hardware decimal FP unit to
all their zSeries mainframes. The most recent iteration even added more
surrounding logic to make the decimal pipe go faster.

http://researchweb.watson.ibm.com/journal/abstracts/rd/531/schwarz.html

Robert Wessel

unread,

Feb 24, 2013, 6:43:57 PM2/24/13

to

On Sun, 24 Feb 2013 15:19:54 -0800, Ivan Godard <iv...@ootbcomp.com>
wrote:

POWER too.

James Van Buskirk

unread,

Feb 24, 2013, 6:48:48 PM2/24/13

to

"Ivan Godard" <iv...@ootbcomp.com> wrote in message
news:kgdv2b$kpl$2...@dont-email.me...

Do you mean something more like SUM(a(1:10))/10 and SUM(a(1:10)/10)?

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Quadibloc

unread,

Feb 24, 2013, 8:16:52 PM2/24/13

to

On Feb 24, 4:19 pm, Ivan Godard <i...@ootbcomp.com> wrote:

> IBM is not known for throwing away money.

This is true. But from the viewpoint of a corporation, money is not
thrown away if it is spent on features that appeal to customers and
increase sales... regardless of their ultimate technical merit.

I don't object to decimal floating point, and it appears that in
recent iterations of Excel, Microsoft has moved away from using
standard data types in spreadsheets, as was the old practice, towards
using a software format of data which is essentially equivalent to
decimal floating point.

Database programs often store numerical data in printed form, rather
than in internal binary format - at least older microcomputer ones,
like dBase II, did this.

So it is indeed possible that decimal floating-point could meet some
real need for business data processing for those situations where
scaled decimal is inadequate, or, at least, where scaled decimal would
impose a burden on the programmer.

At least one thing is definitely positive: the original packed decimal
instructions on the IBM 360 made it, basically, a 7090 and a 1401
nailed together. That meant it was very efficient at binary
computations and very inefficient at decimal computations. (That,
though, depended on the implementation. Being 1401-like suited the
smaller models of the 360 quite well.)

Now that "all computers are made this way" - the way the 360/195 was
made, with cache, pipelining, Wallace trees, and advanced division
algorithms - DFP is, in a way, a move towards a 7090 and a 7070 nailed
together, which is considerably more harmonious. That is, decimal
computations are done a word at a time, just like binary ones, not a
digit at a time.

John Savard

Quadibloc

unread,

Feb 24, 2013, 8:18:05 PM2/24/13

to

On Feb 24, 4:43 pm, Robert Wessel <robertwess...@yahoo.com> wrote:

> POWER too.

Yes, and Intel was supposed to be including DFP in its next generation
of x86 chips - and when I heard that was long enough ago, that I
wonder if it is in their current chips.

John Savard

John Levine

unread,

Feb 24, 2013, 9:00:07 PM2/24/13

to

>> Or just throw the whole, horrible mess out of the window, use a
>> binary format, and convert when needed. �Decimal floating-point
>> is a marketing solution to a known technical non-requirement.

I gather it is somewhat useful in financial calculations, where the
requirement is to get the same answer as everyone else who's done
the same calculation over the past century, not the mathematically
correct answer.

Back in the 1980s, I did the financial functions for a modelling
package, and it was quite tricky to get the right answers to bond
yield and price calculations, because they were defined in terms of
decimal rounding.

R's,
John
--
Regards,
John Levine, jo...@iecc.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. http://jl.ly

Ivan Godard

unread,

Feb 24, 2013, 10:30:49 PM2/24/13

to

On 2/24/2013 10:10 AM, nm...@cam.ac.uk wrote:

I had to go look up "wobbling significance"; it wasn't a term of art to
me. Turns out that it refers to the sawtooth distribution of values
across the real line, a problem I knew well, and is a very apt name for
it indeed.

Some early machines got more mantissa bits by using a bigger radix,
usually hexadecimal; every time you increased the exponent you lost four
hard-won bits, only to gradually gain the precision again as you
proceeded to the next increase. The sawtooth is worse the smaller the
underlying significance, quite noticeable at single and sub-single
precisions and pretty invisible at quad and super-quad precisions. The
problem afflicts all bases, but is worse the larger the radix, so these
days binary radix is used in floating point that is approximating real.

Note "floating point" and "approximating real". Wobbling significance is
irrelevant in fixed point and when not approximating real, i.e. IEEE
decimal's intended application domain.

In the case in which decimal is in fact used as true floating point in
an application that does in fact approximate real, then the significance
does wobble within its factor-of-10 range within the significance of the
representation, whereas binary wobbles by only a factor of two within
the representation significance.

If that wobble is significant (ahem), then you have a clear choice: in
double precision (say) you can have a standard binary double that
wobbles as one part in ~10**15, a (non-standard but permitted extension)
double extended that wobbles as one part in ~10**19, a standard decimal
double which wobbles as one part in ~10**33.

Some people will care about extreme significance enough to buy an IBM
product (or, someday, a Mill product) to get the extra significance of
decimal double. Others may not care that much but still care enough to
set compiler flags to use double extended in the x87 unit. And some
people don't care and will use binary double despite the lower
precision. And some of the last group will try to score points about
wobbling in the higher precision they choose not to use :-)

Robert Wessel

unread,

Feb 25, 2013, 12:29:22 AM2/25/13

to

It's certainly not anything they've documented or announced for any
shipping implementations.

nm...@cam.ac.uk

unread,

Feb 25, 2013, 3:05:47 AM2/25/13

to

In article <jhtli8t8h5qsv57b5...@4ax.com>,

I asked at EMEA a couple of years back and was told that there
were no plans, which I deduce means that it has been dropped.
As far as I know, nobody except IBM is planning to put it into
hardware, and nobody (not even IBM) is planning to make it
available on systems for small or medium business use.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 25, 2013, 3:08:46 AM2/25/13

to

In article <kgegj7$208$1...@leila.iecc.com>, John Levine <jo...@iecc.com> wrote:
>>> Or just throw the whole, horrible mess out of the window, use a
>>> binary format, and convert when needed. �Decimal floating-point
>>> is a marketing solution to a known technical non-requirement.
>
>I gather it is somewhat useful in financial calculations, where the
>requirement is to get the same answer as everyone else who's done
>the same calculation over the past century, not the mathematically
>correct answer.
>
>Back in the 1980s, I did the financial functions for a modelling
>package, and it was quite tricky to get the right answers to bond
>yield and price calculations, because they were defined in terms of
>decimal rounding.

I know :-( And my point is that this doesn't do what you were
required to do. It can be made to, fairly easily, for addition,
subtraction and (with reservations) multiplication - but don't
even think of trying to do currency conversion using it if you
need to both follow specific rules and get it right.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 25, 2013, 3:12:55 AM2/25/13

to

In article <lc9li8l0d0140ba9g...@4ax.com>,

Robert Wessel <robert...@yahoo.com> wrote:
>On Sun, 24 Feb 2013 15:19:54 -0800, Ivan Godard <iv...@ootbcomp.com>
>wrote:
>

>>IBM is not known for throwing away money. Not only was the DPD
>>representation of IEEE decimal invented at an IBM lab (and the hardware
>>implementation too), but they have added a hardware decimal FP unit to
>>all their zSeries mainframes. The most recent iteration even added more
>>surrounding logic to make the decimal pipe go faster.
>>
>>http://researchweb.watson.ibm.com/journal/abstracts/rd/531/schwarz.html
>
>POWER too.

zOS is based on a modified POWER architecture nowadays. However,
while IBM is not known for DELIBERATELY wasting money, they are
well-known for doing so because they take a wrong decision at a
high level. IBM's disk business shambles is well-recorded, and
the POWER4 shambles was comparable but better covered up.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 25, 2013, 4:04:21 AM2/25/13

to

I shall not follow up after this, unless there is a particularly
pressing requirement.

In article <kge6nv$3h7$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>
>I think you have not understood something, probably the expected usage
>and perhaps the standard itself.

Possibly. However, saying that a flawed design is correct because
people will use only the parts that are is not good mathematics or
software engineering.

>Computations that do not overflow maintain a fixed precision according
>to well defined rules for quanta. Unless you are computing with numbers
>that exceed 99,999,999,999,999,999,999,999,999,999,999 (place the
>decimal point where you will) then the precision is preserved in useful,
>common sense ways. If however you are the Zimbabwean National Bank then
>you will get gradual overflow, which is perhaps not a problem in
>Zimbabwe. I would assert that gradual overflow (duly trapped if desired)
>is better than wrong answers even in Zimbabwe.

Not as I read 5.2. Addition and subtraction are not the only operations,
you know. Repeated multiplication or (in general) a single division
will invariably exhaust the available precision. To take an example:

Multiply 1.2 by 3.4 in 5 digits with a preferred precision of 1.

I think that will deliver 4.08, not 4.1. That is not fixed-point
arithmetic.

>Because quantum size is controllable, the program can also control at
>which digit position rounding is applied. If the program is calculating
>in mills (which many tax codes are required to do) then property tax
>will be rounded at the mill. Code to do the same in binary is non-trivial.

Yes, it can, but not in the basic operations as I read it. 5.2 again,
especially the last sentence of paragraph 4: "If If the result's cohort
does not include a member with the preferred exponent, the member with
the exponent closest to the preferred exponent is used."

>> Attempting to be all things to all men usually ends up with meeting
>> none of the requirements.
>
>A worthy epigram. However check before applying it :-)

I have, repeatedly and fairly carefully. I usually do. I may have
made a mistake, but I don't think so.

>As a language designer I have always felt that fixed-point rightly
>belonged to the integral group of types; integer could be extended to
>fixed point more naturally than float could be cut down to fixed, or
>rather than fixed could be extended to float. YMMV.

As a numerical mathematician, I can explain why that is correct ar the
hardware level but a mistake at the language design level.

>Perhaps again you have misunderstood the standard and how it can be
>used? Double rounding (and anything else that differs from the effect of
>exact single rounding) is explicitly disallowed, just as it is for
>binary. And as the operation and modes sets of decimal are supersets of
>those for binary, I can't imagine anything that you might think needed
>emulation. Please enlighten me.

I never said that binary supported it, either - I am disputing your
claims of benefits for use for fixed-point decimal emulation. Let's
say that I have two numbers, 1 < A, B < 10, with 20 digits of
significance and 15 digits after the decimal point. I want to
multiply them and get back to the same state. When I do the
multiplication, I will have to selected SOME rounding (which could
be truncation), but it will be in the wrong place. So I have to
round again - Q.E.D.

That one's soluble by multiplying into double precision and then
rounding, but the division equivalent isn't. That's hell to fix up.
Try it :-)

>> That is merely one rounding of many specified in various statutes.
>> So emulation is needed.
>
>In general no. In particular, no more than would be required if the
>operation were performed in integer with programattically maintained
>implied radix point, and much less than in binary.

Try fixing up after division. Nobody with even a modicum of Clue
would dream of trying to emulate fixed point using binary floating-
point. It's been known to be a moronic idea since the beginning
of time.

>Because the standard defines operations that will produce a value of the
>same quantum as some other value, it it trivial to generate a correct
>addend for a rounding at any quantum. In the case of some esoteric
>rounding requirement, the code does exactly the same thing that a
>computer of the kind that sits of a high stool and wears a green eye
>shade would do, and will compute the same result.

Er, no. NOT when you hit the double rounding problem.

>I think you've not understood how to code in decimal. The standard does
>not dictate a precision (other than that implied by representation
>width, and I hope quad is enough for you) nor overflow behavior. It
>provides the tools by which you (or rather your code) can control the
>precision and define the overflow behavior.

Again, WHEN it provides something that maps into what is needed,
which is not always.

>> Because they don't do fixed-point arithmetic - they will rescale
>> to preserve precision.
>
>Only if they would otherwise have lost precision due to overflow. I
>grant you that true fixed point would have faulted on overflow, or
>(following the history of integer overflow behavior on present
>computers) silently lost bits.

Again, no. That's one reason. The other is the build-up of
significant digits in multiplication and division.

>I know which behavior I prefer. YMMV.

And I know which Cobol mandates, and it's the other one.

>I had to go look up "wobbling significance"; it wasn't a term of art to
>me. Turns out that it refers to the sawtooth distribution of values
>across the real line, a problem I knew well, and is a very apt name for
>it indeed.

Yes, but do you know of the numerical problems it causes?

>If that wobble is significant (ahem), then you have a clear choice: in
>double precision (say) you can have a standard binary double that
>wobbles as one part in ~10**15, a (non-standard but permitted extension)
>double extended that wobbles as one part in ~10**19, a standard decimal
>double which wobbles as one part in ~10**33.

That's pure polemic. There is also a standard binary format with the
last precision, and it is more widely available than decimal in
hardware.

It's also not the point. Using base K in a fixed-width storage format
loses a little precision (about log_2(K)-log_2(log_2(K)), +1 if the
hidden bit trick is used). But that's minor. The real problem is
that wobbling precision/significance can lose accuracy at a higher
rate than binary (which is almost, but not quite, immune for fairly
seriously arcane reasons) for some real and important uses.

Again, this has been known since certainly the late 1960s, and was
one of the main reasons that numerical experts wanted to abandon
other bases in favour of binary. The problem is that it was never
well-known :-(

Regards,
Nick Maclaren.

Robert Wessel

unread,

Feb 25, 2013, 4:17:39 AM2/25/13

to

On Mon, 25 Feb 2013 08:12:55 +0000 (GMT), nm...@cam.ac.uk wrote:

>In article <lc9li8l0d0140ba9g...@4ax.com>,
>Robert Wessel <robert...@yahoo.com> wrote:
>>On Sun, 24 Feb 2013 15:19:54 -0800, Ivan Godard <iv...@ootbcomp.com>
>>wrote:
>>
>>>IBM is not known for throwing away money. Not only was the DPD
>>>representation of IEEE decimal invented at an IBM lab (and the hardware
>>>implementation too), but they have added a hardware decimal FP unit to
>>>all their zSeries mainframes. The most recent iteration even added more
>>>surrounding logic to make the decimal pipe go faster.
>>>
>>>http://researchweb.watson.ibm.com/journal/abstracts/rd/531/schwarz.html
>>
>>POWER too.
>
>zOS is based on a modified POWER architecture nowadays.

Not really. While there are some shared bits (the last couple of Z
FPUs were largely ripped from POWER), and certainly a fair bit of
cross pollination, the microarchitectures remain radically different.

There was speculation about an "eCLipz" project or plan that was to
unify all the platforms, but other than a few mentions on slides half
a decade ago, that's really all disappeared (although the AS/400 an
POWER lines are now full converged, although that's been a work in
progress for much longer than that).

One thing that's new with the last couple of generations of Z is the
ability to house and manage a rack of non-Z servers for various
purposes, but no zOS code runs there.

IBM's Systems Journal and Journal of R&D have had decent issues
devoted to these chips. Unfortunately they're no longer available
free.

nm...@cam.ac.uk

unread,

Feb 25, 2013, 4:33:02 AM2/25/13

to

In article <8f9mi8h00rg810l52...@4ax.com>,

Robert Wessel <robert...@yahoo.com> wrote:
>>
>>zOS is based on a modified POWER architecture nowadays.
>
>Not really. While there are some shared bits (the last couple of Z
>FPUs were largely ripped from POWER), and certainly a fair bit of
>cross pollination, the microarchitectures remain radically different.

Thanks for the correction. I wasn't desperately interested, so
didn't read the articles carefully. In the context of this thread,
of course, the FPU is the relevant bit :-)

>There was speculation about an "eCLipz" project or plan that was to
>unify all the platforms, but other than a few mentions on slides half
>a decade ago, that's really all disappeared (although the AS/400 an
>POWER lines are now full converged, although that's been a work in
>progress for much longer than that).

Yes. I was involved with IBM when that started. I did hear from
an IBMer (not a salesdroid) that the plan to make zOS use POWER
had nearly been completed, but that was a long time ago now and
it might have been one of the projects that got to a late stage
before being shelved in a dungeon and left to the rats.

Regards,
Nick Maclaren.

Quadibloc

unread,

Feb 25, 2013, 7:06:12 AM2/25/13

to

I have done a search, and I see that my memory must have been playing
tricks on me. Intel does support decimal floating-point by having made
a software implementation available. That software uses an alternative
encoding format, the BID format, thus while the exponent is a power of
ten, everything else is binary.

This method of allowing computers without decimal support to handle
DFP dates back to the JOSS interpreter of John von Neumann.

John Savard

nm...@cam.ac.uk

unread,

Feb 25, 2013, 7:34:35 AM2/25/13

to

In article <223c373d-2fba-4f15...@9g2000yqy.googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>On Feb 24, 10:29 pm, Robert Wessel <robertwess...@yahoo.com> wrote:
>
>> >Yes, and Intel was supposed to be including DFP in its next generation
>> >of x86 chips - and when I heard that was long enough ago, that I
>> >wonder if it is in their current chips.
>>
>> It's certainly not anything they've documented or announced for any
>> shipping implementations.
>
>I have done a search, and I see that my memory must have been playing
>tricks on me. Intel does support decimal floating-point by having made
>a software implementation available. That software uses an alternative
>encoding format, the BID format, thus while the exponent is a power of
>ten, everything else is binary.

No, your memory was NOT playing tricks. There was indeed just such
a statement, though I never saw Intel state formally that they
would put anything into hardware. It was confirmed as a plan by
people within Intel, so it wasn't just the rumour mill.

As you point out, software emulation dates from time immemorial,
though I am pretty sure that it predated JOSS by a decade. I have
no idea what bases were used for the really early versions, but
my guess is that 10 was one of them. But it isn't what most people
when they talk about Intel supporting something.

Regards,
Nick Maclaren.

Michael S

unread,

Feb 25, 2013, 9:06:58 AM2/25/13

to

No, DFP is not in current (IvyB) chips and not in the next generation
(Haswell).
And since the one after next generation (Broadwell) is a tick, we can
be pretty sure that it also does not contain DFP.

Michael S

unread,

Feb 25, 2013, 10:39:18 AM2/25/13

to

On Feb 25, 3:16 am, Quadibloc <jsav...@ecn.ab.ca> wrote:
>
> Now that "all computers are made this way" - the way the 360/195 was
> made, with cache, pipelining, Wallace trees, and advanced division
> algorithms - DFP is, in a way, a move towards a 7090 and a 7070 nailed
> together, which is considerably more harmonious. That is, decimal
> computations are done a word at a time, just like binary ones, not a
> digit at a time.
>
> John Savard

I am not sure about POWER7 and the newest zArch chip (zEC12 or what is
official name?), but on POWER6 and z10 DFP hardware can be described
as processing approximately 1 digit at time.
z196 DFP hardware is faster than that, especially for addition/
subtraction, but, according to my understanding, still not pipelined.
So even on z196 the throughput of DFP calculations is far away from
binary floating point or from BCD fix-point on the same machine.

If you are interested, here are more details than I, personally, want
to know:
http://www.acsel-lab.com/arithmetic/papers/ARITH20/ARITH20_Carlough.pdf

Ivan Godard

unread,

Feb 25, 2013, 11:57:02 AM2/25/13

to

Well, they supported it at least to the extent of paying for 20 or so
people to attend the three IEEE meetings required to to be eligible to
vote, so the BID decimal format would be part of the official IEEE
floating point standard.

nm...@cam.ac.uk

unread,

Feb 25, 2013, 12:11:46 PM2/25/13

to

In article <kgg52s$199$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>>>
>>>>> Yes, and Intel was supposed to be including DFP in its next generation
>>>>> of x86 chips - and when I heard that was long enough ago, that I
>>>>> wonder if it is in their current chips.
>>>>
>>>> It's certainly not anything they've documented or announced for any
>>>> shipping implementations.
>>>
>>> I have done a search, and I see that my memory must have been playing
>>> tricks on me. Intel does support decimal floating-point by having made
>>> a software implementation available. That software uses an alternative
>>> encoding format, the BID format, thus while the exponent is a power of
>>> ten, everything else is binary.
>>
>> No, your memory was NOT playing tricks. There was indeed just such
>> a statement, though I never saw Intel state formally that they
>> would put anything into hardware. It was confirmed as a plan by
>> people within Intel, so it wasn't just the rumour mill.
>>
>> As you point out, software emulation dates from time immemorial,
>> though I am pretty sure that it predated JOSS by a decade. I have
>> no idea what bases were used for the really early versions, but
>> my guess is that 10 was one of them. But it isn't what most people
>> when they talk about Intel supporting something.
>
>Well, they supported it at least to the extent of paying for 20 or so
>people to attend the three IEEE meetings required to to be eligible to
>vote, so the BID decimal format would be part of the official IEEE
>floating point standard.

Well, yes, I know that. So?

There are four possibilities, all of which make excellent economic
sense for Intel:

1) They intended to implement it, but did a cost-benefit analysis
and thought better of it.

2) They wanted a specification that they COULD implement without
busting a gut (and without IBM IP problems?), sometime, manana.

3) They wanted a specification that was feasibly implementable
by emulation using binary integers.

4) They wanted to hedge their bets between (1)-(3).

Allocating 50 man-years or so to ensure that you don't get caught
in a cleft stick in an area (numeric computation) that is a major
source of revenue is a no-brainer for a company like Intel. IBM
had just such a policy on standards, and probably still do.

But that is even less what most people mean when they talk about
Intel supporting something.

Regards,
Nick Maclaren.

Andy (Super) Glew

unread,

Feb 25, 2013, 12:44:19 PM2/25/13

to Ivan Godard

On 2/23/2013 9:43 AM, Ivan Godard wrote:
> On 2/23/2013 5:01 AM, Terje Mathisen wrote:
>> Ivan Godard wrote:
>>> On 2/23/2013 1:45 AM, bert wrote:
>>>> On Tuesday, February 19, 2013 3:15:08 AM UTC+1, Andy (Super) Glew
>>>> wrote:
>>>>> Inviting additions - what are your favorite examples
>>>>> of special purpose registers?
>>>>
>>>> The 'excess constants' register on the LEO 3. Loading it with
>>>> hex 66666 converted the arithmetic to packed decimal. Other
>>>> suitable values converted it to UK pounds, shillings and pence;
>>>> hours, minutes and seconds; or tons, hundredweights and stones.
>>>>
>>>
>>> Oh really! Can you link to an engineering description - Google is
>>> getting my only the business/historical story, nothing useful
>>> technically.
>>
>> Seems rather obvious:
>>
>> The 'excess constant' was added to the results of the regular add, with
>> the per-nybble carries generated by this operation instead of just the
>> A+B+Incoming_Carry terms.
>>
>> I.e. each nybble adder would consist of two parts, one to generate the
>> regular result and the other would merge in the excess part to generate
>> any outgoing carry.
>
> Not obvious to me.
>
> The well-known 6666 technique for packed decimal puts one digit in each
> four-bit nibble, and adds 6 to generate the carry as you say. One could
> do the same for h/m/s (for example) with a low order nibble size of 6
> bits and adding 4 for the carry. And the nibble is 4 bits for l/s/d with
> an added 4 in the low order, and 5 bits with an added 12 in the next
> nibble. As you say, it's obvious how to do any one of those.
>
> However, I don't see how a single piece of hardware does *all* of them
> because not only does the constant change, the nibble size does too.
> Hence which bit of the adder is to supply the carry differs among the
> cases. I suppose you could mux to select the right bit, but that's more
> than just picking a constant as you implied was the case.
>
> Or you could select the widest nibble of all interesting cases (6 bits
> in your examples) and adjust the constant accordingly. However, that
> makes the packed decimal constant be 54-54-54, not 6-6-6 as you gave;
> such a design wastes a lot of adder bits; and you need circuitry to
> explode 4-bit packed decimal up to 6-bit adder.
>
> There are probably other ways it could be done too, especially if the
> adder was bit sequential rather than bit parallel, as seems likely for
> the day. I'd like to know how they *actually* did it, so if you can find
> a tech doc source please cite it.
>
> Ivan

Wow! This topic got many responses, not all of which I have read yet.

My first stab at it: a mask to indicate chunk size (what Ivan calls
"nibble" size), and the value to be added - such that overflow the
reduced digit set would send an overflow all the way to the bit boundary
indicated by the mask.

E.g. for a 32 bit register, with BCD arithmetic:

mask = 1000_1000_1000_1000_1000_1000_1000_1000
addin = 0110_0110_0110_0110_0110_0110_0110_0110

(sorry, I think in binary better than I do in hex:
mask = 88888888
addin = 66666666
)

or possibly the mask might be better inverted
mask = 0111_0111_0111_0111_0111_0111_0111_0111
or
mask = 1110_1110_1110_1110_1110_1110_1110_1110
the latter indicating "no carry in"
the former "no carry out"

E.g. for y:d365:m12:d31:h:m:s
- i.e. for a contrived situation,
years of exactly 365 days (no leap years)
months of exactly 31 days (no SepAprJunNovFeb)

mask
011111111_0111_01111_01111_011111_011111
addin
512-365 .16-12.32-31.32-24. 64-60.64.60
147. 4. 1. 8 . 4. 4

now, can I figure out a way to encode this in a single N-bit number?

... actually, perhaps I don't WANT to encode to
this in a single number. The mask is convenient for shifting.
and allows different addin values to be used - e.g. if correcting
for non-31day months

512-365 .16-12.32-31.32-24. 64-60.64.60
147. 4. 1. 8 . 4. 4
=2
3
=9,4,6,11
2

I would not seriously suggest doing much arithmetic in this format.
Handling different month sizes might be okay,
but handling leapyears (let alone leapseconds)
would be slow.

But... a polynomial evaluation instruction that did

y:d365:m12:d31:h:m:s
mask
011111111_0111_01111_01111_011111_011111
Horner's rule coefficient
365. 12. 31. 24. 60. 60

might be efficient in computing a a canonical
y:d365:m12:d31:h:m:s
representation - from which a table lookup
might compute a corrected true time in seconds.

Hmmm... I think I can see how to use the mask to control a multiplier
array to evaluate such a polynomial. But it looks like it is still a
bit more expensive than a non-grouped multiplier.

--

See also

https://www.semipublic.comp-arch.net/wiki/Operations_Under_Mask#Arithmetic_operations_under_mask

--
The content of this message is my personal opinion only. Although I am
an employee (currently of MIPS Technologies; in the past of companies
such as Intellectual Ventures and QIPS, Intel, AMD, Motorola, and
Gould), I reveal this only so that the reader may account for any
possible bias I may have towards my employer's products. The statements
I make here in no way represent my employers' positions on the issue,
nor am I authorized to speak on behalf of my employers, past or present.

Ivan Godard

unread,

Feb 25, 2013, 12:58:31 PM2/25/13

to

Fascinating paper; a very clean and well done design IMO.

It's a 4 stage pipe. They do a 16-digit add in 6 clocks and a 34-digit
add in 8, a bit faster than your understanding :-) (There's a table on
the last page with latencies of common operations).

The design was optimized for area and BCD latency, not for FP, so
there's an iterative multiplication - although the text does say that
they rejected a 4x faster multiplier (that sounded like if was a decimal
Wallace tree) because the result would have been 50% bigger than all the
rest of the unit.

The units duplicates much of the logic for RAS, and has parity, excess-3
and excess-9 checking throughout. All that and it runs at 5.2GHz. I'm
impressed.

Terje Mathisen

unread,

Feb 25, 2013, 1:43:15 PM2/25/13

to

Ivan Godard wrote:
> On 2/24/2013 10:10 AM, nm...@cam.ac.uk wrote:
>> Decimal floating-point doesn't cut the mustard for a large number
>> of reasons. Here are SOME of the reasons:
>>
>> a) The rounding modes include many that are not in IEEE 754;
>> some even require carry values.
>

> International accounting standards that long predate computers mandate
> so-called "banker's rounding" for currency calculations; in many
> countries, including your own, this requirement is also by statute. This
> rounding mode is essential if hardware decimal is to be usable at all.

According to wikipedia this is 'round-to-nearest-even', which is of
course the same as the binary fp default.

The main difference being _where_ you round, of course!

If I'm required to have 4 fractional digits (common for many banking
applications), then I can scale the value by 10000, then add a binary
bias value so that the two top bits are set and the actual value is
aligned with the least significant mantissa bits.

The key is of course that the addition of the bias forces the required
rounding to happen at the correct spot.

It is probably easier to make this correct in a purely scaled integer
code...

> It is of course meaningless in binary where the rounding will be wrong
> anyway. We considered adding it to binary anyway simply for the
> symmetry, but decided that there was no reason to burden the hardware
> guys with a pointless mode that would potentially break binary codes
> that assume the existing set of modes.

>
>
>> b) There are also strict rules on precision and overflow,

>> at specific limits.
>
> You would prefer no rules?

>
>> c) Many of them constrain multiplication and division; the
>> former has to be done by a double precision operating and rounding
>> to s precision, and the latter is too horrible to contemplate
>> starting from floating-point.
>

> There are no "many of them". There is one standard, and almost nothing
> is "up to the implementation". Implementation-defined things include the
> bitwise representation of values (which you cannot detect with decimal
> operations; you'd have to cast to integer and look at the bits). The
> semantics of trap handlers is also implementation-defined, but that's
> true of binary FP as well, and the definitions and facilities for traps
> are identical in the two radices.

>
>>
>>> As for the impact, we on the committee (mostly HPC types with little
>>> business background) wondered if there was really a need to upgrade 854
>>> as part of our work. Then we saw measurements of real-world database
>>> applications. If a column is typed "numeric" then the database keeps the
>>> values in decimal, and the programs - yes, often COBOL - do the
>>> arithmetic in decimal too. Over a sample of several thousand
>>> applications, umptillion CPU cycles, and billions of dollars of
>>> hardware, some 31% of all processor cycles were spent in *decimal
>>> arithmetic emulation routines*.
>>
>> I don't believe that statistic. Waiting for memory and recovering
>> from glitches is VASTLY higher. But, even if it were true, my
>> point is that what has been standardised won't meet even Cobol's
>> basic requirements, let alone those of all of the laws, and it
>> will STILL require an emulation layer!
>

> Well, I didn't do the studies, but I believe the counts were of issued
> instructions and hence would not have included memory stalls that were
> not hidden by the OOO. I *have* looked at the emulation codes that were
> consuming the cycles, and I believe the numbers for apps in which
> essentially all the computation is being done in decimal, which is the
> routine case. YMMV.

This is the real problem: All those applications that traditionally have
done everything in decimal (or even ascii) are really broken, they would
run one or two orders of magnitude faster if you did everything with
64-bit binary integers. :-(

I.e. I am talking about the (in)famous telecom invoicing benchmark which
processes horrible amounts of (ascii) call line items: It is faster to
convert everything to binary than to do the calculations in
(effectively) BCD.

> There is only one case I know of in which a standard that preserves
> significance requires added thought (and code). That's in algorithms
> that actually *increase* significance, although they don't look like it.
> A Newton-Rapheson sqrt is adding 10 bits or so of significance with
> every iteration, yet in standard decimal the quanta will be shrinking.
> You have to use the (standard) operations to assert the new and correct
> significance of the result.

>
> Of course, I doubt many COBOL finance programs are doing
> Newton-Rapheson. :-)

I bet there's a lot of (binary) fp calculations in the High Frequency
Trading programs, they are probably responsible for a big part of the
total cycles used in finance.

> There is no specification for prenormalization; not sure where you got
> the idea. Instead the standard uses the same terminology as it uses for
> binary: computation shall be *as if* the operation were performed
> exactly, and the result is then rounded as specified by the mode. I'm
> reasonably confident that doing prenormalization followed by a hardware
> binary operation and then post un-normalization would not conform to
> standard; I do not recommend that you buy any machine that does it that
> way.

I disagree: I'm quite confident that I can take any standard decimal
operation and come up with a set of pre/post operations that would cause
the correct result to come out after a binary fp operation. :-)

> However, we did consider it apt for stability exploration. Kahan had a
> technique that he had found very useful in heavy numeric applications as
> a way to get a sense of the "goodness" of a result. He would run the
> application twice, once with rounding mode set to round-up (toward
> positive infinity) and once to round-down. The difference between the
> two results gave an approximation to the significance. He had quite a
> few harmless-looking examples where the difference was of the same
> magnitude as the result itself.
>
> Applied here, one could run an existing app again in decimal and then
> try to figure out why the quantum had shrunk to zero :-)
>

Always an interesting problem...

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Bill Findlay

unread,

Feb 25, 2013, 1:53:01 PM2/25/13

to

On 25/02/2013 17:44, in article 512BA2F3...@SPAM.comp-arch.net, "Andy

(Super) Glew" <an...@SPAM.comp-arch.net> wrote:

>
> But... a polynomial evaluation instruction that did
>
> y:d365:m12:d31:h:m:s
> mask
> 011111111_0111_01111_01111_011111_011111
> Horner's rule coefficient
> 365. 12. 31. 24. 60. 60
>
> might be efficient in computing a a canonical
> y:d365:m12:d31:h:m:s
> representation - from which a table lookup
> might compute a corrected true time in seconds.

The KDF9 had a pair of instructions TOB (To Binary) and FRB (From Binary)
that used a radix word to direct conversion. It did not allow fields of
varying length: they were all 6 bits (one character).

TOB did this:
C := 0;
for i in 1 .. 8 loop
A := rotate_word_left(A, 6);
B := rotate_word_left(B, 6);
C := C*(B and 8#77#) + (A and 8#77#);
end loop;

Here the value of C is the result, B is the radix word and A is the word of
8 characters in the corresponding radices.

And FRB did this:
C := 0;
for i in 1 .. 8 loop
C := C or (A mod (B and 8#77#));
A := A / (B and 8#77#);
B := shift_word_right(B, 6);
C := rotate_word_right(C, 6);
end loop;

Here, A is the original binary value.

The KDF9 was first installed 50 years ago next month.

--
Bill Findlay
with blueyonder.co.uk;
use surname & forename;

Terje Mathisen

unread,

Feb 25, 2013, 2:04:00 PM2/25/13

to

Ivan Godard wrote:
> On 2/24/2013 11:10 AM, Terje Mathisen wrote:
>> I still like truncate-to-zero instead of denormalization, mostly because
>> it makes for a cleaner implementation.
>>
>> If you ever get into these ranges, and depend upon gradual underflow,
>> then I _really_ hope you know what you are doing.
>
> Bill Kahan, who invented gradual underflow, certainly knew what he was
> doing. His was the last generation of serious numerical analysts,
> though. Nick teaches students who need to know that FP is not "real",
> and I bet he'd tell you that the students aren't interested.
>
> The idea behind gradual underflow was to have the density of
> representable numbers be similar around zero to the density further out.
> However, when I told him he could have FTZ quad for the cost of denormal
> single, Kahan himself told me he'd take the quad.

Exactly.

I wrote a 128-bit fp library (two or three days/a weekend?) just so I
could verify the FDIV/FPATAN sw replacement we developed during the FDIV
affair (Dec 1994-Jan 1995).

I found that ignoring underflow made my code quite a bit simpler, and it
was still more than accurate enough to check 64 and 80-bit fp operations.

Actually writing the code made it clear to me why fp people talk about
guard bits and a 'sticky' bit. :-)

nm...@cam.ac.uk

unread,

Feb 25, 2013, 3:10:38 PM2/25/13

to

In article <3vnsv9-...@ntp-sure.tmsw.no>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>
>> However, we did consider it apt for stability exploration. Kahan had a
>> technique that he had found very useful in heavy numeric applications as
>> a way to get a sense of the "goodness" of a result. He would run the
>> application twice, once with rounding mode set to round-up (toward
>> positive infinity) and once to round-down. The difference between the
>> two results gave an approximation to the significance. He had quite a
>> few harmless-looking examples where the difference was of the same
>> magnitude as the result itself.
>>
>> Applied here, one could run an existing app again in decimal and then
>> try to figure out why the quantum had shrunk to zero :-)
>
>Always an interesting problem...

However, he was being over-simplistic, and it would work well for
only very well-behaved code. Directed rounding is notorious,
and my rule of thumb was that IBM System/370 64-bit was about as
accurate as ICL 1900 48-bit. The technique is useful, but NOT
by using directed rounding - for reasonable results, you need to
use a reasonable rounding method. Here are some:

Use a form of nearest rounding (it doesn't matter which)

Use a slightly reduced precision

Use the 'rounding' of truncating and forcing the last bit to 1

Use probabilistic rounding of the CS hack form

Use true probabilistic rounding

But the last two caused steam to come out of Kahan's ears!

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 25, 2013, 3:34:23 PM2/25/13

to

On 2/25/2013 10:43 AM, Terje Mathisen wrote:
> Ivan Godard wrote:

<snip>

>
>> There is no specification for prenormalization; not sure where you got
>> the idea. Instead the standard uses the same terminology as it uses for
>> binary: computation shall be *as if* the operation were performed
>> exactly, and the result is then rounded as specified by the mode. I'm
>> reasonably confident that doing prenormalization followed by a hardware
>> binary operation and then post un-normalization would not conform to
>> standard; I do not recommend that you buy any machine that does it that
>> way.
>
> I disagree: I'm quite confident that I can take any standard decimal
> operation and come up with a set of pre/post operations that would cause
> the correct result to come out after a binary fp operation. :-)

It's not even true for the identity operation.

There are numbers in either radix that you cannot convert to the other
radix and back and have the result match the input. The problem is that
each conversion must go to the nearest adjacent value. Say the input is
A in radix r1, and the nearest value in radix r2 is B. However, it is
quite possible for the nearest value to B in r1 is not A but some other
value A'.

This problem exists in every pair of radices; it's a consequence of the
wobbling precision problem (my thanks to Nick Maclaran for the term.
It's always nice to get a word for a long-known concept).

Ivan

Ivan Godard

unread,

Feb 25, 2013, 3:37:45 PM2/25/13

to

An apt description :-)

We had stochastic rounding in an early Mill proposal. We took it out
after Kahan (and several other analysts on the committee) took me in
hand and explained why it was a seductive but fundamentally bad idea.

Ivan

nm...@cam.ac.uk

unread,

Feb 25, 2013, 3:50:47 PM2/25/13

to

In article <kggi0l$gna$2...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>
>We had stochastic rounding in an early Mill proposal. We took it out
>after Kahan (and several other analysts on the committee) took me in
>hand and explained why it was a seductive but fundamentally bad idea.

Hmm. I am not of his calibre, but I am pretty sure that I could
hold my own in that debate, and people who WERE of his calibre
disagreed. What were the reasons given, because I know of no
fundamental reason that holds water?

I accept that neither he nor I want to have to deal with users
failing to debug their programs with that turned on :-)

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 25, 2013, 4:12:33 PM2/25/13

to

On 2/25/2013 1:04 AM, nm...@cam.ac.uk wrote:
> I shall not follow up after this, unless there is a particularly
> pressing requirement.
>
>
> In article <kge6nv$3h7$1...@dont-email.me>,
> Ivan Godard <iv...@ootbcomp.com> wrote:
>>
>> I think you have not understood something, probably the expected usage
>> and perhaps the standard itself.
>
> Possibly. However, saying that a flawed design is correct because
> people will use only the parts that are is not good mathematics or
> software engineering.
>
>> Computations that do not overflow maintain a fixed precision according
>> to well defined rules for quanta. Unless you are computing with numbers
>> that exceed 99,999,999,999,999,999,999,999,999,999,999 (place the
>> decimal point where you will) then the precision is preserved in useful,
>> common sense ways. If however you are the Zimbabwean National Bank then
>> you will get gradual overflow, which is perhaps not a problem in
>> Zimbabwe. I would assert that gradual overflow (duly trapped if desired)
>> is better than wrong answers even in Zimbabwe.
>
> Not as I read 5.2. Addition and subtraction are not the only operations,
> you know. Repeated multiplication or (in general) a single division
> will invariably exhaust the available precision. To take an example:
>
> Multiply 1.2 by 3.4 in 5 digits with a preferred precision of 1.
>
> I think that will deliver 4.08, not 4.1. That is not fixed-point
> arithmetic.

Sorry, can't make sense of your example. The standard does not have
"preferred precision", and "preferred exponent" doesn't make sense here.
Also, what do you mean by "in 5 digits"? There are no 5-digit formats;
the smallest is 16 digits.

>> Because quantum size is controllable, the program can also control at
>> which digit position rounding is applied. If the program is calculating
>> in mills (which many tax codes are required to do) then property tax
>> will be rounded at the mill. Code to do the same in binary is non-trivial.
>
> Yes, it can, but not in the basic operations as I read it. 5.2 again,
> especially the last sentence of paragraph 4: "If If the result's cohort
> does not include a member with the preferred exponent, the member with
> the exponent closest to the preferred exponent is used."

This occurs only in the case of overflow, where it defines gradual
overflow. Would you have preferred a trap/NaN?+

>> Perhaps again you have misunderstood the standard and how it can be
>> used? Double rounding (and anything else that differs from the effect of
>> exact single rounding) is explicitly disallowed, just as it is for
>> binary. And as the operation and modes sets of decimal are supersets of
>> those for binary, I can't imagine anything that you might think needed
>> emulation. Please enlighten me.
>
> I never said that binary supported it, either - I am disputing your
> claims of benefits for use for fixed-point decimal emulation. Let's
> say that I have two numbers, 1 < A, B < 10, with 20 digits of
> significance and 15 digits after the decimal point. I want to
> multiply them and get back to the same state. When I do the
> multiplication, I will have to selected SOME rounding (which could
> be truncation), but it will be in the wrong place. So I have to
> round again - Q.E.D.
>
> That one's soluble by multiplying into double precision and then
> rounding, but the division equivalent isn't. That's hell to fix up.
> Try it :-)

Again the example makes no sense in the standard. If the value is
between 1 and 10 and there are 20 digits of significance then there are
19 digits after the point; 15 is impossible in the representation.

In general for *any* multiplication with exact result there will be no
rounding and the preferred exponent is used, which is the sum of the
original exponents. That is, exact multiplication is
significance-preserving. Because the result is exact, when you then use
the quantize operation to round down the the cohort the application
wants there will have been only one rounding, not two.

When the result is not exact the standard is significance-preserving
insofar as that can be done. If the full-significance result is longer
than is desired then it may be rounded (with quantize) to the desired
precision. While this involves double rounding, it is the same
double-rounding that occurs in binary when a rounded result has more
precision than desired, for example when a double computation is
narrowed to single.

If you wish to argue that the standard should have an operation
"multiply for narrow result" in both binary and decimal then I have some
sympathy. But it is unreasonable to decry decimal without also decrying
binary for the same sin.

Terje Mathisen

unread,

Feb 25, 2013, 4:18:44 PM2/25/13

to

Andy (Super) Glew wrote:
> I would not seriously suggest doing much arithmetic in this format.
> Handling different month sizes might be okay,
> but handling leapyears (let alone leapseconds)
> would be slow.
>
> But... a polynomial evaluation instruction that did
>
> y:d365:m12:d31:h:m:s
> mask
> 011111111_0111_01111_01111_011111_011111
> Horner's rule coefficient
> 365. 12. 31. 24. 60. 60
>
> might be efficient in computing a a canonical
> y:d365:m12:d31:h:m:s
> representation - from which a table lookup
> might compute a corrected true time in seconds.
>
> Hmmm... I think I can see how to use the mask to control a multiplier
> array to evaluate such a polynomial. But it looks like it is still a
> bit more expensive than a non-grouped multiplier.

Date calculations are a bit too irregular, besides the fastest approach
is probably close to what I posted a few years ago, i.e. to get from
julian day nr to YMD you start by figuring out the correct 400-year
cycle (using a reciprocal multiplication), then within that cycle all
the numbers are relatively small.

The key is to make a very fast guess of the year, then do the reverse
calculation which is trivial and adjust for those fraction of a percent
which is off by one:

/* Useful and faster approximation to the number of years, will be
* wrong (one too high) 910 times
* in a 400-year cycle, i.e. in 0.62% of all calls.
*
* Requires unsigned values with up to 29 significant bits!
*/
y = (days * 2871 + 1983) >> 20;

(Yes, I wrote a program to search for those constants! :-) )

/* Calculate # of centuries:
* Since y will be in the 0 to 400 range, the following
* approximation can be used instead of a division by 100:
* y100 = y / 100; ~ (y * 41) >> 12;
*/
y100 = (y * 41) >> 12;

/* # of days in those full years */
gd = y * 365 /* Normal years */
+ (y >> 2) /* + leap years every 4 years */
- y100 /* - missing century leap years */
+ (y100 >> 2); /* + every 400 years */

/* test for the small chance (0.62%) of a wrong year: */
if (gd > days) {
y--; /* y will be in the [0-399] range! */
y100 = (y * 41) >> 12;
/* The 400-year correction can be skipped! */
gd = y * 365 + (y >> 2) - y100 /* + (y100 >> 2) */;

Joe keane

unread,

Feb 25, 2013, 4:23:07 PM2/25/13

to

In article <kgdisp$39r$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>Then we saw measurements of real-world database applications.

Dude you got snowed...

>Consequently, in decimal 1.0, 1.00, and 1.000 are three different
>numbers with three different representations, whereas in binary they
>are all the same.

>Because decimal keeps the quantum (effectively, the number of trailing
>zeros) while binary normalizes it away, in decimal you can detect loss
>of significance that cannot be discovered in binary absent heroic side
>computations that are never done until after the plane has crashed.

You keep using those words. I don't think they mean what you think they
mean.

"IEEE Standard for Floating-Point Arithmetic"?

Ivan Godard

unread,

Feb 25, 2013, 4:24:12 PM2/25/13

to

As I recall the problem was instability in the vicinity of a zero. If
there is consistent rounding then the zero can be approached without the
possibility of tunneling through to the other side. The problem was
topological invariants, not precision.

It's some years ago, and I might be misrecalling. And remember, I am in
no sense a numerics guy - I was on the committee to make sure that
whatever the numerics guys wanted would be possible in practical
hardware. I didn't completely succeed in that; a strict interpretation
of the standard is impossible on modern HPC hardware. But it was clear
that every hardware manufacturer would lie about that, and any customer
who understood the problem would want them to lie, so it didn't matter.

Ivan

Michael S

unread,

Feb 25, 2013, 7:26:16 PM2/25/13

to

On Feb 25, 7:58 pm, Ivan Godard <i...@ootbcomp.com> wrote:
> On 2/25/2013 7:39 AM, Michael S wrote:
>
>
>
>
>
>
>
>
>
> > On Feb 25, 3:16 am, Quadibloc <jsav...@ecn.ab.ca> wrote:
>
> >> Now that "all computers are made this way" - the way the 360/195 was
> >> made, with cache, pipelining, Wallace trees, and advanced division
> >> algorithms - DFP is, in a way, a move towards a 7090 and a 7070 nailed
> >> together, which is considerably more harmonious. That is, decimal
> >> computations are done a word at a time, just like binary ones, not a
> >> digit at a time.
>
> >> John Savard
>
> > I am not sure about POWER7 and the newest zArch chip (zEC12 or what is
> > official name?), but on POWER6 and z10 DFP hardware can be described
> > as processing approximately 1 digit at time.
> > z196 DFP hardware is faster than that, especially for addition/
> > subtraction, but, according to my understanding, still not pipelined.
> > So even on z196 the throughput of DFP calculations is far away from
> > binary floating point or from BCD fix-point on the same machine.
>
> > If you are interested, here are more details than I, personally, want
> > to know:
> >http://www.acsel-lab.com/arithmetic/papers/ARITH20/ARITH20_Carlough.pdf
>
> Fascinating paper; a very clean and well done design IMO.
>
> It's a 4 stage pipe. They do a 16-digit add in 6 clocks and a 34-digit
> add in 8, a bit faster than your understanding :-) (There's a table on
> the last page with latencies of common operations).
>

I said that POWER6 and z10 do approximately 1 digit at time.
And I said that z196 is faster than that.

Terje Mathisen

unread,

Feb 26, 2013, 1:33:08 AM2/26/13

to

Ivan Godard wrote:
>> I disagree: I'm quite confident that I can take any standard decimal
>> operation and come up with a set of pre/post operations that would cause
>> the correct result to come out after a binary fp operation. :-)
>
> It's not even true for the identity operation.
>
> There are numbers in either radix that you cannot convert to the other
> radix and back and have the result match the input. The problem is that

Sure I can, for the special case when both numbers are integers!

Fractional numbers are of course (in general) inexact, and only those
numbers which match the current base can be represented. I didn't think
I needed to state this. :-(

> each conversion must go to the nearest adjacent value. Say the input is
> A in radix r1, and the nearest value in radix r2 is B. However, it is
> quite possible for the nearest value to B in r1 is not A but some other
> value A'.

Ivan, I am suggesting that all my numbers will be (in effect) scaled
integers!

As long as I have enough free bits in the mantissa, I can store any
given value, in any radix, by first scaling the input so that it becomes
an integer. I then store that number and remember the scale factor.

I am NOT saying that I can use binary fp as-is to handle arbitrary
decimal numbers, far from it!

Terje Mathisen

unread,

Feb 26, 2013, 1:42:55 AM2/26/13

to

Ivan Godard wrote:
> It's some years ago, and I might be misrecalling. And remember, I am in
> no sense a numerics guy - I was on the committee to make sure that
> whatever the numerics guys wanted would be possible in practical
> hardware. I didn't completely succeed in that; a strict interpretation
> of the standard is impossible on modern HPC hardware. But it was clear
> that every hardware manufacturer would lie about that, and any customer
> who understood the problem would want them to lie, so it didn't matter.

There _has_ to be a great story behind that paragraph, please tell!

Terje Mathisen

unread,

Feb 26, 2013, 1:49:27 AM2/26/13

to

Ivan Godard wrote:
> When the result is not exact the standard is significance-preserving
> insofar as that can be done. If the full-significance result is longer
> than is desired then it may be rounded (with quantize) to the desired
> precision. While this involves double rounding, it is the same
> double-rounding that occurs in binary when a rounded result has more
> precision than desired, for example when a double computation is
> narrowed to single.
>
> If you wish to argue that the standard should have an operation
> "multiply for narrow result" in both binary and decimal then I have some
> sympathy. But it is unreasonable to decry decimal without also decrying
> binary for the same sin.

That is not needed (at least not in binary, and I believe not in any
base) as long as the long format mantissa is _more_ than twice as long
as the short format:

In this specific case you can never get the wrong final result after
using the same rounding format both for the original (long) result and
for reducing to the short representation.

It is one of the key reasons for keeping a double exponent less than
twice as long as the float exponent, it leaves room for several extra
bits in the mantissa after doubling the float mantissa length.

The Intel 80-bit format is a bastard, it has exactly those double
rounding problems because the mantissa length is just 64 bits.

Terje Mathisen

unread,

Feb 26, 2013, 2:16:09 AM2/26/13

to

Ivan Godard wrote:
> On 2/25/2013 7:39 AM, Michael S wrote:
>> I am not sure about POWER7 and the newest zArch chip (zEC12 or what is
>> official name?), but on POWER6 and z10 DFP hardware can be described
>> as processing approximately 1 digit at time.
>> z196 DFP hardware is faster than that, especially for addition/
>> subtraction, but, according to my understanding, still not pipelined.
>> So even on z196 the throughput of DFP calculations is far away from
>> binary floating point or from BCD fix-point on the same machine.
>>
>> If you are interested, here are more details than I, personally, want
>> to know:
>> http://www.acsel-lab.com/arithmetic/papers/ARITH20/ARITH20_Carlough.pdf
>
> Fascinating paper; a very clean and well done design IMO.
>
> It's a 4 stage pipe. They do a 16-digit add in 6 clocks and a 34-digit
> add in 8, a bit faster than your understanding :-) (There's a table on
> the last page with latencies of common operations).

That is actually required in order to beat a pure SW BID format
implementation, at least if the compiler inlines the code so that it (or
the OoO hw) can overlap multiple decimal operations:

Pick up the exponents, special-case when they are identical.

Do a scaling operation based on a lookup table on the exp difference.

Do the binary mantissa add/sub, keeping the exact result.

Normalize if needed.

Round based on the least significant bit of the final mantissa plus the
following bits.

I.e. this looks like 15-30 cycles for the general 16-digit case and just
5-10 more cycles for 30+ digits, with an important special case of
identical input and output exponents and no rounding needed, where the
code would run in approximately the same 8 cycles as the DFP hw above.

nm...@cam.ac.uk

unread,

Feb 26, 2013, 4:21:40 AM2/26/13

to

In article <kggknp$3lc$1...@dont-email.me>,
Ivan Godard <iv...@ootbcomp.com> wrote:

Thanks for the clarification.

>As I recall the problem was instability in the vicinity of a zero. If
>there is consistent rounding then the zero can be approached without the
>possibility of tunneling through to the other side. The problem was
>topological invariants, not precision.

Eh? That doesn't make sense. My suspicion is that the real cause
of that problem is the signed zero dogma, but that's only a guess.
Or it may be denormalised numbers, which DO exhibit that form of
instability.

The point here is that probabilistic methods, asynchronism and
parallelism are very good at exposing issues that already existed
but had been swept under the carpet.

>It's some years ago, and I might be misrecalling. And remember, I am in
>no sense a numerics guy - I was on the committee to make sure that
>whatever the numerics guys wanted would be possible in practical
>hardware. I didn't completely succeed in that; a strict interpretation
>of the standard is impossible on modern HPC hardware. But it was clear
>that every hardware manufacturer would lie about that, and any customer
>who understood the problem would want them to lie, so it didn't matter.

Is it? I didn't know that. What I do know is that it makes little
sense in any post-1960 programming language, and is incompatible
with well-established forms of (compiler) optimisation. In the
early days of the revision, I believe that there was quite strong
support for producing a standard that COULD be adopted by ordinary
programming languages, and it was definitely strongly supported by
several Email correspondents with expertise in those areas, but it
was abandoned.

I did think of joining IEEE and getting more closely involved, but
I thought better of it. Spending 5,000 quid of my own money and
a lot of effort in a (probably futile) attempt to stop a bandwagon
didn't seem attractive.

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 26, 2013, 4:31:55 AM2/26/13

to

On 2/25/2013 10:42 PM, Terje Mathisen wrote:
> Ivan Godard wrote:
>> It's some years ago, and I might be misrecalling. And remember, I am in
>> no sense a numerics guy - I was on the committee to make sure that
>> whatever the numerics guys wanted would be possible in practical
>> hardware. I didn't completely succeed in that; a strict interpretation
>> of the standard is impossible on modern HPC hardware. But it was clear
>> that every hardware manufacturer would lie about that, and any customer
>> who understood the problem would want them to lie, so it didn't matter.
>
> There _has_ to be a great story behind that paragraph, please tell!

Not that much of a story, but I'll try to explain.

The original 754 was done before there was any real inkling of
parallelism in the myriad forms we see today. The standard assumed
serial execution, and was defined in terms of single operations, assumed
atomic.

The revision, done largely by people of the same orientation (and often
the same people) as those that did the original, kept the same approach.
Hence there is no overt reference to parallel semantics in the revised
standard.

However, there are a few things, in the original and retained in the
revision, that if enforced would have an extreme impact on available
parallelism. There are some subtle examples, but to my mind the most
glaring are the flags, the set of boolean condition indicators that are
defined to be implicit reference arguments of every operation. There is
only one set of flags, global to the program.

Think about that a minute. Then take a typical BLAS floating point
application and let the compiler (not the source) chop the matrix into
100,000 rows fanned out over a roomful of processors, all executing
(different parts of) a single program. Then realize that the function
applied to one of those rows may query the (defined to be unified)
single global state of those flags, at any time.

Maintaining a consistent global set of flags is clearly impossible to be
physically realized when the program no longer follows the sequential
execution model of the standard. Everybody knows this; any reasonable
program will let each core maintain a local set of flags, which may be
merged at the end if desired. But standards are not qualified with
"applies only to reasonable programs".

Consequently every big iron vendor's Fortran will violate the standard.
Not one of them will admit it of course; they will all lie and claim
compliance. The people who buy big iron Fortran are no dummies; they
know they are being lied to. But they don't care, because the only way
to actually conform is to run the program on a monocore, and they would
rather have their iron, standards be damned, thank you.

I fought long and hard to remove all the globals from the standard. We
could have required the flags to be attached to data and propagate
dataflow-wise through computation. We could have attached the modes to
scopes and required them to be static with the scope. That big iron
Fortran would then truly be conforming, no lie. I didn't succeed.

For what it's worth, the Mill lies too. We are in good company.

Ivan

Ivan Godard

unread,

Feb 26, 2013, 4:49:58 AM2/26/13

to

Are you allowing for the mispredicts in the tests? NaN etc will be rare
enough that prediction will be good, but the exponent shuffling and
post-op quantum selection will be megamorphic. And scaling is a matter
of mul/div by powers of 10. One divide takes longer than your estimate.
The divide will have to software too, unless your underlying machine has
quad integer divide; don't know of any that do.

I'd like to measure a software DPD implementation using software BCD
underneath vs. BID, both on binary hardware, because the scaling problem
is just a shift in underlying BCD. BCD add/sub is pretty cheap on a
binary machine, the question is the multiplier. Intel sure thought BID
was better, but I don't have a clear sense of whether they are right.

Ivan

Terje Mathisen

unread,

Feb 26, 2013, 5:17:04 AM2/26/13

to

Ivan Godard wrote:
> On 2/25/2013 11:16 PM, Terje Mathisen wrote:
>> Round based on the least significant bit of the final mantissa plus the
>> following bits.
>>
>> I.e. this looks like 15-30 cycles for the general 16-digit case and just
>> 5-10 more cycles for 30+ digits, with an important special case of
>> identical input and output exponents and no rounding needed, where the
>> code would run in approximately the same 8 cycles as the DFP hw above.
>
> Are you allowing for the mispredicts in the tests? NaN etc will be rare
> enough that prediction will be good, but the exponent shuffling and

I'm expecting good BP for the equal vs different exp, so on the order of
2-3 cycles average cost here.

> post-op quantum selection will be megamorphic. And scaling is a matter

This is table driven except for extreme exponents/exp differences.

> of mul/div by powers of 10. One divide takes longer than your estimate.
> The divide will have to software too, unless your underlying machine has
> quad integer divide; don't know of any that do.

All division scaling is handled with reciprocal mul of course: There's a
limited number of factors to handle, and all of them can be
pre-calculated. I.e. no branching and no DIV.

Here we will also gain a lot by special casing short effective mantissa
lengths: I'm willing to bet that even when using the 34-digit format,
most of the involved operations will use far less digits, i.e. 19--,
short enough to fit in a single 64-bit variable.

>
> I'd like to measure a software DPD implementation using software BCD
> underneath vs. BID, both on binary hardware, because the scaling problem
> is just a shift in underlying BCD. BCD add/sub is pretty cheap on a
> binary machine, the question is the multiplier. Intel sure thought BID
> was better, but I don't have a clear sense of whether they are right.

There is one good reason for doing BCD, and that it to use the SIMD hw!

With native byte ops only the nybble boundary carries must be handled,
or you could even do a nybble->byte split and work with the 256-bit AVX
registers to hold 34-digit values.

However, the overhead for any FDMUL operation will be pretty bad, since
you either have to emulate paper multiplication, or convert to 128-bit
binary for the operation.

Terje Mathisen

unread,

Feb 26, 2013, 5:21:01 AM2/26/13

to

Ivan Godard wrote:
> Consequently every big iron vendor's Fortran will violate the standard.
> Not one of them will admit it of course; they will all lie and claim
> compliance. The people who buy big iron Fortran are no dummies; they
> know they are being lied to. But they don't care, because the only way
> to actually conform is to run the program on a monocore, and they would
> rather have their iron, standards be damned, thank you.

Ouch!

Yes, I had forgotten (or rather suppressed) that part of the standard,
it is so obviously incompatible with any form of optimization that the
only reasonable approach is to disregard it. :-(

>
> I fought long and hard to remove all the globals from the standard. We
> could have required the flags to be attached to data and propagate
> dataflow-wise through computation. We could have attached the modes to
> scopes and required them to be static with the scope. That big iron
> Fortran would then truly be conforming, no lie. I didn't succeed.

That would have been very useful.

>
> For what it's worth, the Mill lies too. We are in good company.

:-)

nm...@cam.ac.uk

unread,

Feb 26, 2013, 5:26:25 AM2/26/13

to

In article <kghvc7$red$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>
>The original 754 was done before there was any real inkling of
>parallelism in the myriad forms we see today. The standard assumed
>serial execution, and was defined in terms of single operations, assumed
>atomic.

Jesus wept!

I am sorry, but that is absolute nonsense (and I know that it wasn't
you who said it). Automatic vectorisation was well-established by
1970 and near-ubiquitous on supercomputers by 1980. Automatic
parallelisation of the modern form was a major research topic in
the 1970s and was starting to appear in commercial products (IBM
Fortran Q and Alliant) by 1984.

But, most of all, the arithmetic models of expression meaning were
near-ubiquitous in high-level languages by the late 1950s, and were
clearly and unequivocally stated in the leading numerical language
of the period 1960-1985 (i.e. Fortran). In particular, the
unsupportability of the ghastly signed zero and infinity mess,
as well as the global flags, were and are clearly non-starters.

>Consequently every big iron vendor's Fortran will violate the standard.
>Not one of them will admit it of course; they will all lie and claim
>compliance. The people who buy big iron Fortran are no dummies; they
>know they are being lied to. But they don't care, because the only way
>to actually conform is to run the program on a monocore, and they would
>rather have their iron, standards be damned, thank you.

Eh? No, they don't. And it's not just big iron. Users don't want
IEEE 754 exception handling either because (a) they don't make any
errors (!) or (b) because they have enough Clue to know that it's
a useless specification in the context of a high-level language.
Fortran and C++ don't even ATTEMPT to claim conformance with the
full IEEE 754 standard, which is at least better than C's ghastly
mess.

>I fought long and hard to remove all the globals from the standard. We
>could have required the flags to be attached to data and propagate
>dataflow-wise through computation. We could have attached the modes to
>scopes and required them to be static with the scope. That big iron
>Fortran would then truly be conforming, no lie. I didn't succeed.

Yes. And there were a lot of people who would have supported you.
But, no, no high-level language (and, in particular, not Fortran)
would be fully conforming unless the other messes were fixed as
well. The tragedy is that they would have been possible almost
compatibly, and would have improved IEEE 754's usefulness for
software engineering by a near-infinite factor.

Regards,
Nick Maclaren.

Ivan Godard

unread,

Feb 26, 2013, 1:04:48 PM2/26/13

to

On 2/26/2013 2:26 AM, nm...@cam.ac.uk wrote:
> In article <kghvc7$red$1...@dont-email.me>,
> Ivan Godard <iv...@ootbcomp.com> wrote:
>>
>> The original 754 was done before there was any real inkling of
>> parallelism in the myriad forms we see today. The standard assumed
>> serial execution, and was defined in terms of single operations, assumed
>> atomic.
>
> Jesus wept!
>
> I am sorry, but that is absolute nonsense (and I know that it wasn't
> you who said it). Automatic vectorisation was well-established by
> 1970 and near-ubiquitous on supercomputers by 1980. Automatic
> parallelisation of the modern form was a major research topic in
> the 1970s and was starting to appear in commercial products (IBM
> Fortran Q and Alliant) by 1984.
>
> But, most of all, the arithmetic models of expression meaning were
> near-ubiquitous in high-level languages by the late 1950s, and were
> clearly and unequivocally stated in the leading numerical language
> of the period 1960-1985 (i.e. Fortran). In particular, the
> unsupportability of the ghastly signed zero and infinity mess,
> as well as the global flags, were and are clearly non-starters.

I am not enough of a numerics guy to contribute to the affine vs.
projective infinity question. However, I know a religious fight when I
see one and that one sure fits. As is usual in religious arguments, I
assume that there is merit on both sides, to go along with the
intolerance and bullheadedness.

Plausibly both models could have been supported for uses, and users,
that worked better with one or the other. Exponent handling would have
largely been the same, and magnitude is close enough (in hardware terms)
to two's complement that the same unit could have been used for both
with an added few muxes here and there. Such relatively cheap
compromises are rarely found in religious dispute.

I understand (perhaps wrongly) that, after the monumental food fight
when the original 754 got sign-magnitude, many of those who were so
single-zero intransigent discovered that things weren't quite so bad
after all; a change in way of thinking perhaps required, but not
wholesale failure. Marranos as it were. Others, more Puritan in
persuasion, retired to gloom and bile. However, it seems ancient history
now; I doubt if one in ten that wrote a line of FP during my time on
this board could explain why it makes a difference.

Perhaps they should be able to; I'm sure that the original schismatics
think so. It's the fate of the old and devout to bewail the young,
secular, and uncaring.

Ivan

nm...@cam.ac.uk

unread,

Feb 26, 2013, 1:55:16 PM2/26/13

to

In article <kgitdr$867$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>>
>> But, most of all, the arithmetic models of expression meaning were
>> near-ubiquitous in high-level languages by the late 1950s, and were
>> clearly and unequivocally stated in the leading numerical language
>> of the period 1960-1985 (i.e. Fortran). In particular, the
>> unsupportability of the ghastly signed zero and infinity mess,
>> as well as the global flags, were and are clearly non-starters.
>
>I am not enough of a numerics guy to contribute to the affine vs.
>projective infinity question. However, I know a religious fight when I
>see one and that one sure fits. As is usual in religious arguments, I
>assume that there is merit on both sides, to go along with the
>intolerance and bullheadedness.

I am, but that wasn't and isn't my point. The simple answer to
the question is that affine makes most sense for the real line,
projective is the only one that makes sense for complex numbers,
and the two cannot be made conformant :-( I am agnostic, except
for the complex plane where the mathematics is clear.

My point was the simple design error of doing BOTH of the following:

Confounding true zero and approximate zero with a positive
infinitesimal

Defining 1/0 to be +infinity and 1/-0 to be -infinity

The point is that the combination completely breaks many of
the most important and obvious mathematical invariants assumed
by programmers, used by optimising compilers, needed by numerical
analysts and built-in to most programming languages. And it is
THAT which makes it a ghastly mess!

Kahan's response (in person) was that this wasn't a problem
because all programs should check the divide-by-zero exception
flag after every operation that might do a division. I failed
to make him even begin to understand that it was infeasible,
both in terms of human factors and in terms of programming
languages. In its general form, even for the purest of pure
functions f(), 'x == y' can be true and 'f(x) == f(y)' false.

In the early days of the IEEE 754 revision, there was strong support
for making division by zero deliver a NaN, which entirely eliminates
the problem.

>I understand (perhaps wrongly) that, after the monumental food fight
>when the original 754 got sign-magnitude, many of those who were so
>single-zero intransigent discovered that things weren't quite so bad
>after all; a change in way of thinking perhaps required, but not
>wholesale failure. Marranos as it were. Others, more Puritan in
>persuasion, retired to gloom and bile. However, it seems ancient history
>now; I doubt if one in ten that wrote a line of FP during my time on
>this board could explain why it makes a difference.

Obviously, I can, and I have seen it cause wrong answers more times
than I care to think. But the point that the fanatics (of both sides)
miss is that it isn't the preservation of the sign bit (such as it is)
that causes the trouble, but the mathematically bogus treatment
of a zero of unknown or no sign (and true zero has no sign) as a
positive infinitesimal at poles and other discontinuities (i.e.
where a true zero value is mathematically invalid).

Just as, if anyone took any notice of it, the exception flag handling
is incompatible with performance. I know the reasons, but they were
a solution to a problem that had ceased to be the main obstacle in
the 1970s :-(

The overall result is that the IEEE 754 specification is incompatible
with the numerical reliability of real codes, exactly the opposite
of the intention. And that, even today, most of the specification
is ignored by all programming languages, compilers and programmers.
I don't know if anyone explicitly predicted that in 1984, but it
was obvious to everyone by 1994.

Regards,
Nick Maclaren.

Quadibloc

unread,

Feb 26, 2013, 2:51:20 PM2/26/13

to

On Feb 26, 11:04 am, Ivan Godard <i...@ootbcomp.com> wrote:

> I am not enough of a numerics guy to contribute to the affine vs.
> projective infinity question.

You should get a projective infinity when you divide by zero, and an
affine infinity when you overflow.

Unfortunately, it's a mode setting, so you can only have one kind of
infinity at a time.

> Exponent handling would have
> largely been the same, and magnitude is close enough (in hardware terms)
> to two's complement that the same unit could have been used for both
> with an added few muxes here and there. Such relatively cheap
> compromises are rarely found in religious dispute.
>
> I understand (perhaps wrongly) that, after the monumental food fight
> when the original 754 got sign-magnitude, many of those who were so
> single-zero intransigent discovered that things weren't quite so bad
> after all; a change in way of thinking perhaps required, but not
> wholesale failure.

I happen to be on the side of sign-magnitude for floating-point,
although I'll happily accept one's complement if it's desired to have
floating-point numbers collate as if they were two's complement
integers.

The bad thing about two's complement floating-point, in my opinion, is
that you suddenly don't have an "exponent field" and a "mantissa
field" (oops, sorry, a "significand" field) because when the mantissa
is 1.0, a carry out of it increments the exponent. That makes it hard
to explain the floating-point format to people. I don't like making
formats messy that way. (Yes, I also am strongly big-endian. How did
you guess?)

John Savard

Quadibloc

unread,

Feb 26, 2013, 2:53:27 PM2/26/13

to

On Feb 26, 11:55 am, n...@cam.ac.uk wrote:
> But the point that the fanatics (of both sides)
> miss is that it isn't the preservation of the sign bit (such as it is)
> that causes the trouble, but the mathematically bogus treatment
> of a zero of unknown or no sign (and true zero has no sign) as a
> positive infinitesimal at poles and other discontinuities (i.e.
> where a true zero value is mathematically invalid).

Indeed. Those zeroes should produce NaNs or at least projective
infinities when they divide something. Save the overflows of known
sign for division by infinitesimals of known sign.

John Savard

Quadibloc

unread,

Feb 26, 2013, 3:02:43 PM2/26/13

to

On Feb 26, 11:55 am, n...@cam.ac.uk wrote:

> I am, but that wasn't and isn't my point. The simple answer to
> the question is that affine makes most sense for the real line,
> projective is the only one that makes sense for complex numbers,
> and the two cannot be made conformant :-( I am agnostic, except
> for the complex plane where the mathematics is clear.

I'm sure there might be applications where you do want to preserve the
angle to a point in an infinitely far-away circle where a complex
infinity can be thought to reside. It's true that the complex plane
can be nicely transformed to a sphere, but it can also be nicely
transformed to a circle as well.

But having two infinities which have a precisely defined ratio between
them requires wasting too many bits in a floating-point
representation. If someone were to have a standard for the hardware
representation of complex numbers - rather than just plonking two
reals together, one after another - such questions could be discussed.

What I will say is this:

When I was an undergrad in Physics, someone in Comp Sci asked me what
I thought multiplying two arrays together should do. I said it should
be element by element, as in APL. He said that it obviously should be
matrix multiplication.

At the time, I wasn't sophisticated enough to give a good counter-
argument.

Now, I can.

Element-by-element array multiplication involves adding no semantics
to the computational object that is an array. It doesn't force people
to conceptualize collections of numbers as always being used for one
particular purpose.

If you want a matrix, you should have to *declare* a matrix, the same
way you declare a complex number. It should be a new datatype, and it
should be entirely natural to have arrays of matrices, just as you
might have arrays of strings.

It might be allowed to EQUIVALENCE a matrix to an array - I'm not
trying to enforce encapsulation on people - but if you want a matrix,
you should say so.

John Savard

Ivan Godard

unread,

Feb 26, 2013, 3:05:31 PM2/26/13

to

As previously noted, the standard reflects an (unfortunate IMO)
sequential mind-set.

In its general form, even for the purest of pure
> functions f(), 'x == y' can be true and 'f(x) == f(y)' false.
>
> In the early days of the IEEE 754 revision, there was strong support
> for making division by zero deliver a NaN, which entirely eliminates
> the problem.

You won that one as it turns out. The handling of exceptions in the
revision was explicitly concerned with so-called "substitution handlers"
in which the excepting result would be replaced by a different value. An
early proposal would have required a trap to software; the eventual
standard does not, and permits hardware substitution. Our Mill gives you
a choice of NaN and several other possibly relevant values, software
trap, or silent flag setting.

Unfortunately, heavy vendor lobbying prevented the standard from making
the exception-handling capability mandatory; it is only a recommended
facility.The commercial realities of legacy hardware trumps mathematics
every time.

Ivan

nm...@cam.ac.uk

unread,

Feb 26, 2013, 3:37:47 PM2/26/13

to

In article <kgj4g6$kcu$1...@dont-email.me>,

Ivan Godard <iv...@ootbcomp.com> wrote:
>>
>> Kahan's response (in person) was that this wasn't a problem
>> because all programs should check the divide-by-zero exception
>> flag after every operation that might do a division. I failed
>> to make him even begin to understand that it was infeasible,
>> both in terms of human factors and in terms of programming
>> languages.
>
>As previously noted, the standard reflects an (unfortunate IMO)
>sequential mind-set.

Well, yes, but that's minor compared to the assumptions that fail
in that case :-(

> In its general form, even for the purest of pure
>> functions f(), 'x == y' can be true and 'f(x) == f(y)' false.
>>
>> In the early days of the IEEE 754 revision, there was strong support
>> for making division by zero deliver a NaN, which entirely eliminates
>> the problem.
>
>You won that one as it turns out. The handling of exceptions in the
>revision was explicitly concerned with so-called "substitution handlers"
>in which the excepting result would be replaced by a different value. An
>early proposal would have required a trap to software; the eventual
>standard does not, and permits hardware substitution. Our Mill gives you
>a choice of NaN and several other possibly relevant values, software
>trap, or silent flag setting.
>
>Unfortunately, heavy vendor lobbying prevented the standard from making
>the exception-handling capability mandatory; it is only a recommended
>facility.The commercial realities of legacy hardware trumps mathematics
>every time.

Sorry, but my side completely lost that one. The point is that
the broken behaviour is mandatory and the fixed behaviour is
optional, needing action by all of the hardware vendor, compiler
vendor and programmer to make it usable. But it's MUCH worse
than that :-(

The fact that the programmer can specify the value means that the
facility is equivalent to trap-fixup-and-recover, which is (to a
good first approximation) unimplementable and unimplemented. It
hasn't been available in a usable and reliable form on any general
purpose system that I know of since the demise of mainframes.
Yes, it is often CLAIMED to be available - but it often doesn't
work at all and is completely incompatible with out-of-order
or SIMD execution and reasonable performance. And that's the
least of the problems when using it from a high-level language :-(

The only alternate form that is reliably implementable was not
included: trap-diagnose-and-terminate.

Regards,
Nick Maclaren.

Andy (Super) Glew

unread,

Feb 26, 2013, 6:12:12 PM2/26/13

to

On 2/25/2013 1:23 PM, Joe keane wrote:
> In article <kgdisp$39r$1...@dont-email.me>,
> Ivan Godard <iv...@ootbcomp.com> wrote:
>> Then we saw measurements of real-world database applications.
>
> Dude you got snowed...

I haven't seen the studies of FP decimal arithmetic in database
applications. (Or, rather, I have seen the IBM studies; I haven't done
my own or seen them verified.)

But... I *have* seen studies I trust that make a damned good case for
doing arithmetic in ASCII or Unicode. Without converting to binary
formats first.

It's a question of how complex the computation is. If you convert to
binary (or even to decimal FP), and then do many calculations, great,
the conversion cost is amortized. But if all you are doing is adding +1
to a number represented by a text string, or adding a list of numbers in
a text (XML) object, then the conversion costs often far outweigh the
arithmetic.

So often we waste a lot of time and power pushing bits around, shuffling
bytes and nibbles.

Andy (Super) Glew

unread,

Feb 26, 2013, 6:18:09 PM2/26/13

to

On 2/25/2013 6:06 AM, Michael S wrote:
> On Feb 25, 3:18 am, Quadibloc <jsav...@ecn.ab.ca> wrote:
>> On Feb 24, 4:43 pm, Robert Wessel <robertwess...@yahoo.com> wrote:
>>
>>> POWER too.
>>
>> Yes, and Intel was supposed to be including DFP in its next generation
>> of x86 chips - and when I heard that was long enough ago, that I
>> wonder if it is in their current chips.
>>
>> John Savard
>
> No, DFP is not in current (IvyB) chips and not in the next generation
> (Haswell).
> And since the one after next generation (Broadwell) is a tick, we can
> be pretty sure that it also does not contain DFP.

Actually, DFP is the sort of thing that could be put into a tick.

It requires new execution units, but does not necessarily require new
datapaths - assuming that the DFP bits can be placed in exeiting
registers. (Adding new registers would make it a tock.)

E.g. MMX was a tick, in P55C and Pentium III, because it reused the
existing x87 registers.

--
The content of this message is my personal opinion only. Although I am
an employee (currently of MIPS Technologies; in the past of companies
such as Intellectual Ventures and QIPS, Intel, AMD, Motorola, and
Gould), I reveal this only so that the reader may account for any
possible bias I may have towards my employer's products. The statements
I make here in no way represent my employers' positions on the issue,
nor am I authorized to speak on behalf of my employers, past or present.