IEEE 754r, decimal floating point, BID and DPD

Jeremy Linton

unread,

Feb 19, 2009, 12:04:10 AM2/19/09

to

Holy sh*t, it looks like there are _TWO_ standard "incompatible"
formats in IEEE 754r. Who thought that would be a good idea? Apparently
IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel
pushed a software centric Binary Integer Decimal (BID). One is just a
more tightly packed BCD type format, while the other is a binary integer
based system.
Nothing really surprising there, the big surprise is that the committee
(unanimously!) accepted the standard with _TWO_ completely different
formats to encode the same data! What were they thinking? Is there
somewhere in the spec where you can unconditionally determine if a value
is DPD or BID encoded? Its like consciously making a decision that
having two major endian formats is a good idea. Except instead of some
bit swizzling we are talking a pretty major chunk of conversion code, or
some really fancy hardware.

Terje Mathisen

unread,

Feb 19, 2009, 3:47:24 AM2/19/09

to

If those formats are only used internally, i.e. in hw and all
communication happens in a canonical format (probably ASCII or BCD since
we're using base 10), then it shouldn't matter.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Jan Vorbrüggen

unread,

Feb 19, 2009, 4:20:45 AM2/19/09

to

I would say the intent and task of standardisation is to define a
standard way of doing something - whether that is actually implemented
or used is, obviously, a different matter. Given that two major players
in the relevant market clearly had different ideas of what they wanted,
I suppose the committee preferred to standardize both while making them
as compatible as possible - are they? (e.g., regarding range and
precision) - rather than doing only one (which would likely hav been
politically impossible) or none.

It's just a fact of life that such activities are rarely for the benefit
of all mankind and more for the benefit of some subset...

Jan

nm...@cam.ac.uk

unread,

Feb 19, 2009, 8:46:37 AM2/19/09

to

In article <eY6dnRCZ3_gAgQDU...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Jeremy Linton wrote:
>
>> Holy sh*t, it looks like there are _TWO_ standard "incompatible"
>> formats in IEEE 754r. Who thought that would be a good idea? Apparently
>> IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel

>> pushed a software centric Binary Integer Decimal (BID). ...

>>
>
>If those formats are only used internally, i.e. in hw and all
>communication happens in a canonical format (probably ASCII or BCD since
>we're using base 10), then it shouldn't matter.

Terje, that's not like you!

It makes a hell of a difference to anyone who needs to encode or decode
them, especially as there doesn't seem to be a canonical way of finding
out which is being used (or whether binary floating-point is). There
may not be all that many people who do that, but the code that we write
underlies everything else.

Inter alia, it is going to lead to debuggers, transfer utilities (from
NetCDF to MPI), compilers, language run-time systems and so on all
having to guess which format they have or need to produce. And you
can't tell from the bit pattern alone, so expect, er, amusement! As
you know, given even a small number of IEEE 754 floating-point values,
it is trivial to check for big- versus little-endian with a very high
probability of being right. Distinguishing the three IEEE 754R formats
for the same endianness is something that you, I and one or two other
posters to this groups could do - but not all that many experienced
programmers could!

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 19, 2009, 8:48:06 AM2/19/09

to

In article <gnj0l7$9ui$1...@s1.news.oleane.net>,

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <Jan.Vor...@not-thomson.net> wrote:
>I would say the intent and task of standardisation is to define a
>standard way of doing something - whether that is actually implemented

>or used is, obviously, a different matter. ...

Ah - the ISO Virtual Terminal Protocol, perhaps?

Regards,
Nick Maclaren.

Jan Vorbrüggen

unread,

Feb 19, 2009, 11:40:31 AM2/19/09

to

>> I would say the intent and task of standardisation is to define a
>> standard way of doing something - whether that is actually implemented
>> or used is, obviously, a different matter. ...
>
> Ah - the ISO Virtual Terminal Protocol, perhaps?

Dunno about that one, but there are a lot of unused - and unuseable,
which is quite different - standards. Some of them are even used _and_
unuseable, which is the worst thing that can happen to you - X509v3
being a case in point.

Jan

nm...@cam.ac.uk

unread,

Feb 19, 2009, 11:51:20 AM2/19/09

to

In article <gnjqci$k1u$1...@s1.news.oleane.net>,
=?ISO-8859-15?Q?Jan_Vorbr=FCggen?=

That one was a classic case of taking off like a lead balloon, for
good and sufficient reasons :-) My normal examples of used and
unusable standards are the X Windowing System interface and POSIX
threads, but I am not going to deny the reality of your example!
They are regrettably common :-(

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 19, 2009, 3:05:56 PM2/19/09

to

nm...@cam.ac.uk wrote:
> In article <eY6dnRCZ3_gAgQDU...@giganews.com>,
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>> If those formats are only used internally, i.e. in hw and all
>> communication happens in a canonical format (probably ASCII or BCD since
>> we're using base 10), then it shouldn't matter.
>
> Terje, that's not like you!
>
> It makes a hell of a difference to anyone who needs to encode or decode
> them, especially as there doesn't seem to be a canonical way of finding
> out which is being used (or whether binary floating-point is). There
> may not be all that many people who do that, but the code that we write
> underlies everything else.

As long as canonical conversion opcodes/operations are defined,
debuggers and compilers can use those without needing to care about the
internal format, right?

Without those conversion operations however, the problem is indeed at
least as bad as you indicate.

>
> Inter alia, it is going to lead to debuggers, transfer utilities (from
> NetCDF to MPI), compilers, language run-time systems and so on all
> having to guess which format they have or need to produce. And you
> can't tell from the bit pattern alone, so expect, er, amusement! As
> you know, given even a small number of IEEE 754 floating-point values,
> it is trivial to check for big- versus little-endian with a very high
> probability of being right.

With a "normal" set of fp values, this is indeed correct: Particularly
values close to 1.0 in magnitude are very obvious when looking at the
hex values.

> Distinguishing the three IEEE 754R formats
> for the same endianness is something that you, I and one or two other
> posters to this groups could do - but not all that many experienced
> programmers could!

Which is why I'm hoping the quite trivial conversion operations are a
required part of any implementation!

MitchAlsup

unread,

Feb 19, 2009, 3:34:48 PM2/19/09

to

On Feb 18, 11:04 pm, Jeremy Linton <reply-to-l...@nospam.org> wrote:
> Holy sh*t, it looks like there are _TWO_ standard "incompatible"
> formats in IEEE 754r. Who thought that would be a good idea? Apparently
> IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel
> pushed a software centric Binary Integer Decimal (BID).

DPD is designed for business applications
BID is designed to be integrated into binary FPUs

The biggest difference is the DPD is an unnormalized format, designed
to allow decimal calculations without the need for pre-alignment nor
post-normalization. Thus, it is expected that the dat being processed
remains right-aligned (although there is no requirement for it to
remain so). So, in a reasonable implementation of DPD, one can survey
the upper most section of the fraction and if it contains zeros, there
is an excellent chance that you are processing DPD.

To a certain extent this is my fault. The architectural review board
at AMD was ask to support DPD (from IBM) and BID (from Intel). We
decided that we did not understant enough of the nuanced issues to
vote one way or the other. And due to this, the Decimal FP comittee
basically broke down without a winner. This is a disasterous mistake
for the architecural community, a diasterous mistake for the business
community, and a diasterous mistake for joe-random human. After
studying the nuances for a couple of months, I became firmly
supportive of DPD is the "right" way to do Decimal FP calculations for
business applications, and for innocent people playing around with
eXcel-like spreadsheets where decimal results are expected by the non-
numerical analyst. By the time I realized my error it was too late to
change the situation. And for this I am quite sorry.

mitch

MitchAlsup

unread,

Feb 19, 2009, 3:32:59 PM2/19/09

to

On Feb 18, 11:04 pm, Jeremy Linton <reply-to-l...@nospam.org> wrote:

> Holy sh*t, it looks like there are _TWO_ standard "incompatible"
> formats in IEEE 754r. Who thought that would be a good idea? Apparently
> IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel
> pushed a software centric Binary Integer Decimal (BID).

DPD is designed for business applications

nm...@cam.ac.uk

unread,

Feb 19, 2009, 3:38:38 PM2/19/09

to

In article <1qadndXoC8I4JgDU...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>nm...@cam.ac.uk wrote:
>> In article <eY6dnRCZ3_gAgQDU...@giganews.com>,
>> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>>> If those formats are only used internally, i.e. in hw and all
>>> communication happens in a canonical format (probably ASCII or BCD since
>>> we're using base 10), then it shouldn't matter.
>>
>> Terje, that's not like you!
>>
>> It makes a hell of a difference to anyone who needs to encode or decode
>> them, especially as there doesn't seem to be a canonical way of finding
>> out which is being used (or whether binary floating-point is). There
>> may not be all that many people who do that, but the code that we write
>> underlies everything else.
>
>As long as canonical conversion opcodes/operations are defined,
>debuggers and compilers can use those without needing to care about the
>internal format, right?

And just how do you convert a file that has been imported from another
system using those opcodes?

> > Distinguishing the three IEEE 754R formats
>> for the same endianness is something that you, I and one or two other
>> posters to this groups could do - but not all that many experienced
>> programmers could!
>
>Which is why I'm hoping the quite trivial conversion operations are a
>required part of any implementation!

As I read it, the requirement is only on systems that use decimal
floating-point, and then only to convert to a specified one of the
DPD or binary decimal format. The only way I can see to find out
which is the native format is to compare the bit-patterns, and I am
not at all sure that is required to work.

Regards,
Nick Maclaren.

Glen Herrmannsfeldt

unread,

Feb 19, 2009, 4:05:05 PM2/19/09

to

MitchAlsup wrote:

> On Feb 18, 11:04 pm, Jeremy Linton <reply-to-l...@nospam.org> wrote:

>> Holy sh*t, it looks like there are _TWO_ standard "incompatible"
>>formats in IEEE 754r. Who thought that would be a good idea? Apparently
>>IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel
>>pushed a software centric Binary Integer Decimal (BID).

> DPD is designed for business applications
> BID is designed to be integrated into binary FPUs

> The biggest difference is the DPD is an unnormalized format, designed
> to allow decimal calculations without the need for pre-alignment nor
> post-normalization. Thus, it is expected that the dat being processed
> remains right-aligned (although there is no requirement for it to
> remain so). So, in a reasonable implementation of DPD, one can survey
> the upper most section of the fraction and if it contains zeros, there
> is an excellent chance that you are processing DPD.

I hadn't heard this one before. There were binary floating point
processors years ago that would keep integer results right aligned,
and with a biased exponent of zero. That allows the same format
to be used for integer and floating point. (I believe some Burroughs
and CDC machines did that.)

As I understand it, it is very easy to convert DPD to BCD, maybe
even when stored in processor registers. That makes it fairly
easy to normalize values, either left or right doesn't make much
difference.

I only heard about BID a few days ago. I suppose it is easier
to process in a binary ALU, but normalization (pre and post) will
be much harder and slower. (Multiply and divide by powers of 10.)

> To a certain extent this is my fault. The architectural review board
> at AMD was ask to support DPD (from IBM) and BID (from Intel). We
> decided that we did not understant enough of the nuanced issues to
> vote one way or the other. And due to this, the Decimal FP comittee
> basically broke down without a winner. This is a disasterous mistake
> for the architecural community, a diasterous mistake for the business
> community, and a diasterous mistake for joe-random human. After
> studying the nuances for a couple of months, I became firmly
> supportive of DPD is the "right" way to do Decimal FP calculations for
> business applications, and for innocent people playing around with
> eXcel-like spreadsheets where decimal results are expected by the non-
> numerical analyst. By the time I realized my error it was too late to
> change the situation. And for this I am quite sorry.

I believe IBM is producing DPD processors. I only heard about BID
recently, and have no information that Intel is producing processors
supporting it. To me it looks like DPD makes efficient use of the
bits, and should be able to be implemented such that it runs about
as fast as binary, and maybe faster. (Fewer levels of logic for
a barrel shifter for pre/post normalization.)

-- glen

Andrew Reilly

unread,

Feb 19, 2009, 6:59:12 PM2/19/09

to

On Thu, 19 Feb 2009 13:46:37 +0000, nmm1 wrote:
> Inter alia, it is going to lead to debuggers, transfer utilities (from
> NetCDF to MPI), compilers, language run-time systems and so on all
> having to guess which format they have or need to produce.

I agree that it sounds like an ugly situation, but it's one that I plan
to stay away from. I expect that if NetCDF ever supports either format,
then it will be type-tagged in the headers, just like other formats.
Endianness is already defined (big) by the XDR underpinnings, I believe.
Do you expect that any serious use of MPI will involve either of the
decimal formats at all, ever? I can't see an argument for scientific
work to use anything other than binary. Do the financial HPC folk use
MPI? In any case, within an MPI application, can't the programmer
arrange agreement on the format by fiat?

Cheers,

--
Andrew

MitchAlsup

unread,

Feb 19, 2009, 7:37:55 PM2/19/09

to

On Feb 19, 3:05 pm, Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> MitchAlsup wrote:
> > On Feb 18, 11:04 pm, Jeremy Linton <reply-to-l...@nospam.org> wrote:
> >> Holy sh*t, it looks like there are _TWO_ standard "incompatible"
> >>formats in IEEE 754r. Who thought that would be a good idea? Apparently
> >>IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel
> >>pushed a software centric Binary Integer Decimal (BID).
> > DPD is designed for business applications
> > BID is designed to be integrated into binary FPUs
> > The biggest difference is the DPD is an unnormalized format, designed
> > to allow decimal calculations without the need for pre-alignment nor
> > post-normalization. Thus, it is expected that the dat being processed
> > remains right-aligned (although there is no requirement for it to
> > remain so). So, in a reasonable implementation of DPD, one can survey
> > the upper most section of the fraction and if it contains zeros, there
> > is an excellent chance that you are processing DPD.
>
> I hadn't heard this one before. There were binary floating point
> processors years ago that would keep integer results right aligned,
> and with a biased exponent of zero. That allows the same format
> to be used for integer and floating point. (I believe some Burroughs
> and CDC machines did that.)

Buroughs and to a certain extent CDC

> As I understand it, it is very easy to convert DPD to BCD, maybe
> even when stored in processor registers. That makes it fairly
> easy to normalize values, either left or right doesn't make much
> difference.

The trick is that DPD can get 3 decimal digits into 10-binary bits
(that is less than 2.4% waste) and expand/compress back into storage
form in 2 gates of delay. It would not surprise me to find an IBM
patent on this scheme.)

>
> I only heard about BID a few days ago. I suppose it is easier
> to process in a binary ALU, but normalization (pre and post) will
> be much harder and slower. (Multiply and divide by powers of 10.)

The real problems with BID is I/O (in ASCII) and rounding. In DPD
rounding only requires decimal shifting, wherease in BID it requires
multiplication (and maybe division (hazy here)).

> I believe IBM is producing DPD processors. I only heard about BID
> recently, and have no information that Intel is producing processors
> supporting it. To me it looks like DPD makes efficient use of the
> bits, and should be able to be implemented such that it runs about
> as fast as binary, and maybe faster. (Fewer levels of logic for
> a barrel shifter for pre/post normalization.)

More importantly, one can set up the DFU pipeline to assume that no
pre-alignment and no post-normalization are required and handle these
as micro-faults.

I also suspect that IBM is doing 128-bit DPD calculations, as this
basically gets rid of overflow and undeflow for any reasonable
business application--including computing the world GDP in the least
valued currency in the world by adding up every single unit of value
transacted that year. I would council anyone contemplating doing a DPD
to just build the 128-bit DFU and have it done with. With 128-bit DFU
and respectable performance, one could garner a considerable amount of
physics calculations that are problematic in 64-bit IEEE 754.

Mitch

Glen Herrmannsfeldt

unread,

Feb 19, 2009, 10:43:51 PM2/19/09

to

MitchAlsup wrote:
(snip)

> The trick is that DPD can get 3 decimal digits into 10-binary bits
> (that is less than 2.4% waste) and expand/compress back into storage
> form in 2 gates of delay. It would not surprise me to find an IBM
> patent on this scheme.)

IBM patents many things to be sure that they can build them
without being sued by others. If they want this to spread,
then they should license if free or minimal cost.

>>I only heard about BID a few days ago. I suppose it is easier
>>to process in a binary ALU, but normalization (pre and post) will
>>be much harder and slower. (Multiply and divide by powers of 10.)

> The real problems with BID is I/O (in ASCII) and rounding. In DPD
> rounding only requires decimal shifting, wherease in BID it requires
> multiplication (and maybe division (hazy here)).

I suppose for business use that is I/O bound. For CPU bound problems,
normalizing will get to you fast. For addition and subtraction,
and to some extent multiplication you might get away with keeping
them unnormalized as long as possible, but with divide you will
(most of the time) have to normalize.

Looking at the Intel web site, it looks like BID is meant for
software implementation, and Intel supplied gcc with a library
of routines to do it. Also, conversions to/from DPD.

(snip)

> More importantly, one can set up the DFU pipeline to assume that no
> pre-alignment and no post-normalization are required and handle these
> as micro-faults.

> I also suspect that IBM is doing 128-bit DPD calculations, as this
> basically gets rid of overflow and undeflow for any reasonable
> business application--including computing the world GDP in the least
> valued currency in the world by adding up every single unit of value
> transacted that year. I would council anyone contemplating doing a DPD
> to just build the 128-bit DFU and have it done with. With 128-bit DFU
> and respectable performance, one could garner a considerable amount of
> physics calculations that are problematic in 64-bit IEEE 754.

IBM has supported 128 bit floating point since the 360/85 in
about 1968. Except for divide it was standard on S/370 and later.
Somewhere in ESA/390 DXR (extended precision divide) was added.

Other than that, others have not provided much in the way
of hardware implementations. Some VAX had it, but most did it
as software emulation on the illegal instruction trap.

If 128 bit DPD began to be supported in hardware, and began
to be used, that would greatly speed up its popularity.

-- glen

nm...@cam.ac.uk

unread,

Feb 20, 2009, 3:25:56 AM2/20/09

to

In article <8fd40e82-28fb-49d3...@o11g2000yql.googlegroups.com>,
MitchAlsup <Mitch...@aol.com> wrote:
>On Feb 19, 3:05=A0pm, Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>>
>> I hadn't heard this one before. =A0There were binary floating point

>> processors years ago that would keep integer results right aligned,

>> and with a biased exponent of zero. =A0That allows the same format
>> to be used for integer and floating point. =A0(I believe some Burroughs

>> and CDC machines did that.)
>
>Buroughs and to a certain extent CDC

And Ferranti.

>> I only heard about BID a few days ago. =A0I suppose it is easier

>> to process in a binary ALU, but normalization (pre and post) will

>> be much harder and slower. =A0(Multiply and divide by powers of 10.)

>
>The real problems with BID is I/O (in ASCII) and rounding. In DPD
>rounding only requires decimal shifting, wherease in BID it requires
>multiplication (and maybe division (hazy here)).

Division, I think. While I am no logic person, I should be surprised
if division by a fixed, small number (10 in this case) took all that
many gates or much time.

>I also suspect that IBM is doing 128-bit DPD calculations, as this
>basically gets rid of overflow and undeflow for any reasonable
>business application--including computing the world GDP in the least
>valued currency in the world by adding up every single unit of value
>transacted that year. I would council anyone contemplating doing a DPD
>to just build the 128-bit DFU and have it done with. With 128-bit DFU
>and respectable performance, one could garner a considerable amount of
>physics calculations that are problematic in 64-bit IEEE 754.

Yes. I was told that they are. And it's critical, because fixed-point
overflow shows up as floating-point inexact! As I pointed out to the
relevant people, that meant that IEEE 754R effectively forbids error
detection when 'using' decimal floating-point to emulate fixed-point,
except when the compiler knows which mode it is in.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Feb 20, 2009, 3:28:35 AM2/20/09

to

In article <gnl8te$6v4$1...@aioe.org>,

Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>MitchAlsup wrote:
>

>> The trick is that DPD can get 3 decimal digits into 10-binary bits
>> (that is less than 2.4% waste) and expand/compress back into storage
>> form in 2 gates of delay. It would not surprise me to find an IBM
>> patent on this scheme.)
>
>IBM patents many things to be sure that they can build them
>without being sued by others. If they want this to spread,
>then they should license if free or minimal cost.

I believe that is the intent.

>IBM has supported 128 bit floating point since the 360/85 in
>about 1968. Except for divide it was standard on S/370 and later.
>Somewhere in ESA/390 DXR (extended precision divide) was added.

And it was a real crock. The miniscule exponent range meant that
it was virtually impossible to use without over- and under-flow.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 20, 2009, 3:52:04 AM2/20/09

to

nm...@cam.ac.uk wrote:
> In article <1qadndXoC8I4JgDU...@giganews.com>,
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>> As long as canonical conversion opcodes/operations are defined,
>> debuggers and compilers can use those without needing to care about the
>> internal format, right?
>
> And just how do you convert a file that has been imported from another
> system using those opcodes?

I would make it illegal to ever store the internal representation to any
kind of file, but I know that's somewhat naive. :-(

Terje Mathisen

unread,

Feb 20, 2009, 4:03:07 AM2/20/09

to

The way I understand the US patent system (i.e. mostly shaking my head
in disbelief), you should not be able to patent something that can only
ever be done in one particular way. Is this correct?

Anyway, there are at least two more or less equivalent ways to do the
000-999 to 10 bits encoding, one of the IBM guys here challenged me to
figure it out a few years ago. I then came up with something which was
slightly different from the IBM way:

My setup, afair, was slightly easier to convert in SW, while the IBM
method is probably optimal for a HW converter.

There might be a few more ways to do it, but I believe they will turn
out to be mostly trivial modifications to one of the two above.

>> I only heard about BID a few days ago. I suppose it is easier
>> to process in a binary ALU, but normalization (pre and post) will
>> be much harder and slower. (Multiply and divide by powers of 10.)
>
> The real problems with BID is I/O (in ASCII) and rounding. In DPD
> rounding only requires decimal shifting, wherease in BID it requires
> multiplication (and maybe division (hazy here)).

Binary can be

>
>> I believe IBM is producing DPD processors. I only heard about BID
>> recently, and have no information that Intel is producing processors
>> supporting it. To me it looks like DPD makes efficient use of the
>> bits, and should be able to be implemented such that it runs about
>> as fast as binary, and maybe faster. (Fewer levels of logic for
>> a barrel shifter for pre/post normalization.)
>
> More importantly, one can set up the DFU pipeline to assume that no
> pre-alignment and no post-normalization are required and handle these
> as micro-faults.
>
> I also suspect that IBM is doing 128-bit DPD calculations, as this
> basically gets rid of overflow and undeflow for any reasonable
> business application--including computing the world GDP in the least
> valued currency in the world by adding up every single unit of value
> transacted that year. I would council anyone contemplating doing a DPD
> to just build the 128-bit DFU and have it done with. With 128-bit DFU
> and respectable performance, one could garner a considerable amount of
> physics calculations that are problematic in 64-bit IEEE 754.
>
> Mitch

Terje Mathisen

unread,

Feb 20, 2009, 4:20:30 AM2/20/09

to

MitchAlsup wrote:
> On Feb 19, 3:05 pm, Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>> I only heard about BID a few days ago. I suppose it is easier
>> to process in a binary ALU, but normalization (pre and post) will
>> be much harder and slower. (Multiply and divide by powers of 10.)
>
> The real problems with BID is I/O (in ASCII) and rounding. In DPD
> rounding only requires decimal shifting, wherease in BID it requires
> multiplication (and maybe division (hazy here)).

(I hit send by accident on my previous msg. :-()

Maybe 10 years ago I discovered a very fast way to convert unsigned
binary numbers into decimal, AMD used to show it in their optimization
manuals (unfortunately without attribution :-().

You never need to use division for this conversion operation,
multiplication by suitable scaled reciprocal powers of 10 is sufficient.

For a 32-bit value my method needs 3 or 4 MULs, all the remaining
operations are fast shift/mask/add/sub, so the total time for 32 bits to
10 ascii digits is about 25-50 cycles depending upon the speed of an
integer MUL.

In my conversion code the input value is first split into two base 1e5
numbers, both of them scaled by 2^28/1e4 (rounded up).

This means that the top decimal digit ends up in the top 4 bits, and can
be extracted with a shift right operation.

Afterwards I mask away the top 4 bits, multiply the result by 5 (i.e.
LEA on x86, shift + add on other architectures) and get the next decimal
digit in the top 5 bits.

When converting very wide binary decimal mantissas I would apply the
above process recursively, splitting the number into 2, 4 or 8 parts
before the final stages can happen in parallel on all the parts.

With 128-bit DFP (with ~112 bits of mantissa?) the speed limiter would
be the initial double-wide reciprocal multiplication.

Terje

Bernd Paysan

unread,

Feb 20, 2009, 6:00:18 AM2/20/09

to

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

>> The trick is that DPD can get 3 decimal digits into 10-binary bits
>> (that is less than 2.4% waste) and expand/compress back into storage
>> form in 2 gates of delay. It would not surprise me to find an IBM
>> patent on this scheme.)
>
> The way I understand the US patent system (i.e. mostly shaking my head
> in disbelief), you should not be able to patent something that can only
> ever be done in one particular way. Is this correct?
>
> Anyway, there are at least two more or less equivalent ways to do the
> 000-999 to 10 bits encoding, one of the IBM guys here challenged me to
> figure it out a few years ago. I then came up with something which was
> slightly different from the IBM way:
>
> My setup, afair, was slightly easier to convert in SW, while the IBM
> method is probably optimal for a HW converter.
>
> There might be a few more ways to do it, but I believe they will turn
> out to be mostly trivial modifications to one of the two above.

For software, the easiest way is to use a lookup table. There are an awful
lot of possible encodings that fit 1000 states into 10 bits (1024!/24!=
~8.73e2615), all equally simple to encode/decode with lookup tables, and
the one where you simply store the integers 0..999 in 10 bits even can be
converted algorithmically (and easily can be converted to binary FP). Now
if you want to find the easiest way to do hardware encoding, you probably
really have to throw all those possible lookup tables into a term
minimizer, and compare the results (you probably can find a heuristic like
alpha-beta min-max to prune the tree ;-). You might even pose restrictions
like "find the encoding with least amount of gates where the collating
sequence is preserved" (there are still ~2.17e48 possible encodings with
preserved collating sequence - maybe there are some which have a
sufficiently simple en- and decoding logic).

Even if you stay with the Chen-Ho/DPD idea of compressing by keeping 0-7 as
3 bits, and 8-9 as 1 bit, leaving space to tell the position, you still
have an awful lot of possible permutations, and you still can change the
constant values.

IMHO, the IBM method is a rather trivial remix of the Chen-Ho encoding, but
the claimed advantage is IMHO unreal - Chen-Ho is left aligned, DPD right
aligned, that's the main difference (so if you want a 7 bit Chen-Ho subset,
you drop the right digit, whereas in DPD, you drop the left).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

MitchAlsup

unread,

Feb 20, 2009, 12:12:53 PM2/20/09

to

On Feb 20, 2:25 am, n...@cam.ac.uk wrote:
> In article <8fd40e82-28fb-49d3-b006-6cf12d836...@o11g2000yql.googlegroups.com>,

>
> MitchAlsup <MitchAl...@aol.com> wrote:
> >On Feb 19, 3:05=A0pm, Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> >> I only heard about BID a few days ago. =A0I suppose it is easier
> >> to process in a binary ALU, but normalization (pre and post) will
> >> be much harder and slower. =A0(Multiply and divide by powers of 10.)
>
> >The real problems with BID is I/O (in ASCII) and rounding. In DPD
> >rounding only requires decimal shifting, wherease in BID it requires
> >multiplication (and maybe division (hazy here)).
>
> Division, I think. While I am no logic person, I should be surprised
> if division by a fixed, small number (10 in this case) took all that
> many gates or much time.

Consider DPD versus BID: and generate a very small 64-bit DFP number
with lots of digits:: 1.234567890123456E-256 (*)

Now multiply this number by 10 five hundred and twelve times.

Do you get exactly 1.234567890123456E+256 ?

In DPD the answer is actually yes. And this occurs because the
calculation actually takes place in decimal. And it does not mater
which digits make up the fraction.

In DPD one can start with the large number and divide it by 10 five
hundred and twelve times and end up with exactly the smaller number.
In DPD one can start with the large number and multiply it by 0.1 five
hundred and twelve times and end up with exactly the smaller number.

Mitch

(*) Any suitable fraction that uses every single bit of its
representation.

nm...@cam.ac.uk

unread,

Feb 20, 2009, 12:23:10 PM2/20/09

to

In article <63c6d94e-5e26-413c...@j38g2000yqa.googlegroups.com>,

MitchAlsup <Mitch...@aol.com> wrote:
>
>Consider DPD versus BID: and generate a very small 64-bit DFP number
>with lots of digits:: 1.234567890123456E-256 (*)
>
>Now multiply this number by 10 five hundred and twelve times.
>
>Do you get exactly 1.234567890123456E+256 ?
>
>In DPD the answer is actually yes. And this occurs because the
>calculation actually takes place in decimal. And it does not mater
>which digits make up the fraction.
>

>(*) Any suitable fraction that uses every single bit of its
>representation.

I am pretty certain that the answer is yes in BID, too, because the
calculation takes place in scaled integers. If the answer ISN'T yes,
then the IEEE 754R people made a spectacular mess of things that I
am pretty certain they know a lot about. Also, when I looked at it,
I thought they had got that aspect right.

While I can be very rude about their design, with very good reason,
I don't think that there is anything to criticise about the consistency
of the arithmetic between DPD and BID. I could be wrong, of course.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 20, 2009, 1:19:47 PM2/20/09

to

Bernd Paysan wrote:
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>> There might be a few more ways to do it, but I believe they will turn
>> out to be mostly trivial modifications to one of the two above.
>
> For software, the easiest way is to use a lookup table. There are an awful
> lot of possible encodings that fit 1000 states into 10 bits (1024!/24!=

The best software method is the one that needs the least storage space,
as long as the processing time is more or less identical, right?

> ~8.73e2615), all equally simple to encode/decode with lookup tables, and
> the one where you simply store the integers 0..999 in 10 bits even can be
> converted algorithmically (and easily can be converted to binary FP). Now
> if you want to find the easiest way to do hardware encoding, you probably
> really have to throw all those possible lookup tables into a term
> minimizer, and compare the results (you probably can find a heuristic like
> alpha-beta min-max to prune the tree ;-). You might even pose restrictions
> like "find the encoding with least amount of gates where the collating
> sequence is preserved" (there are still ~2.17e48 possible encodings with
> preserved collating sequence - maybe there are some which have a
> sufficiently simple en- and decoding logic).

Interesting...

>
> Even if you stay with the Chen-Ho/DPD idea of compressing by keeping 0-7 as
> 3 bits, and 8-9 as 1 bit, leaving space to tell the position, you still
> have an awful lot of possible permutations, and you still can change the
> constant values.

I just looked for encodings that kept as much of the logic pass-through,
and then minimized the size of the required lookup tables.

It is trivial to halve the table size by making the parity bit
passthrough: This needs a 9-bit index into 512 12-bit BCD results, while
the opposite operation would need 2048 9-bit DPD values.

I don't remember now exactly how far down I managed to get those
numbers, but some more should be possible afair.

> IMHO, the IBM method is a rather trivial remix of the Chen-Ho encoding, but
> the claimed advantage is IMHO unreal - Chen-Ho is left aligned, DPD right
> aligned, that's the main difference (so if you want a 7 bit Chen-Ho subset,
> you drop the right digit, whereas in DPD, you drop the left).

Keeping the flag bit on the right end might make a tiny bit of
difference for a compact sw algorithm, if that bit can be used for
predication around a table lookup for the special cases.

Terje

Terje Mathisen

unread,

Feb 20, 2009, 1:43:25 PM2/20/09

to

MitchAlsup wrote:
> Consider DPD versus BID: and generate a very small 64-bit DFP number
> with lots of digits:: 1.234567890123456E-256 (*)
>
> Now multiply this number by 10 five hundred and twelve times.
>
> Do you get exactly 1.234567890123456E+256 ?

You must, which means that a binary encoded representation _must_ keep
the mantissa as an integer and store the decimal exponent separately, right?

>
> In DPD the answer is actually yes. And this occurs because the
> calculation actually takes place in decimal. And it does not mater
> which digits make up the fraction.
>
> In DPD one can start with the large number and divide it by 10 five
> hundred and twelve times and end up with exactly the smaller number.

This should be trivial in both representations, since only the exponent
field is modified. (I really have to check how BID is stored!)
...
OK, I looked at one of the Intel papers, showing that their library is
about an order of magnitude faster than decNumber, the only well-known
sw implementation, and the one GCC can use.

decNumber is based on the BCD/DPD packing, so that factor of 10
represents the sw overhead vs simply working in scaled binary all the time.

> In DPD one can start with the large number and multiply it by 0.1 five
> hundred and twelve times and end up with exactly the smaller number.

Since 0.1 in binary decimal is simply 1 + e-1, that multiplication
becomes an exponent decrement.

Wilco Dijkstra

unread,

Feb 20, 2009, 5:33:52 PM2/20/09

to

<nm...@cam.ac.uk> wrote in message news:gnmotu$p5b$1...@soup.linux.pwf.cam.ac.uk...

> In article <63c6d94e-5e26-413c...@j38g2000yqa.googlegroups.com>,
> MitchAlsup <Mitch...@aol.com> wrote:
>>
>>Consider DPD versus BID: and generate a very small 64-bit DFP number
>>with lots of digits:: 1.234567890123456E-256 (*)
>>
>>Now multiply this number by 10 five hundred and twelve times.
>>
>>Do you get exactly 1.234567890123456E+256 ?
>>
>>In DPD the answer is actually yes. And this occurs because the
>>calculation actually takes place in decimal. And it does not mater
>>which digits make up the fraction.

Same for BID - unlike the current binary format.

>>(*) Any suitable fraction that uses every single bit of its
>>representation.
>
> I am pretty certain that the answer is yes in BID, too, because the
> calculation takes place in scaled integers. If the answer ISN'T yes,
> then the IEEE 754R people made a spectacular mess of things that I
> am pretty certain they know a lot about. Also, when I looked at it,
> I thought they had got that aspect right.
>
> While I can be very rude about their design, with very good reason,
> I don't think that there is anything to criticise about the consistency
> of the arithmetic between DPD and BID. I could be wrong, of course.

Indeed, the key is that both use a decimal exponent. Whether the mantissa
is decimal or binary doesn't actually matter, DPD and BID are just different
encodings of the same value. I'm sure it is possible to convert between the
formats without any loss.

Wilco

Bernd Paysan

unread,

Feb 21, 2009, 4:33:54 PM2/21/09

to

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>> For software, the easiest way is to use a lookup table. There are an
>> awful lot of possible encodings that fit 1000 states into 10 bits
>> (1024!/24!=
>
> The best software method is the one that needs the least storage space,
> as long as the processing time is more or less identical, right?

Yes. My go on code space efficiency for this would be a purely algorithmic
one:

>> and
>> the one where you simply store the integers 0..999 in 10 bits even can be
>> converted algorithmically

You first multiply by a constant (16 bits are sufficient, the constant is 41
or 0x29, which means two leas) to get the first BCD digit in the top 4
bits, and then you do two masks+leas to multiply by 5 to extract the
remaining two digits; this should be a lot simpler than DPD from a software
point of view (see your own posting on fast conversion of integer to
decimal ;-). It has the advantage that you also can use binary string
compares for sorting.

Converting BCD back to the 0..999 format is even simpler, because
multiplying by 10 is very cheap (lea+shift/add, and since you have to add
anyway, just use another lea for that). You can always store integers from
0..99 in a 7 bit block, so you get all the advantages of DPD ;-).

I would call this format BCM, binary coded millelimal.

Terje Mathisen

unread,

Feb 21, 2009, 4:52:30 PM2/21/09

to

Bernd Paysan wrote:
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>>> For software, the easiest way is to use a lookup table. There are an
>>> awful lot of possible encodings that fit 1000 states into 10 bits
>>> (1024!/24!=
>> The best software method is the one that needs the least storage space,
>> as long as the processing time is more or less identical, right?
>
> Yes. My go on code space efficiency for this would be a purely algorithmic
> one:
>
>>> and
>>> the one where you simply store the integers 0..999 in 10 bits even can be
>>> converted algorithmically
>
> You first multiply by a constant (16 bits are sufficient, the constant is 41
> or 0x29, which means two leas) to get the first BCD digit in the top 4

I like 41, I use it instead of div 100 to get the century in my date code:

41/4096 is close enough to 1/100 (rounded up) to give exactly the right
result for all years in a 400-year cycle.

> bits, and then you do two masks+leas to multiply by 5 to extract the
> remaining two digits; this should be a lot simpler than DPD from a software
> point of view (see your own posting on fast conversion of integer to
> decimal ;-). It has the advantage that you also can use binary string
> compares for sorting.

What's important here is that this style of code can work in a wide SIMD
register, while lookup tables really don't scale.

Storing each 10-bit field inside a 16-bit sub-register would allow these
sorts of tricks while working on 8 groups (24 decimal digits) in parallel.

Graphics-optimized vector unit, as used in GPUs and in Larrabee have
special hw to unpack common graphics texture formats, possibly including
2:10:10:10 which could then be (ab)used for DPD?

> Converting BCD back to the 0..999 format is even simpler, because
> multiplying by 10 is very cheap (lea+shift/add, and since you have to add
> anyway, just use another lea for that). You can always store integers from
> 0..99 in a 7 bit block, so you get all the advantages of DPD ;-).
>
> I would call this format BCM, binary coded millelimal.

This is my preferred internal representation for a sw DPD library.

Bernd Paysan

unread,

Feb 21, 2009, 4:55:26 PM2/21/09

to

Bernd Paysan wrote:
> I would call this format BCM, binary coded millelimal.

I forgot to mention, but it should be obvious: You can use BCM directly
(i.e. by applying all the usual tricks how to operate on BCD with digital
logic), instead of converting to BCD before operation.

ha...@watson.ibm.com

unread,

Feb 21, 2009, 10:08:11 PM2/21/09

to

On Feb 19, 4:20 am, Jan Vorbrüggen <Jan.Vorbrueg...@not-thomson.net>
wrote:
[on BID vs DPD decimal FP encodings]
> as compatible as possible - are they? (e.g., regarding range and
> precision)

They are. For each precision, the two encodings represent exactly the
same finite
set of representations (which is a stronger statement than the same
set of values).
Arithmetic and exceptions are also the same, so the only way to tell
them apart is
by looking at the raw bits (e.g. via type overlay). The standard also
requires the
availability of conversion functions between the native encoding and
the other one.

There is an interesting difference in non-canonical encodings. There
are more
encodings than representations, though each representation has exactly
one
canonical encoding (and operations return canonical encodings for
canonical
operands; in fact, arithmetic operations always return canonical
results). Each
bit pattern (canonical or not) is valid, i.e. maps to a valid
representation. For
BID, the value of non-canonical finite numbers is zero, but for DPD it
is not (a
non-canonical declet -- group of 10 bits of the significand --
corresponds to a
triple of decimal digits made up of 8's and 9's).

As for why we ended up with two encodings: IBM already had hardware
that
implemented DPD (announced and shipped near the end of 754r
development).
Intel had published some interesting papers on BID. (I am not
speaking for
either company, of course. I was a member of the P754 committee for
the
last three years of its long existence.)

Michel.

Terje Mathisen

unread,

Feb 22, 2009, 6:52:28 AM2/22/09

to

ha...@watson.ibm.com wrote:
> On Feb 19, 4:20 am, Jan Vorbrüggen <Jan.Vorbrueg...@not-thomson.net>
> wrote: [on BID vs DPD decimal FP encodings]
>> as compatible as possible - are they? (e.g., regarding range and
>> precision)
>
> They are. For each precision, the two encodings represent exactly
> the same finite set of representations (which is a stronger statement
> than the same set of values).

Thanks, this is the way I expected it had to be done.

> Arithmetic and exceptions are also the same, so the only way to tell
> them apart is by looking at the raw bits (e.g. via type overlay).
> The standard also requires the availability of conversion functions
> between the native encoding and the other one.

<BG> Nick, did you see that?

> As for why we ended up with two encodings: IBM already had hardware
> that implemented DPD (announced and shipped near the end of 754r
> development). Intel had published some interesting papers on BID. (I

I have advocated using pure binary mantissas for decimal math here on
c.arch for many years now. :-)

Each time one of the Decimal evangelists try to come up with a benchmark
to "prove" that some form of BCD-style encoding is both needed and the
fastest way to solve a particular problem, I've been able to show that a
BID-style approach can be at least as fast, and much faster when you
don't have hardware 754r/DPD support.

nm...@cam.ac.uk

unread,

Feb 22, 2009, 8:04:14 AM2/22/09

to

In article <6N6dnW9ysvPgoTzU...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

>ha...@watson.ibm.com wrote:
>
>> Arithmetic and exceptions are also the same, so the only way to tell
>> them apart is by looking at the raw bits (e.g. via type overlay).
>> The standard also requires the availability of conversion functions
>> between the native encoding and the other one.
>
><BG> Nick, did you see that?

I thought that I had posted an equivalent statement! Maybe I managed
to make it totally confusing ....

>Each time one of the Decimal evangelists try to come up with a benchmark
>to "prove" that some form of BCD-style encoding is both needed and the
>fastest way to solve a particular problem, I've been able to show that a
>BID-style approach can be at least as fast, and much faster when you
>don't have hardware 754r/DPD support.

As I have every time one of them claimed that decimal floating-point
will help with the emulation of decimal fixed-point :-(

This whole thing is a completely ridiculous idea, based on a shallow
analysis of the requirements. Sorry (Mike and others), but that is so.
The upside is that it isn't anywhere near as harmful as was made out in
the 1960s and 1970s - I already teach the few extra techniques needed
to cope with it.

What I regret is that the opportunity was not taken to fix the serious
defects of IEEE 754 at the same time, which will continue to block
any attempts to reintroduce numerical software engineering for another
decade :-(

Regards,
Nick Maclaren.

Glen Herrmannsfeldt

unread,

Feb 24, 2009, 3:57:48 PM2/24/09

to

Jeremy Linton wrote:

> Holy sh*t, it looks like there are _TWO_ standard "incompatible"
> formats in IEEE 754r. Who thought that would be a good idea? Apparently
> IBM pushed a hardware centric Densely Packed Decimal (DPD), and Intel

> pushed a software centric Binary Integer Decimal (BID). One is just a
> more tightly packed BCD type format, while the other is a binary integer
> based system.

A little late now, but I was wondering why this post didn't go
into comp.arch.arithmetic?

-- glen