64 bit code

bob

unread,

Oct 10, 2012, 11:22:56 AM10/10/12

to

Was the move to 64 bit code primarily to break the 4 gig memory barrier?

Or were there other equally compelling incentives?

Jongware

unread,

Oct 10, 2012, 11:35:51 AM10/10/12

to

On 10-Oct-12 17:22 PM, bob wrote:
> Was the move to 64 bit code primarily to break the 4 gig memory barrier?
>
> Or were there other equally compelling incentives?

Memory is typically served by a separate bus. There used to be 16/32 bit
CPUs that internally worked with 16 bit words and externally with 32
bits (or possibly the other way around).

One compelling reason could be that Bigger = Better. A native 64 bit
processor can handle integers up to 2^64 ~ 10^19, which admittedly is
not a big advantage (I don't think an integer of such size is useful for
arithmetic), but they also can process 64 bit floating point numbers in
one stride. As far as floating point goes, bigger (more precision!) is
most definitely better.

[Jw]

BartC

unread,

Oct 10, 2012, 12:37:27 PM10/10/12

to

"Jongware" <jong...@no-spam.plz> wrote in message
news:507595db$0$6898$e4fe...@news2.news.xs4all.nl...

Floating point processors could deal with 64-bit numbers even with 32-bit
processors (in fact even with 16-bit ones).

And 64-bit data/instruction busses on a processor didn't really need it to
be 64-bits either.

And the downside of 64-bit processors is *having* to use 64-bits for things
for which 32-bits was perfectly adequate.

They can address over 4GB in one virtual address space, sure, but how many
programs actually need to do that?

--
Bartc

Robert Wessel

unread,

Oct 10, 2012, 1:54:29 PM10/10/12

to

OS's do, and the OS guy's really hate the more complicated schemes for
address more than the native amount of memory. In addition, some
important applications do. Databases, for example, many HPC
workloads...

But to the OP, yes, the 64 bit transition was mainly driven by memory.
Once you had 64 bit addressing, you needed 64 bit registers, then 64
bit operations on those register, etc. Not that those are bad things
(in some case they're quite useful), but it's much easier to simulate
longer operations than bigger addressing.

Note that some CPUs brought other useful enhancements with 64 bit
mode. For example, x86 doubled the number of GPRs, but that really is
orthogonal to the addressing issue.

Fritz Wuehler

unread,

Oct 10, 2012, 3:37:30 PM10/10/12

to

The main driver for 4GB+ address spaces has been databases and even some
filesystems. Other than that, there aren't so many applications that need it
aside from number crunching (which is still possible, just slower) and
probably multimedia stuff like transcoding.

Rui Maciel

unread,

Oct 10, 2012, 3:38:05 PM10/10/12

to

Jongware wrote:

> One compelling reason could be that Bigger = Better. A native 64 bit
> processor can handle integers up to 2^64 ~ 10^19, which admittedly is
> not a big advantage (I don't think an integer of such size is useful
for
> arithmetic), but they also can process 64 bit floating point numbers in
> one stride. As far as floating point goes, bigger (more precision!) is
> most definitely better.

Not quite. There is a significant tradeoff between the computational
cost of an algorithm and the precision of the floating point data type
that is used. In a significant number of number crunching applications,
the double precision type brings an insignificant improvement in
accuracy. Being forced to waste more time crunching some numbers to end
up with practically the exact same result is something which pretty much
everyone wants to avoid.

So no, greater precision is not always better, and can actually be worse.

Rui Maciel

Rui Maciel

unread,

Oct 10, 2012, 3:43:09 PM10/10/12

to

BartC wrote:

> Floating point processors could deal with 64-bit numbers even with 32-bit
> processors (in fact even with 16-bit ones).

And let's not forget about the ever elusive 80-bit extended precision
floating point format, also known as "why bother".

Rui Maciel

Rui Maciel

unread,

Oct 10, 2012, 4:04:56 PM10/10/12

to

BartC wrote:

> And the downside of 64-bit processors is *having* to use 64-bits for
> things for which 32-bits was perfectly adequate.

Is that really a downside if we can have it all essentially for free?

> They can address over 4GB in one virtual address space, sure, but how
> many programs actually need to do that?

If a boundary ceases to be there then people start to go through it. For
example, I've been developing a number crunching application that
frequently needs to address more than 4GB of memory. When I designed it
I paid no attention to any memory addressing limit because simply there
isn't one. If it was there then I would be forced to jump through a
number of hoops just to go around that limit, and that wouldn't be fun.

I also must be poitned out that the x86-64 ISA brought more to the table
than a larger address space. For example, it has twice the number of
general purpose registers, and they are all 64-bit.

Rui Maciel

Ben Pfaff

unread,

Oct 10, 2012, 4:17:18 PM10/10/12

to

"BartC" <b...@freeuk.com> writes:

> And the downside of 64-bit processors is *having* to use 64-bits for
> things for which 32-bits was perfectly adequate.
>
> They can address over 4GB in one virtual address space, sure, but how
> many programs actually need to do that?

This is why Linux on x86 now has the "x32" ABI that runs in
64-bit mode but only uses 32 bits of address space.

BartC

unread,

Oct 10, 2012, 4:52:29 PM10/10/12

to

"Rui Maciel" <rui.m...@gmail.com> wrote in message
news:k54kd4$g1o$1...@speranza.aioe.org...

> BartC wrote:
>
>> And the downside of 64-bit processors is *having* to use 64-bits for
>> things for which 32-bits was perfectly adequate.
>
> Is that really a downside if we can have it all essentially for free?

But it's not free. You need double the memory bandwidth for a start; you've
got twice the throughput, but if your data has to be twice as wide, then
you're back where you started! The stack, for example, might insist on
64-bit values only, ensuring your data is spread out and requiring more
accesses.

>> They can address over 4GB in one virtual address space, sure, but how
>> many programs actually need to do that?
>
> If a boundary ceases to be there then people start to go through it. For
> example, I've been developing a number crunching application that
> frequently needs to address more than 4GB of memory. When I designed it
> I paid no attention to any memory addressing limit because simply there
> isn't one. If it was there then I would be forced to jump through a
> number of hoops just to go around that limit, and that wouldn't be fun.

But it might be everyone else who has to jump through hoops, if there isn't
an easy way of using 32-bit pointers in 64-bit code. And *their* application
might not involve number-crunching, where the overheads of the calculations
would hide any manipulations (via an indirect register for example) needed
to get at the data.

> I also must be poitned out that the x86-64 ISA brought more to the table
> than a larger address space. For example, it has twice the number of
> general purpose registers, and they are all 64-bit.

When the x86-32 (386) came out, many of the new features could be
immediately accessed even from 16-bit code, eg. using a d32 prefix to make
use of 32-bit registers and operations. (Using 32-bit address modes was
harder without OS support.)

With the x86-64, it seems to be all or nothing: either run everything as
64-bits, or run in compatible 32-bit mode, where you don't have access to
the extra, wider registers, nor to 64-bit arithmetic.

--
Bartc

Ian Collins

unread,

Oct 10, 2012, 5:50:11 PM10/10/12

to

On 10/11/12 09:52, BartC wrote:
>
> With the x86-64, it seems to be all or nothing: either run everything as
> 64-bits, or run in compatible 32-bit mode, where you don't have access to
> the extra, wider registers, nor to 64-bit arithmetic.

Which equates to a net loss of nothing and net gain of the 64 bit mode!

Now you have the ability to build and test 32 and 64 bit versions of
your code to see which performs best.

--
Ian Collins

Robin Vowels

unread,

Oct 10, 2012, 6:22:38 PM10/10/12

to

64-bit floating-point numbers have been available from early PC days
(without 64-bit processor). If you mean FPNs stored in 64-bit form,
they have been around since the 1960s.
FPNs with 64-bit mantissa are part of the PC's FPU, whereby each
FPN is requires 80 bits.
None of those required 64-bit processor.

BGB

unread,

Oct 10, 2012, 10:46:02 PM10/10/12

to

many games are pushing towards or going beyond the 4GB limit.

in the not-to-distant future, many games may require a 64-bit OS to
work, especially if higher-density voxel based worlds start getting popular.

say a person makes a Minecraft style game, but chooses 0.25 meters as
the block size, and suddenly the terrain takes 8x as much RAM (a
moderate sized chunk of world needing, say, 8 or 16GB, to really be
played effectively).

as-is, this is pretty much already the case for people running
multiplayer Minecraft servers, where they often need around 32GB or 64GB
of RAM for things to really run effectively.

BGB

unread,

Oct 10, 2012, 10:54:34 PM10/10/12

to

it is worth noting that there is also the x32 ABI, which is basically a
32-bit sub-mode of x86-64 (mostly off in Linux land, it is based on the
SysV/AMD64 ABI).

in this sub-mode, basically the extended registers and arithmetic are
still available, but there is an ABI-level convention to only use 32-bit
pointers and stay within the low 4GB of memory.

BartC

unread,

Oct 11, 2012, 8:11:42 AM10/11/12

to

"BGB" <cr8...@hotmail.com> wrote in message
news:k55cjk$joa$1...@news.albasani.net...

Is this mode supported by the hardware, or does it depend on software
zero-extending pointers as needed to the 64-bits that are expected?

I suspect the latter, from what I have been able to find out.

--
Bartc

BGB

unread,

Oct 11, 2012, 12:17:29 PM10/11/12

to

it is implemented in software, but it does make use of a little quirk
that exists in x86-64, namely that whenever a 32-bit operation is
performed, the register is automatically zero-extended. this allows the
code to essentially largely ignore the upper 32-bits, which are then
basically automatically cleared by any arithmetic ops.

there are a few cases where things may need to be handled specially, as,
since the CPU doesn't actually "know" that the space is supposed to
wraparound after 4GB, it is possible for code to "accidentally" go
outside the 4GB window in certain cases.

this can generally be handled by using an extra step using a register
(we load the target address in a register and then use this to access
memory, rather than doing it directly).

or such...

Fritz Wuehler

unread,

Oct 11, 2012, 1:36:14 PM10/11/12

to

But I heard this doesn't really work

Robert Miles

unread,

Oct 12, 2012, 3:13:03 AM10/12/12

to

On 10/10/2012 10:22 AM, bob wrote:
> Was the move to 64 bit code primarily to break the 4 gig memory barrier?
>
> Or were there other equally compelling incentives?

Some of the server versions of Windows used a different method to get
past the
4 gig memory barrier - using two registers to hold a complete memory
address,
not just one.

Some of the early 8-bit processors used a similar method to get past the 256
byte barrier, and make the new barrier 64 k for two registers, or higher
for three.

Chris Uppal

unread,

Oct 12, 2012, 3:40:59 AM10/12/12

to

Rui Maciel wrote:

> And let's not forget about the ever elusive 80-bit extended precision
> floating point format, also known as "why bother".

For one answer, people might like to read:

http://www.cs.berkeley.edu/~wkahan/Stnfrd50.pdf

linked to from Kahan's page:

http://www.cs.berkeley.edu/~wkahan/

-- chris

Ben Pfaff

unread,

Oct 12, 2012, 11:13:22 AM10/12/12

to

Fritz Wuehler <fr...@spamexpire-201210.rodent.frell.theremailer.net>
writes:

Cite? I don't know of a reason why it shouldn't work.

(But the x32 ABI is very new, so it's possible that some wrinkles
are not yet ironed out.)

BGB

unread,

Oct 12, 2012, 1:36:07 PM10/12/12

to

the upside of 80-bit floats: higher precision than 64-bit doubles;
the downside of 80-bit floats: slightly bigger than 64-bit doubles, and
they are an inconvenient size.

but, I remember vaguely recently running into an issue:
32-bit floats are pretty much the standard in gaming.

so, originally, I used floats, and all was well (actually, many places
were sub-float, as I had a "float28" which basically shaved the low 4
bits off, mostly to allow essentially twiddling a float into a 32-bit
pointer, or a so-called "flonum").

but, then there was a problem:
much more than about 1km from the origin (in a game from a first-person
POV), there was obvious/noticeable jitter.

there was some rendering jitter (twitching geometry), and also the
camera movement would jitter/shake, and if a weapon was fired the
projectile itself would jitter and shake around as it moved, ...

so, there was a problem then: floats were not really sufficient.

second problem:
doubles don't actually work in the graphics hardware (they all assume
"floats are plenty good enough" here as well).

so, partial solution:
many intermediate (server-end) calculations (and storage), are done
using doubles (mostly things involving the object origin);
the client-side scene-graph also uses doubles.

however, there is a "reference point" which is now subtracted out of
origins when sending them between the client and server (the main place
where the 28-bit truncation occurs, but sending full doubles through
here would be much more costly), and also the client-side rendering is
performed relative to this reference-point (it is subtracted out when
things are sent to the renderer).

note that only about a float-28 worth of accuracy is sent, but since it
is relative to a point near the camera, they are more "the bits that count".

suddenly, now, the scene is no longer limited to around 1km or so (or,
at least, not by jitter), however the second problem is that scenes are
voxel-based, similar to Minecraft, and going too much over 1km^2
requires large amounts of RAM. a partial way to combat this issue
essentially involves RLE-compressing the voxel-chunks in RAM, and
decompressing the chunks as-needed, though, there is still a limit as
there is currently still no mechanism to "unload" chunks or regions
which are out of range.

or such...

Rui Maciel

unread,

Oct 12, 2012, 3:57:36 PM10/12/12

to

Chris Uppal wrote:

> Rui Maciel wrote:
>
>> And let's not forget about the ever elusive 80-bit extended precision
>> floating point format, also known as "why bother".
>
> For one answer, people might like to read:
>
> http://www.cs.berkeley.edu/~wkahan/Stnfrd50.pdf

It appears that this text omits some important details which weight
significantly on this issue. For example, in the text's part 2, where the
author refers to discretized elliptic boundary-value problems, it is said:

«(...) For many reasons not necessarily spawned by roundoff, the solution u
has to be computed by an iteration, and that always entails the computation
of a residual

r := b – (A + Diag(q))·u

The final accuracy of the computed u is limited by the accuracy with which
the residual r can be computed. The accuracy of u , which is typically a
potential, has to be sufficient to support differencing to estimate the
Gradient Grad U(x), a field strength, without too much loss of accuracy to
cancellation.»

What must be taken under consideration in this regard is that the
coefficients of A, q and b tend to assume the form of integral expressions,
whose values are generally obtained by employing quadrature and cubature
rules.

The problem with this approach is that, in general, these
quadrature/cubature rules were developed so that their result is only exact
if they are used to integrate functions of a specific type. For example, a
popular choise, the set of Gauss-Legendre quadrature rules, were devised to
return the exact value of an integral if and only if the integrand function
is a polynomial up to a certain degree. When these quadrature rules are
employed to integrate functions other than the one they actually can
integrate, which is what happens in practice, they only return approximate
results.

In other words, in this example, Matrix A and vectors q and b generally are
themselves the result of an approximate calculation, and their error isn't
small. To put things in perspective, It's not uncommon for these errors to
be in the 1%-0.1% range. In some commercial number crunching applications
that implement this class of methods, the error even goes up to 3%-4% in
come applications.

Meanwhile, discussing whether floats or doubles should be used is equivalent
to discussing the significance of an error in the range of 0.0001%-0.00001%.

Hence, why bother?

To add insult to injury, discretized elliptic boundary-value problems only
provide approximate solutions, unless we are dealing with a very specific
problem. This means that if somehow it was possible to run all floating
point operations without introducing any rounding error or any precision
degradation, the end result would always be an approximate solution, not the
exact one.

So, although it might be nice to have heaps of precision at our disposal and
a profound knowledge on how to leverage the existing tools to minimize this
type of errors, spending time on this type of optimization, considering the
returns that can be had from that investment, represents a complete waste of
time.

Rui Maciel

BartC

unread,

Oct 12, 2012, 4:30:49 PM10/12/12

to

"Rui Maciel" <rui.m...@gmail.com> wrote in message

news:k59snb$oqm$1...@speranza.aioe.org...

> Chris Uppal wrote:
>
>> Rui Maciel wrote:
>>
>>> And let's not forget about the ever elusive 80-bit extended precision
>>> floating point format, also known as "why bother".
>>
>> For one answer, people might like to read:
>>
>> http://www.cs.berkeley.edu/~wkahan/Stnfrd50.pdf

> In other words, in this example, Matrix A and vectors q and b generally
> are
> themselves the result of an approximate calculation, and their error isn't
> small. To put things in perspective, It's not uncommon for these errors to
> be in the 1%-0.1% range. In some commercial number crunching applications
> that implement this class of methods, the error even goes up to 3%-4% in
> come applications.
>
> Meanwhile, discussing whether floats or doubles should be used is
> equivalent
> to discussing the significance of an error in the range of
> 0.0001%-0.00001%.
>
> Hence, why bother?

I didn't understand any of your argument. Using more precision is always
going to help some types of calculations. The difference between float and
double is a matter of 29 or so bits, about 500 million times more precision,
not the 10 to 100 million your figures suggest.

And the real reason for 80 bits, is probably that the mantissa is exactly 64
bits; that makes it attractive for the same sorts of reasons that 64-bit
processors are more popular than 52-bit ones. An 80-bit floating-point value
can also represent a 64-bit integer exactly (signed *or* unsigned I
believe). It's worth bothering with!

--
Bartc

BGB

unread,

Oct 12, 2012, 5:33:18 PM10/12/12

to

but it is an inconvenient size in that it requires 80 bits to store,
which is not much more than 64 bits, but kills power-of-2 alignment,
whereas a 64-bit double is still aligned by a power-of-2.

the next power-of-2 size is 128 bits, but using a 128-bit space to store
an 80-bit value isn't very good, as nearly 1/2 of it is wasted (this is
often what ends up in compilers though).

a float128 value is better, but then the drawback is a lack of hardware
support, making it slow.

so, the cost/benefit tradeoffs point to double, which can exactly
represent a value up to 52 bits, still goes pretty fast, and has a
convenient storage size.

or such...

BartC

unread,

Oct 12, 2012, 7:44:35 PM10/12/12

to

"BGB" <cr8...@hotmail.com> wrote in message

news:k5a2hk$dve$1...@news.albasani.net...

> On 10/12/2012 3:30 PM, BartC wrote:
>> "Rui Maciel" <rui.m...@gmail.com> wrote in message

>>> Hence, why bother?

>> It's worth bothering with!
>>
>
> but it is an inconvenient size in that it requires 80 bits to store, which
> is not much more than 64 bits, but kills power-of-2 alignment, whereas a
> 64-bit double is still aligned by a power-of-2.

80-bits might have been intended for intermediate results only, hence no
need to store them in that form in memory, although it was possible to do
so. And when they were stored, it was more likely for individual results,
rather than an array of such values.

> the next power-of-2 size is 128 bits, but using a 128-bit space to store
> an 80-bit value isn't very good, as nearly 1/2 of it is wasted (this is
> often what ends up in compilers though).

But, this thread also discussed shorter pointers (such as 40-bit ones,
enough to address approx 1TB of RAM) needing 64 bits to store them in. That
40/64 ratio is the same as 80/128! While the spare 48 bits between 80-bit
floats could be used to store useful data in a way not possible with the
pointers.

--
Bartc

Ben Pfaff

unread,

Oct 12, 2012, 11:07:11 PM10/12/12

to

BGB <cr8...@hotmail.com> writes:

> On 10/12/2012 3:30 PM, BartC wrote:
>> And the real reason for 80 bits, is probably that the mantissa is
>> exactly 64
>> bits; that makes it attractive for the same sorts of reasons that 64-bit
>> processors are more popular than 52-bit ones. An 80-bit floating-point
>> value can also represent a 64-bit integer exactly (signed *or* unsigned
>> I believe). It's worth bothering with!
>
> but it is an inconvenient size in that it requires 80 bits to store,
> which is not much more than 64 bits, but kills power-of-2 alignment,
> whereas a 64-bit double is still aligned by a power-of-2.

The 8087, that introduced this 80-bit format, was coupled with
the 8086, which did not benefit from alignment beyond 16 bits.
And the 8088 that the 8087 was perhaps more often coupled with
did not benefit from alignment beyond 8-bit.

Dmitry A. Kazakov

unread,

Oct 13, 2012, 4:37:10 AM10/13/12

to

Great reading, thanks!

I remember George Forsythe’s excellent analysis of numeric computations on
the example of solving a quadratic equation. Basically it showed how
accuracy may flow from point to point. In the example it was one root (very
accurate) and another (catastrophic).

Nonetheless, it is worth to mention that configurable rounding is not an
answer. The answer is IMO interval computations, which always deliver an
*accurate* result (interval certainly containing the exact result).

"Therefore we must (re)design computer architectures, languages
and program-development environments to diminish rather than
enlarge the capture cross-section for numerical misadventure of
programs written by clever but numerically naive programmers"

Very true, and not limited to numerical programs only.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Rui Maciel

unread,

Oct 13, 2012, 6:11:22 AM10/13/12

to

BartC wrote:

> I didn't understand any of your argument. Using more precision is always
> going to help some types of calculations.

Some, not all. And if we really look into it, probably not most, at least
in a meaningful way.

Then, the argument I made in the previous post was that in a significant
number of applications, the precision loss brought by relying on single
precision types instead of higher-precision ones is completely irrelevant,
as it is dwarfed when compared to other errors introduced by the
implementation. So, as it was the case in the "part 2" example I referred
to, we see tons of attention invested in floating point while, at best, it
is responsible for introducing an error which is orders of magnitude smaller
than the error already introduced by the method itself.

Finally, when we factor in the performance penalty associated with using
higher precision types, the irrelevant effect on precision factored with the
higher computational cost does show that picking a higher precision floating
point type can and does pose a problem.

Rui Maciel

BartC

unread,

Oct 13, 2012, 8:11:00 AM10/13/12

to

"Rui Maciel" <rui.m...@gmail.com> wrote in message

news:k5beod$t0h$1...@speranza.aioe.org...

> BartC wrote:
>
>> I didn't understand any of your argument. Using more precision is always
>> going to help some types of calculations.
>
> Some, not all. And if we really look into it, probably not most, at least
> in a meaningful way.

> Finally, when we factor in the performance penalty associated with using
> higher precision types, the irrelevant effect on precision factored with
> the
> higher computational cost does show that picking a higher precision
> floating
> point type can and does pose a problem.

If the hardware can directly deal with 64-bit floats, then there is little
performance loss (compared with attempting 64-bit arithmetic on 32-bit
hardware, or in software).

The only issue might be memory bandwidth when dealing with large numbers of
floating point values, although if they are accessed mainly for calculation,
then it's possible the calculation will dominate the timings.

Anyway, I remember using 32-bit floats in an application (originally using
software emulation), and they weren't of sufficient accuracy for some types
of problems (mapping for example). They only have 1 in 8 million precision
after all. Switching to 64-bits solved that problem.

Sometimes there are ways to get around such problems, and continue using
32-bits, but sometimes also it's simpler to just use 64!

--
Bartc

Willem

unread,

Oct 13, 2012, 8:34:57 AM10/13/12

to

BartC wrote:
) If the hardware can directly deal with 64-bit floats, then there is little
) performance loss (compared with attempting 64-bit arithmetic on 32-bit
) hardware, or in software).
)
) The only issue might be memory bandwidth when dealing with large numbers of
) floating point values, although if they are accessed mainly for calculation,
) then it's possible the calculation will dominate the timings.

Most CPU's nowadays can do parallel computations on 4 32-bit floats at the
same time, thus effectively quadrupling the speed.

) Anyway, I remember using 32-bit floats in an application (originally using
) software emulation), and they weren't of sufficient accuracy for some types
) of problems (mapping for example). They only have 1 in 8 million precision
) after all. Switching to 64-bits solved that problem.
)
) Sometimes there are ways to get around such problems, and continue using
) 32-bits, but sometimes also it's simpler to just use 64!

The point is that sometimes, using 32-bit floats is the best option.
Showing cases where 64-bit is the best option is irrelevant to that
position.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

BGB

unread,

Oct 13, 2012, 10:50:16 AM10/13/12

to

On 10/13/2012 7:34 AM, Willem wrote:
> BartC wrote:
> ) If the hardware can directly deal with 64-bit floats, then there is little
> ) performance loss (compared with attempting 64-bit arithmetic on 32-bit
> ) hardware, or in software).
> )
> ) The only issue might be memory bandwidth when dealing with large numbers of
> ) floating point values, although if they are accessed mainly for calculation,
> ) then it's possible the calculation will dominate the timings.
>
> Most CPU's nowadays can do parallel computations on 4 32-bit floats at the
> same time, thus effectively quadrupling the speed.
>

yep.

though in some cases there ends up being the funkiness of multiple types
of FPUs:
the x87 style FPU;
the SSE SIMD FPU.

though in other cases, there just ends up being a lot more FPUs
in-general, with lots of pipelining for x87 code.

> ) Anyway, I remember using 32-bit floats in an application (originally using
> ) software emulation), and they weren't of sufficient accuracy for some types
> ) of problems (mapping for example). They only have 1 in 8 million precision
> ) after all. Switching to 64-bits solved that problem.
> )
> ) Sometimes there are ways to get around such problems, and continue using
> ) 32-bits, but sometimes also it's simpler to just use 64!
>
> The point is that sometimes, using 32-bit floats is the best option.
> Showing cases where 64-bit is the best option is irrelevant to that
> position.
>

yep.

in some cases, double is needed.

in many cases, float is plenty sufficient, and a person is hard-pressed
the huge memory waste that using doubles would bring.

sometimes, we can't even really afford the memory used by floats, and
have to resort to more compact representations (storing values in 8 or
16-bit representations).

for example, my recent foray into colored-lighting for voxels, left me
with this particular scheme:
4 bits, light intensity, following a power curve;
4 bits, light color, based on a modified 16-color palette (6 pure
colors, 6 half-saturation colors, white, orange/'greenish'/'sky').

and to speed up calculations ended up resorting to (*cough*)
color-blending tables.

why all this? because these voxels eat up a fair chunk of RAM.
presently, each voxel is only about 8 bytes, but uses up a good chunk of
a 4GB address space doing so (hence, why the previous RLE hack). such is
the cost of x^3.

now, what about all the vertex-arrays?...
well, these are big as well, can't really afford doubles there either
(nor does the graphics hardware really support them).

the vertex arrays eat lots of RAM, but have a more limited view-distance.

like, having lots more RAM doesn't help much when one puts lots more
into it. a lot of this stuff couldn't really be done on older hardware.

or such...

Rui Maciel

unread,

Oct 13, 2012, 12:05:20 PM10/13/12

to

BartC wrote:

> If the hardware can directly deal with 64-bit floats, then there is little
> performance loss (compared with attempting 64-bit arithmetic on 32-bit
> hardware, or in software).

It really depends on the hardware and how the code was compiled. For
example, the x86-64 ISA includes instructions which take operands that pack
four single-precision numbers, while the double precision version can only
pack two double-precision ones, which means that compilers are able to pull
some magic tricks. For instance, take the following functions:

<code>
float foo(float const A[], float const B[], float const C[], float const
D[])
{
float value = 0;
for(int i = 0; i < 10; i++)
value += A[i]+B[i] + C[i] + D[i];

return value;
}

double bar(double const A[], double const B[], double const C[], double
const D[])
{
float value = 0;
for(int i = 0; i < 10; i++)
value += A[i]+B[i] + C[i] + D[i];

return value;
}
</code>

With g++ 4.6, compiling this code with -mtune=k8 -O2, foo() results in a
loop with 9 instructions, while bar() results in a loop with 11
instructions. If -mtune=k8 -O3 is used, foo() unrolls to 50 instructions
while bar() is unrolled to 71.

This example might be very specific, but it indicates thath the performance
penalty associated with the use of double-precision numbers instead of
single-precision ones may not be that small.

> Anyway, I remember using 32-bit floats in an application (originally using
> software emulation), and they weren't of sufficient accuracy for some
> types of problems (mapping for example). They only have 1 in 8 million
> precision after all. Switching to 64-bits solved that problem.

That isn't a FP error, only a blatant design error. Single-precision
numbers simply weren't capable of representing the numbers that needed to be
represented. It's like trying to represent unicode characters in 7-bit
ASCII.

Rui Maciel

BartC

unread,

Oct 13, 2012, 12:25:11 PM10/13/12

to

"Willem" <wil...@turtle.stack.nl> wrote in message
news:slrnk7invh...@turtle.stack.nl...

> BartC wrote:
> ) If the hardware can directly deal with 64-bit floats, then there is
> little
> ) performance loss (compared with attempting 64-bit arithmetic on 32-bit
> ) hardware, or in software).
> )
> ) The only issue might be memory bandwidth when dealing with large numbers
> of
> ) floating point values, although if they are accessed mainly for
> calculation,
> ) then it's possible the calculation will dominate the timings.
>
> Most CPU's nowadays can do parallel computations on 4 32-bit floats at the
> same time, thus effectively quadrupling the speed.

So? Perhaps they can do parallel computations on integer or fixed point
values even faster, so you need to look at that option too.

If speed is that much of an issue, then you look at all the options. But
given a floating point requirement which I can't immediately parallelise, I
tend to use the 64-bit floating point unit on my machine unless there is an
advantage to using 32-bits (which of course applies to the representation
in memory, since the calculation is always 64-bits (or 80-bits) anyway).

> ) Sometimes there are ways to get around such problems, and continue using
> ) 32-bits, but sometimes also it's simpler to just use 64!
>
> The point is that sometimes, using 32-bit floats is the best option.
> Showing cases where 64-bit is the best option is irrelevant to that
> position.

And maybe 24 or 16-bits floats are best sometimes. Or maybe I don't
understand your point! I was replying to the implication that using higher
precision always imposes a "performance penalty"; I'm saying that sometimes
it doesn't!

--
Bartc

Rui Maciel

unread,

Oct 13, 2012, 12:53:52 PM10/13/12

to

BartC wrote:

>> Most CPU's nowadays can do parallel computations on 4 32-bit floats at
>> the same time, thus effectively quadrupling the speed.
>
> So? Perhaps they can do parallel computations on integer or fixed point
> values even faster, so you need to look at that option too.
>
> If speed is that much of an issue, then you look at all the options.

In number crunching applications, speed is always an issue, only taking
second place to correctness.

> But
> given a floating point requirement which I can't immediately parallelise,
> I tend to use the 64-bit floating point unit on my machine unless there is
> an
> advantage to using 32-bits (which of course applies to the representation
> in memory, since the calculation is always 64-bits (or 80-bits) anyway).

Nowadays, a programmer doesn't necessarily has to explicitly parallelize
their code to benefit from that. Compilers are able to pull some tricks
automatically, even with the default options.

>> ) Sometimes there are ways to get around such problems, and continue
>> using ) 32-bits, but sometimes also it's simpler to just use 64!
>>
>> The point is that sometimes, using 32-bit floats is the best option.
>> Showing cases where 64-bit is the best option is irrelevant to that
>> position.
>
> And maybe 24 or 16-bits floats are best sometimes. Or maybe I don't
> understand your point! I was replying to the implication that using higher
> precision always imposes a "performance penalty"; I'm saying that
> sometimes it doesn't!

The main point is that this "higher precision is always better" mantra is
patently false. There are plenty of cases where higher precision types are
only able to provide an insignificant improvement while causing a
significant performance penalty. Having to pay a higher cost to get nothing
in return is always a bad thing.

Rui Maciel

Chris Uppal

unread,

Oct 27, 2012, 7:18:47 AM10/27/12

to

Dmitry A. Kazakov wrote:

> Nonetheless, it is worth to mention that configurable rounding is not an
> answer. The answer is IMO interval computations, which always deliver an
> *accurate* result (interval certainly containing the exact result).

I've hardly used interval computation at all, but my impression from small
experiments and what I've read on the web (Kahan's page and elsewhere) is that
the intervals quickly become unusable wide (i.e. far too pessimistic).

In any case, if you are using floating-point to /implement/ your interval
arithmetic, then I think you need control over the rounding mode -- since you
can't risk having the lower bound rounded up, or the upper bound rounded down.

I have a package for my PLoC[*] -- Smalltalk -- which uses rational arithmetic
for the end-points (tagged exclusive or inclusive), it's really aimed at
symolic maths over significant ranges (not typicaly error bounds, but "large"
set like [1,3]) but I have occasionally tried to [ab]use it to get an
impression of the likely error bounds on some floating-point computation.
Hasn't been a lot of help to me so far (in that way), but then I don't do a lot
of floating point stuff, and I'm in no way at all a numerical analysist.

-- chris

Chris Uppal

unread,

Oct 27, 2012, 7:54:12 AM10/27/12

to

I wrote:

> I have a package for my PLoC[*] -- Smalltalk -- which uses rational

Forgot to add the footnote: PLoC == Programming Language of Choice

-- chris

Dmitry A. Kazakov

unread,

Oct 27, 2012, 9:13:35 AM10/27/12

to

On Sat, 27 Oct 2012 12:18:47 +0100, Chris Uppal wrote:

> Dmitry A. Kazakov wrote:
>
>> Nonetheless, it is worth to mention that configurable rounding is not an
>> answer. The answer is IMO interval computations, which always deliver an
>> *accurate* result (interval certainly containing the exact result).
>
> I've hardly used interval computation at all, but my impression from small
> experiments and what I've read on the web (Kahan's page and elsewhere) is that
> the intervals quickly become unusable wide (i.e. far too pessimistic).

But what he described in the lecture was trying all sorts of rounding
behaviors and comparing the results. This is just what interval arithmetic
does automatically. It considers all possible outcomes of rounding and the
resulting interval includes all of them.

Now in this context the argument of being too pessimistic is bogus. Let
rounding r1 yields x1 and rounding r2 yields x2. If x1<<x2 there is nothing
to do about that without additional information. How is that better than
the interval [x1, x2]? It is not a problem of computation, it is of
interpreting the results.

> In any case, if you are using floating-point to /implement/ your interval
> arithmetic, then I think you need control over the rounding mode -- since you
> can't risk having the lower bound rounded up, or the upper bound rounded down.

Not really. The only thing you need is a guarantee that the exact result of
the operation is adjacent to the returned value, which all modern CPUs do.
E.g. if the machine operation, say, a+b, returns c, then [c-eps, c+eps]
contains the exact result.

Interval arithmetic is easier to implement when rounding is fixed, like
towards negative infinity. Then you could take [c, c+eps], which is twice
better than when it rounds to the nearest value, because you don't know
where is the exact value, on the left or on the right.

Patricia Shanahan

unread,

Oct 27, 2012, 11:51:36 AM10/27/12

to

On 10/13/2012 1:37 AM, Dmitry A. Kazakov wrote:
...

> Nonetheless, it is worth to mention that configurable rounding is not an
> answer. The answer is IMO interval computations, which always deliver an
> *accurate* result (interval certainly containing the exact result).

Do you have some references for non-trivial interval arithmetic
computations that produced a narrow enough range for the answer to be
useful?

Patricia

LudovicoVan

unread,

Oct 27, 2012, 12:23:40 PM10/27/12

to

"Patricia Shanahan" <pa...@acm.org> wrote in message
news:AM2dnUzGRPuOnhHN...@earthlink.com...

Intervals certainly containing the solution is *closed* interval arithmetic
and it uses outwardly directed rounding. See E. Hansen, G.W. Walster,
"Global Optimization Using Interval Analysis". Systems can be more or less
"sharp", i.e. provide more or less narrow intervals (at the cost of extra
complexity). Some important problems including the Kepler conjecture have
been eventually proved this way.

-LV

Dmitry A. Kazakov

unread,

Oct 27, 2012, 2:43:27 PM10/27/12

to

http://www.cs.utep.edu/interval-comp

I would expect iterative algorithms to converge in terms interval width.

The point is that if interval computation yields a result of the width
making it useless, then a result obtained without intervals is garbage
unless rounding error analysis made. Intervals do not solve the original
mathematical problem, they add safety.

Robert Miles

unread,

Oct 28, 2012, 5:12:51 AM10/28/12

to

On 10/27/2012 8:13 AM, Dmitry A. Kazakov wrote:
> On Sat, 27 Oct 2012 12:18:47 +0100, Chris Uppal wrote:
>
>> Dmitry A. Kazakov wrote:

[snip]

>> In any case, if you are using floating-point to /implement/ your interval
>> arithmetic, then I think you need control over the rounding mode -- since you
>> can't risk having the lower bound rounded up, or the upper bound rounded down.
> Not really. The only thing you need is a guarantee that the exact result of
> the operation is adjacent to the returned value, which all modern CPUs do.
> E.g. if the machine operation, say, a+b, returns c, then [c-eps, c+eps]
> contains the exact result.

Is that adequate for a-b when a and are nearly the same?

Dmitry A. Kazakov

unread,

Oct 28, 2012, 8:10:04 AM10/28/12

to

Yes, + or - makes no difference. In general, for intervals:
[a,b]-[c,d]=[a-d,b-c] where a-d is rounded towards negative infinity, b-c
is rounded towards positive infinity.

LudovicoVan

unread,

Oct 29, 2012, 12:38:33 AM10/29/12

to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1cyg6bpqcn1uu$.w8upkuzwty9o$.dlg@40tude.net...

Some care is needed because, despite different formulations can be
mathematically equivalent, depending on how exactly a formula is written
down errors may or may not amplify, i.e. intervals may or may not widen.
This is called the "dependence problem": each occurrence of a variable in an
interval computation is treated as a different variables. Of course, there
are techniques to tackle this problem.

-LV

Dmitry A. Kazakov

unread,

Oct 29, 2012, 4:30:47 AM10/29/12

to

Yes, e.g. x*x /= x**2.

But the question, as far as I understood it, was about what happens in the
situations close to underflow. The answer is: nothing wrong. Interval
bounds will capture precision loss.

LudovicoVan

unread,

Oct 30, 2012, 6:09:54 PM10/30/12

to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

news:yq9rpdkhtkbq.1a...@40tude.net...

The typical example of the dependence problem is X-X where X is, for
instance, [0;1]. By the rule above, we get [-1;1] and not the more obvious
and quite sharper [0;0].

Again, there are techniques to tackle the problem: rewriting the expression
as you hint at above, but also using "improper" intervals (I think that was
the term, but I am going from memory), i.e. relaxing the rule that the left
end-point must be <= to the right end-point, so that we can rewrite
[0;1]-[0;1] as [0;1]-[1;0] and get, still by the same rule, the sought for
result [0;0].

As said, just from the top of my head: details and rules are of course in
the technical treatments.

-LV

LudovicoVan

unread,

Oct 30, 2012, 6:16:35 PM10/30/12

to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

news:1le7t8ij5jkfp$.4y8buibrxvv7$.dlg@40tude.net...

Sorry but that is not strictly correct either: _closed_ interval arithmetic
(*) can solve in polynomial time problems that are simply unfeasible to
traditional approaches. Informally speaking, this arithmetic works by
removing non-solutions rather than by selecting solutions. Again, a most
famous example is the solution of Kepler's conjecture, which had remained
open for 300+ years. And, of course, entirely new computational
perspectives open, to tackle the general problem of global optimization.

(*) Closed interval arithmetic is not interval arithmetic tout court. The
reference text is E. Hansen, G.W. Walster, "Global Optimization Using
Interval Analysis".

-LV

Dmitry A. Kazakov

unread,

Oct 31, 2012, 4:47:23 AM10/31/12

to

On Tue, 30 Oct 2012 22:09:54 -0000, LudovicoVan wrote:

> Again, there are techniques to tackle the problem: rewriting the expression
> as you hint at above,

There is no magic in dependency analysis, except that it becomes very
complex.

I only wanted to stress that dependency analysis is an equivalent of
rounding analysis. One cannot blame intervals for being too wide without
doing the latter.

> but also using "improper" intervals (I think that was
> the term, but I am going from memory), i.e. relaxing the rule that the left
> end-point must be <= to the right end-point, so that we can rewrite
> [0;1]-[0;1] as [0;1]-[1;0] and get, still by the same rule, the sought for
> result [0;0].

I believe here you mean extended interval arithmetic, where [1,0] means
]-oo,1[U]0,+oo[.

AFAIK it is difficult to deploy them because their arithmetic is not closed
upon standard operations.

P.S. There is much research done in the direction of fuzzy arithmetic,
where various sets of numbers taken to represent estimations of imprecise
values. From that point of view intervals are such numbers with a
rectangular membership function.

Dmitry A. Kazakov

unread,

Oct 31, 2012, 4:53:15 AM10/31/12

to

On Tue, 30 Oct 2012 22:16:35 -0000, LudovicoVan wrote:

> Informally speaking, this arithmetic works by
> removing non-solutions rather than by selecting solutions.

Interesting, this looks like "intuitionistic" numbers describing both
inclusion and/or non-inclusion.

Dmitry A. Kazakov

unread,

Oct 31, 2012, 4:55:57 AM10/31/12

to

On Wed, 31 Oct 2012 09:47:23 +0100, Dmitry A. Kazakov wrote:

> I believe here you mean extended interval arithmetic, where [1,0] means
> ]-oo,1[U]0,+oo[.

]-oo,0[U]1,+oo[ of course.

LudovicoVan

unread,

Oct 31, 2012, 10:32:00 AM10/31/12

to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

news:ggvw01gqi246$.byym06hqymay$.dlg@40tude.net...

> On Tue, 30 Oct 2012 22:09:54 -0000, LudovicoVan wrote:
>
>> Again, there are techniques to tackle the problem: rewriting the
>> expression
>> as you hint at above,
>
> There is no magic in dependency analysis, except that it becomes very
> complex.

Did I mention magic?

> I only wanted to stress that dependency analysis is an equivalent of
> rounding analysis. One cannot blame intervals for being too wide without
> doing the latter.
>
>> but also using "improper" intervals (I think that was
>> the term, but I am going from memory), i.e. relaxing the rule that the
>> left
>> end-point must be <= to the right end-point, so that we can rewrite
>> [0;1]-[0;1] as [0;1]-[1;0] and get, still by the same rule, the sought
>> for
>> result [0;0].
>
> I believe here you mean extended interval arithmetic, where [1,0] means
> ]-oo,1[U]0,+oo[.

If it is extended, then you should rather write [-oo ... +oo], i.e. the
end-points at infinity are included.

Anyway no, I meant and said *closed* interval arithmetic, which is also
extended but that was not the point. The example of X-X that does not give
zero and not even near to zero does not need an extended setting.

> AFAIK it is difficult to deploy them because their arithmetic is not
> closed
> upon standard operations.

In *closed* interval arithmetic there are no undefined operator-operand
combinations.

> P.S. There is much research done in the direction of fuzzy arithmetic,
> where various sets of numbers taken to represent estimations of imprecise
> values. From that point of view intervals are such numbers with a
> rectangular membership function.

Which is not what I was talking about.

-LV

LudovicoVan

unread,

Oct 31, 2012, 10:33:14 AM10/31/12

to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

news:1vdd6sl1vuknj.1...@40tude.net...

> On Tue, 30 Oct 2012 22:16:35 -0000, LudovicoVan wrote:
>
>> Informally speaking, this arithmetic works by
>> removing non-solutions rather than by selecting solutions.
>
> Interesting, this looks like "intuitionistic" numbers describing both
> inclusion and/or non-inclusion.

It looks more like fried bananas.

-LV

Fritz Wuehler

unread,

Oct 31, 2012, 7:25:27 PM10/31/12

to

But it tastes like chicken.

Bananas! The "other" white meat!