float vs. double

3 views
Skip to first unread message

MacRules

unread,
Jun 22, 2008, 12:37:54 PM6/22/08
to
I have inherited some plain old C code written back in '94 that uses
only floats. I did a little empirical testing on my Mac (2 x 2.8 GHz
Quad-Core Intel Xeon) to see whether it would be worthwhile to change
everything to double. I would like the greater accuracy, but don't
want any kind of performance hit.

I'm using Xcode 3.0 with gcc 4.0.

My first experiment was to take a look at the number of information-
bearing digits with float vs. double:

printf("float: %.30f\n",(float)1/9);
printf("double: %.58lf\n",(double)1/9);

Result:
float: 0.111111111938953399658203125000
double: 0.1111111111111111049432054187491303309798240661621093750000

>> Question: I was a bit surprised to see a substantial amount of (what looks like) garbage in the least significant digits of these values - should I be?

>> My next experiment was to measure latency. Here's the code:

void ASG_fd_timing_test(void) {
ASG_TIMER timer;
long num_iter;
float a_f,b_f,c_f,d_f,e_f,f_f;
double a_d,b_d,c_d,d_d,e_d,f_d;
int duration_float;
int duration_dble;
int n;

printf("Number of iterations? ");
scanf("%ld",&num_iter);

// --- start timer ---

ASG_timing_init(
&timer);

ASG_timing_stamp_advance(
&timer);

// --- do float testing ---

a_f = 1.6;
b_f = -0.33;
for (n=0; n<num_iter; n++) {
c_f = a_f + b_f;
d_f = a_f - b_f;
e_f = a_f * b_f;
f_f = a_f / b_f;
a_f += 3.2;
b_f -= 1.05;

// -- record stop time --

ASG_timing_stamp_advance(
&timer);

// --- do double testing ---

a_d = 1.6;
b_d = -0.33;
for (n=0; n<num_iter; n++) {
c_d = a_d + b_d;
d_d = a_d - b_d;
e_d = a_d * b_d;
f_d = a_d / b_d;
a_d += 3.2;
b_d -= 1.05; }

// -- record stop time --

ASG_timing_stamp_advance(
&timer);

// -- find float duration --

ASG_timing_find_delta(
&timer.timestamp[0],
&timer.timestamp[1],
&duration_float);

// -- find double duration --

ASG_timing_find_delta(
&timer.timestamp[1],
&timer.timestamp[2],
&duration_dble);

printf("\nDuration float, double = %d %d ms
\n",duration_float,duration_dble);
}

The results:

Number of iterations? 1000000000

Duration float, double = 359 358 ms

>> It appears that using doubles has no performance hit. If I took a look at the ASM code and knew more about Intel processors, perhaps I would know why. My tentative conclusion is that I should go ahead and upgrade. Any comments?

TIA,
Steve


MacRules

unread,
Jun 22, 2008, 10:55:39 PM6/22/08
to
<snip>

>
> // --- do float testing ---
>
>         a_f = 1.6;
>         b_f = -0.33;
>         for (n=0; n<num_iter; n++) {
>                 c_f = a_f + b_f;
>                 d_f = a_f - b_f;
>                 e_f = a_f * b_f;
>                 f_f = a_f / b_f;
>                 a_f += 3.2;
>                 b_f -= 1.05;
>

<snip>

> // --- do double testing ---
>
>         a_d = 1.6;
>         b_d = -0.33;
>         for (n=0; n<num_iter; n++) {
>                 c_d = a_d + b_d;
>                 d_d = a_d - b_d;
>                 e_d = a_d * b_d;
>                 f_d = a_d / b_d;
>                 a_d += 3.2;
>                 b_d -= 1.05; }
>

Sorry - stupid mistake. I took a look at the ASM code - the compiler
optimizer was smart enough to figure out that the variables inside
these loops aren't used for anything, and deleted them. However, the
optimizer wasn't smart enough to figure out that the resulting empty
FOR loops aren't doing anything, either, so it left these in.

I added some printf() calls to be bottom of the function, making all
of the variables above 'relevant' and thus forcing the optimizer to
leave the FOR loop contents intact:

printf("a_f b_f c_f d_f e_f f_f = %f %f %f %f %f %f
\n",a_f,b_f,c_f,d_f,e_f,f_f);
printf("a_d b_d c_d d_d e_d f_d = %lf %lf %lf %lf %lf %lf
\n",a_d,b_d,c_d,d_d,e_d,f_d);

I then verified that the compiler is now doing the requested
mathematical operations with floats and doubles.

The new results:

Number of iterations? 50000000

Duration float, double = 643 361 ms
a_f b_f c_f d_f e_f f_f = 67108864.000000 -33554432.000000
33554432.000000 100663296.000000 -2251799813685248.000000 -2.000000
a_d b_d c_d d_d e_d f_d = 160000001.592189 -52500000.293033
107499999.149156 212499997.635222 -8399999794475229.000000 -3.047619

I don't know why double would be twice as fast as float, considering
that the ASM code shows a separate instruction for +,-,*,/ for both
floats and doubles. Perhaps the double math instructions take few
internal processor cycles than the float instructions. In any case, my
original conclusion still holds: I should upgrade to double.

-Steve

Gregory Weston

unread,
Jun 22, 2008, 11:42:28 PM6/22/08
to
In article
<0c2cd374-bfef-468f...@k37g2000hsf.googlegroups.com>,
MacRules <dode...@aol.com> wrote:

> I don't know why double would be twice as fast as float, considering
> that the ASM code shows a separate instruction for +,-,*,/ for both
> floats and doubles. Perhaps the double math instructions take few
> internal processor cycles than the float instructions. In any case, my
> original conclusion still holds: I should upgrade to double.
>
> -Steve

Do you actually have an identified problem that you believe this would
solve, or are you doing it for the hell of it?

--
"Harry?" Ron's voice was a mere whisper. "Do you smell something ... burning?"
- Harry Potter and the Odor of the Phoenix

MacRules

unread,
Jun 23, 2008, 8:31:21 AM6/23/08
to
>
> Do you actually have an identified problem that you believe this would
> solve, or are you doing it for the hell of it?
>

The code I inherited is poorly written and fairly complex. In fact,
it's a nightmare. The last thing I want do to is make unnecessary
changes. Even the upgrade from float to double makes me a bit nervous.
The additional accuracy that doubles would bring would be handy, as
long as performance doesn't suffer.

- Steve

Reinder Verlinde

unread,
Jun 23, 2008, 12:56:28 PM6/23/08
to

> <snip>


> Sorry - stupid mistake. I took a look at the ASM code - the compiler
> optimizer was smart enough to figure out that the variables inside
> these loops aren't used for anything, and deleted them. However, the
> optimizer wasn't smart enough to figure out that the resulting empty
> FOR loops aren't doing anything, either, so it left these in.
>
> I added some printf() calls to be bottom of the function, making all
> of the variables above 'relevant' and thus forcing the optimizer to
> leave the FOR loop contents intact:
>
> printf("a_f b_f c_f d_f e_f f_f = %f %f %f %f %f %f
> \n",a_f,b_f,c_f,d_f,e_f,f_f);
> printf("a_d b_d c_d d_d e_d f_d = %lf %lf %lf %lf %lf %lf
> \n",a_d,b_d,c_d,d_d,e_d,f_d);
>
> I then verified that the compiler is now doing the requested
> mathematical operations with floats and doubles.
>
> The new results:
>
> Number of iterations? 50000000
>
> Duration float, double = 643 361 ms
>

> I don't know why double would be twice as fast as float

> [...]


> In any case, my original conclusion still holds: I should upgrade
> to double.

Based on what you told here, I would not conclude that. Your logic still
may be flawed. I would guess that, if you changed your code to read and
write lots of different memory locations, you could reverse the timing
difference.

For example,

float * a_fs = new float[1000000];
float * b_fs = new float[1000000];
float * c_fs = new float[1000000];
...
for( int i = 0; i < 1000000; ++i) {
c_fs[i] = a_fs[i] * b_fs[ i];
}

Reason would be that, for this, the number of bytes that have to be read
and written will be the limiting factors in the speed you get.

I haven't checked, but there also may be effects of the placement of
your variables in cache lines.

Reinder

MacRules

unread,
Jun 23, 2008, 9:32:47 PM6/23/08
to
>    float * a_fs = new float[1000000];
>    float * b_fs = new float[1000000];
>    float * c_fs = new float[1000000];
>    ...
>    for( int i = 0; i < 1000000; ++i) {
>      c_fs[i] = a_fs[i] * b_fs[ i];
>    }
>
> Reason would be that, for this, the number of bytes that have to be read
> and written will be the limiting factors in the speed you get.
>

Thanks for the feedback. This is a valid point. The code I inherited
is heavily memory intensive. I have plans to make it less so, which is
why I concentrated on the efficiency of arithmetic operations first.

For fun, I added two more tests, focusing only on float vs. double
memory I/O:

#define ASG_FD_TEST_ARRAY_LENGTH 2000000 // second
attempt
//#define ASG_FD_TEST_ARRAY_LENGTH 32768 // first
attempt

test_f_e = (float *)malloc(ASG_FD_TEST_ARRAY_LENGTH*sizeof(float));

m1 = 0;
m2 = ASG_FD_TEST_ARRAY_LENGTH - 1;


for (n=0; n<num_iter; n++) {

test_f_e[m1] = test_f_e[m2];
m1++;
if (m1 == ASG_FD_TEST_ARRAY_LENGTH)
m1 = 0;
m2--;
if (m2 < 0)
m2 = ASG_FD_TEST_ARRAY_LENGTH - 1; }

free(test_f_e);

ASG_timing_stamp_advance(
&timer);

test_d_e = (double *)malloc(ASG_FD_TEST_ARRAY_LENGTH*sizeof(double));

m1 = 0;
m2 = ASG_FD_TEST_ARRAY_LENGTH - 1;


for (n=0; n<num_iter; n++) {

test_d_e[m1] = test_d_e[m2];
m1++;
if (m1 == ASG_FD_TEST_ARRAY_LENGTH)
m1 = 0;
m2--;
if (m2 < 0)
m2 = ASG_FD_TEST_ARRAY_LENGTH - 1; }

free(test_d_e);

ASG_timing_stamp_advance(
&timer);

Result #1: when the float and double test array lengths are 32768

MEM I/O: duration float, double = 117 116 ms

Result #2: when the float and double test array lengths are 2000000

MEM I/O: duration float, double = 106 225 ms

Discussion: for result #1, I suspect that an internal cache in the
processor was intercepting my external memory read/write attempts,
leading to no performance difference between floats and doubles. For
result #2, the increased array length defeated the internal cache,
resulting in many external memory read/writes, resulting in longer
latency for the doubles - as you predicted.

Whether this matters for my particular application is unclear. I'm
still leaning toward an upgrade to doubles.

Thanks,
Steve

Korchkidu

unread,
Jun 24, 2008, 10:55:57 AM6/24/08
to
> I don't know why double would be twice as fast as float, considering
> that the ASM code shows a separate instruction for +,-,*,/ for both
> floats and doubles. Perhaps the double math instructions take few
> internal processor cycles than the float instructions. In any case, my
> original conclusion still holds: I should upgrade to double.
Hi,

as far as I know, using double is normally faster than using float.
Indeed, all calculations are done using doubles internally. When you
use floats, they are first cast into doubles, then the computation is
done, and the result is cast bask into float. The only reason I see to
use float is only for memory purposes.

K.

MacRules

unread,
Jun 24, 2008, 12:39:46 PM6/24/08
to
> Indeed, all calculations are done using doubles internally. When you
> use floats, they are first cast into doubles, then the computation is
> done, and the result is cast bask into float.

It depends upon what you mean by 'internally'. If you mean internal to
the CPU, then your observation matches what I've seen so far. If you
mean internal to the compiler, then that doesn't match because the
ASM code I looked at had separate float and double arithmetic
instructions, so the compiler didn't have to do any extra work to
convert float to/from double.

- Steve


Korchkidu

unread,
Jun 29, 2008, 11:14:36 AM6/29/08
to

Hi,

what i meant by "internally" was that the register in your CPU are
"built" for doubles. The assembler code makes the difference between
floats and doubles so that it "knows" that if it is a float, it has to
cast it first before performing the operations. This is of course far
more complex but I think is a quite good summary of what happens
inside. Anyway, as far as I know, use floats only if you have memory
problems and this sould be very rare those days...;)

K.

glenn andreas

unread,
Jun 29, 2008, 11:35:42 AM6/29/08
to
In article
<4b2376d9-e1d5-40d5...@i76g2000hsf.googlegroups.com>,
Korchkidu <korc...@gmail.com> wrote:

Anyway, as far as I know, use floats only if you have memory
> problems and this sould be very rare those days...;)
>
> K.

Or if you want to vectorize your app, since PPC Altivec doesn't support
double vectors at all, and SSE only allows 2 doubles/vector, which
really isn't all that much of an improvement (and perhaps not even
offset the extra overhead that vectorization can require).

Reply all
Reply to author
Forward
0 new messages