x86 performance: 32-bit vs 64-bit

756 views
Skip to first unread message

unread,
Feb 27, 2011, 9:21:55 AM2/27/11
to golang-nuts
It seems the x86 32-bit compiler produces much slower (30% slower)
code than the x86 64-bit compiler. I am talking about integer code,
not floating-point code.

Can the readers of the golang-nuts group who are running their
applications on both 32-bit and 64-bit machines confirm that 32-bit is
approximately 30% slower than 64-bit?

Dave Cheney

unread,
Feb 27, 2011, 9:38:51 AM2/27/11
to ⚛, golang-nuts
Hard to say, I don't have any remaining 32bit machines of comparable performance to my current 64bit ones.

If I had to guess the reason for the disparity I would suggest the paucity of registers in 32 bit mode causing more register spills.

Dave


Sent from my iPhone

peterGo

unread,
Feb 27, 2011, 3:33:54 PM2/27/11
to golang-nuts
Atom Symbol,

> It seems the x86 32-bit compiler produces much slower (30% slower)
> code than the x86 64-bit compiler. I am talking about integer code,
> not floating-point code.

Which is what you should expect. Take a look at the Go pseudo assembly
code generated by the Go compiler (flag -S) for the functions i32 and
i64 compiled with 8g (32-bit) and 6g (64-bit) respectively.

package main

import (
"fmt"
"testing"
)

func i32() {
var i int32
i += 1
i -= 1
i *= 1
i /= 1
}

func i64() {
var i int64
i += 1
i -= 1
i *= 1
i /= 1
}

func bm32(b *testing.B) {
for i := 0; i < b.N; i++ {
i32()
}
}

func bm64(b *testing.B) {
for i := 0; i < b.N; i++ {
i64()
}
}

func main() {
fmt.Println(testing.Benchmark(bm32))
fmt.Println(testing.Benchmark(bm64))
}

Benchmarks run on Intel Q8300.
8g && 8l:
500000000 5 ns/op
100000000 23 ns/op
6g && 6l:
500000000 5 ns/op
100000000 16 ns/op

Peter

unread,
Feb 28, 2011, 5:27:26 AM2/28/11
to golang-nuts
On Feb 27, 8:33 pm, peterGo <go.peter...@gmail.com> wrote:
> Atom Symbol,

Peter,

> > It seems the x86 32-bit compiler produces much slower (30% slower)
> > code than the x86 64-bit compiler. I am talking about integer code,
> > not floating-point code.
>
> Which is what you should expect.

No, I don't expect it.

Maybe I should have mentioned that I am talking about code making only
*rudimentary* use of 'int64' and 'uint64'. Most integer types and
operations in the code are 8-, 16- or 32-bit, or 'int' and 'uint'.
Obviously, 64-bit operations in 32-bit mode are much slower than 64-
bit operations in 64-bit mode.

The most likely cause for the slowdown is that the x86-32 compiler has
received less attention from the developers than the x86-64 compiler.

Russ Cox

unread,
Feb 28, 2011, 10:19:10 AM2/28/11
to ⚛, golang-nuts
> The most likely cause for the slowdown is that the x86-32 compiler has
> received less attention from the developers than the x86-64 compiler.

At this point they've received about the same amount
of attention. If you can identify a small example that
demonstrates the slowdown, it would help focus said
attentions.

Russ

Ken Thompson

unread,
Feb 28, 2011, 7:46:11 PM2/28/11
to r...@golang.org, ⚛, golang-nuts
the difference is the number of registers available
for optimization. 32-bit compiler has approximately
zero available, while the 64-bit compiler has over
eight. they are used to registerize hot variables.
nothing much can be done.

unread,
Mar 5, 2011, 8:34:39 AM3/5/11
to golang-nuts
On Mar 1, 12:46 am, Ken Thompson <k...@google.com> wrote:
> the difference is the number of registers available
> for optimization. 32-bit compiler has approximately
> zero available, while the 64-bit compiler has over
> eight. they are used to registerize hot variables.
> nothing much can be done.

I tried to take a look at the code generated by 8g. It is complicated,
because I am unable to disassemble the code in GDB (GDB terminates due
to a segfault), and I am unable to run Valgring on the code (Valgrind
terminates due to a segfault). But from the code I have been able to
analyze so far, it seems that the problem might related to
instructions such as [MOVB BP,DI], [MOVB BP,(BX)], [MOVB (BP),BP] and
similar instructions. On x86-32, the *byte* manipulation instructions
are incompatible with the BP register, while x86-64 has no such
problems. I may be wrong, but it seems to me that this particular
problem has been already mentioned somewhere (but I don't recall
exactly where).

There are two solutions to this problem (the 1st solution is better
from long-term perspective, the 2nd one is simpler to implement):

1. Don't let the 32bit compiler generate code with BP/SI/DI in byte
manipulation instructions

2. Some of the pseudo-instructions can be converted to a form
compatible with x86-32. For example, the pseudo-instruction [MOVB
(BP),BP] can be converted to [movzbl (%ebp),%ebp].

Russ Cox

unread,
Mar 5, 2011, 11:14:49 AM3/5/11
to ⚛, golang-nuts
> I tried to take a look at the code generated by 8g. It is complicated,
> because I am unable to disassemble the code in GDB (GDB terminates due
> to a segfault), and I am unable to run Valgring on the code (Valgrind
> terminates due to a segfault).

Please file a bug on the issue tracker explaining how to
reproduce these crashes. Thanks.

You can generate an assembly listing while linking the binary
by using 8l's -a flag.

I can't tell if you are blaming the MOVB instructions for the
crashes of gdb and Valgrind or for the performance slowdown.
The latter seems more likely, so I am assuming that.

It is true that, to make the compilers' jobs easier, the linker
allows them to ask for instructions like MOVB BP, BX, which the
linker implements as three actual x86 instructions, swapping
registers around the actual move
(say, XCHG AX, BP; MOVB AX, BX; XCHG AX, BP).
They only appear as one instruction in the 8l -a output, but
you can see the ruse in the actual instruction bytes displayed.
I would be very interested to see evidence that this trick
is causing a performance problem.

In C this rarely comes up, because all of the "usual arithmetic
conversions" convert up to int before any work happens.
In Go the 8-bit arithmetic operations exercise this workaround
more frequently. If it does turn out that this is causing a
performance problem, the easiest solution is probably to
say that the 8-bit (and maybe 16-bit) values are still represented
as 32-bit registers and just make sure to get the operations right
(only divide and right shift would need special care, I think).

Russ

unread,
Mar 7, 2011, 4:33:54 AM3/7/11
to golang-nuts
On Mar 5, 4:14 pm, Russ Cox <r...@golang.org> wrote:
> I can't tell if you are blaming the MOVB instructions for the
> crashes of gdb and Valgrind or for the performance slowdown.
> The latter seems more likely, so I am assuming that.

Your assumption was correct.

> It is true that, to make the compilers' jobs easier, the linker
> allows them to ask for instructions like MOVB BP, BX, which the
> linker implements as three actual x86 instructions, swapping
> registers around the actual move
> (say, XCHG AX, BP; MOVB AX, BX; XCHG AX, BP).
> They only appear as one instruction in the 8l -a output, but
> you can see the ruse in the actual instruction bytes displayed.
> I would be very interested to see evidence that this trick
> is causing a performance problem.

I applied two changes to "src/pkg/8g" on my computer:

- add an optimization pass that converts instructions such as MOVB
BP,BX into MOVL BP,BX, and instructions such as MOVB (BX),BP into
MOVBLZX (BX),BP

- change the peep-hole optimizer so that optimizations applied to MOVB
and MOVW are exactly the same as the ones applied to MOVL. This should
produce correct code because the compiler is never using the upper
parts of the registers where it stored 8/16-bit values (so MOVB/MOVW
can be treated as if they were equivalent to MOVL). Before this
change, the usage of MOVB and MOVW instructions was preventing the
compiler from performing certain optimizations.

The program (gospeccy) I am using for testing these optimizations is
making a heavy use of the types 'byte' and 'uint16'. After the
changes, the speed increased by 9% on average, and by 16% in one
particular case.

As expected, there are no measurable performance changes in any other
Go programs, because "normal" Go programs are mostly using the 'int'
and 'uint' types.

I can send you the patch if the Go team is interested in incorporating
the changes into mainline Go.

unread,
Mar 10, 2011, 12:02:10 PM3/10/11
to golang-nuts
On Mar 7, 9:33 am, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> I can send you the patch if the Go team is interested in incorporating
> the changes into mainline Go.

A small note for those who are interested to know whether the patch
will appear in Go. The answer is: no. This conclusion has been arrived
at after some off-the-list discussion about the licensing process.

But don't panic, it may happen that the Go team will implement the
relatively simple source code changes based on the descriptions I
provided in this forum thread so far.
Reply all
Reply to author
Forward
0 new messages