Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Performance difference 32bit/64bit userland

11 views
Skip to first unread message

Christoph Biedl

unread,
Feb 7, 2017, 12:30:03 PM2/7/17
to
Hi there,

while preparing other tests I created two installations on two hosts
with identical hardware (LPARs on IBM POWER):

- powerpc (64 bit kernel, 32 bit userland)
- ppc64 (64 bit kernel, 64 bit userland)

Both are up-to-date sid with systemd held to 232-10 (#852811).

Now the surprise: Using the 32 bit userland, CPU bound operations like
gzip or xz are significantly faster (5 to 10 percent). Comparing to x86
where i386 is 10 to 15 percent slower than amd64. Also running
debootstrap showed a similar pattern. All tests were repeated a few
times to rule out any caching or similar effects.

This is not quite satisfying. Anyone an explanation for this?

Christoph
signature.asc

Lennart Sorensen

unread,
Feb 7, 2017, 12:40:02 PM2/7/17
to
64 bit pointers take twice the cache space (and memory bandwidth) of 32
bit pointers. There is essentially not other difference between 32 and
64 bit powerpc.

x86 is different. AMD did a nice job fixing a lot of mistakes in x86
as part of designing the 64 bit version of x86. The big changes are to
double the number of registers, since the x86 has always been terribly
register starved. This alone accounts for most of the performance
improvements you see on 64 bit x86. Of course as a result x32 ought to
actually be slightly faster since it uses the instructions and registers
of x86_64 while using only 32 bit pointers. Smaller pointers mean less
cache usage making room for more stuff to actually be cached and less
memory bandwidth used to move the pointers around in the first place.
Dropping support for x87 FPU instructions and mandating SSE instead is
another huge improvement in x86_64 over x86 although some systems were
already doing sse floating pointer with 32 bit builds if they were
specificly compiled for new 32 bit x86 chips, although I think that
would mean P3 and P4 only.

Most architectures other than x86 loose a bit of speed in 64 bit mode
compared to 32 bit mode unless they make other architectural improvements
at the same time (as x86 did). In some cases those changes were big
enough to justify a 32 bit mode using those enhancements (as x32 does
for x86 and mips n32 does for mips) to get all the advantages without
the 64 bit pointer size disadvantage.

You really have no need for 64 bit pointers except for programs that
need to use more than 2 or 3 GB of memory space. For everything else,
32 bit pointers are better.

So yes this is normal and expected.

--
Len Sorensen

Gatis Visnevskis

unread,
Feb 7, 2017, 3:20:02 PM2/7/17
to
Well explained!

In enterprise world, Java applications will benefit from 64-bit addressing.
Another example is Oracle, 32-bit server side is dropped while ago. So
it also make sense to compile all Oracle client linked apps with 64-bit
compiler. Might be offtopic, but all above is true also for AIX. Most
binaries are 32-bit.

Gasha

John Paul Adrian Glaubitz

unread,
Feb 7, 2017, 3:40:02 PM2/7/17
to
On 02/07/2017 09:05 PM, Gatis Visnevskis wrote:
> In enterprise world, Java applications will benefit from 64-bit addressing.
> Another example is Oracle, 32-bit server side is dropped while ago. So it also
> make sense to compile all Oracle client linked apps with 64-bit compiler. Might
> be offtopic, but all above is true also for AIX. Most binaries are 32-bit.

Yes, but lots of platforms fully migrate to 64-bit userland these days. Like
PowerPC, SPARC used to be 32-bit userland on a 64-bit kernel. However, Oracle
is shifting Linux for SPARC to 64-bit these days.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - glau...@debian.org
`. `' Freie Universitaet Berlin - glau...@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

John Paul Adrian Glaubitz

unread,
Feb 7, 2017, 4:00:03 PM2/7/17
to
On 02/07/2017 09:44 PM, Riccardo Mottola wrote:
> A hypothetical amd32 would be faster.

It actually isn't hypothetical at all:

> https://en.wikipedia.org/wiki/X32_ABI
> https://wiki.debian.org/X32Port

Riccardo Mottola

unread,
Feb 7, 2017, 4:00:03 PM2/7/17
to
Hi Christoph,

Christoph Biedl wrote:
> Now the surprise: Using the 32 bit userland, CPU bound operations like
> gzip or xz are significantly faster (5 to 10 percent). Comparing to x86
> where i386 is 10 to 15 percent slower than amd64. Also running
> debootstrap showed a similar pattern. All tests were repeated a few
> times to rule out any caching or similar effects.
>
> This is not quite satisfying. Anyone an explanation for this?

what do you expect actually? why do you think there should be an
improvement in performance?
Your memory addresses are bigger, so more data to shuffle around, also
in instructions which have memory parameters. You usually enable bigger
math, again you shuffle more data around. More data means more cache
trashing, more registers fill up.

I think this is not only valid for PPC, but was experienced first on
MIPS and then on SPARC when these platform went 64bit many years ago and
for that reason always had a 32-bit environment available.
PPC was designed for 64bit from the beginning, even if 32bit was public
first.

Why is it on x86 different? It is not just i386-64, but it is actually
amd64: AMD enhanced the architecture, including the registers, it is a
new revision of the CPU. A hypothetical amd32 would be faster.


Riccardo

Lennart Sorensen

unread,
Feb 7, 2017, 4:30:02 PM2/7/17
to
On Tue, Feb 07, 2017 at 09:39:21PM +0100, John Paul Adrian Glaubitz wrote:
> Yes, but lots of platforms fully migrate to 64-bit userland these days. Like
> PowerPC, SPARC used to be 32-bit userland on a 64-bit kernel. However, Oracle
> is shifting Linux for SPARC to 64-bit these days.

I think eventually maintaining the dual library builds and the tool
chain and such starts to be a hassle, when things like web browsers and
graphics editing programs and such start to have reasons to want more
than 2 or 3GB of ram. So the tradeoff in performance versus the hassle
of keeping a dual set of libraries around can change towards keeping
things simpler instead once enough applications start to have reason to
want 64 bit.

Of course the x32 architecture existing at all makes you think not
everyone is convinced it isn't worthwhile.

--
Len Sorensen

Lennart Sorensen

unread,
Feb 7, 2017, 4:30:02 PM2/7/17
to
On Tue, Feb 07, 2017 at 10:05:38PM +0200, Gatis Visnevskis wrote:
> Well explained!
>
> In enterprise world, Java applications will benefit from 64-bit addressing.
> Another example is Oracle, 32-bit server side is dropped while ago. So it
> also make sense to compile all Oracle client linked apps with 64-bit
> compiler. Might be offtopic, but all above is true also for AIX. Most
> binaries are 32-bit.

Certainly part of what has kept too much work going into ppc64 is probably
that 32 bit power pc with a 64 bit kernel gives the best performance
for most uses, and just installing a few things 64 bit where needed is
the best balance. Sparc I believe has been similar in the past, where
64 bit kernel to allow more overall memory was useful, but keeping most
things user space 32 bit was the best for performance.

ppc64el is different since it is targeting systems where the applications
are expected to need a lot of ram, and switching the little endian
improves performance when working with GPUs (which are all designed
for the x86 little endian world). At the same time being able to
assume power8 as a minimum cpu means you can gain some instruction set
improvements that normal powerpc can't assume. ppc64 can't make such
assumptions since it will run on a much wider range of 64 bit powerpc
chips.

--
Len Sorensen

John Paul Adrian Glaubitz

unread,
Feb 7, 2017, 4:40:02 PM2/7/17
to
On 02/07/2017 10:26 PM, Lennart Sorensen wrote:
> I think eventually maintaining the dual library builds and the tool
> chain and such starts to be a hassle, when things like web browsers and
> graphics editing programs and such start to have reasons to want more
> than 2 or 3GB of ram.

Not sure. At least on Debian, biarch doesn't involve too much work. I
think people just default to 64 bit binaries because in most cases
the possible performance advantage of 32-bit binaries isn't important
but making everything 64 bit by default means you only have to install
32-bit libraries when you actually need them.

> Of course the x32 architecture existing at all makes you think not
> everyone is convinced it isn't worthwhile.

I think the original motivation behind the port were actually mobile
devices with x86 CPUs. But since Intel was never successful with
x86 in mobile devices, people lost interest. We're still maintaining
it in Debian in any case as it doesn't require much attention and
some people still like to use it.

Intel is actually still using x32 with the "autoilp32" feature in
their compilers. With autoilp32, the compiler will generate 32-bit
code when possible to improve performance.

Breno Leitao

unread,
Feb 8, 2017, 8:40:03 AM2/8/17
to
On 02/07/2017 07:22 PM, Lennart Sorensen wrote:
> ppc64el is different since it is targeting systems where the applications
> are expected to need a lot of ram, and switching the little endian
> improves performance when working with GPUs (which are all designed
> for the x86 little endian world). At the same time being able to
> assume power8 as a minimum cpu means you can gain some instruction set
> improvements that normal powerpc can't assume.

Correct. And we can't do 32-bits userspace also, since we do not have a
32-bits little endian ABI. For ppc64el, bi-arch does not seem to be a valid
feature.

John Paul Adrian Glaubitz

unread,
Feb 8, 2017, 8:50:01 AM2/8/17
to
On 02/08/2017 02:21 PM, Breno Leitao wrote:
> Correct. And we can't do 32-bits userspace also, since we do not have a
> 32-bits little endian ABI. For ppc64el, bi-arch does not seem to be a valid
> feature.

But 32-bit PowerPC, little-endian, seems to be possible in general as Helmut
Grohne has this setup in rebootstrap. He added it with DEB_HOST_ARCH=powerpcel,
but I have no idea whether it's actually POWER8 or something older.

Breno Leitao

unread,
Feb 8, 2017, 9:10:01 AM2/8/17
to
Hi Adrian,

On 02/08/2017 11:41 AM, John Paul Adrian Glaubitz wrote:
> On 02/08/2017 02:21 PM, Breno Leitao wrote:
>> Correct. And we can't do 32-bits userspace also, since we do not have a
>> 32-bits little endian ABI. For ppc64el, bi-arch does not seem to be a valid
>> feature.
>
> But 32-bit PowerPC, little-endian, seems to be possible in general as Helmut
> Grohne has this setup in rebootstrap. He added it with DEB_HOST_ARCH=powerpcel,
> but I have no idea whether it's actually POWER8 or something older.

Hmm, this is based in which ABI? We do not have a Linux 32 bits Little Endian
ABI.

Mathieu Malaterre

unread,
Feb 8, 2017, 9:10:03 AM2/8/17
to
I always assumed it was possible to run ppc32 (be) or ppc64 (be)
userlands on a ppc64el system. Is this a restriction at Linux level or
hardware level ?

-M

Breno Leitao

unread,
Feb 8, 2017, 9:50:02 AM2/8/17
to
I think it is not easy to run cross-endianess userspace applications in Linux.

I think mainly about kernel interfaces, where the kernel will continue to run
in LE, and you have a Big Endian userspace. What would happen if you be
interrupted? You would need to have a 'fix-endianess' layer to have it done IMO.

>From a hardware perspective, the bit endianess is basically a register bit in
POWER, so, from a hardware perspective it is very simple to change endianess.
In fact, we used this artifact in the beginning of the bootstrap, mainly to
avoid 'porting' the endianess-wise code.

Lennart Sorensen

unread,
Feb 8, 2017, 11:40:02 AM2/8/17
to
Yes, I don't think ppc64el architecture has defined a 32 bit variant.
I don't think they see any point.

Of course linux could run little endian on powerpc if someone wanted to.
After all Windows NT did so the CPUs (for the most part) can do it.

--
Len Sorensen

Lennart Sorensen

unread,
Feb 8, 2017, 12:00:01 PM2/8/17
to
On Wed, Feb 08, 2017 at 03:08:41PM +0100, Mathieu Malaterre wrote:
> I always assumed it was possible to run ppc32 (be) or ppc64 (be)
> userlands on a ppc64el system. Is this a restriction at Linux level or
> hardware level ?

I thought at least KVM virtual machines it was possible to do. Not sure
about running big endian binaries on a little endian kernel.

The endian switching feature of powerpc is certainly unusual. It is
the only CPU I know of that allows each process to set its endianess.
prctl has a PR_SET_ENDIAN option that is powerpc only.

--
Len Sorensen
0 new messages