Bug#1033643: debvm: consider using cpu=cortex-a57 instead of cpu=max on arm64

Emanuele Rocca

unread,

Mar 29, 2023, 6:10:04 AM3/29/23

to

Package: debvm
Version: 0.2.9

Hi,

some arm64 hosts unfortunately do not have KVM support:

kvm [1]: HYP mode not available

On those systems, running qemu with -cpu=cortex-a57 results in
significantly improved performance compared to -cpu=max.

For example: here is how long it takes debvm-run to reach the point
where the hostname is being set when using -cpu=max:

[ 34.838074] systemd[1]: Hostname set to <testvm>.

Modifying /usr/bin/debvm-run to set CPU=cortex-a57 instead:

[ 12.450115] systemd[1]: Hostname set to <testvm>.

Please consider using CPU=cortex-a57 instead of CPU=max for arm64,
and/or adding a command line switch to override the value of CPU.

Clearly, if kvm support *is* present CPU=host keeps on being the best
choice.

Thank you for writing debvm!

Helmut Grohne

unread,

Mar 29, 2023, 6:50:04 AM3/29/23

to

Hi Emanuele,

On Wed, Mar 29, 2023 at 11:58:24AM +0200, Emanuele Rocca wrote:
> some arm64 hosts unfortunately do not have KVM support:
>
> kvm [1]: HYP mode not available
>
> On those systems, running qemu with -cpu=cortex-a57 results in
> significantly improved performance compared to -cpu=max.
>
> For example: here is how long it takes debvm-run to reach the point
> where the hostname is being set when using -cpu=max:
>
> [ 34.838074] systemd[1]: Hostname set to <testvm>.
>
> Modifying /usr/bin/debvm-run to set CPU=cortex-a57 instead:
>
> [ 12.450115] systemd[1]: Hostname set to <testvm>.

Thanks for the report. Since I mainly use it on amd64, I am not affected
by this change and seek feedback from Arnd and Johannes as they will be
affected.

My initial reaction is that I am slightly opposed to changing this,
because architectures tend to bump baselines and at some point arm64
will likely have a baseline that is not met by cortex-a57. So the max
value poses less maintenance cost on my side.

Now this is two arguments (performance vs maintenance) and we need to
strike a good balance.

> Please consider using CPU=cortex-a57 instead of CPU=max for arm64,
> and/or adding a command line switch to override the value of CPU.

Are you aware that any options passed after a double dash are forwarded
to qemu? Are you also aware that when you pass -cpu twice, the last -cpu
wins? So overriding is as simple as appending "-- --cpu=cortex-a57" to
your debvm-run invocation.

Helmut

Arnd Bergmann

unread,

Mar 29, 2023, 7:20:05 AM3/29/23

to

On Wed, Mar 29, 2023, at 12:41, Helmut Grohne wrote:
> On Wed, Mar 29, 2023 at 11:58:24AM +0200, Emanuele Rocca wrote:
>
> Thanks for the report. Since I mainly use it on amd64, I am not affected
> by this change and seek feedback from Arnd and Johannes as they will be
> affected.

The machine I use has KVM support for 64-bit guests, so
I think you have this the wrong way around: If you run
the arm64 guest with TCG, I would expect to see the
same effect on an arm64 host and an x86 host.

> My initial reaction is that I am slightly opposed to changing this,
> because architectures tend to bump baselines and at some point arm64
> will likely have a baseline that is not met by cortex-a57. So the max
> value poses less maintenance cost on my side.

It will be a very long time before Debian/arm64 can
consider bumping the baseline, as the oldest Cortex-A53
and Cortex-A57 cores are only ten years old at this point,
and the Cortex-A53 is still the most popular core in
currently shipping SoCs, by a wide margin.

> Now this is two arguments (performance vs maintenance) and we need to
> strike a good balance.

The maintenance argument goes both ways I think: Having
the A57 or A53 as the baseline makes it easier when
a package accidentally relies on a feature of a later
core without doing a runtime feature check, so that
would favor using A57 over cpu=max. The advantage
of using cpu=max is normally that this enables additional
features to be used in the guest that may provide
better performance or security, and allow testing those
features.

I don't know why the system performs poorly with cpu=max,
this may be a known issue with one of the features this
enables, or it may be a bug in the kernel or in qemu
that we should fix.

I have no objections to changing the default to
cpu=cortex-a57 for non-KVM runs, but I think more
importantly we should

a) try to reproduce the behavior on an x86-64 host, and

b) figure out the underlying issue.

Arnd

Emanuele Rocca

unread,

Mar 29, 2023, 11:30:05 AM3/29/23

to

Hi,

On 2023-03-29 01:12, Arnd Bergmann wrote:
> a) try to reproduce the behavior on an x86-64 host

Good point. Also on a x86-64 host cpu=cortex-a57 is significantly
faster:

max:
[ 30.086331] systemd[1]: Hostname set to <testvm>.

cortex-a57:
[ 13.870771] systemd[1]: Hostname set to <testvm>.

Helmut Grohne

unread,

Mar 29, 2023, 1:10:06 PM3/29/23

to

Hi Arnd,

On Wed, Mar 29, 2023 at 01:12:45PM +0200, Arnd Bergmann wrote:
> The machine I use has KVM support for 64-bit guests, so
> I think you have this the wrong way around: If you run
> the arm64 guest with TCG, I would expect to see the
> same effect on an arm64 host and an x86 host.

Thanks for correcting me.

> It will be a very long time before Debian/arm64 can
> consider bumping the baseline, as the oldest Cortex-A53
> and Cortex-A57 cores are only ten years old at this point,
> and the Cortex-A53 is still the most popular core in
> currently shipping SoCs, by a wide margin.

This also is a useful bit. Johannes mentioned that autopkgtest uses
cortex-a53.

> The maintenance argument goes both ways I think: Having
> the A57 or A53 as the baseline makes it easier when
> a package accidentally relies on a feature of a later
> core without doing a runtime feature check, so that
> would favor using A57 over cpu=max. The advantage
> of using cpu=max is normally that this enables additional
> features to be used in the guest that may provide
> better performance or security, and allow testing those
> features.

At this time, I am convinced that -cpu max is a suboptimal choice.

> I don't know why the system performs poorly with cpu=max,
> this may be a known issue with one of the features this
> enables, or it may be a bug in the kernel or in qemu
> that we should fix.
>
> I have no objections to changing the default to
> cpu=cortex-a57 for non-KVM runs, but I think more
> importantly we should
>
> a) try to reproduce the behavior on an x86-64 host, and

Yes, it is fully reproducible there. I ran some tests with suggestions
from Arnd:

-cpu max
Startup finished in 23.590s (kernel) + 18.210s (userspace) = 41.800s
-cpu cortex-a53
Startup finished in 6.080s (kernel) + 9.808s (userspace) = 15.889s
-cpu cortex-a57
Startup finished in 6.090s (kernel) + 8.460s (userspace) = 14.551s
-cpu cortex-a76
Startup finished in 6.415s (kernel) + 8.300s (userspace) = 14.715s
-cpu a64fx
Startup finished in 6.373s (kernel) + 9.048s (userspace) = 15.422s
-cpu neoverse-n1
Startup finished in 6.078s (kernel) + 8.367s (userspace) = 14.446s
-cpu max,sve=off,sme=off,pmu=off,lpa2=off,pauth=off
Startup finished in 4.357s (kernel) + 5.405s (userspace) = 9.763s
-cpu max,lpa2=off
Startup finished in 20.854s (kernel) + 18.848s (userspace) = 39.703s
-cpu max,pauth=off
Startup finished in 4.756s (kernel) + 5.678s (userspace) = 10.435s
-cpu max,sme=off
Startup finished in 22.018s (kernel) + 18.335s (userspace) = 40.353s
-cpu max,pmu=off
Startup finished in 21.032s (kernel) + 17.974s (userspace) = 39.007s
-cpu max,pauth-impdef=on
Startup finished in 6.077s (kernel) + 7.241s (userspace) = 13.319s

So pauth seems to be the culprit. This is kinda known, see:
https://qemu-project.gitlab.io/qemu/system/arm/cpu-features.html#tcg-vcpu-features

> b) figure out the underlying issue.

I think we did.

So choosing pauth-impdef over pauth should mostly fix performance. So
given that for kvm we choose cpu=host, I think going higher than
cortex-something would still be sensible. At this point, my preference
is max,pauth-impdef=on. Does anyone disagree? Would someone confirm that
this also speeds up on arm64?

Helmut

Emanuele Rocca

unread,

Mar 29, 2023, 2:00:04 PM3/29/23

to

On 2023-03-29 06:55, Helmut Grohne wrote:
> At this point, my preference is max,pauth-impdef=on.

Agreed.

> Would someone confirm that this also speeds up on arm64?

Confirmed.

Thanks!
ema

Arnd Bergmann

unread,

Mar 29, 2023, 3:20:05 PM3/29/23

to

On Wed, Mar 29, 2023, at 18:55, Helmut Grohne wrote:

> -cpu max
> Startup finished in 23.590s (kernel) + 18.210s (userspace) = 41.800s

> -cpu cortex-a57
> Startup finished in 6.090s (kernel) + 8.460s (userspace) = 14.551s

> -cpu max,pauth=off
> Startup finished in 4.756s (kernel) + 5.678s (userspace) = 10.435s

> -cpu max,pauth-impdef=on
> Startup finished in 6.077s (kernel) + 7.241s (userspace) = 13.319s

Ok, so max,pauth-impdef=on no slower than cortex-a57, but
it's slower than cpu=max was with an old kernel or an old
qemu before the addition of pauth.

> So choosing pauth-impdef over pauth should mostly fix performance. So
> given that for kvm we choose cpu=host, I think going higher than
> cortex-something would still be sensible. At this point, my preference
> is max,pauth-impdef=on. Does anyone disagree?

I think the two most sensible options are max,pauth-impdef=on
or max,pauth=off, which is a tradeoff between performance
and features. With pauth-impdef, it becomes a lot safer
to run untrusted userspace code in the guest, as well
as catching buggy code that triggers the pauth checks
by accident, but 30% slowdown is also quite significant.

Between the two, it depends on which use case you want to optimize for.

Arnd