[RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi

Stephen Boyd

unread,

Nov 20, 2015, 8:24:48 PM11/20/15

to linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

This is a respin of a patch series from about a year ago[1]. I realized
that we already had most of the code in recordmcount to figure out
where we make calls to particular functions, so recording where
we make calls to the integer division functions should be easy enough
to add support for in the same codepaths. Looking back on the thread
it seems like Mans was thinking along the same lines, although it wasn't
obvious to me back then or even over the last few days when I wrote this.

This series extends recordmcount to record the locations of the calls
to the library functions on ARM builds, and puts those locations into a
table that we use to patch instructions at boot. The first two patches
are the recordmcount changes, while the last patch implements the runtime
patching for modules and kernel code. The module part also hooks into the
relocation patching code we already have.

The RFC tag is because I'm thinking of splitting the recordmcount changes
into a new program based on recordmcount so that we don't drag in a lot
of corner cases and stuff when we don't need to. I suspect it will be
cleaner that way too. Does anyone prefer one way or the other?

Comments/feedback appreciated.

[1] http://lkml.kernel.org/r/1383951632-6090-1-g...@codeaurora.org

Cc: Nicolas Pitre <ni...@fluxnic.net>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Måns Rullgård <ma...@mansr.com>

Stephen Boyd (3):
scripts: Allow recordmcount to be used without tracing enabled
recordmcount: Record locations of __aeabi_{u}idiv() calls on ARM
ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

Makefile | 7 +
arch/arm/Kconfig | 14 ++
arch/arm/kernel/module.c | 44 ++++++
arch/arm/kernel/setup.c | 34 +++++
arch/arm/kernel/vmlinux.lds.S | 13 ++
kernel/trace/Kconfig | 4 +
scripts/Makefile.build | 15 +-
scripts/recordmcount.c | 10 +-
scripts/recordmcount.h | 337 +++++++++++++++++++++++++++++++++---------
scripts/recordmcount.pl | 11 +-
10 files changed, 406 insertions(+), 83 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Arnd Bergmann

unread,

Nov 21, 2015, 3:40:57 PM11/21/15

to Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Steven Rostedt, Måns Rullgård

On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
> This is a respin of a patch series from about a year ago[1]. I realized
> that we already had most of the code in recordmcount to figure out
> where we make calls to particular functions, so recording where
> we make calls to the integer division functions should be easy enough
> to add support for in the same codepaths. Looking back on the thread
> it seems like Mans was thinking along the same lines, although it wasn't
> obvious to me back then or even over the last few days when I wrote this.

Shouldn't we start by allowing to build the kernel for -march=armv7ve
on platforms that allow it? That would seem like a simpler change
and likely generate better code for most people, except when you actually
care about running the same binary kernel on older platforms.

I tried to get a complete list of CPU cores with idiv, lpae and
virtualization support at some point, but I don't remember the
details for all Qualcomm and Marvell cores any more, to create the
complete configuration matrix. IIRC, all CPUs that support
virtualization also do lpae (they have to) and all CPUs that
do lpae also do idiv, but the opposite is not true.

Arnd

Måns Rullgård

unread,

Nov 21, 2015, 3:46:19 PM11/21/15

to Arnd Bergmann, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Steven Rostedt

On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann <ar...@arndb.de> wrote:
>On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
>> This is a respin of a patch series from about a year ago[1]. I
>realized
>> that we already had most of the code in recordmcount to figure out
>> where we make calls to particular functions, so recording where
>> we make calls to the integer division functions should be easy enough
>> to add support for in the same codepaths. Looking back on the thread
>> it seems like Mans was thinking along the same lines, although it
>wasn't
>> obvious to me back then or even over the last few days when I wrote
>this.
>
>Shouldn't we start by allowing to build the kernel for -march=armv7ve
>on platforms that allow it? That would seem like a simpler change
>and likely generate better code for most people, except when you
>actually
>care about running the same binary kernel on older platforms.
>
>I tried to get a complete list of CPU cores with idiv, lpae and
>virtualization support at some point, but I don't remember the
>details for all Qualcomm and Marvell cores any more, to create the
>complete configuration matrix. IIRC, all CPUs that support
>virtualization also do lpae (they have to) and all CPUs that
>do lpae also do idiv, but the opposite is not true.
>
> Arnd

The ARM ARM says anything with virt has idiv, lpae doesn't matter. ARMv7-R also has idiv. I've no idea if anyone runs Linux on those though.
--
Måns Rullgård

Arnd Bergmann

unread,

Nov 21, 2015, 4:01:23 PM11/21/15

to linux-ar...@lists.infradead.org, Måns Rullgård, Stephen Boyd, linux-...@vger.kernel.org, Steven Rostedt, linux-...@vger.kernel.org, Nicolas Pitre

On Saturday 21 November 2015 20:45:38 Måns Rullgård wrote:
> On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann <ar...@arndb.de> wrote:
> >On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
> >> This is a respin of a patch series from about a year ago[1]. I
> >realized
> >> that we already had most of the code in recordmcount to figure out
> >> where we make calls to particular functions, so recording where
> >> we make calls to the integer division functions should be easy enough
> >> to add support for in the same codepaths. Looking back on the thread
> >> it seems like Mans was thinking along the same lines, although it
> >wasn't
> >> obvious to me back then or even over the last few days when I wrote
> >this.
> >
> >Shouldn't we start by allowing to build the kernel for -march=armv7ve
> >on platforms that allow it? That would seem like a simpler change
> >and likely generate better code for most people, except when you
> >actually
> >care about running the same binary kernel on older platforms.
> >
> >I tried to get a complete list of CPU cores with idiv, lpae and
> >virtualization support at some point, but I don't remember the
> >details for all Qualcomm and Marvell cores any more, to create the
> >complete configuration matrix. IIRC, all CPUs that support
> >virtualization also do lpae (they have to) and all CPUs that
> >do lpae also do idiv, but the opposite is not true.
> >
>

> The ARM ARM says anything with virt has idiv, lpae doesn't matter.

Ok, and anything with virt also has lpae by definition. The question is
whether we care about using idiv on cores that do not have lpae, or that
have neither lpae nor virt.

We have a related problem at the moment where we don't handle configuration
of lpae correctly in Kconfig: you can simply turn that on for any
ARMv7-only kernel, but it breaks running on Cortex-A8, Cortex-A9
and at least some subset of PJ4/Scorpion/Krait (not sure which).

If we add a way to configure idiv support, we should do it right and
handle lpae correctly too. If we are lucky, each CPU we support
either has both or neither, and then we just need one additional
Kconfig option. We don't need another option for virt, because KVM
support can be handled in a way that it doesn't break on cores with
lpae but without virt (it requires lpae).

> ARMv7-R also has idiv. I've no idea if anyone runs Linux on those though.

Not mainline at least. There were patches at some point, but they
never got merged.

Arnd

Måns Rullgård

unread,

Nov 21, 2015, 5:11:53 PM11/21/15

to Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, linux-...@vger.kernel.org, Steven Rostedt, linux-...@vger.kernel.org, Nicolas Pitre

The question is, are there any such cores? GCC doesn't know of any, but
then it's missing most non-ARM designs.

--
Måns Rullgård
ma...@mansr.com

Arnd Bergmann

unread,

Nov 21, 2015, 6:15:48 PM11/21/15

to Måns Rullgård, linux-ar...@lists.infradead.org, Stephen Boyd, linux-...@vger.kernel.org, Steven Rostedt, linux-...@vger.kernel.org, Nicolas Pitre

On Saturday 21 November 2015 22:11:36 Måns Rullgård wrote:
> Arnd Bergmann <ar...@arndb.de> writes:
> > On Saturday 21 November 2015 20:45:38 Måns Rullgård wrote:
> >> On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann <ar...@arndb.de> wrote:
> >>
> >> The ARM ARM says anything with virt has idiv, lpae doesn't matter.
> >
> > Ok, and anything with virt also has lpae by definition. The question is
> > whether we care about using idiv on cores that do not have lpae, or that
> > have neither lpae nor virt.
>
> The question is, are there any such cores? GCC doesn't know of any, but
> then it's missing most non-ARM designs.

Exactly. Stephen should be able to find out about the Qualcomm cores,
and http://comments.gmane.org/gmane.linux.ports.arm.kernel/426289 has
some information about the others:
* Brahma-B15 supports all three.
* Dove (PJ4) reports idiv only in thumb mode, which I'm tempted to ignore
for the kernel, as it supports neither lpae nor idiva.
* Armada 370/XP (PJ4B) reports support for idiva and idivt, but according to
https://groups.google.com/a/dartlang.org/forum/#!topic/reviews/9wvsJvq0YYY
that may be a lie.
* According to the same source, Krait fails to report idiva and idivt,
but supports both anyway. However, I found reports on the web where
/proc/cpuinfo correctly contains the flags on the same SoC (APQ8064)
that was mentioned there, so maybe they were just running an old
kernel.

Arnd

Arnd Bergmann

unread,

Nov 21, 2015, 6:22:10 PM11/21/15

to Måns Rullgård, linux-ar...@lists.infradead.org, Stephen Boyd, linux-...@vger.kernel.org, Steven Rostedt, linux-...@vger.kernel.org, Nicolas Pitre

This has some more information:

commit 120ecfafabec382c4feb79ff159ef42a39b6d33b
Author: Stepan Moskovchenko <ste...@codeaurora.org>
Date: Mon Mar 18 19:44:16 2013 +0100

ARM: 7678/1: Work around faulty ISAR0 register in some Krait CPUs

Some early versions of the Krait CPU design incorrectly indicate
that they only support the UDIV and SDIV instructions in Thumb
mode when they actually support them in ARM and Thumb mode. It
seems that these CPUs follow the DDI0406B ARM ARM which has two
possible values for the divide instructions field, instead of the
DDI0406C document which has three possible values.

Work around this problem by checking the MIDR against Krait CPUs
with this faulty ISAR0 register and force the hwcaps to indicate
support in both modes.

[sboyd: Rewrote commit text to reflect real reasoning now that
we autodetect udiv/sdiv]

Signed-off-by: Stepan Moskovchenko <ste...@codeaurora.org>
Acked-by: Will Deacon <will....@arm.com>
Signed-off-by: Stephen Boyd <sb...@codeaurora.org>
Signed-off-by: Russell King <rmk+k...@arm.linux.org.uk>

so Krait clearly supports them, and this also explains why some
machines misreport it depending on the CPU version and kernel
release running on it.

Regarding PJ4, it's still unclear whether that has the same
problem and it only reports idivt when it actually supports idiva,
or whether the lack of idiva support on PJ4 is instead the reason
why the ARM ARM was updated to have separate flags.

Peter Maydell

unread,

Nov 22, 2015, 8:30:03 AM11/22/15

to Arnd Bergmann, Måns Rullgård, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On 21 November 2015 at 23:21, Arnd Bergmann <ar...@arndb.de> wrote:
> Regarding PJ4, it's still unclear whether that has the same
> problem and it only reports idivt when it actually supports idiva,
> or whether the lack of idiva support on PJ4 is instead the reason
> why the ARM ARM was updated to have separate flags.

SDIV/IDIV were originally introduced for R and M profile only
and there the Thumb encodings of SDIV/IDIV are mandatory
whereas the ARM ones are optional (and weren't initially
defined at all). So if you're looking for CPUs with only the
Thumb encodings I would try checking older R profile cores
like the Cortex-R4.

thanks
-- PMM

Arnd Bergmann

unread,

Nov 22, 2015, 2:26:31 PM11/22/15

to Peter Maydell, Måns Rullgård, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On Sunday 22 November 2015 13:29:29 Peter Maydell wrote:
> On 21 November 2015 at 23:21, Arnd Bergmann <ar...@arndb.de> wrote:
> > Regarding PJ4, it's still unclear whether that has the same
> > problem and it only reports idivt when it actually supports idiva,
> > or whether the lack of idiva support on PJ4 is instead the reason
> > why the ARM ARM was updated to have separate flags.
>
> SDIV/IDIV were originally introduced for R and M profile only
> and there the Thumb encodings of SDIV/IDIV are mandatory
> whereas the ARM ones are optional (and weren't initially
> defined at all). So if you're looking for CPUs with only the
> Thumb encodings I would try checking older R profile cores
> like the Cortex-R4.

The question is really about Marvell Dove, MMP and Armada 370,
which are all based on PJ4 or PJ4B (CPU part : 0x581), so ARMv7-A
and report idivt support but idiva.

There are a couple of explanations here:

a) Marvell really implemented only idivt but not idiva
and reports it correctly, and the people from
https://groups.google.com/a/dartlang.org/forum/#!topic/reviews/9wvsJvq0YYY
just misinterpreted the flags

b) the dartlag.org folks are correct, and it supports neither
idivt nor idiva, and the /proc/cpuinfo flag is just wrong
and requires a fixup

c) like Krait, it actually implements both idiva and idivt but
gets the reporting wrong.

Arnd

Måns Rullgård

unread,

Nov 22, 2015, 2:30:27 PM11/22/15

to Arnd Bergmann, Peter Maydell, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

It's trivial to test for someone who has one.

--
Måns Rullgård
ma...@mansr.com

Russell King - ARM Linux

unread,

Nov 22, 2015, 2:47:33 PM11/22/15

to Arnd Bergmann, Peter Maydell, Måns Rullgård, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On Sun, Nov 22, 2015 at 08:25:27PM +0100, Arnd Bergmann wrote:
> The question is really about Marvell Dove, MMP and Armada 370,
> which are all based on PJ4 or PJ4B (CPU part : 0x581), so ARMv7-A
> and report idivt support but idiva.

Well, it's pretty hard to test when binutils blocks your ability to
write assembly using the instructions.

root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a9+idiv' -marm
/tmp/cc8WPQiB.s: Assembler messages:
/tmp/cc8WPQiB.s:32: Error: selected processor does not support ARM mode `udiv ip,r5,r4'
root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a9+idiv' -mthumb
/tmp/ccRzgAlM.s: Assembler messages:
/tmp/ccRzgAlM.s:36: Error: selected processor does not support Thumb mode `udiv r6,r5,r4'
root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='marvell-pj4+idiv' -mthumb
/tmp/cc1JYyFD.s: Assembler messages:
/tmp/cc1JYyFD.s:36: Error: selected processor does not support Thumb mode `udiv r6,r5,r4'
root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='marvell-pj4+idiv' -marm
/tmp/ccEQbQpp.s: Assembler messages:
/tmp/ccEQbQpp.s:32: Error: selected processor does not support ARM mode `udiv ip,r5,r4'

That's binutils 2.24 and gcc 4.8.4 as found on Ubuntu 14.04. I'm
sorry, but I don't have spare time to work out what the opcodes
would be.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Arnd Bergmann

unread,

Nov 22, 2015, 2:59:36 PM11/22/15

to Russell King - ARM Linux, Peter Maydell, Måns Rullgård, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On Sunday 22 November 2015 19:47:05 Russell King - ARM Linux wrote:
> On Sun, Nov 22, 2015 at 08:25:27PM +0100, Arnd Bergmann wrote:
> > The question is really about Marvell Dove, MMP and Armada 370,
> > which are all based on PJ4 or PJ4B (CPU part : 0x581), so ARMv7-A
> > and report idivt support but idiva.
>
> Well, it's pretty hard to test when binutils blocks your ability to
> write assembly using the instructions.
>
> root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a9+idiv' -marm
> /tmp/cc8WPQiB.s: Assembler messages:
> /tmp/cc8WPQiB.s:32: Error: selected processor does not support ARM mode `udiv ip,r5,r4'
> root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a9+idiv' -mthumb
> /tmp/ccRzgAlM.s: Assembler messages:
> /tmp/ccRzgAlM.s:36: Error: selected processor does not support Thumb mode `udiv r6,r5,r4'
> root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='marvell-pj4+idiv' -mthumb
> /tmp/cc1JYyFD.s: Assembler messages:
> /tmp/cc1JYyFD.s:36: Error: selected processor does not support Thumb mode `udiv r6,r5,r4'
> root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='marvell-pj4+idiv' -marm
> /tmp/ccEQbQpp.s: Assembler messages:
> /tmp/ccEQbQpp.s:32: Error: selected processor does not support ARM mode `udiv ip,r5,r4'
>
> That's binutils 2.24 and gcc 4.8.4 as found on Ubuntu 14.04. I'm
> sorry, but I don't have spare time to work out what the opcodes
> would be.
>

does it work with -mcpu=cortex-a15? I've tried crosstool as versions
2.23.52.20130913, 2.24.0.20141017 and 2.25.51.20150518, and they
all seem to behave as expected, failing with -mcpu=cortex-a9 and
marvell-pj4 but succeeding with -mcpu=cortex-a15 or marvell-pj4+idiv.

I've also found some /proc/cpuinfo output to cross-reference SoCs
to their core names.

variant part revision name features
mmp2: 0 0x581 5 PJ4 idivt
dove: 0 0x581 5 PJ4 idivt
Armada 370 1 0x581 1 PJ4B idivt
mmp3: 2 0x584 2 PJ4-MP idiva idivt lpae
Armada XP 2 0x584 2 PJ4-MP idiva idivt lpae
Berlin 2 0x584 2 PJ4-MP idiva idivt lpae

Arnd

Russell King - ARM Linux

unread,

Nov 22, 2015, 3:03:50 PM11/22/15

to Arnd Bergmann, Peter Maydell, Måns Rullgård, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On Sun, Nov 22, 2015 at 08:58:08PM +0100, Arnd Bergmann wrote:
> does it work with -mcpu=cortex-a15? I've tried crosstool as versions
> 2.23.52.20130913, 2.24.0.20141017 and 2.25.51.20150518, and they
> all seem to behave as expected, failing with -mcpu=cortex-a9 and
> marvell-pj4 but succeeding with -mcpu=cortex-a15 or marvell-pj4+idiv.

Appears not:

root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a15+idiv' -marm
/tmp/ccSovg32.s: Assembler messages:
/tmp/ccSovg32.s:32: Error: selected processor does not support ARM mode `udiv ip,r5,r4'
root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a15+idiv' -mthumb
/tmp/cchbT3EE.s: Assembler messages:
/tmp/cchbT3EE.s:36: Error: selected processor does not support Thumb mode `udiv r6,r5,r4'

Same without the +idiv.

> I've also found some /proc/cpuinfo output to cross-reference SoCs
> to their core names.
>
> variant part revision name features
> mmp2: 0 0x581 5 PJ4 idivt
> dove: 0 0x581 5 PJ4 idivt

Yes, that agrees with my dove.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Måns Rullgård

unread,

Nov 22, 2015, 3:40:15 PM11/22/15

to Arnd Bergmann, Russell King - ARM Linux, Peter Maydell, Nicolas Pitre, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

Arnd Bergmann <ar...@arndb.de> writes:

> arnd@wuerfel:/tmp$ arm-linux-gnueabihf-gcc -Wall -O2 -mcpu=cortex-a15 idiv.c -c -o idiv-arm.o
> arnd@wuerfel:/tmp$ objdump -dr idiv-arm.o
>
> idiv-arm.o: file format elf32-littlearm
>
> Disassembly of section .text:
>
> 00000000 <udiv>:
> 0: fbb0 f0f1 udiv r0, r0, r1
> 4: 4770 bx lr
> 6: bf00 nop
>
> 00000008 <sdiv>:
> 8: fb90 f0f1 sdiv r0, r0, r1
> c: 4770 bx lr
> e: bf00 nop

Your compiler seems to default to thumb so you should add -marm.

--
Måns Rullgård
ma...@mansr.com

Nicolas Pitre

unread,

Nov 22, 2015, 9:36:57 PM11/22/15

to Arnd Bergmann, Russell King - ARM Linux, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On Sun, 22 Nov 2015, Arnd Bergmann wrote:

> I've also found some /proc/cpuinfo output to cross-reference SoCs
> to their core names.
>
> variant part revision name features

> dove: 0 0x581 5 PJ4 idivt

I just managed to boot my dusty Dove DB and ran a quick test programon
it. Its cpuinfo corresponds to the above.

$ cat m.c
#include <stdio.h>
int mydiv(int, int);
int main()
{
printf("div test\n");
printf("%d\n", mydiv(12345678, 37));
return 0;
}
$ cat d.c
int mydiv(int x, int y)
{
return x/y;
}
$ gcc -o test m.c d.c
$ ./test
div test
333666
$ gcc -o test m.c d.c -march=armv7ve -mthumb
$ ./test
div test
333666
$ gcc -o test m.c d.c -march=armv7ve -marm
$ ./test
div test
Illegal instruction (core dumped)
$

Nicolas

Arnd Bergmann

unread,

Nov 23, 2015, 3:15:58 AM11/23/15

to Nicolas Pitre, Russell King - ARM Linux, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

Ok, thanks a lot! So the reporting in /proc/cpuinfo clearly matches
the actual features, and we can just treat this as no LPAE / no IDIV
for kernel compilation, as nobody ever seems to use THUMB2_KERNEL
in practice.

PJ4-MP is like Cortex-A15/A7/A12/A17 and supports both IDIV and LPAE,
which leaves the question whether Scorpion or Krait do the same as
well, or whether they are outliers and need a special configuration.

Arnd

Christopher Covington

unread,

Nov 23, 2015, 9:14:51 AM11/23/15

to Arnd Bergmann, Nicolas Pitre, Russell King - ARM Linux, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

LPAE is only supported in the Krait 450.

http://www.anandtech.com/show/7537/qualcomms-snapdragon-805-25ghz-128bit-memory-interface-d3d11class-graphics-more

I'm pretty sure idiv support came earlier, but I don't have the
specifics on hand.

Regards,
Christopher Covington

--
Qualcomm Innovation Center, Inc.

The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project

Arnd Bergmann

unread,

Nov 23, 2015, 10:33:22 AM11/23/15

to Christopher Covington, Nicolas Pitre, Russell King - ARM Linux, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On Monday 23 November 2015 09:14:39 Christopher Covington wrote:
> On 11/23/2015 03:15 AM, Arnd Bergmann wrote:
> > On Sunday 22 November 2015 21:36:45 Nicolas Pitre wrote:
> >> On Sun, 22 Nov 2015, Arnd Bergmann wrote:
> >
> > Ok, thanks a lot! So the reporting in /proc/cpuinfo clearly matches
> > the actual features, and we can just treat this as no LPAE / no IDIV
> > for kernel compilation, as nobody ever seems to use THUMB2_KERNEL
> > in practice.
> >
> > PJ4-MP is like Cortex-A15/A7/A12/A17 and supports both IDIV and LPAE,
> > which leaves the question whether Scorpion or Krait do the same as
> > well, or whether they are outliers and need a special configuration.
>
> LPAE is only supported in the Krait 450.
>
> http://www.anandtech.com/show/7537/qualcomms-snapdragon-805-25ghz-128bit-memory-interface-d3d11class-graphics-more
>
> I'm pretty sure idiv support came earlier, but I don't have the
> specifics on hand.

I have seen that article, but didn't trust it as a canonical
source of information here.

If you can confirm that it's right, that would mean that we
don't support LPAE on mach-qcom, as the only SoC with Krait 450
seems to be APQ8084, and mainline Linux doesn't run on that.

The ones we do support are MSM8x60 (Scorpion), MSM8960
(Krait-without-number),and MSM7874 (Krait 400). Do those all
support IDIV but not LPAE?

Arnd

Stephen Boyd

unread,

Nov 23, 2015, 3:38:58 PM11/23/15

to Arnd Bergmann, Christopher Covington, Nicolas Pitre, Russell King - ARM Linux, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, arm-mail-list

On 11/23, Arnd Bergmann wrote:
> On Monday 23 November 2015 09:14:39 Christopher Covington wrote:
> > On 11/23/2015 03:15 AM, Arnd Bergmann wrote:
> > > On Sunday 22 November 2015 21:36:45 Nicolas Pitre wrote:
> > >> On Sun, 22 Nov 2015, Arnd Bergmann wrote:
> > >
> > > Ok, thanks a lot! So the reporting in /proc/cpuinfo clearly matches
> > > the actual features, and we can just treat this as no LPAE / no IDIV
> > > for kernel compilation, as nobody ever seems to use THUMB2_KERNEL
> > > in practice.
> > >
> > > PJ4-MP is like Cortex-A15/A7/A12/A17 and supports both IDIV and LPAE,
> > > which leaves the question whether Scorpion or Krait do the same as
> > > well, or whether they are outliers and need a special configuration.
> >
> > LPAE is only supported in the Krait 450.
> >
> > http://www.anandtech.com/show/7537/qualcomms-snapdragon-805-25ghz-128bit-memory-interface-d3d11class-graphics-more
> >
> > I'm pretty sure idiv support came earlier, but I don't have the
> > specifics on hand.
>
> I have seen that article, but didn't trust it as a canonical
> source of information here.
>
> If you can confirm that it's right, that would mean that we
> don't support LPAE on mach-qcom, as the only SoC with Krait 450
> seems to be APQ8084, and mainline Linux doesn't run on that.

arch/arm/boot/dts/qcom-apq8084.dtsi exists in the mainline
kernel. We support more than what's in the Kconfig language
under mach-qcom. And yes LPAE is supported by apq8084 (as is
IDIV). Here's the /proc/cpuinfo on that device.

# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 1 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x3
CPU part : 0x06f
CPU revision : 1

processor : 1
model name : ARMv7 Processor rev 1 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x3
CPU part : 0x06f
CPU revision : 1

processor : 2
model name : ARMv7 Processor rev 1 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x3
CPU part : 0x06f
CPU revision : 1

processor : 3
model name : ARMv7 Processor rev 1 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x3
CPU part : 0x06f
CPU revision : 1

Hardware : Qualcomm (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000

>
> The ones we do support are MSM8x60 (Scorpion), MSM8960
> (Krait-without-number),and MSM7874 (Krait 400). Do those all
> support IDIV but not LPAE?
>

Krait supports IDIV for all versions. Scorpion doesn't support
IDIV or lpae. Here's the output of /proc/cpuinfo on that device.

# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 13.50
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x0
CPU part : 0x02d
CPU revision : 2

processor : 1
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 13.50
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x0
CPU part : 0x02d
CPU revision : 2

Hardware : Qualcomm (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,

a Linux Foundation Collaborative Project

Arnd Bergmann

unread,

Nov 23, 2015, 4:20:14 PM11/23/15

to linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

On Monday 23 November 2015 12:38:47 Stephen Boyd wrote:
> On 11/23, Arnd Bergmann wrote:
> > On Monday 23 November 2015 09:14:39 Christopher Covington wrote:
> > > On 11/23/2015 03:15 AM, Arnd Bergmann wrote:
> > > LPAE is only supported in the Krait 450.
> > >
> > > http://www.anandtech.com/show/7537/qualcomms-snapdragon-805-25ghz-128bit-memory-interface-d3d11class-graphics-more
> > >
> > > I'm pretty sure idiv support came earlier, but I don't have the
> > > specifics on hand.
> >
> > I have seen that article, but didn't trust it as a canonical
> > source of information here.
> >
> > If you can confirm that it's right, that would mean that we
> > don't support LPAE on mach-qcom, as the only SoC with Krait 450
> > seems to be APQ8084, and mainline Linux doesn't run on that.
>
> arch/arm/boot/dts/qcom-apq8084.dtsi exists in the mainline
> kernel. We support more than what's in the Kconfig language
> under mach-qcom.

Ok, cool. I'm sometimes confused by the model numbers, could you
do a separate patch to update the Kconfig help text?

> And yes LPAE is supported by apq8084 (as is
> IDIV). Here's the /proc/cpuinfo on that device.
> # cat /proc/cpuinfo
> processor : 0
> model name : ARMv7 Processor rev 1 (v7l)
> BogoMIPS : 38.40
> Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm

Ok.

> > The ones we do support are MSM8x60 (Scorpion), MSM8960
> > (Krait-without-number),and MSM7874 (Krait 400). Do those all
> > support IDIV but not LPAE?
> >
>
> Krait supports IDIV for all versions. Scorpion doesn't support
> IDIV or lpae. Here's the output of /proc/cpuinfo on that device.
>
> # cat /proc/cpuinfo
> processor : 0
> model name : ARMv7 Processor rev 2 (v7l)
> BogoMIPS : 13.50
> Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
> CPU implementer : 0x51
> CPU architecture: 7
> CPU variant : 0x0
> CPU part : 0x02d
> CPU revision : 2

Ok, that leaves just one missing puzzle piece: can you confirm that
no supported Krait variant other than Krait 450 / apq8084 has LPAE?

Arnd

Stephen Boyd

unread,

Nov 23, 2015, 4:32:20 PM11/23/15

to Arnd Bergmann, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

On 11/23, Arnd Bergmann wrote:
> On Monday 23 November 2015 12:38:47 Stephen Boyd wrote:
> > On 11/23, Arnd Bergmann wrote:
> > > On Monday 23 November 2015 09:14:39 Christopher Covington wrote:
> > > > On 11/23/2015 03:15 AM, Arnd Bergmann wrote:
> > > > LPAE is only supported in the Krait 450.
> > > >
> > > > http://www.anandtech.com/show/7537/qualcomms-snapdragon-805-25ghz-128bit-memory-interface-d3d11class-graphics-more
> > > >
> > > > I'm pretty sure idiv support came earlier, but I don't have the
> > > > specifics on hand.
> > >
> > > I have seen that article, but didn't trust it as a canonical
> > > source of information here.
> > >
> > > If you can confirm that it's right, that would mean that we
> > > don't support LPAE on mach-qcom, as the only SoC with Krait 450
> > > seems to be APQ8084, and mainline Linux doesn't run on that.
> >
> > arch/arm/boot/dts/qcom-apq8084.dtsi exists in the mainline
> > kernel. We support more than what's in the Kconfig language
> > under mach-qcom.
>
> Ok, cool. I'm sometimes confused by the model numbers, could you
> do a separate patch to update the Kconfig help text?

What did you have in mind? I'm also confused by the model numbers
so I don't know how helpful I will be.

It would be nice to drop the ARCH_MSM* configs entirely. If we
could select the right timers from kconfig without using selects
then we could drop them. Or we could just select both types of
timers when building qcom platforms.

>
> > > The ones we do support are MSM8x60 (Scorpion), MSM8960
> > > (Krait-without-number),and MSM7874 (Krait 400). Do those all
> > > support IDIV but not LPAE?
> > >
> >
> > Krait supports IDIV for all versions. Scorpion doesn't support
> > IDIV or lpae. Here's the output of /proc/cpuinfo on that device.
> >
> > # cat /proc/cpuinfo
> > processor : 0
> > model name : ARMv7 Processor rev 2 (v7l)
> > BogoMIPS : 13.50
> > Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
> > CPU implementer : 0x51
> > CPU architecture: 7
> > CPU variant : 0x0
> > CPU part : 0x02d
> > CPU revision : 2
>
> Ok, that leaves just one missing puzzle piece: can you confirm that
> no supported Krait variant other than Krait 450 / apq8084 has LPAE?
>

Right, apq8084 is the only SoC with a Krait CPU that supports
LPAE.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Arnd Bergmann

unread,

Nov 23, 2015, 4:57:59 PM11/23/15

to Stephen Boyd, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano

Ok, dropping the specific Kconfig entries is actually an awesome
idea, as it completely solves the other problem as well, more on
that below.

In that case, don't worry about listing all the models, once
we stop listing a subset of them, the confusion is already
reduced by the fact that one has to look at the .dts files
so see which models we support, and I assume there will be
additional ones coming in for at least a few more years (before
you stop caring about 32-bit MSM and compatibles).

Regarding the timers:
HAVE_ARM_ARCH_TIMER is already user-selectable, so maybe something
like

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index b251013eef0a..bad6343c34d5 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -324,8 +324,9 @@ config EM_TIMER_STI
such as EMEV2 from former NEC Electronics.

config CLKSRC_QCOM
- bool "Qualcomm MSM timer" if COMPILE_TEST
+ bool "Qualcomm MSM timer" if ARCH_QCOM || COMPILE_TEST
depends on ARM
+ default ARCH_QCOM
select CLKSRC_OF
help
This enables the clocksource and the per CPU clockevent driver for the

would make both of them equally configurable and not clutter up
the Kconfig file when ARCH_QCOM is not selected. I've added
Daniel Lezcano to Cc, he probably has an opinion on this too.

> > > > The ones we do support are MSM8x60 (Scorpion), MSM8960
> > > > (Krait-without-number),and MSM7874 (Krait 400). Do those all
> > > > support IDIV but not LPAE?
> > > >
> > >
> > > Krait supports IDIV for all versions. Scorpion doesn't support
> > > IDIV or lpae. Here's the output of /proc/cpuinfo on that device.
> > >
> > > # cat /proc/cpuinfo
> > > processor : 0
> > > model name : ARMv7 Processor rev 2 (v7l)
> > > BogoMIPS : 13.50
> > > Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
> > > CPU implementer : 0x51
> > > CPU architecture: 7
> > > CPU variant : 0x0
> > > CPU part : 0x02d
> > > CPU revision : 2
> >
> > Ok, that leaves just one missing puzzle piece: can you confirm that
> > no supported Krait variant other than Krait 450 / apq8084 has LPAE?
> >
>
> Right, apq8084 is the only SoC with a Krait CPU that supports
> LPAE.

Ok, thanks for the confirmation.

Summarizing what we've found, I think we can get away with just
introducing two Kconfig symbols ARCH_MULTI_V7VE and CPU_V7VE.
Most CPUs fall clearly into one category or the other, and then
we can allow LPAE to be selected for V7VE-only build but not
for plain V7, and we can unconditionally build the kernel with

arch-$(CONFIG_CPU_32v7VE) = -D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7ve,-march=armv7-a -mcpu=cortex-a15)

This works perfectly for Cortex-A5, -A8, -A9, -A12, -A15, -A17, Brahma-B15,
PJ4B-MP, Scorpion and Krait-450, which all clearly fall into one of
the two other categories.

The two exceptions that don't quite fit are still "good enough":

- PJ4/PJ4B (not PJ4B-MP) has a different custom opcode for udiv and sdiv
in ARM mode. We don't support that with true multiplatform kernels
because those opcodes work nowhere else, though with your proposed
series we could easily do that for dynamic patching.

- Krait (pre-450) won't run kernels with LPAE disabled, but if we only
have one global ARCH_QCOM option that can be enabled for both
ARCH_MULTI_V7VE and ARCH_MULTI_V7, we still win: a mach-qcom
kernel with only ARCH_MULTI_V7VE will use IDIV by default, and
give you the option to enable LPAE. If you pick LPAE, it will
still work fine on Krait-450 but not the older ones, and that is
a user error. If you enable ARCH_MULTI_V7 / CPU_V7, you get neither
LPAE nor IDIV, and the kernel will be able to run on both Scorpion
and Krait, as long as you have the right drivers too.

Arnd

Stephen Boyd

unread,

Nov 23, 2015, 6:14:03 PM11/23/15

to Arnd Bergmann, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano

Yeah I think that architected timers are an outlier. I recall
some words from John Stultz that platforms should select the
clocksources they use, but maybe things have changed. For this
kind of thing I wouldn't mind putting it in the defconfig though.
I'll put the patches on the list to get the discussion started.

Do you have the information on these custom opcodes? I can work
that into the patches assuming the MIDR is different.

>
> - Krait (pre-450) won't run kernels with LPAE disabled, but if we only
> have one global ARCH_QCOM option that can be enabled for both
> ARCH_MULTI_V7VE and ARCH_MULTI_V7, we still win: a mach-qcom
> kernel with only ARCH_MULTI_V7VE will use IDIV by default, and
> give you the option to enable LPAE. If you pick LPAE, it will
> still work fine on Krait-450 but not the older ones, and that is
> a user error. If you enable ARCH_MULTI_V7 / CPU_V7, you get neither
> LPAE nor IDIV, and the kernel will be able to run on both Scorpion
> and Krait, as long as you have the right drivers too.
>

So if I have built mach-qcom with ARCH_MULTI_V7VE won't I get a
kernel that uses idiv instructions that could be run on Scorpion,
where the instruction doesn't exist? Or is that a user error
again like picking LPAE?

It seems fine to me to go ahead with this approach. Should I take
care of cooking up the patches? I can package this all up into a
series that adds the new CPU type, updates the affected
platforms, and layers the runtime patching on top when plain V7
is a selected CPU type.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Stephen Boyd

unread,

Nov 23, 2015, 7:13:17 PM11/23/15

to Arnd Bergmann, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano

On 11/23, Arnd Bergmann wrote:
>
> Ok, thanks for the confirmation.
>
> Summarizing what we've found, I think we can get away with just
> introducing two Kconfig symbols ARCH_MULTI_V7VE and CPU_V7VE.
> Most CPUs fall clearly into one category or the other, and then
> we can allow LPAE to be selected for V7VE-only build but not
> for plain V7, and we can unconditionally build the kernel with
>
> arch-$(CONFIG_CPU_32v7VE) = -D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7ve,-march=armv7-a -mcpu=cortex-a15)
>

This causes compiler spew for me:

warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch

Removing -march=armv7-a from there makes it quiet.

Also, it's sort of feels wrong to have -mcpu in a place where
we're exclusively doing -march. Perhaps the fallback should be
bog standard -march=armv7-a? (or the fallback for that one
"-march=armv5t -Wa$(comma)-march=armv7-a")?

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Stephen Boyd

unread,

Nov 24, 2015, 3:54:06 AM11/24/15

to Arnd Bergmann, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, linux-ar...@lists.infradead.org

On 11/23, Stephen Boyd wrote:
> On 11/23, Arnd Bergmann wrote:
> >
> > Ok, thanks for the confirmation.
> >
> > Summarizing what we've found, I think we can get away with just
> > introducing two Kconfig symbols ARCH_MULTI_V7VE and CPU_V7VE.
> > Most CPUs fall clearly into one category or the other, and then
> > we can allow LPAE to be selected for V7VE-only build but not
> > for plain V7, and we can unconditionally build the kernel with
> >
> > arch-$(CONFIG_CPU_32v7VE) = -D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7ve,-march=armv7-a -mcpu=cortex-a15)
> >
>
> This causes compiler spew for me:
>
> warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch
>
> Removing -march=armv7-a from there makes it quiet.
>
> Also, it's sort of feels wrong to have -mcpu in a place where
> we're exclusively doing -march. Perhaps the fallback should be
> bog standard -march=armv7-a? (or the fallback for that one
> "-march=armv5t -Wa$(comma)-march=armv7-a")?
>

And adding CPU_V7VE causes a cascade of changes to wherever
CPU_V7 is being used today. Here's the patch I currently have,
without the platform changes:

---8<----
arch/arm/Kconfig | 68 +++++++++++++++++++++-----------------
arch/arm/Kconfig-nommu | 2 +-
arch/arm/Makefile | 1 +
arch/arm/boot/compressed/head.S | 2 +-
arch/arm/boot/compressed/misc.c | 2 +-
arch/arm/include/asm/cacheflush.h | 2 +-
arch/arm/include/asm/glue-cache.h | 2 +-
arch/arm/include/asm/glue-proc.h | 2 +-
arch/arm/include/asm/switch_to.h | 2 +-
arch/arm/include/debug/icedcc.S | 2 +-
arch/arm/kernel/entry-armv.S | 6 ++--
arch/arm/kernel/perf_event_v7.c | 4 +--
arch/arm/kvm/Kconfig | 2 +-
arch/arm/mm/Kconfig | 41 ++++++++++++++++-------
arch/arm/mm/Makefile | 1 +
arch/arm/probes/kprobes/test-arm.c | 2 +-
drivers/bus/Kconfig | 6 ++--
17 files changed, 86 insertions(+), 61 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9e2d2adcc85b..ccd0d5553d38 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -32,7 +32,7 @@ config ARM
select HANDLE_DOMAIN_IRQ
select HARDIRQS_SW_RESEND
select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
- select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
+ select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7 || CPU_32_v7VE) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
@@ -46,12 +46,12 @@ config ARM
select HAVE_DMA_ATTRS
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL) && !CPU_ENDIAN_BE32
- select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
+ select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7VE) && MMU
select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
select HAVE_FUNCTION_GRAPH_TRACER if (!THUMB2_KERNEL)
select HAVE_FUNCTION_TRACER if (!XIP_KERNEL)
select HAVE_GENERIC_DMA_COHERENT
- select HAVE_HW_BREAKPOINT if (PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7))
+ select HAVE_HW_BREAKPOINT if (PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7VE))
select HAVE_IDE if PCI || ISA || PCMCIA
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KERNEL_GZIP
@@ -805,6 +805,12 @@ config ARCH_MULTI_V7
select CPU_V7
select HAVE_SMP

+config ARCH_MULTI_V7VE
+ bool "ARMv7 w/ virtualization extensions based platforms (Cortex-A, PJ4-MP, Krait)"
+ select ARCH_MULTI_V6_V7
+ select CPU_V7VE
+ select HAVE_SMP
+
config ARCH_MULTI_V6_V7
bool
select MIGHT_HAVE_CACHE_L2X0
@@ -1069,7 +1075,7 @@ config ARM_ERRATA_411920

config ARM_ERRATA_430973
bool "ARM errata: Stale prediction on replaced interworking branch"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
help
This option enables the workaround for the 430973 Cortex-A8
r1p* erratum. If a code sequence containing an ARM/Thumb
@@ -1085,7 +1091,7 @@ config ARM_ERRATA_430973

config ARM_ERRATA_458693
bool "ARM errata: Processor deadlock when a false hazard is created"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
depends on !ARCH_MULTIPLATFORM
help
This option enables the workaround for the 458693 Cortex-A8 (r2p0)
@@ -1099,7 +1105,7 @@ config ARM_ERRATA_458693

config ARM_ERRATA_460075
bool "ARM errata: Data written to the L2 cache can be overwritten with stale data"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
depends on !ARCH_MULTIPLATFORM
help
This option enables the workaround for the 460075 Cortex-A8 (r2p0)
@@ -1112,7 +1118,7 @@ config ARM_ERRATA_460075

config ARM_ERRATA_742230
bool "ARM errata: DMB operation may be faulty"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
depends on !ARCH_MULTIPLATFORM
help
This option enables the workaround for the 742230 Cortex-A9
@@ -1125,7 +1131,7 @@ config ARM_ERRATA_742230

config ARM_ERRATA_742231
bool "ARM errata: Incorrect hazard handling in the SCU may lead to data corruption"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
depends on !ARCH_MULTIPLATFORM
help
This option enables the workaround for the 742231 Cortex-A9
@@ -1140,7 +1146,7 @@ config ARM_ERRATA_742231

config ARM_ERRATA_643719
bool "ARM errata: LoUIS bit field in CLIDR register is incorrect"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
default y
help
This option enables the workaround for the 643719 Cortex-A9 (prior to
@@ -1151,7 +1157,7 @@ config ARM_ERRATA_643719

config ARM_ERRATA_720789
bool "ARM errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
help
This option enables the workaround for the 720789 Cortex-A9 (prior to
r2p0) erratum. A faulty ASID can be sent to the other CPUs for the
@@ -1163,7 +1169,7 @@ config ARM_ERRATA_720789

config ARM_ERRATA_743622
bool "ARM errata: Faulty hazard checking in the Store Buffer may lead to data corruption"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
depends on !ARCH_MULTIPLATFORM
help
This option enables the workaround for the 743622 Cortex-A9
@@ -1177,7 +1183,7 @@ config ARM_ERRATA_743622

config ARM_ERRATA_751472
bool "ARM errata: Interrupted ICIALLUIS may prevent completion of broadcasted operation"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
depends on !ARCH_MULTIPLATFORM
help
This option enables the workaround for the 751472 Cortex-A9 (prior
@@ -1188,7 +1194,7 @@ config ARM_ERRATA_751472

config ARM_ERRATA_754322
bool "ARM errata: possible faulty MMU translations following an ASID switch"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
help
This option enables the workaround for the 754322 Cortex-A9 (r2p*,
r3p*) erratum. A speculative memory access may cause a page table walk
@@ -1199,7 +1205,7 @@ config ARM_ERRATA_754322

config ARM_ERRATA_754327
bool "ARM errata: no automatic Store Buffer drain"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
help
This option enables the workaround for the 754327 Cortex-A9 (prior to
r2p0) erratum. The Store Buffer does not have any automatic draining
@@ -1222,7 +1228,7 @@ config ARM_ERRATA_364296

config ARM_ERRATA_764369
bool "ARM errata: Data cache line maintenance operation by MVA may not succeed"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
help
This option enables the workaround for erratum 764369
affecting Cortex-A9 MPCore with two or more processors (all
@@ -1236,7 +1242,7 @@ config ARM_ERRATA_764369

config ARM_ERRATA_775420
bool "ARM errata: A data cache maintenance operation which aborts, might lead to deadlock"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
help
This option enables the workaround for the 775420 Cortex-A9 (r2p2,
r2p6,r2p8,r2p10,r3p0) erratum. In case a date cache maintenance
@@ -1246,7 +1252,7 @@ config ARM_ERRATA_775420

config ARM_ERRATA_798181
bool "ARM errata: TLBI/DSB failure on Cortex-A15"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
help
On Cortex-A15 (r0p0..r3p2) the TLBI*IS/DSB operations are not
adequately shooting down all use of the old entries. This
@@ -1256,7 +1262,7 @@ config ARM_ERRATA_798181

config ARM_ERRATA_773022
bool "ARM errata: incorrect instructions may be executed from loop buffer"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
help
This option enables the workaround for the 773022 Cortex-A15
(up to r0p4) erratum. In certain rare sequences of code, the
@@ -1337,7 +1343,7 @@ config HAVE_SMP

config SMP
bool "Symmetric Multi-Processing"
- depends on CPU_V6K || CPU_V7
+ depends on CPU_V6K || CPU_V7 || CPU_V7VE
depends on GENERIC_CLOCKEVENTS
depends on HAVE_SMP
depends on MMU || ARM_MPU
@@ -1373,7 +1379,7 @@ config SMP_ON_UP

config ARM_CPU_TOPOLOGY
bool "Support cpu topology definition"
- depends on SMP && CPU_V7
+ depends on SMP && (CPU_V7 || CPU_V7VE)
default y
help
Support ARM cpu topology definition. The MPIDR register defines
@@ -1403,7 +1409,7 @@ config HAVE_ARM_SCU

config HAVE_ARM_ARCH_TIMER
bool "Architected timer support"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
select ARM_ARCH_TIMER
select GENERIC_CLOCKEVENTS
help
@@ -1417,7 +1423,7 @@ config HAVE_ARM_TWD

config MCPM
bool "Multi-Cluster Power Management"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
help
This option provides the common power management infrastructure
for (multi-)cluster based systems, such as big.LITTLE based
@@ -1434,7 +1440,7 @@ config MCPM_QUAD_CLUSTER

config BIG_LITTLE
bool "big.LITTLE support (Experimental)"
- depends on CPU_V7 && SMP
+ depends on (CPU_V7 || CPU_V7VE) && SMP
select MCPM
help
This option enables support selections for the big.LITTLE
@@ -1501,7 +1507,7 @@ config HOTPLUG_CPU

config ARM_PSCI
bool "Support for the ARM Power State Coordination Interface (PSCI)"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
select ARM_PSCI_FW
help
Say Y here if you want Linux to communicate with system firmware
@@ -1579,7 +1585,7 @@ config SCHED_HRTICK

config THUMB2_KERNEL
bool "Compile the kernel in Thumb-2 mode" if !CPU_THUMBONLY
- depends on (CPU_V7 || CPU_V7M) && !CPU_V6 && !CPU_V6K
+ depends on (CPU_V7 || CPU_V7VE || CPU_V7M) && !CPU_V6 && !CPU_V6K
default y if CPU_THUMBONLY
select AEABI
select ARM_ASM_UNIFIED
@@ -1642,7 +1648,7 @@ config AEABI

config ARM_PATCH_UIDIV
bool "Runtime patch calls to __aeabi_{u}idiv() with udiv/sdiv"
- depends on CPU_V7 && !XIP_KERNEL && AEABI
+ depends on CPU_32v7 && !XIP_KERNEL && AEABI
help
Some v7 CPUs have support for the udiv and sdiv instructions
that can be used in place of calls to __aeabi_uidiv and __aeabi_idiv
@@ -1843,7 +1849,7 @@ config XEN_DOM0
config XEN
bool "Xen guest support on ARM"
depends on ARM && AEABI && OF
- depends on CPU_V7 && !CPU_V6
+ depends on (CPU_V7 || CPU_V7VE) && !CPU_V6
depends on !GENERIC_ATOMIC64
depends on MMU
select ARCH_DMA_ADDR_T_64BIT
@@ -2132,7 +2138,7 @@ config FPE_FASTFPE

config VFP
bool "VFP-format floating point maths"
- depends on CPU_V6 || CPU_V6K || CPU_ARM926T || CPU_V7 || CPU_FEROCEON
+ depends on CPU_V6 || CPU_V6K || CPU_ARM926T || CPU_V7 || CPU_V7VE || CPU_FEROCEON
help
Say Y to include VFP support code in the kernel. This is needed
if your hardware includes a VFP unit.
@@ -2145,11 +2151,11 @@ config VFP
config VFPv3
bool
depends on VFP
- default y if CPU_V7
+ default y if CPU_V7 || CPU_V7VE

config NEON
bool "Advanced SIMD (NEON) Extension support"
- depends on VFPv3 && CPU_V7
+ depends on VFPv3 && (CPU_V7 || CPU_V7VE)
help
Say Y to include support code for NEON, the ARMv7 Advanced SIMD
Extension.
@@ -2174,7 +2180,7 @@ source "kernel/power/Kconfig"

config ARCH_SUSPEND_POSSIBLE
depends on CPU_ARM920T || CPU_ARM926T || CPU_FEROCEON || CPU_SA1100 || \
- CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7M || CPU_XSC3 || CPU_XSCALE || CPU_MOHAWK
+ CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7M || CPU_V7VE || CPU_XSC3 || CPU_XSCALE || CPU_MOHAWK
def_bool y

config ARM_CPU_SUSPEND
diff --git a/arch/arm/Kconfig-nommu b/arch/arm/Kconfig-nommu
index aed66d5df7f1..aa04be6b29b9 100644
--- a/arch/arm/Kconfig-nommu
+++ b/arch/arm/Kconfig-nommu
@@ -53,7 +53,7 @@ config REMAP_VECTORS_TO_RAM

config ARM_MPU
bool 'Use the ARM v7 PMSA Compliant MPU'
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
default y
help
Some ARM systems without an MMU have instead a Memory Protection
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 2c2b28ee4811..c553862e26c8 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -67,6 +67,7 @@ KBUILD_CFLAGS += $(call cc-option,-fno-ipa-sra)
# macro, but instead defines a whole series of macros which makes
# testing for a specific architecture or later rather impossible.
arch-$(CONFIG_CPU_32v7M) =-D__LINUX_ARM_ARCH__=7 -march=armv7-m -Wa,-march=armv7-m
+arch-$(CONFIG_CPU_32v7VE) =-D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7ve,-mcpu=cortex-a15)
arch-$(CONFIG_CPU_32v7) =-D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7-a,-march=armv5t -Wa$(comma)-march=armv7-a)
arch-$(CONFIG_CPU_32v6) =-D__LINUX_ARM_ARCH__=6 $(call cc-option,-march=armv6,-march=armv5t -Wa$(comma)-march=armv6)
# Only override the compiler option if ARMv6. The ARMv6K extensions are
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 06e983f59980..e51ef838947c 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -26,7 +26,7 @@

#if defined(CONFIG_DEBUG_ICEDCC)

-#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7)
+#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7) || defined (CONFIG_CPU_V7VE)
.macro loadsp, rb, tmp
.endm
.macro writeb, ch, rb
diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index d4f891f56996..0e0300c25008 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -29,7 +29,7 @@ extern void error(char *x);

#ifdef CONFIG_DEBUG_ICEDCC

-#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7)
+#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE)

static void icedcc_putc(int ch)
{
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index d5525bfc7e3e..ff5e71c45809 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -193,7 +193,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
* Optimized __flush_icache_all for the common cases. Note that UP ARMv7
* will fall through to use __flush_icache_all_generic.
*/
-#if (defined(CONFIG_CPU_V7) && \
+#if ((defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE)) && \
(defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K))) || \
defined(CONFIG_SMP_ON_UP)
#define __flush_icache_preferred __cpuc_flush_icache_all
diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h
index cab07f69382d..4a2f076dbcc3 100644
--- a/arch/arm/include/asm/glue-cache.h
+++ b/arch/arm/include/asm/glue-cache.h
@@ -109,7 +109,7 @@
# endif
#endif

-#if defined(CONFIG_CPU_V7)
+#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE)
# ifdef _CACHE
# define MULTI_CACHE 1
# else
diff --git a/arch/arm/include/asm/glue-proc.h b/arch/arm/include/asm/glue-proc.h
index 74be7c22035a..345a32137117 100644
--- a/arch/arm/include/asm/glue-proc.h
+++ b/arch/arm/include/asm/glue-proc.h
@@ -239,7 +239,7 @@
# endif
#endif

-#ifdef CONFIG_CPU_V7
+#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE)
/*
* Cortex-A9 needs a different suspend/resume function, so we need
* multiple CPU support for ARMv7 anyway.
diff --git a/arch/arm/include/asm/switch_to.h b/arch/arm/include/asm/switch_to.h
index 12ebfcc1d539..79fd3fc09d45 100644
--- a/arch/arm/include/asm/switch_to.h
+++ b/arch/arm/include/asm/switch_to.h
@@ -9,7 +9,7 @@
* to ensure that the maintenance completes in case we migrate to another
* CPU.
*/
-#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7)
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) && (defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE))
#define __complete_pending_tlbi() dsb(ish)
#else
#define __complete_pending_tlbi()
diff --git a/arch/arm/include/debug/icedcc.S b/arch/arm/include/debug/icedcc.S
index 43afcb021fa3..6b3b2d2f3694 100644
--- a/arch/arm/include/debug/icedcc.S
+++ b/arch/arm/include/debug/icedcc.S
@@ -14,7 +14,7 @@
.macro addruart, rp, rv, tmp
.endm

-#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7)
+#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE)

.macro senduart, rd, rx
mcr p14, 0, \rd, c0, c5, 0
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 3ce377f7251f..317de38c357e 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -504,7 +504,7 @@ __und_usr:
__und_usr_thumb:
@ Thumb instruction
sub r4, r2, #2 @ First half of thumb instr at LR - 2
-#if CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && CONFIG_CPU_V7
+#if CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && (CONFIG_CPU_V7 || CONFIG_CPU_V7VE)
/*
* Thumb-2 instruction handling. Note that because pre-v6 and >= v6 platforms
* can never be supported in a single kernel, this code is not applicable at
@@ -549,7 +549,7 @@ ARM_BE8(rev16 r0, r0) @ little endian instruction
.arch armv6
#endif
#endif /* __LINUX_ARM_ARCH__ < 7 */
-#else /* !(CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && CONFIG_CPU_V7) */
+#else /* !(CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && (CONFIG_CPU_V7 || CONFIG_CPU_V7VE)) */
b __und_usr_fault_16
#endif
UNWIND(.fnend)
@@ -565,7 +565,7 @@ ENDPROC(__und_usr)
.popsection
.pushsection __ex_table,"a"
.long 1b, 4b
-#if CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && CONFIG_CPU_V7
+#if CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && (CONFIG_CPU_V7 || CONFIG_CPU_V7VE)
.long 2b, 4b
.long 3b, 4b
#endif
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 126dc679b230..6c3c4b269e90 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -16,7 +16,7 @@
* counter and all 4 performance counters together can be reset separately.
*/

-#ifdef CONFIG_CPU_V7
+#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE)

#include <asm/cp15.h>
#include <asm/cputype.h>
@@ -1900,4 +1900,4 @@ static int __init register_armv7_pmu_driver(void)
return platform_driver_register(&armv7_pmu_driver);
}
device_initcall(register_armv7_pmu_driver);
-#endif /* CONFIG_CPU_V7 */
+#endif /* CONFIG_CPU_V7 || CONFIG_CPU_V7VE */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 95a000515e43..ea62ada144b1 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -5,7 +5,7 @@
source "virt/kvm/Kconfig"

menuconfig VIRTUALIZATION
- bool "Virtualization"
+ bool "Virtualization" if CPU_V7VE
---help---
Say Y here to get to see options for using your Linux host to run
other operating systems inside virtual machines (guests).
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index c21941349b3e..e4ff161da98f 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -407,6 +407,21 @@ config CPU_V7M
select CPU_PABRT_LEGACY
select CPU_THUMBONLY

+# ARMv7ve
+config CPU_V7VE
+ bool "Support ARM V7 processor w/ virtualization extensions" if (!ARCH_MULTIPLATFORM || ARCH_MULTI_V7VE) && (ARCH_INTEGRATOR || MACH_REALVIEW_EB || MACH_REALVIEW_PBX)
+ select CPU_32v6K
+ select CPU_32v7VE
+ select CPU_ABRT_EV7
+ select CPU_CACHE_V7
+ select CPU_CACHE_VIPT
+ select CPU_COPY_V6 if MMU
+ select CPU_CP15_MMU if MMU
+ select CPU_CP15_MPU if !MMU
+ select CPU_HAS_ASID if MMU
+ select CPU_PABRT_V7
+ select CPU_TLB_V7 if MMU
+
config CPU_THUMBONLY
bool
# There are no CPUs available with MMU that don't implement an ARM ISA:
@@ -450,6 +465,9 @@ config CPU_32v6K
config CPU_32v7
bool

+config CPU_32v7VE
+ bool
+
config CPU_32v7M
bool

@@ -626,8 +644,7 @@ comment "Processor Features"

config ARM_LPAE
bool "Support for the Large Physical Address Extension"
- depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
- !CPU_32v4 && !CPU_32v3
+ depends on MMU && CPU_32v7VE
help
Say Y if you have an ARMv7 processor supporting the LPAE page
table format and you would like to access memory beyond the
@@ -652,7 +669,7 @@ config ARM_THUMB
CPU_ARM925T || CPU_ARM926T || CPU_ARM940T || CPU_ARM946E || \
CPU_ARM1020 || CPU_ARM1020E || CPU_ARM1022 || CPU_ARM1026 || \
CPU_XSCALE || CPU_XSC3 || CPU_MOHAWK || CPU_V6 || CPU_V6K || \
- CPU_V7 || CPU_FEROCEON || CPU_V7M
+ CPU_V7 || CPU_FEROCEON || CPU_V7M || CPU_V7VE
default y
help
Say Y if you want to include kernel support for running user space
@@ -666,7 +683,7 @@ config ARM_THUMB

config ARM_THUMBEE
bool "Enable ThumbEE CPU extension"
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
help
Say Y here if you have a CPU with the ThumbEE extension and code to
make use of it. Say N for code that can run on CPUs without ThumbEE.
@@ -674,7 +691,7 @@ config ARM_THUMBEE
config ARM_VIRT_EXT
bool
depends on MMU
- default y if CPU_V7
+ default y if CPU_V7VE
help
Enable the kernel to make use of the ARM Virtualization
Extensions to install hypervisors without run-time firmware
@@ -686,7 +703,7 @@ config ARM_VIRT_EXT

config SWP_EMULATE
bool "Emulate SWP/SWPB instructions" if !SMP
- depends on CPU_V7
+ depends on CPU_V7 || CPU_V7VE
default y if SMP
select HAVE_PROC_CPU if PROC_FS
help
@@ -723,7 +740,7 @@ config CPU_BIG_ENDIAN
config CPU_ENDIAN_BE8
bool
depends on CPU_BIG_ENDIAN
- default CPU_V6 || CPU_V6K || CPU_V7
+ default CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7VE
help
Support for the BE-8 (big-endian) mode on ARMv6 and ARMv7 processors.

@@ -789,7 +806,7 @@ config CPU_CACHE_ROUND_ROBIN

config CPU_BPREDICT_DISABLE
bool "Disable branch prediction"
- depends on CPU_ARM1020 || CPU_V6 || CPU_V6K || CPU_MOHAWK || CPU_XSC3 || CPU_V7 || CPU_FA526
+ depends on CPU_ARM1020 || CPU_V6 || CPU_V6K || CPU_MOHAWK || CPU_XSC3 || CPU_V7 || CPU_V7VE || CPU_FA526
help
Say Y here to disable branch prediction. If unsure, say N.

@@ -835,7 +852,7 @@ config KUSER_HELPERS

config VDSO
bool "Enable VDSO for acceleration of some system calls"
- depends on AEABI && MMU && CPU_V7
+ depends on AEABI && MMU && (CPU_V7 || CPU_V7VE)
default y if ARM_ARCH_TIMER
select GENERIC_TIME_VSYSCALL
help
@@ -984,7 +1001,7 @@ config CACHE_XSC3L2

config ARM_L1_CACHE_SHIFT_6
bool
- default y if CPU_V7
+ default y if CPU_V7 || CPU_V7VE
help
Setting ARM L1 cache line size to 64 Bytes.

@@ -994,10 +1011,10 @@ config ARM_L1_CACHE_SHIFT
default 5

config ARM_DMA_MEM_BUFFERABLE
- bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !CPU_V7
+ bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !(CPU_V7 || CPU_V7VE)
depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \
MACH_REALVIEW_PB11MP)
- default y if CPU_V6 || CPU_V6K || CPU_V7
+ default y if CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7VE
help
Historically, the kernel has used strongly ordered mappings to
provide DMA coherent memory. With the advent of ARMv7, mapping
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 57c8df500e8c..4f542b29137c 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -93,6 +93,7 @@ obj-$(CONFIG_CPU_FEROCEON) += proc-feroceon.o
obj-$(CONFIG_CPU_V6) += proc-v6.o
obj-$(CONFIG_CPU_V6K) += proc-v6.o
obj-$(CONFIG_CPU_V7) += proc-v7.o
+obj-$(CONFIG_CPU_V7VE) += proc-v7.o
obj-$(CONFIG_CPU_V7M) += proc-v7m.o

AFLAGS_proc-v6.o :=-Wa,-march=armv6
diff --git a/arch/arm/probes/kprobes/test-arm.c b/arch/arm/probes/kprobes/test-arm.c
index 8866aedfdea2..c2591b8b8718 100644
--- a/arch/arm/probes/kprobes/test-arm.c
+++ b/arch/arm/probes/kprobes/test-arm.c
@@ -192,7 +192,7 @@ void kprobe_arm_test_cases(void)
TEST_BF_R ("mov pc, r",0,2f,"")
TEST_BF_R ("add pc, pc, r",14,(2f-1f-8)*2,", asr #1")
TEST_BB( "sub pc, pc, #1b-2b+8")
-#if __LINUX_ARM_ARCH__ == 6 && !defined(CONFIG_CPU_V7)
+#if __LINUX_ARM_ARCH__ == 6 && !(defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7VE))
TEST_BB( "sub pc, pc, #1b-2b+8-2") /* UNPREDICTABLE before and after ARMv6 */
#endif
TEST_BB_R( "sub pc, pc, r",14, 1f-2f+8,"")
diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 0ebca8ba7bc4..33fa47dc03a8 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -17,7 +17,7 @@ config ARM_CCI400_COMMON

config ARM_CCI400_PMU
bool "ARM CCI400 PMU support"
- depends on (ARM && CPU_V7) || ARM64
+ depends on (ARM && (CPU_V7 || CPU_V7VE)) || ARM64
depends on PERF_EVENTS
select ARM_CCI400_COMMON
select ARM_CCI_PMU
@@ -28,7 +28,7 @@ config ARM_CCI400_PMU

config ARM_CCI400_PORT_CTRL
bool
- depends on ARM && OF && CPU_V7
+ depends on ARM && OF && (CPU_V7 || CPU_V7VE)
select ARM_CCI400_COMMON
help
Low level power management driver for CCI400 cache coherent
@@ -36,7 +36,7 @@ config ARM_CCI400_PORT_CTRL

config ARM_CCI500_PMU
bool "ARM CCI500 PMU support"
- depends on (ARM && CPU_V7) || ARM64
+ depends on (ARM && (CPU_V7 || CPU_V7VE)) || ARM64
depends on PERF_EVENTS
select ARM_CCI_PMU
help

Arnd Bergmann

unread,

Nov 24, 2015, 5:18:17 AM11/24/15

to Stephen Boyd, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano, Thomas Petazzoni

Ok, thanks!

> > This works perfectly for Cortex-A5, -A8, -A9, -A12, -A15, -A17, Brahma-B15,
> > PJ4B-MP, Scorpion and Krait-450, which all clearly fall into one of
> > the two other categories.
> >
> > The two exceptions that don't quite fit are still "good enough":
> >
> > - PJ4/PJ4B (not PJ4B-MP) has a different custom opcode for udiv and sdiv
> > in ARM mode. We don't support that with true multiplatform kernels
> > because those opcodes work nowhere else, though with your proposed
> > series we could easily do that for dynamic patching.
>
> Do you have the information on these custom opcodes? I can work
> that into the patches assuming the MIDR is different.

Thomas Petazzoni said this in a private mail:

| According to the datasheet, the PJ4B has integer signed and unsigned
| divide, similar to the sdiv and udiv ARM instructions. But the way to
| access it is by doing a MRC instruction.
|
| MRC<cond> p6, 1, Rd , CRn , CRm, 4
|
|for PJ4B is the same as:
|
| SDIV Rd , Rn, Rm
|
| on ARM cores.
|
|And:
|
| MRC<cond> p6, 1, Rd , CRn , CRm, 0
|
|for PJ4B is the same as:
|
| UDIV Rd , Rn, Rm
|
|on ARM cores.
|
|This is documented in the "Extended instructions" section of the
|PJ4B datasheet.

I assume what he meant was that this is true for both PJ4 and PJ4B
but not for PJ4B-MP, which has the normal udiv/sdiv instructions.

IOW, anything with CPU implementer 0x56 part 0x581 should use those,
while part 0x584 can use the sdiv/udiv that it reports correctly.

> > - Krait (pre-450) won't run kernels with LPAE disabled, but if we only
> > have one global ARCH_QCOM option that can be enabled for both
> > ARCH_MULTI_V7VE and ARCH_MULTI_V7, we still win: a mach-qcom
> > kernel with only ARCH_MULTI_V7VE will use IDIV by default, and
> > give you the option to enable LPAE. If you pick LPAE, it will
> > still work fine on Krait-450 but not the older ones, and that is
> > a user error. If you enable ARCH_MULTI_V7 / CPU_V7, you get neither
> > LPAE nor IDIV, and the kernel will be able to run on both Scorpion
> > and Krait, as long as you have the right drivers too.
> >
>
> So if I have built mach-qcom with ARCH_MULTI_V7VE won't I get a
> kernel that uses idiv instructions that could be run on Scorpion,
> where the instruction doesn't exist? Or is that a user error
> again like picking LPAE?

Right. If you want to run on Scorpion, you have to select ARCH_MULTI_V7.
If both are set, we should build with -march=armv7-a and not use
the idiv instructions.

> It seems fine to me to go ahead with this approach. Should I take
> care of cooking up the patches? I can package this all up into a
> series that adds the new CPU type, updates the affected
> platforms, and layers the runtime patching on top when plain V7
> is a selected CPU type.

That would be nice, yes.

Arnd

Russell King - ARM Linux

unread,

Nov 24, 2015, 5:37:48 AM11/24/15

to Stephen Boyd, Arnd Bergmann, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano

On Mon, Nov 23, 2015 at 04:13:06PM -0800, Stephen Boyd wrote:
> On 11/23, Arnd Bergmann wrote:
> >
> > Ok, thanks for the confirmation.
> >
> > Summarizing what we've found, I think we can get away with just
> > introducing two Kconfig symbols ARCH_MULTI_V7VE and CPU_V7VE.
> > Most CPUs fall clearly into one category or the other, and then
> > we can allow LPAE to be selected for V7VE-only build but not
> > for plain V7, and we can unconditionally build the kernel with
> >
> > arch-$(CONFIG_CPU_32v7VE) = -D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7ve,-march=armv7-a -mcpu=cortex-a15)
> >
>
> This causes compiler spew for me:
>
> warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch
>
> Removing -march=armv7-a from there makes it quiet.
>
> Also, it's sort of feels wrong to have -mcpu in a place where
> we're exclusively doing -march. Perhaps the fallback should be
> bog standard -march=armv7-a? (or the fallback for that one
> "-march=armv5t -Wa$(comma)-march=armv7-a")?

How it was explained to me years ago is that -march selects the
instruction set, -mtune selects the instruction scheduling, and -mcpu
selects both.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Arnd Bergmann

unread,

Nov 24, 2015, 5:39:44 AM11/24/15

to linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

On Tuesday 24 November 2015 00:53:49 Stephen Boyd wrote:
> On 11/23, Stephen Boyd wrote:
> > On 11/23, Arnd Bergmann wrote:
> > >
> > > Ok, thanks for the confirmation.
> > >
> > > Summarizing what we've found, I think we can get away with just
> > > introducing two Kconfig symbols ARCH_MULTI_V7VE and CPU_V7VE.
> > > Most CPUs fall clearly into one category or the other, and then
> > > we can allow LPAE to be selected for V7VE-only build but not
> > > for plain V7, and we can unconditionally build the kernel with
> > >
> > > arch-$(CONFIG_CPU_32v7VE) = -D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7ve,-march=armv7-a -mcpu=cortex-a15)
> > >
> >
> > This causes compiler spew for me:
> >
> > warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch
> >
> > Removing -march=armv7-a from there makes it quiet.
> >
> > Also, it's sort of feels wrong to have -mcpu in a place where
> > we're exclusively doing -march. Perhaps the fallback should be
> > bog standard -march=armv7-a? (or the fallback for that one
> > "-march=armv5t -Wa$(comma)-march=armv7-a")?

I suggested using -mcpu=cortex-a15 because there are old gcc versions
that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
that do understand -mcpu=cortex-a15. I might be misremembering the
exact options we want though, and we could now just decided that if
your gcc is too old, you get no idiv even if it supports that on
Cortex-A15.

> And adding CPU_V7VE causes a cascade of changes to wherever
> CPU_V7 is being used today. Here's the patch I currently have,
> without the platform changes:

Thanks a lot for looking into this. I think we can simplify a couple
of them, as long as we also fix all the platforms to have the correct
dependencies:

> @@ -1069,7 +1075,7 @@ config ARM_ERRATA_411920
>
> config ARM_ERRATA_430973
> bool "ARM errata: Stale prediction on replaced interworking branch"
> - depends on CPU_V7
> + depends on CPU_V7 || CPU_V7VE
> help
> This option enables the workaround for the 430973 Cortex-A8
> r1p* erratum. If a code sequence containing an ARM/Thumb

If it's a Cortex-A8 erratum, we don't need the "|| CPU_V7VE". Same for
a lot of the following errata.

> @@ -1246,7 +1252,7 @@ config ARM_ERRATA_775420
>
> config ARM_ERRATA_798181
> bool "ARM errata: TLBI/DSB failure on Cortex-A15"
> - depends on CPU_V7 && SMP
> + depends on (CPU_V7 || CPU_V7VE) && SMP
> help
> On Cortex-A15 (r0p0..r3p2) the TLBI*IS/DSB operations are not
> adequately shooting down all use of the old entries. This
> @@ -1256,7 +1262,7 @@ config ARM_ERRATA_798181
>
> config ARM_ERRATA_773022
> bool "ARM errata: incorrect instructions may be executed from loop buffer"
> - depends on CPU_V7
> + depends on CPU_V7 || CPU_V7VE
> help
> This option enables the workaround for the 773022 Cortex-A15
> (up to r0p4) erratum. In certain rare sequences of code, the

And conversely, these will only need to depend on CPU_V7VE.

> @@ -1373,7 +1379,7 @@ config SMP_ON_UP
>
> config ARM_CPU_TOPOLOGY
> bool "Support cpu topology definition"
> - depends on SMP && CPU_V7
> + depends on SMP && (CPU_V7 || CPU_V7VE)
> default y
> help
> Support ARM cpu topology definition. The MPIDR register defines

Does topology make sense with Cortex-A5/A9 or Scorpion?
Otherwise it can also be V7VE-only.

> @@ -1417,7 +1423,7 @@ config HAVE_ARM_TWD
>
> config MCPM
> bool "Multi-Cluster Power Management"
> - depends on CPU_V7 && SMP
> + depends on (CPU_V7 || CPU_V7VE) && SMP
> help
> This option provides the common power management infrastructure
> for (multi-)cluster based systems, such as big.LITTLE based
> @@ -1434,7 +1440,7 @@ config MCPM_QUAD_CLUSTER
>
> config BIG_LITTLE
> bool "big.LITTLE support (Experimental)"
> - depends on CPU_V7 && SMP
> + depends on (CPU_V7 || CPU_V7VE) && SMP
> select MCPM
> help
> This option enables support selections for the big.LITTLE

multi-cluster and big.little are also V7VE-only by definition,
as pre-VE ARMv7 cores are all limited to one cluster.

> @@ -1642,7 +1648,7 @@ config AEABI
>
> config ARM_PATCH_UIDIV
> bool "Runtime patch calls to __aeabi_{u}idiv() with udiv/sdiv"
> - depends on CPU_V7 && !XIP_KERNEL && AEABI
> + depends on CPU_32v7 && !XIP_KERNEL && AEABI
> help
> Some v7 CPUs have support for the udiv and sdiv instructions
> that can be used in place of calls to __aeabi_uidiv and __aeabi_idiv

How about making this

depends on (CPU_V6 || CPU_V6K || CPU_V7) && CPU_V7VE

> @@ -1843,7 +1849,7 @@ config XEN_DOM0
> config XEN
> bool "Xen guest support on ARM"
> depends on ARM && AEABI && OF
> - depends on CPU_V7 && !CPU_V6
> + depends on (CPU_V7 || CPU_V7VE) && !CPU_V6
> depends on !GENERIC_ATOMIC64
> depends on MMU
> select ARCH_DMA_ADDR_T_64BIT

This is also V7VE-only. The !CPU_V6 check is there to avoid a compile-time
bug with some instructions that are V7-only. I think this should
be

depends on CPU_V7VE && !CPU_V6

> diff --git a/arch/arm/Kconfig-nommu b/arch/arm/Kconfig-nommu
> index aed66d5df7f1..aa04be6b29b9 100644
> --- a/arch/arm/Kconfig-nommu
> +++ b/arch/arm/Kconfig-nommu
> @@ -53,7 +53,7 @@ config REMAP_VECTORS_TO_RAM
>
> config ARM_MPU
> bool 'Use the ARM v7 PMSA Compliant MPU'
> - depends on CPU_V7
> + depends on CPU_V7 || CPU_V7VE
> default y
> help
> Some ARM systems without an MMU have instead a Memory Protection

Not sure about this one. It's strictly speaking for V7-R, and we don't have
a Kconfig option for that at all.

> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 95a000515e43..ea62ada144b1 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -5,7 +5,7 @@
> source "virt/kvm/Kconfig"
>
> menuconfig VIRTUALIZATION
> - bool "Virtualization"
> + bool "Virtualization" if CPU_V7VE
> ---help---
> Say Y here to get to see options for using your Linux host to run
> other operating systems inside virtual machines (guests).

We already have 'depends on ARM_VIRT_EXT' here. I guess we should instead
add the dependency on CPU_V7VE there.

> @@ -626,8 +644,7 @@ comment "Processor Features"
>
> config ARM_LPAE
> bool "Support for the Large Physical Address Extension"
> - depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
> - !CPU_32v4 && !CPU_32v3
> + depends on MMU && CPU_32v7VE

This looks wrong, we have to ensure at least that CPU_32v6 and CPU_32v7
are not set, though we can get rid of the v5/v4/v3 checks.

Pretty sure that these only work with ARMv7VE capable cores, the older ones only
have one cluster.

Arnd

Russell King - ARM Linux

unread,

Nov 24, 2015, 5:40:17 AM11/24/15

to Stephen Boyd, Arnd Bergmann, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, linux-ar...@lists.infradead.org

NAK on all this. The fact that you're having to add CPU_V7VE at all
sites which have CPU_V7 shows that this is a totally broken way of
approaching this.

Make CPU_V7VE be an _add-on_ to CPU_V7. In other words, when CPU_V7VE
is enabled, CPU_V7 should also be enabled, just like we do for CPU_V6K.

Note that v7M is different because that's not an add-on feature, it's
a different CPU class from (what should be) v7A.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,

Nov 24, 2015, 5:43:46 AM11/24/15

to Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
> I suggested using -mcpu=cortex-a15 because there are old gcc versions
> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
> that do understand -mcpu=cortex-a15.

That's not all. The bigger problem is that there are toolchains out
there which accept these options, but do _not_ support IDIV in either
ARM or Thumb mode. I'm afraid that makes it impossible to add this
feature to the mainline kernel in this form: we need to run a test
build to check that -march=armv7ve or what-not really does work
through both GCC and GAS.

Given that Ubuntu 14.04 LTS suffers with this, it's something that
must be resolved.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Måns Rullgård

unread,

Nov 24, 2015, 7:10:22 AM11/24/15

to Russell King - ARM Linux, Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

Russell King - ARM Linux <li...@arm.linux.org.uk> writes:

> On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> that do understand -mcpu=cortex-a15.
>
> That's not all. The bigger problem is that there are toolchains out
> there which accept these options, but do _not_ support IDIV in either
> ARM or Thumb mode. I'm afraid that makes it impossible to add this
> feature to the mainline kernel in this form: we need to run a test
> build to check that -march=armv7ve or what-not really does work
> through both GCC and GAS.

If the compiler accepts the option but doesn't actually emit any div
instructions, is there any real harm?

--
Måns Rullgård
ma...@mansr.com

Måns Rullgård

unread,

Nov 24, 2015, 7:15:26 AM11/24/15

to Arnd Bergmann, Stephen Boyd, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano, Thomas Petazzoni

Or we could simply ignore those and they'd be no worse off than they are
now.

--
Måns Rullgård
ma...@mansr.com

Russell King - ARM Linux

unread,

Nov 24, 2015, 7:23:33 AM11/24/15

to Måns Rullgård, Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

On Tue, Nov 24, 2015 at 12:10:02PM +0000, Måns Rullgård wrote:
> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
>
> > On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
> >> I suggested using -mcpu=cortex-a15 because there are old gcc versions
> >> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
> >> that do understand -mcpu=cortex-a15.
> >
> > That's not all. The bigger problem is that there are toolchains out
> > there which accept these options, but do _not_ support IDIV in either
> > ARM or Thumb mode. I'm afraid that makes it impossible to add this
> > feature to the mainline kernel in this form: we need to run a test
> > build to check that -march=armv7ve or what-not really does work
> > through both GCC and GAS.
>
> If the compiler accepts the option but doesn't actually emit any div
> instructions, is there any real harm?

That's not what I've found. I've found that asking the assembler
to accept idiv instructions appears to be ignored, which is something
completely different.

Further to this, what it comes down to is the stupid idea that the
compiler should embed .arch / .cpu in the assembly output, which has
the effect of overriding the command line arguments given to it via
-Wa. So, giving -Wa,-mcpu=cortex-a15 is a total no-op, because the
first thing in the assembly output is:

.arch armv7-a

which kills off any attempt to set the assembly level ISA from the
command line.

It does appear after all that Ubuntu 14.04 does support sdiv/idiv with
-mcpu=cortex-a15, but there is no -march=armv7ve.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Måns Rullgård

unread,

Nov 24, 2015, 7:29:19 AM11/24/15

to Russell King - ARM Linux, Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

Oh, you mean the compiler knows about the instructions but the assembler
doesn't or isn't passed the right options. It's infuriating when that
happens.

--
Måns Rullgård
ma...@mansr.com

Arnd Bergmann

unread,

Nov 24, 2015, 8:46:04 AM11/24/15

to linux-ar...@lists.infradead.org, Måns Rullgård, Nicolas Pitre, Peter Maydell, Russell King - ARM Linux, linux-...@vger.kernel.org, Daniel Lezcano, Stephen Boyd, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Thomas Petazzoni

On Tuesday 24 November 2015 12:15:13 Måns Rullgård wrote:
> Arnd Bergmann <ar...@arndb.de> writes:
> > On Monday 23 November 2015 15:13:52 Stephen Boyd wrote:
> >> On 11/23, Arnd Bergmann wrote:
> >> > On Monday 23 November 2015 13:32:06 Stephen Boyd wrote:
> >> > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> >> > index b251013eef0a..bad6343c34d5 100644

> >> Do you have the information on these custom opcodes? I can work
> >> that into the patches assuming the MIDR is different.
> >
> > Thomas Petazzoni said this in a private mail:
> >
> > | According to the datasheet, the PJ4B has integer signed and unsigned
> > | divide, similar to the sdiv and udiv ARM instructions. But the way to
> > | access it is by doing a MRC instruction.
> > |
> > | MRC<cond> p6, 1, Rd , CRn , CRm, 4
> > |
> > |for PJ4B is the same as:
> > |
> > | SDIV Rd , Rn, Rm
> > |
> > | on ARM cores.
> > |
> > |And:
> > |
> > | MRC<cond> p6, 1, Rd , CRn , CRm, 0
> > |
> > |for PJ4B is the same as:
> > |
> > | UDIV Rd , Rn, Rm
> > |
> > |on ARM cores.
> > |
> > |This is documented in the "Extended instructions" section of the
> > |PJ4B datasheet.
> >
> > I assume what he meant was that this is true for both PJ4 and PJ4B
> > but not for PJ4B-MP, which has the normal udiv/sdiv instructions.
> >
> > IOW, anything with CPU implementer 0x56 part 0x581 should use those,
> > while part 0x584 can use the sdiv/udiv that it reports correctly.
>
> Or we could simply ignore those and they'd be no worse off than they are
> now.

Well, if we add all the infrastructure to do dynamic patching, we
might as well use it here, that is a very little extra effort.

I'm not convinced that the dynamic patching for idiv is actually needed
but I'm not objecting either, and Stephen has done the work already.

Arnd

Russell King - ARM Linux

unread,

Nov 24, 2015, 9:01:33 AM11/24/15

to Måns Rullgård, Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

No, that isn't what I meant.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Måns Rullgård

unread,

Nov 24, 2015, 9:03:46 AM11/24/15

to Russell King - ARM Linux, Arnd Bergmann, linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington

Then what do you mean? The compiler either emits the instructions or it
doesn't. If it doesn't, what the assembler accepts is irrelevant.

--
Måns Rullgård
ma...@mansr.com

Stephen Boyd

unread,

Nov 24, 2015, 3:07:43 PM11/24/15

to Russell King - ARM Linux, Arnd Bergmann, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, linux-ar...@lists.infradead.org

On 11/24, Russell King - ARM Linux wrote:
> On Tue, Nov 24, 2015 at 12:53:49AM -0800, Stephen Boyd wrote:
> >
> > And adding CPU_V7VE causes a cascade of changes to wherever
> > CPU_V7 is being used today. Here's the patch I currently have,
> > without the platform changes:

> > @@ -1069,7 +1075,7 @@ config ARM_ERRATA_411920
> >
> > config ARM_ERRATA_430973
> > bool "ARM errata: Stale prediction on replaced interworking branch"
> > - depends on CPU_V7
> > + depends on CPU_V7 || CPU_V7VE
>
> NAK on all this. The fact that you're having to add CPU_V7VE at all
> sites which have CPU_V7 shows that this is a totally broken way of
> approaching this.
>
> Make CPU_V7VE be an _add-on_ to CPU_V7. In other words, when CPU_V7VE
> is enabled, CPU_V7 should also be enabled, just like we do for CPU_V6K.
>
> Note that v7M is different because that's not an add-on feature, it's
> a different CPU class from (what should be) v7A.
>

Ok. Presumably the order of arch-$(CONFIG) lines in the Makefile
are done in an order to allow the build to degrade to the lowest
common denominator among architecture support. CPU_V7 selects
CPU_32v7 and we're using that config to select -march=armv7-a in
the Makefile. The patch currently uses CPU_32v7VE to select
-march=armv7ve. If CPU_V7VE selects CPU_V7 we'll never be able to
use -march=armv7ve because CPU_V7 will be selecting CPU_32v7 and
that will come after CPU_32v7VE in the Makefile.

My understanding is that we want to support CPU_V7VE without
CPU_V7 enabled so that it uses the idiv instructions in that
configuration. When V7VE and V7 are both enabled, we should
degrade to the aeabi functions, and the same is true for when
V7VE is disabled.

I suppose we can fix this by making CPU_V7 a hidden option? Or I
need some coffee because I'm missing something.

---8<----
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ccd0d5553d38..158ffb983387 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -626,7 +626,7 @@ config ARCH_SHMOBILE_LEGACY
select ARCH_SHMOBILE
select ARM_PATCH_PHYS_VIRT if MMU
select CLKDEV_LOOKUP
- select CPU_V7
+ select CPU_V7_NOEXT
select GENERIC_CLOCKEVENTS
select HAVE_ARM_SCU if SMP
select HAVE_ARM_TWD if SMP
@@ -802,7 +802,7 @@ config ARCH_MULTI_V7
bool "ARMv7 based platforms (Cortex-A, PJ4, Scorpion, Krait)"
default y
select ARCH_MULTI_V6_V7
- select CPU_V7
+ select CPU_V7_NOEXT
select HAVE_SMP

config ARCH_MULTI_V7VE
diff --git a/arch/arm/mach-prima2/Kconfig b/arch/arm/mach-prima2/Kconfig
index 9ab8932403e5..7e1b36400e14 100644
--- a/arch/arm/mach-prima2/Kconfig
+++ b/arch/arm/mach-prima2/Kconfig
@@ -25,7 +25,7 @@ config ARCH_ATLAS7
bool "CSR SiRFSoC ATLAS7 ARM Cortex A7 Platform"
default y
select ARM_GIC
- select CPU_V7
+ select CPU_V7_NOEXT
select HAVE_ARM_SCU if SMP
select HAVE_SMP
help
diff --git a/arch/arm/mach-realview/Kconfig b/arch/arm/mach-realview/Kconfig
index 565925f37dc5..7e084c34071c 100644
--- a/arch/arm/mach-realview/Kconfig
+++ b/arch/arm/mach-realview/Kconfig
@@ -24,7 +24,7 @@ config MACH_REALVIEW_EB
config REALVIEW_EB_A9MP
bool "Support Multicore Cortex-A9 Tile"
depends on MACH_REALVIEW_EB
- select CPU_V7
+ select CPU_V7_NOEXT
select HAVE_ARM_SCU if SMP
select HAVE_ARM_TWD if SMP
select HAVE_SMP
@@ -93,7 +93,7 @@ config REALVIEW_PB1176_SECURE_FLASH
config MACH_REALVIEW_PBA8
bool "Support RealView(R) Platform Baseboard for Cortex(tm)-A8 platform"
select ARM_GIC
- select CPU_V7
+ select CPU_V7_NOEXT
select HAVE_PATA_PLATFORM
help
Include support for the ARM(R) RealView Platform Baseboard for
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index e4ff161da98f..02a887235155 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -350,11 +350,11 @@ config CPU_FEROCEON_OLD_ID
config CPU_PJ4
bool
select ARM_THUMBEE
- select CPU_V7
+ select CPU_V7_NOEXT

config CPU_PJ4B
bool
- select CPU_V7
+ select CPU_V7_NOEXT

# ARMv6
config CPU_V6
@@ -383,11 +383,9 @@ config CPU_V6K
select CPU_PABRT_V6
select CPU_TLB_V6 if MMU

-# ARMv7
config CPU_V7
- bool "Support ARM V7 processor" if (!ARCH_MULTIPLATFORM || ARCH_MULTI_V7) && (ARCH_INTEGRATOR || MACH_REALVIEW_EB || MACH_REALVIEW_PBX)
+ bool
select CPU_32v6K
- select CPU_32v7
select CPU_ABRT_EV7
select CPU_CACHE_V7
select CPU_CACHE_VIPT
@@ -398,6 +396,12 @@ config CPU_V7
select CPU_PABRT_V7
select CPU_TLB_V7 if MMU

+# ARMv7
+config CPU_V7_NOEXT
+ bool "Support ARM V7 processor" if (!ARCH_MULTIPLATFORM || ARCH_MULTI_V7) && (ARCH_INTEGRATOR || MACH_REALVIEW_EB || MACH_REALVIEW_PBX)
+ select CPU_32v7
+ select CPU_V7
+
# ARMv7M
config CPU_V7M
bool
@@ -410,17 +414,8 @@ config CPU_V7M
# ARMv7ve
config CPU_V7VE

bool "Support ARM V7 processor w/ virtualization extensions" if (!ARCH_MULTIPLATFORM || ARCH_MULTI_V7VE) && (ARCH_INTEGRATOR || MACH_REALVIEW_EB || MACH_REALVIEW_PBX)

- select CPU_32v6K
select CPU_32v7VE
- select CPU_ABRT_EV7
- select CPU_CACHE_V7
- select CPU_CACHE_VIPT
- select CPU_COPY_V6 if MMU
- select CPU_CP15_MMU if MMU
- select CPU_CP15_MPU if !MMU
- select CPU_HAS_ASID if MMU
- select CPU_PABRT_V7
- select CPU_TLB_V7 if MMU
+ select CPU_V7

config CPU_THUMBONLY
bool

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Russell King - ARM Linux

unread,

Nov 24, 2015, 3:35:40 PM11/24/15

to Stephen Boyd, Arnd Bergmann, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, linux-ar...@lists.infradead.org

On Tue, Nov 24, 2015 at 12:07:30PM -0800, Stephen Boyd wrote:
> Ok. Presumably the order of arch-$(CONFIG) lines in the Makefile
> are done in an order to allow the build to degrade to the lowest
> common denominator among architecture support.

Correct. Make processes the directives in the order listed in the
makefile, which means that a variable final value results from its
last assignment.

> CPU_V7 selects
> CPU_32v7 and we're using that config to select -march=armv7-a in
> the Makefile. The patch currently uses CPU_32v7VE to select
> -march=armv7ve. If CPU_V7VE selects CPU_V7 we'll never be able to
> use -march=armv7ve because CPU_V7 will be selecting CPU_32v7 and
> that will come after CPU_32v7VE in the Makefile.

Right, but look at how the V6K stuff is handled:

arch-$(CONFIG_CPU_32v6) =-D__LINUX_ARM_ARCH__=6 $(call cc-option,-march=armv6,-march=armv5t -Wa$(comma)-march=armv6)
# Only override the compiler option if ARMv6. The ARMv6K extensions are

# always available in ARMv7
ifeq ($(CONFIG_CPU_32v6),y)
arch-$(CONFIG_CPU_32v6K) =-D__LINUX_ARM_ARCH__=6 $(call cc-option,-march=armv6k,-march=armv5t -Wa$(comma)-march=armv6k)
endif

We'd need to do something similar for v7VE as well. As we're getting
more of this, I'd suggest we move to:

arch-v7a-y =$(call cc-option,-march=armv7-a,-march=armv5t -Wa$(comma)-march=armv7-a)
arch-v7a-$(CONFIG_CPU_32v7VE) =... whatever it was...
arch-$(CONFIG_CPU_32v7) =-D__LINUX_ARM_ARCH__=7 $(arch-v7a-y)
arch-v6-y =$(call cc-option,-march=armv6,-march=armv5t -Wa$(comma)-march=armv6)
arch-v6-$(CONFIG_CPU_32v6K) =$(call cc-option,-march=armv6k,-march=armv5t -Wa$(comma)-march=armv6k)
arch-$(CONFIG_CPU_32v6) =-D__LINUX_ARM_ARCH__=6 $(arch-v6-y)

> My understanding is that we want to support CPU_V7VE without
> CPU_V7 enabled so that it uses the idiv instructions in that
> configuration. When V7VE and V7 are both enabled, we should
> degrade to the aeabi functions, and the same is true for when
> V7VE is disabled.

Let me have another look at this, it's been a while since I touched these
options...

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Arnd Bergmann

unread,

Nov 24, 2015, 4:12:47 PM11/24/15

to Russell King - ARM Linux, Stephen Boyd, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, linux-ar...@lists.infradead.org

On Tuesday 24 November 2015 20:35:16 Russell King - ARM Linux wrote:
> We'd need to do something similar for v7VE as well. As we're getting
> more of this, I'd suggest we move to:
>
> arch-v7a-y =$(call cc-option,-march=armv7-a,-march=armv5t -Wa$(comma)-march=armv7-a)
> arch-v7a-$(CONFIG_CPU_32v7VE) =... whatever it was...
> arch-$(CONFIG_CPU_32v7) =-D__LINUX_ARM_ARCH__=7 $(arch-v7a-y)
> arch-v6-y =$(call cc-option,-march=armv6,-march=armv5t -Wa$(comma)-march=armv6)
> arch-v6-$(CONFIG_CPU_32v6K) =$(call cc-option,-march=armv6k,-march=armv5t -Wa$(comma)-march=armv6k)
> arch-$(CONFIG_CPU_32v6) =-D__LINUX_ARM_ARCH__=6 $(arch-v6-y)

I would argue that V7VE is different from V6K here: The instructions
that are added in V6K compared to V6 are not generated by gcc but
are typically used in assembly like

static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
{
..
#ifndef CONFIG_CPU_V6
asm volatile(...); /* v6k specific instruction */
#endif
}

while the logic in your example above would break normal v7 support when
both V7 and V7VE are enabled.

> > My understanding is that we want to support CPU_V7VE without
> > CPU_V7 enabled so that it uses the idiv instructions in that
> > configuration. When V7VE and V7 are both enabled, we should
> > degrade to the aeabi functions, and the same is true for when
> > V7VE is disabled.
>
> Let me have another look at this, it's been a while since I touched these
> options...

There is one idea that I've had in the back of my mind for a long
while, and probably mentioned on the list before:

We could decide to simplify the CPU architecture selection for
multiplatform a lot if we turn the somewhat overengineered ARCH_MULTI_*
options into a choice statement, where each of them implies the
higher architecture levels. That way we can linearize
ARMv6/v6k/v7/v7VE/v8/v8.1 so that you just pick which platforms you
want to see by selecting the minimum level, and all higher ones will
automatically be available (for v8 and v8.1 that means just MACH_VIRT,
as we don't really want to run 32-bit kernels on bare metal v8-A
machines).

That way, we can have LPAE and -march=armv7ve support depend on
CONFIG_ARCH_MULTI_V7VE, which would imply that we don't support
CPU_V7 based platforms.

Arnd

Stephen Boyd

unread,

Nov 24, 2015, 8:52:14 PM11/24/15

to Arnd Bergmann, linux-ar...@lists.infradead.org, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Daniel Lezcano, Thomas Petazzoni

It looks like we have some sort of function that mostly does
this, except it doesn't differentiate on that lower bit for 1 vs
4. I guess I'll write another one for that.

static inline int cpu_is_pj4(void)
{
unsigned int id;

id = read_cpuid_id();
if ((id & 0xff0fff00) == 0x560f5800)
return 1;

return 0;

}

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Arnd Bergmann

unread,

Nov 25, 2015, 2:22:20 AM11/25/15

to linux-ar...@lists.infradead.org, Stephen Boyd, Nicolas Pitre, Peter Maydell, Måns Rullgård, Russell King - ARM Linux, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, Thomas Petazzoni

On Tuesday 24 November 2015 17:51:37 Stephen Boyd wrote:
> On 11/24, Arnd Bergmann wrote:
> > On Monday 23 November 2015 15:13:52 Stephen Boyd wrote:
> > IOW, anything with CPU implementer 0x56 part 0x581 should use those,
> > while part 0x584 can use the sdiv/udiv that it reports correctly.
> >
>
> It looks like we have some sort of function that mostly does
> this, except it doesn't differentiate on that lower bit for 1 vs
> 4. I guess I'll write another one for that.
>
> static inline int cpu_is_pj4(void)
> {
> unsigned int id;
>
> id = read_cpuid_id();
> if ((id & 0xff0fff00) == 0x560f5800)
> return 1;
>
> return 0;
> }

Correct, thanks.

Arnd

Stephen Boyd

unread,

Nov 25, 2015, 4:51:19 PM11/25/15

to linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Thomas Petazzoni, Michal Marek, Russell King - ARM Linux

This is a respin of a patch series from about a year ago[1]. I realized
that we already had most of the code in recordmcount to figure out
where we make calls to particular functions, so recording where
we make calls to the integer division functions should be easy enough
to add support for using the same design. Looking back on the thread
it seems like Mans was thinking along the same lines, although it wasn't
obvious to me back then or even over the last few days when I wrote this.

This series copies and reworks recordmcount to record the locations of the
calls to the library integer division functions on ARM builds, and puts those
locations into a table that we use to patch instructions at boot. The first
patch adds the recorduidiv program, and the second patch implements the
runtime patching for modules and kernel code. The module part hooks into
the relocation patching code we already have.

The discussion surrounding adding ARMv7VE so that this patchset isn't
as necessary is still on going, so I'm just sending out these patches for
now. Assuming we can come to a conclusion on how to implement ARMv7VE
I'll finish up those patches.

Comments/feedback appreciated.

Changes from v1:
* Add suport for PJ4 udiv/sdiv mrc opcodes
* Create a new program for recording uidiv instead of modifying recordmcount

[1] http://lkml.kernel.org/r/1383951632-6090-1-g...@codeaurora.org

Cc: Nicolas Pitre <ni...@fluxnic.net>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Måns Rullgård <ma...@mansr.com>
Cc: Thomas Petazzoni <thomas.p...@free-electrons.com>
Cc: Michal Marek <mma...@suse.com>
Cc: Russell King - ARM Linux <li...@arm.linux.org.uk>

Stephen Boyd (2):
scripts: Add a recorduidiv program
ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

arch/arm/Kconfig | 14 +
arch/arm/include/asm/cputype.h | 8 +-
arch/arm/include/asm/setup.h | 3 +
arch/arm/kernel/module.c | 40 +++
arch/arm/kernel/setup.c | 68 ++++
arch/arm/kernel/vmlinux.lds.S | 13 +
scripts/Makefile | 1 +
scripts/Makefile.build | 17 +-
scripts/recorduidiv.c | 745 +++++++++++++++++++++++++++++++++++++++++
9 files changed, 906 insertions(+), 3 deletions(-)
create mode 100644 scripts/recorduidiv.c

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,

a Linux Foundation Collaborative Project

Stephen Boyd

unread,

Nov 25, 2015, 4:51:37 PM11/25/15

to linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Russell King - ARM Linux, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Thomas Petazzoni

The ARM compiler inserts calls to __aeabi_uidiv() and
__aeabi_idiv() when it needs to perform division on signed and
unsigned integers. If a processor has support for the udiv and
sdiv division instructions the calls to these support routines
can be replaced with those instructions. Now that recordmcount
records the locations of calls to these library functions in
two sections (one for udiv and one for sdiv), iterate over these
sections early at boot and patch the call sites with the
appropriate division instruction when we determine that the
processor supports the division instructions. Using the division
instructions should be faster and less power intensive than
running the support code.

Cc: Nicolas Pitre <ni...@fluxnic.net>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Måns Rullgård <ma...@mansr.com>
Cc: Thomas Petazzoni <thomas.p...@free-electrons.com>

Signed-off-by: Stephen Boyd <sb...@codeaurora.org>
---
arch/arm/Kconfig | 14 +++++++++
arch/arm/include/asm/cputype.h | 8 ++++-
arch/arm/include/asm/setup.h | 3 ++
arch/arm/kernel/module.c | 40 +++++++++++++++++++++++++
arch/arm/kernel/setup.c | 68 ++++++++++++++++++++++++++++++++++++++++++
arch/arm/kernel/vmlinux.lds.S | 13 ++++++++
6 files changed, 145 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0365cbbc9179..aa8bc7da6331 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1617,6 +1617,20 @@ config AEABI

To use this you need GCC version 4.0.0 or later.

+config ARM_PATCH_UIDIV
+ bool "Runtime patch calls to __aeabi_{u}idiv() with udiv/sdiv"

+ depends on CPU_32v7 && !XIP_KERNEL && AEABI

+ help
+ Some v7 CPUs have support for the udiv and sdiv instructions
+ that can be used in place of calls to __aeabi_uidiv and __aeabi_idiv
+ functions provided by the ARM runtime ABI.
+
+ Enabling this option allows the kernel to modify itself to replace
+ branches to these library functions with the udiv and sdiv
+ instructions themselves. Typically this will be faster and less
+ power intensive than running the library support code to do
+ integer division.
+
config OABI_COMPAT
bool "Allow old ABI binaries to run with this kernel (EXPERIMENTAL)"
depends on AEABI && !THUMB2_KERNEL
diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
index 85e374f873ac..48c77d422a0d 100644
--- a/arch/arm/include/asm/cputype.h
+++ b/arch/arm/include/asm/cputype.h
@@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void)

return 0;
}
+
+static inline bool __attribute_const__ cpu_is_pj4_nomp(void)
+{
+ return read_cpuid_part() == 0x56005810;
+}
#else
-#define cpu_is_pj4() 0
+#define cpu_is_pj4() 0
+#define cpu_is_pj4_nomp() 0
#endif

static inline int __attribute_const__ cpuid_feature_extract_field(u32 features,
diff --git a/arch/arm/include/asm/setup.h b/arch/arm/include/asm/setup.h
index e0adb9f1bf94..a514552d5cbd 100644
--- a/arch/arm/include/asm/setup.h
+++ b/arch/arm/include/asm/setup.h
@@ -25,4 +25,7 @@ extern int arm_add_memory(u64 start, u64 size);
extern void early_print(const char *str, ...);
extern void dump_machine_table(void);

+extern void patch_udiv(void *addr, size_t size);
+extern void patch_sdiv(void *addr, size_t size);
+
#endif
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index efdddcb97dd1..aa59e5cfe6a0 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -20,7 +20,9 @@
#include <linux/string.h>
#include <linux/gfp.h>

+#include <asm/hwcap.h>
#include <asm/pgtable.h>
+#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/smp_plat.h>
#include <asm/unwind.h>
@@ -51,6 +53,38 @@ void *module_alloc(unsigned long size)
}
#endif

+#ifdef CONFIG_ARM_PATCH_UIDIV
+static int module_patch_aeabi_uidiv(unsigned long loc, const Elf32_Sym *sym)
+{
+ extern char __aeabi_uidiv[], __aeabi_idiv[];
+ unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
+ unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
+ unsigned int mask;
+
+ if (IS_ENABLED(CONFIG_THUMB2_KERNEL))
+ mask = HWCAP_IDIVT;
+ else
+ mask = HWCAP_IDIVA;
+
+ if (elf_hwcap & mask) {
+ if (sym->st_value == udiv_addr) {
+ patch_udiv(&loc, sizeof(loc));
+ return 1;
+ } else if (sym->st_value == sdiv_addr) {
+ patch_sdiv(&loc, sizeof(loc));
+ return 1;
+ }
+ }
+
+ return 0;
+}
+#else
+static int module_patch_aeabi_uidiv(unsigned long loc, const Elf32_Sym *sym)
+{
+ return 0;
+}
+#endif
+
int
apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
unsigned int relindex, struct module *module)
@@ -109,6 +143,9 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
return -ENOEXEC;
}

+ if (module_patch_aeabi_uidiv(loc, sym))
+ break;
+
offset = __mem_to_opcode_arm(*(u32 *)loc);
offset = (offset & 0x00ffffff) << 2;
if (offset & 0x02000000)
@@ -195,6 +232,9 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
return -ENOEXEC;
}

+ if (module_patch_aeabi_uidiv(loc, sym))
+ break;
+
upper = __mem_to_opcode_thumb16(*(u16 *)loc);
lower = __mem_to_opcode_thumb16(*(u16 *)(loc + 2));

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 20edd349d379..39a46059d5e8 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -32,6 +32,7 @@
#include <linux/compiler.h>
#include <linux/sort.h>
#include <linux/psci.h>
+#include <linux/module.h>

#include <asm/unified.h>
#include <asm/cp15.h>
@@ -375,6 +376,72 @@ void __init early_print(const char *str, ...)
printk("%s", buf);
}

+#ifdef CONFIG_ARM_PATCH_UIDIV
+/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */
+static u32 __attribute_const__ sdiv_instruction(void)
+{
+ if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
+ if (cpu_is_pj4_nomp())
+ return __opcode_to_mem_thumb32(0xee300691);
+ return __opcode_to_mem_thumb32(0xfb90f0f1);
+ }
+
+ if (cpu_is_pj4_nomp())
+ return __opcode_to_mem_arm(0xee300691);
+ return __opcode_to_mem_arm(0xe710f110);
+}
+
+/* "udiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 0" if we're on pj4 w/o MP */
+static u32 __attribute_const__ udiv_instruction(void)
+{
+ if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
+ if (cpu_is_pj4_nomp())
+ return __opcode_to_mem_thumb32(0xee300611);
+ return __opcode_to_mem_thumb32(0xfbb0f0f1);
+ }
+
+ if (cpu_is_pj4_nomp())
+ return __opcode_to_mem_arm(0xee300611);
+ return __opcode_to_mem_arm(0xe730f110);
+}
+
+static void __init_or_module patch(u32 **addr, size_t count, u32 insn)
+{
+ for (; count != 0; count -= 4)
+ **addr++ = insn;
+}
+
+void __init_or_module patch_udiv(void *addr, size_t size)
+{
+ patch(addr, size, udiv_instruction());
+}
+
+void __init_or_module patch_sdiv(void *addr, size_t size)
+{
+ return patch(addr, size, sdiv_instruction());
+}
+
+static void __init patch_aeabi_uidiv(void)
+{
+ extern char __start_udiv_loc[], __stop_udiv_loc[];
+ extern char __start_idiv_loc[], __stop_idiv_loc[];
+ unsigned int mask;
+
+ if (IS_ENABLED(CONFIG_THUMB2_KERNEL))
+ mask = HWCAP_IDIVT;
+ else
+ mask = HWCAP_IDIVA;
+
+ if (!(elf_hwcap & mask))
+ return;
+
+ patch_udiv(__start_udiv_loc, __stop_udiv_loc - __start_udiv_loc);
+ patch_sdiv(__start_idiv_loc, __stop_idiv_loc - __start_idiv_loc);
+}
+#else
+static void __init patch_aeabi_uidiv(void) { }
+#endif
+
static void __init cpuid_init_hwcaps(void)
{
int block;
@@ -642,6 +709,7 @@ static void __init setup_processor(void)
elf_hwcap = list->elf_hwcap;

cpuid_init_hwcaps();
+ patch_aeabi_uidiv();

#ifndef CONFIG_ARM_THUMB
elf_hwcap &= ~(HWCAP_THUMB | HWCAP_IDIVT);
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 8b60fde5ce48..bc87a2e04e6f 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -28,6 +28,18 @@
*(.hyp.idmap.text) \
VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;

+#ifdef CONFIG_ARM_PATCH_UIDIV
+#define UIDIV_REC . = ALIGN(8); \
+ VMLINUX_SYMBOL(__start_udiv_loc) = .; \
+ *(__udiv_loc) \
+ VMLINUX_SYMBOL(__stop_udiv_loc) = .; \
+ VMLINUX_SYMBOL(__start_idiv_loc) = .; \
+ *(__idiv_loc) \
+ VMLINUX_SYMBOL(__stop_idiv_loc) = .;
+#else
+#define UIDIV_REC
+#endif
+
#ifdef CONFIG_HOTPLUG_CPU
#define ARM_CPU_DISCARD(x)
#define ARM_CPU_KEEP(x) x
@@ -210,6 +222,7 @@ SECTIONS
.init.data : {
#ifndef CONFIG_XIP_KERNEL
INIT_DATA
+ UIDIV_REC
#endif
INIT_SETUP(16)
INIT_CALLS

Stephen Boyd

unread,

Nov 25, 2015, 4:51:45 PM11/25/15

to linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Russell King - ARM Linux, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Michal Marek

The ARM compiler inserts calls to __aeabi_uidiv() and
__aeabi_idiv() when it needs to perform division on signed and
unsigned integers. If a processor has support for the udiv and
sdiv division instructions the calls to these support routines

can be replaced with those instructions. Therefore, record the
location of calls to these library functions into two sections
(one for udiv and one for sdiv) similar to how we trace calls to
mcount. When the kernel boots up it will check to see if the
processor supports the instructions and then patch the call sites
with the instruction.

Cc: Nicolas Pitre <ni...@fluxnic.net>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Måns Rullgård <ma...@mansr.com>

Cc: Michal Marek <mma...@suse.com>

Signed-off-by: Stephen Boyd <sb...@codeaurora.org>
---

scripts/Makefile | 1 +
scripts/Makefile.build | 17 +-

scripts/recorduidiv.c | 745 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 761 insertions(+), 2 deletions(-)
create mode 100644 scripts/recorduidiv.c

diff --git a/scripts/Makefile b/scripts/Makefile
index fd0d53d4a234..26b9ebd8be31 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -15,6 +15,7 @@ hostprogs-$(CONFIG_KALLSYMS) += kallsyms
hostprogs-$(CONFIG_LOGO) += pnmtologo
hostprogs-$(CONFIG_VT) += conmakehash
hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
+hostprogs-$(CONFIG_ARM_PATCH_UIDIV) += recorduidiv
hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
hostprogs-$(CONFIG_ASN1) += asn1_compiler
hostprogs-$(CONFIG_MODULE_SIG) += sign-file
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 01df30af4d4a..142e5db29d3b 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -241,12 +241,25 @@ cmd_record_mcount = \
fi;
endif

+ifdef CONFIG_ARM_PATCH_UIDIV
+# Due to recursion, we must skip empty.o.
+# The empty.o file is created in the make process in order to determine
+# the target endianness and word size. It is made before all other C
+# files, including recorduidiv.
+cmd_record_uidiv = \
+ if [ $(@) != "scripts/mod/empty.o" ]; then \
+ $(objtree)/scripts/recorduidiv "$(@)"; \
+ fi;
+recorduidiv_source := $(srctree)/scripts/recorduidiv.c
+endif
+
define rule_cc_o_c
$(call echo-cmd,checksrc) $(cmd_checksrc) \
$(call echo-cmd,cc_o_c) $(cmd_cc_o_c); \
$(cmd_modversions) \
$(call echo-cmd,record_mcount) \
$(cmd_record_mcount) \
+ $(call echo-cmd,record_uidiv) $(cmd_record_uidiv) \
scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' > \
$(dot-target).tmp; \
rm -f $(depfile); \
@@ -254,13 +267,13 @@ define rule_cc_o_c
endef

# Built-in and composite module parts
-$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(recorduidiv_source) FORCE
$(call cmd,force_checksrc)
$(call if_changed_rule,cc_o_c)

# Single-part modules are special since we need to mark them in $(MODVERDIR)

-$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) $(recorduidiv_source) FORCE
$(call cmd,force_checksrc)
$(call if_changed_rule,cc_o_c)
@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
diff --git a/scripts/recorduidiv.c b/scripts/recorduidiv.c
new file mode 100644
index 000000000000..a2af5c46cb6d
--- /dev/null
+++ b/scripts/recorduidiv.c
@@ -0,0 +1,745 @@
+/*
+ * recorduidiv.c: construct a table of the locations of calls to '__aeabi_uidiv'
+ * and '__aeabi_idiv' so that the kernel can replace them with idiv and sdiv
+ * instructions.
+ *
+ * Copyright 2009 John F. Reiser <jre...@BitWagon.com>. All rights reserved.
+ * Licensed under the GNU General Public License, version 2 (GPLv2).
+ *
+ * Restructured to fit Linux format, as well as other updates:
+ * Copyright 2010 Steven Rostedt <sros...@redhat.com>, Red Hat Inc.
+ *
+ * Copyright (c) 2015 The Linux Foundation. All rights reserved.
+ */
+
+/*
+ * Strategy: alter the .o file in-place.
+ *
+ * Append a new STRTAB that has the new section names, followed by a new array
+ * ElfXX_Shdr[] that has the new section headers, followed by the section
+ * contents for __udiv_loc and __idiv_loc and their relocations. The old
+ * shstrtab strings, and the old ElfXX_Shdr[] array, remain as "garbage"
+ * (commonly, a couple kilobytes.) Subsequent processing by /bin/ld (or the
+ * kernel module loader) will ignore the garbage regions, because they are not
+ * designated by the new .e_shoff nor the new ElfXX_Shdr[]. [In order to
+ * remove the garbage, then use "ld -r" to create a new file that omits the
+ * garbage.]
+ */
+
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+# define _align 3u
+# define _size 4
+
+#ifndef R_ARM_CALL
+#define R_ARM_CALL 28
+#endif
+
+#ifndef R_ARM_JUMP24
+#define R_ARM_JUMP24 29
+#endif
+
+static int fd_map; /* File descriptor for file being modified. */
+static int mmap_failed; /* Boolean flag. */
+static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */
+static struct stat sb; /* Remember .st_size, etc. */
+static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */
+
+/* setjmp() return values */
+enum {
+ SJ_SETJMP = 0, /* hardwired first return */
+ SJ_FAIL,
+ SJ_SUCCEED
+};
+
+/* Per-file resource cleanup when multiple files. */
+static void
+cleanup(void)
+{
+ if (!mmap_failed)
+ munmap(ehdr_curr, sb.st_size);
+ else
+ free(ehdr_curr);
+ close(fd_map);
+}
+
+static void __attribute__((noreturn))
+fail_file(void)
+{
+ cleanup();
+ longjmp(jmpenv, SJ_FAIL);
+}
+
+static void __attribute__((noreturn))
+succeed_file(void)
+{
+ cleanup();
+ longjmp(jmpenv, SJ_SUCCEED);
+}
+
+/* ulseek, uread, ...: Check return value for errors. */
+
+static off_t
+ulseek(int const fd, off_t const offset, int const whence)
+{
+ off_t const w = lseek(fd, offset, whence);
+ if (w == (off_t)-1) {
+ perror("lseek");
+ fail_file();
+ }
+ return w;
+}
+
+static size_t
+uread(int const fd, void *const buf, size_t const count)
+{
+ size_t const n = read(fd, buf, count);
+ if (n != count) {
+ perror("read");
+ fail_file();
+ }
+ return n;
+}
+
+static size_t
+uwrite(int const fd, void const *const buf, size_t const count)
+{
+ size_t const n = write(fd, buf, count);
+ if (n != count) {
+ perror("write");
+ fail_file();
+ }
+ return n;
+}
+
+static void *
+umalloc(size_t size)
+{
+ void *const addr = malloc(size);
+ if (addr == 0) {
+ fprintf(stderr, "malloc failed: %zu bytes\n", size);
+ fail_file();
+ }
+ return addr;
+}
+
+/*
+ * Get the whole file as a programming convenience in order to avoid
+ * malloc+lseek+read+free of many pieces. If successful, then mmap
+ * avoids copying unused pieces; else just read the whole file.
+ * Open for both read and write; new info will be appended to the file.
+ * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
+ * do not propagate to the file until an explicit overwrite at the last.
+ * This preserves most aspects of consistency (all except .st_size)
+ * for simultaneous readers of the file while we are appending to it.
+ * However, multiple writers still are bad. We choose not to use
+ * locking because it is expensive and the use case of kernel build
+ * makes multiple writers unlikely.
+ */
+static void *mmap_file(char const *fname)
+{
+ void *addr;
+
+ fd_map = open(fname, O_RDWR);
+ if (fd_map < 0 || fstat(fd_map, &sb) < 0) {
+ perror(fname);
+ fail_file();
+ }
+ if (!S_ISREG(sb.st_mode)) {
+ fprintf(stderr, "not a regular file: %s\n", fname);
+ fail_file();
+ }
+ addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
+ fd_map, 0);
+ mmap_failed = 0;
+ if (addr == MAP_FAILED) {
+ mmap_failed = 1;
+ addr = umalloc(sb.st_size);
+ uread(fd_map, addr, sb.st_size);
+ }
+ return addr;
+}
+
+/* w8rev, w8nat, ...: Handle endianness. */
+
+static uint64_t w8rev(uint64_t const x)
+{
+ return ((0xff & (x >> (0 * 8))) << (7 * 8))
+ | ((0xff & (x >> (1 * 8))) << (6 * 8))
+ | ((0xff & (x >> (2 * 8))) << (5 * 8))
+ | ((0xff & (x >> (3 * 8))) << (4 * 8))
+ | ((0xff & (x >> (4 * 8))) << (3 * 8))
+ | ((0xff & (x >> (5 * 8))) << (2 * 8))
+ | ((0xff & (x >> (6 * 8))) << (1 * 8))
+ | ((0xff & (x >> (7 * 8))) << (0 * 8));
+}
+
+static uint32_t w4rev(uint32_t const x)
+{
+ return ((0xff & (x >> (0 * 8))) << (3 * 8))
+ | ((0xff & (x >> (1 * 8))) << (2 * 8))
+ | ((0xff & (x >> (2 * 8))) << (1 * 8))
+ | ((0xff & (x >> (3 * 8))) << (0 * 8));
+}
+
+static uint32_t w2rev(uint16_t const x)
+{
+ return ((0xff & (x >> (0 * 8))) << (1 * 8))
+ | ((0xff & (x >> (1 * 8))) << (0 * 8));
+}
+
+static uint64_t w8nat(uint64_t const x)
+{
+ return x;
+}
+
+static uint32_t w4nat(uint32_t const x)
+{
+ return x;
+}
+
+static uint32_t w2nat(uint16_t const x)
+{
+ return x;
+}
+
+static uint64_t (*w8)(uint64_t);
+static uint32_t (*w)(uint32_t);
+static uint32_t (*w2)(uint16_t);
+
+/* Names of the sections that could contain calls to __aeabi_{u}idiv() */
+static int is_valid_section_name(char const *const txtname)
+{
+ return strcmp(".text", txtname) == 0 ||
+ strcmp(".ref.text", txtname) == 0 ||
+ strcmp(".sched.text", txtname) == 0 ||
+ strcmp(".spinlock.text", txtname) == 0 ||
+ strcmp(".irqentry.text", txtname) == 0 ||
+ strcmp(".kprobes.text", txtname) == 0 ||
+ strcmp(".text.unlikely", txtname) == 0 ||
+ strcmp(".init.text", txtname) == 0;
+}
+
+static uint32_t Elf32_r_sym(Elf32_Rel const *rp)
+{
+ return ELF32_R_SYM(w(rp->r_info));
+}
+
+static void append_section(uint32_t const *const mloc0,
+ uint32_t const *const mlocp,
+ Elf32_Rel const *const mrel0,
+ Elf32_Rel const *const mrelp,
+ char const *name,
+ unsigned int const rel_entsize,
+ unsigned int const symsec_sh_link,
+ uint32_t *name_offp,
+ uint32_t *tp,
+ unsigned *shnump
+ )
+{
+ Elf32_Shdr mcsec;
+ uint32_t name_off = *name_offp;
+ uint32_t t = *tp;
+ uint32_t loc_diff = (void *)mlocp - (void *)mloc0;
+ uint32_t rel_diff = (void *)mrelp - (void *)mrel0;
+ unsigned shnum = *shnump;
+
+ mcsec.sh_name = w((sizeof(Elf32_Rela) == rel_entsize) + strlen(".rel")
+ + name_off);
+ mcsec.sh_type = w(SHT_PROGBITS);
+ mcsec.sh_flags = w(SHF_ALLOC);
+ mcsec.sh_addr = 0;
+ mcsec.sh_offset = w(t);
+ mcsec.sh_size = w(loc_diff);
+ mcsec.sh_link = 0;
+ mcsec.sh_info = 0;
+ mcsec.sh_addralign = w(_size);
+ mcsec.sh_entsize = w(_size);
+ uwrite(fd_map, &mcsec, sizeof(mcsec));
+ t += loc_diff;
+
+ mcsec.sh_name = w(name_off);
+ mcsec.sh_type = (sizeof(Elf32_Rela) == rel_entsize)
+ ? w(SHT_RELA)
+ : w(SHT_REL);
+ mcsec.sh_flags = 0;
+ mcsec.sh_addr = 0;
+ mcsec.sh_offset = w(t);
+ mcsec.sh_size = w(rel_diff);
+ mcsec.sh_link = w(symsec_sh_link);
+ mcsec.sh_info = w(shnum);
+ mcsec.sh_addralign = w(_size);
+ mcsec.sh_entsize = w(rel_entsize);
+ uwrite(fd_map, &mcsec, sizeof(mcsec));
+ t += rel_diff;
+
+ shnum += 2;
+ name_off += strlen(name) + 1;
+
+ *tp = t;
+ *shnump = shnum;
+ *name_offp = name_off;
+}
+
+/*
+ * Append the new shstrtab, Elf32_Shdr[], __{udiv,idiv}_loc and their
+ * relocations.
+ */
+static void append_func(Elf32_Ehdr *const ehdr,
+ Elf32_Shdr *const shstr,
+ uint32_t const *const mloc0_u,
+ uint32_t const *const mlocp_u,
+ Elf32_Rel const *const mrel0_u,
+ Elf32_Rel const *const mrelp_u,
+ uint32_t const *const mloc0_i,
+ uint32_t const *const mlocp_i,
+ Elf32_Rel const *const mrel0_i,
+ Elf32_Rel const *const mrelp_i,
+ unsigned int const rel_entsize,
+ unsigned int const symsec_sh_link)
+{
+ /* Begin constructing output file */
+ char const *udiv_name = (sizeof(Elf32_Rela) == rel_entsize)
+ ? ".rela__udiv_loc"
+ : ".rel__udiv_loc";
+ char const *idiv_name = (sizeof(Elf32_Rela) == rel_entsize)
+ ? ".rela__idiv_loc"
+ : ".rel__idiv_loc";
+ unsigned old_shnum = w2(ehdr->e_shnum);
+ uint32_t const old_shoff = w(ehdr->e_shoff);
+ uint32_t const old_shstr_sh_size = w(shstr->sh_size);
+ uint32_t const old_shstr_sh_offset = w(shstr->sh_offset);
+ uint32_t new_e_shoff;
+ uint32_t t = w(shstr->sh_size);
+ uint32_t name_off = old_shstr_sh_size;
+ int udiv = 0, idiv = 0;
+ int num_sections;
+
+ if (mlocp_u != mloc0_u) {
+ t += 1 + strlen(udiv_name);
+ udiv = 1;
+ }
+ if (mlocp_i != mloc0_i) {
+ t += 1 + strlen(idiv_name);
+ idiv = 1;
+ }
+ num_sections = (udiv * 2) + (idiv * 2);
+
+ shstr->sh_size = w(t);
+ shstr->sh_offset = w(sb.st_size);
+ t += sb.st_size;
+ t += (_align & -t); /* word-byte align */
+ new_e_shoff = t;
+
+ /* body for new shstrtab */
+ ulseek(fd_map, sb.st_size, SEEK_SET);
+ uwrite(fd_map, old_shstr_sh_offset + (void *)ehdr, name_off);
+ if (udiv)
+ uwrite(fd_map, udiv_name, 1 + strlen(udiv_name));
+ if (idiv)
+ uwrite(fd_map, idiv_name, 1 + strlen(idiv_name));
+
+ /* old(modified) Elf32_Shdr table, word-byte aligned */
+ ulseek(fd_map, t, SEEK_SET);
+ t += sizeof(Elf32_Shdr) * old_shnum;
+ uwrite(fd_map, old_shoff + (void *)ehdr,
+ sizeof(Elf32_Shdr) * old_shnum);
+
+ t += num_sections * sizeof(Elf32_Shdr);
+
+ /* new sections __udiv_loc and .rel__udiv_loc */
+ if (udiv)
+ append_section(mloc0_u, mlocp_u, mrel0_u, mrelp_u, udiv_name,
+ rel_entsize, symsec_sh_link, &name_off, &t,
+ &old_shnum);
+
+ /* new sections __idiv_loc and .rel__idiv_loc */
+ if (idiv)
+ append_section(mloc0_i, mlocp_i, mrel0_i, mrelp_i, idiv_name,
+ rel_entsize, symsec_sh_link, &name_off, &t,
+ &old_shnum);
+
+ if (udiv) {
+ uwrite(fd_map, mloc0_u, (void *)mlocp_u - (void *)mloc0_u);
+ uwrite(fd_map, mrel0_u, (void *)mrelp_u - (void *)mrel0_u);
+ }
+ if (idiv) {
+ uwrite(fd_map, mloc0_i, (void *)mlocp_i - (void *)mloc0_i);
+ uwrite(fd_map, mrel0_i, (void *)mrelp_i - (void *)mrel0_i);
+ }
+
+ ehdr->e_shoff = w(new_e_shoff);
+ ehdr->e_shnum = w2(num_sections + w2(ehdr->e_shnum));
+ ulseek(fd_map, 0, SEEK_SET);
+ uwrite(fd_map, ehdr, sizeof(*ehdr));
+}
+
+static unsigned get_sym(Elf32_Sym const *const sym0,
+ Elf32_Rel const *relp,
+ char const *const str0, const char *find)
+{
+ unsigned sym = 0;
+ Elf32_Sym const *const symp = &sym0[Elf32_r_sym(relp)];
+ char const *symname = &str0[w(symp->st_name)];
+
+ if (strcmp(find, symname) == 0)
+ sym = Elf32_r_sym(relp);
+
+ return sym;
+}
+
+static void get_sym_str_and_relp(Elf32_Shdr const *const relhdr,
+ Elf32_Ehdr const *const ehdr,
+ Elf32_Sym const **sym0,
+ char const **str0,
+ Elf32_Rel const **relp)
+{
+ Elf32_Shdr *const shdr0 = (Elf32_Shdr *)(w(ehdr->e_shoff)
+ + (void *)ehdr);
+ unsigned const symsec_sh_link = w(relhdr->sh_link);
+ Elf32_Shdr const *const symsec = &shdr0[symsec_sh_link];
+ Elf32_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
+ Elf32_Rel const *const rel0 = (Elf32_Rel const *)(w(relhdr->sh_offset)
+ + (void *)ehdr);
+
+ *sym0 = (Elf32_Sym const *)(w(symsec->sh_offset)
+ + (void *)ehdr);
+
+ *str0 = (char const *)(w(strsec->sh_offset)
+ + (void *)ehdr);
+
+ *relp = rel0;
+}
+
+static void add_relocation(Elf32_Rel const *relp, uint32_t *mloc0, uint32_t **mlocpp,
+ uint32_t const recval, unsigned const recsym,
+ Elf32_Rel **const mrelpp, unsigned offbase,
+ unsigned rel_entsize)
+{
+ uint32_t *mlocp = *mlocpp;
+ Elf32_Rel *mrelp = *mrelpp;
+ uint32_t const addend = w(w(relp->r_offset) - recval);
+ mrelp->r_offset = w(offbase + ((void *)mlocp - (void *)mloc0));
+ mrelp->r_info = w(ELF32_R_INFO(recsym, R_ARM_ABS32));
+ if (rel_entsize == sizeof(Elf32_Rela)) {
+ ((Elf32_Rela *)mrelp)->r_addend = addend;
+ *mlocp++ = 0;
+ } else
+ *mlocp++ = addend;
+
+ *mlocpp = mlocp;
+ *mrelpp = (Elf32_Rel *)(rel_entsize + (void *)mrelp);
+}
+
+/*
+ * Look at the relocations in order to find the calls to __aeabi_{u}idiv.
+ * Accumulate the section offsets that are found, and their relocation info,
+ * onto the end of the existing arrays.
+ */
+static void sift_relocations(uint32_t **mlocpp_u, uint32_t *mloc_base_u,
+ Elf32_Rel **const mrelpp_u,
+ uint32_t **mlocpp_i,
+ uint32_t *mloc_base_i,
+ Elf32_Rel **const mrelpp_i,
+ Elf32_Shdr const *const relhdr,
+ Elf32_Ehdr const *const ehdr,
+ unsigned const recsym, uint32_t const recval)
+{
+ uint32_t *mlocp_u = *mlocpp_u;
+ unsigned const offbase_u = (void *)mlocp_u - (void *)mloc_base_u;
+ uint32_t *const mloc0_u = mlocp_u;
+ Elf32_Rel *mrelp_u = *mrelpp_u;
+ uint32_t *mlocp_i = *mlocpp_i;
+ unsigned const offbase_i = (void *)mlocp_i - (void *)mloc_base_i;
+ uint32_t *const mloc0_i = mlocp_i;
+ Elf32_Rel *mrelp_i = *mrelpp_i;
+ Elf32_Sym const *sym0;
+ char const *str0;
+ Elf32_Rel const *relp;
+ unsigned rel_entsize = w(relhdr->sh_entsize);
+ unsigned const nrel = w(relhdr->sh_size) / rel_entsize;
+ unsigned udiv_sym = 0, idiv_sym = 0;
+ unsigned t;
+
+ get_sym_str_and_relp(relhdr, ehdr, &sym0, &str0, &relp);
+
+ for (t = nrel; t; --t) {
+ if (!udiv_sym)
+ udiv_sym = get_sym(sym0, relp, str0,
+ "__aeabi_uidiv");
+ if (!idiv_sym)
+ idiv_sym = get_sym(sym0, relp, str0,
+ "__aeabi_idiv");
+
+ switch (relp->r_info & 0xff) {
+ case R_ARM_PC24:
+ case R_ARM_CALL:
+ case R_ARM_JUMP24:
+ if (udiv_sym == Elf32_r_sym(relp)) {
+ add_relocation(relp, mloc0_u, &mlocp_u, recval,
+ recsym, &mrelp_u, offbase_u,
+ rel_entsize);
+ } else if (idiv_sym == Elf32_r_sym(relp)) {
+ add_relocation(relp, mloc0_i, &mlocp_i, recval,
+ recsym, &mrelp_i, offbase_i,
+ rel_entsize);
+ }
+ }
+
+ relp = (Elf32_Rel const *)(rel_entsize + (void *)relp);
+ }
+ *mrelpp_i = mrelp_i;
+ *mrelpp_u = mrelp_u;
+ *mlocpp_i = mlocp_i;
+ *mlocpp_u = mlocp_u;
+}
+
+/*
+ * Find a symbol in the given section, to be used as the base for relocating
+ * the table of offsets of calls. A local or global symbol suffices,
+ * but avoid a Weak symbol because it may be overridden; the change in value
+ * would invalidate the relocations of the offsets of the calls.
+ * Often the found symbol will be the unnamed local symbol generated by
+ * GNU 'as' for the start of each section. For example:
+ * Num: Value Size Type Bind Vis Ndx Name
+ * 2: 00000000 0 SECTION LOCAL DEFAULT 1
+ */
+static unsigned find_secsym_ndx(unsigned const txtndx,
+ char const *const txtname,
+ uint32_t *const recvalp,
+ Elf32_Shdr const *const symhdr,
+ Elf32_Ehdr const *const ehdr)
+{
+ Elf32_Sym const *const sym0 = (Elf32_Sym const *)(w(symhdr->sh_offset)
+ + (void *)ehdr);
+ unsigned const nsym = w(symhdr->sh_size) / w(symhdr->sh_entsize);
+ Elf32_Sym const *symp;
+ unsigned t;
+
+ for (symp = sym0, t = nsym; t; --t, ++symp) {
+ unsigned int const st_bind = ELF32_ST_BIND(symp->st_info);
+
+ if (txtndx == w2(symp->st_shndx)
+ /* avoid STB_WEAK */
+ && (STB_LOCAL == st_bind || STB_GLOBAL == st_bind)) {
+ /* function symbols on ARM have quirks, avoid them */
+ if (ELF32_ST_TYPE(symp->st_info) == STT_FUNC)
+ continue;
+
+ *recvalp = w(symp->st_value);
+ return symp - sym0;
+ }
+ }
+ fprintf(stderr, "Cannot find symbol for section %d: %s.\n",
+ txtndx, txtname);
+ fail_file();
+}
+
+
+/* Evade ISO C restriction: no declaration after statement in has_rel. */
+static char const *
+__has_rel(Elf32_Shdr const *const relhdr, /* is SHT_REL or SHT_RELA */
+ Elf32_Shdr const *const shdr0,
+ char const *const shstrtab,
+ char const *const fname)
+{
+ /* .sh_info depends on .sh_type == SHT_REL[,A] */
+ Elf32_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
+ char const *const txtname = &shstrtab[w(txthdr->sh_name)];
+
+ if (strcmp("__idiv_loc", txtname) == 0 ||
+ strcmp("__udiv_loc", txtname) == 0)
+ succeed_file();
+ if (w(txthdr->sh_type) != SHT_PROGBITS ||
+ !(w(txthdr->sh_flags) & SHF_EXECINSTR))
+ return NULL;
+ return txtname;
+}
+
+static char const *has_rel(Elf32_Shdr const *const relhdr,
+ Elf32_Shdr const *const shdr0,
+ char const *const shstrtab,
+ char const *const fname)
+{
+ if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
+ return NULL;
+ return __has_rel(relhdr, shdr0, shstrtab, fname);
+}
+
+
+static unsigned tot_relsize(Elf32_Shdr const *const shdr0,
+ unsigned nhdr,
+ const char *const shstrtab,
+ const char *const fname)
+{
+ unsigned totrelsz = 0;
+ Elf32_Shdr const *shdrp = shdr0;
+ char const *txtname;
+
+ for (; nhdr; --nhdr, ++shdrp) {
+ txtname = has_rel(shdrp, shdr0, shstrtab, fname);
+ if (txtname && is_valid_section_name(txtname))
+ totrelsz += w(shdrp->sh_size);
+ }
+ return totrelsz;
+}
+
+
+/* Overall supervision for Elf32 ET_REL file. */
+static void
+do_func(Elf32_Ehdr *const ehdr, char const *const fname)
+{
+ Elf32_Shdr *const shdr0 = (Elf32_Shdr *)(w(ehdr->e_shoff)
+ + (void *)ehdr);
+ unsigned const nhdr = w2(ehdr->e_shnum);
+ Elf32_Shdr *const shstr = &shdr0[w2(ehdr->e_shstrndx)];
+ char const *const shstrtab = (char const *)(w(shstr->sh_offset)
+ + (void *)ehdr);
+
+ Elf32_Shdr const *relhdr;
+ unsigned k;
+
+ /* Upper bound on space: assume all relevant relocs are valid. */
+ unsigned const totrelsz = tot_relsize(shdr0, nhdr, shstrtab, fname);
+ Elf32_Rel *const mrel0_u = umalloc(totrelsz);
+ Elf32_Rel * mrelp_u = mrel0_u;
+
+ /* 2*sizeof(address) <= sizeof(Elf32_Rel) */
+ uint32_t *const mloc0_u = umalloc(totrelsz>>1);
+ uint32_t * mlocp_u = mloc0_u;
+
+ Elf32_Rel *const mrel0_i = umalloc(totrelsz);
+ Elf32_Rel * mrelp_i = mrel0_i;
+
+ /* 2*sizeof(address) <= sizeof(Elf32_Rel) */
+ uint32_t *const mloc0_i = umalloc(totrelsz>>1);
+ uint32_t * mlocp_i = mloc0_i;
+
+ unsigned rel_entsize = 0;
+ unsigned symsec_sh_link = 0;
+
+ for (relhdr = shdr0, k = nhdr; k; --k, ++relhdr) {
+ char const *const txtname = has_rel(relhdr, shdr0, shstrtab,
+ fname);
+ if (txtname && is_valid_section_name(txtname)) {
+ uint32_t recval = 0;
+ unsigned const recsym = find_secsym_ndx(
+ w(relhdr->sh_info), txtname, &recval,
+ &shdr0[symsec_sh_link = w(relhdr->sh_link)],
+ ehdr);
+
+ rel_entsize = w(relhdr->sh_entsize);
+ sift_relocations(&mlocp_u, mloc0_u, &mrelp_u,
+ &mlocp_i, mloc0_i, &mrelp_i,
+ relhdr, ehdr, recsym, recval);
+ }
+ }
+ if (mloc0_u != mlocp_u || mloc0_i != mlocp_i) {
+ append_func(ehdr, shstr, mloc0_u, mlocp_u, mrel0_u, mrelp_u,
+ mloc0_i, mlocp_i, mrel0_i, mrelp_i,
+ rel_entsize, symsec_sh_link);
+ }
+ free(mrel0_u);
+ free(mloc0_u);
+ free(mrel0_i);
+ free(mloc0_i);
+}
+
+static void
+do_file(char const *const fname)
+{
+ Elf32_Ehdr *const ehdr = mmap_file(fname);
+
+ ehdr_curr = ehdr;
+ w = w4nat;
+ w2 = w2nat;
+ w8 = w8nat;
+ switch (ehdr->e_ident[EI_DATA]) {
+ static unsigned int const endian = 1;
+ default:
+ fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
+ ehdr->e_ident[EI_DATA], fname);
+ fail_file();
+ break;
+ case ELFDATA2LSB:
+ if (*(unsigned char const *)&endian != 1) {
+ /* main() is big endian, file.o is little endian. */
+ w = w4rev;
+ w2 = w2rev;
+ w8 = w8rev;
+ }
+ break;
+ case ELFDATA2MSB:
+ if (*(unsigned char const *)&endian != 0) {
+ /* main() is little endian, file.o is big endian. */
+ w = w4rev;
+ w2 = w2rev;
+ w8 = w8rev;
+ }
+ break;
+ } /* end switch */
+ if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
+ || w2(ehdr->e_type) != ET_REL
+ || ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
+ fprintf(stderr, "unrecognized ET_REL file %s\n", fname);
+ fail_file();
+ }
+
+ if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
+ || w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
+ fprintf(stderr,
+ "unrecognized ET_REL file: %s\n", fname);
+ fail_file();
+ }
+ do_func(ehdr, fname);
+
+ cleanup();
+}
+
+int
+main(int argc, char *argv[])
+{
+ int n_error = 0; /* gcc-4.3.0 false positive complaint */
+ int i;
+
+ if (argc < 2) {
+ fprintf(stderr, "usage: recorduidiv file.o...\n");
+ return 0;
+ }
+
+ /* Process each file in turn, allowing deep failure. */
+ for (i = 1; i < argc; i++) {
+ char *file = argv[i];
+ int const sjval = setjmp(jmpenv);
+
+ switch (sjval) {
+ default:
+ fprintf(stderr, "internal error: %s\n", file);
+ exit(1);
+ break;
+ case SJ_SETJMP: /* normal sequence */
+ /* Avoid problems if early cleanup() */
+ fd_map = -1;
+ ehdr_curr = NULL;
+ mmap_failed = 1;
+ do_file(file);
+ break;
+ case SJ_FAIL: /* error in do_file or below */
+ ++n_error;
+ break;
+ case SJ_SUCCEED: /* premature success */
+ /* do nothing */
+ break;
+ } /* end switch */
+ }
+ return !!n_error;
+}

Nicolas Pitre

unread,

Nov 25, 2015, 6:24:27 PM11/25/15

to Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Russell King - ARM Linux, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Thomas Petazzoni

On Wed, 25 Nov 2015, Stephen Boyd wrote:

> The ARM compiler inserts calls to __aeabi_uidiv() and
> __aeabi_idiv() when it needs to perform division on signed and
> unsigned integers. If a processor has support for the udiv and
> sdiv division instructions the calls to these support routines
> can be replaced with those instructions. Now that recordmcount
> records the locations of calls to these library functions in
> two sections (one for udiv and one for sdiv), iterate over these
> sections early at boot and patch the call sites with the
> appropriate division instruction when we determine that the
> processor supports the division instructions. Using the division
> instructions should be faster and less power intensive than
> running the support code.

A few remarks:

1) The code assumes unconditional branches to __aeabi_idiv and
__aeabi_uidiv. What if there are conditional branches? Also, tail
call optimizations will generate a straight b opcode rather than a bl
and patching those will obviously have catastrophic results. I think
you should validate the presence of a bl before patching over it.

2) For those cases where a call to __aeabi_uidiv and __aeabi_idiv is not
patched due to (1), you could patch a uidiv/idiv plus "bx lr"
at those function entry points too.

3) In fact I was wondering if the overhead of the branch and back is
really significant compared to the non trivial cost of a idiv
instruction and all the complex infrastructure required to patch
those branches directly, and consequently if the performance
difference is actually worth it versus simply doing (2) alone.

4) Please add some printing to the boot log (debug level should be fine)
about the fact that you did modify n branch instances with a div
insn. That _could_ turn out to be a useful clue when debugging kernel
"strangeties".

Russell King - ARM Linux

unread,

Nov 25, 2015, 6:47:35 PM11/25/15

to Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Michal Marek

On Wed, Nov 25, 2015 at 01:51:03PM -0800, Stephen Boyd wrote:
> The ARM compiler inserts calls to __aeabi_uidiv() and
> __aeabi_idiv() when it needs to perform division on signed and
> unsigned integers. If a processor has support for the udiv and
> sdiv division instructions the calls to these support routines
> can be replaced with those instructions. Therefore, record the
> location of calls to these library functions into two sections
> (one for udiv and one for sdiv) similar to how we trace calls to
> mcount. When the kernel boots up it will check to see if the
> processor supports the instructions and then patch the call sites
> with the instruction.

Do we have any resolution on these programs which modify the object
files in-place, rather than breaking any hard-links which may be
present (eg, as a result of using ccache in hard-link mode) ?

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,

Nov 25, 2015, 7:06:15 PM11/25/15

to Nicolas Pitre, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Thomas Petazzoni

On Wed, Nov 25, 2015 at 06:09:13PM -0500, Nicolas Pitre wrote:
> 3) In fact I was wondering if the overhead of the branch and back is
> really significant compared to the non trivial cost of a idiv
> instruction and all the complex infrastructure required to patch
> those branches directly, and consequently if the performance
> difference is actually worth it versus simply doing (2) alone.

I definitely agree with you on this, given that modern CPUs which
are going to be benefitting from idiv are modern CPUs with a branch
predictor (and if it's not predicting such unconditional calls and
returns it's not much use as a branch predictor!)

I think what we need to see is the performance of existing kernels,
vs patching the idiv instructions at every callsite, vs patching
the called function itself.

Any reason the above aren't marked with __init_or_module as well, as
the compiler can choose not to inline them?

I'm left really concerned about this. We're modifying code with all
the caches on, and the above is not only missing any coherency of the
I/D paths, it's also missing any branch predictor maintanence. So, if
we've executed any divisions at this point, the predictor could already
predicted one of these branches that's being modified.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Måns Rullgård

unread,

Nov 25, 2015, 7:08:06 PM11/25/15

to Nicolas Pitre, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Russell King - ARM Linux, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

Nicolas Pitre <ni...@fluxnic.net> writes:

> On Wed, 25 Nov 2015, Stephen Boyd wrote:
>
>> The ARM compiler inserts calls to __aeabi_uidiv() and
>> __aeabi_idiv() when it needs to perform division on signed and
>> unsigned integers. If a processor has support for the udiv and
>> sdiv division instructions the calls to these support routines
>> can be replaced with those instructions. Now that recordmcount
>> records the locations of calls to these library functions in
>> two sections (one for udiv and one for sdiv), iterate over these
>> sections early at boot and patch the call sites with the
>> appropriate division instruction when we determine that the
>> processor supports the division instructions. Using the division
>> instructions should be faster and less power intensive than
>> running the support code.
>
> A few remarks:
>
> 1) The code assumes unconditional branches to __aeabi_idiv and
> __aeabi_uidiv. What if there are conditional branches? Also, tail
> call optimizations will generate a straight b opcode rather than a bl
> and patching those will obviously have catastrophic results. I think
> you should validate the presence of a bl before patching over it.

I did a quick check on a compiled kernel I had nearby, and there are no
conditional or tail calls to those functions, so although they should
obviously be checked for correctness, performance is unlikely to matter
for those.

However, there are almost half as many calls to __aeabi_{u}idivmod as to
the plain div functions, 129 vs 228 for signed and unsigned combined.
For best results, these functions should also be patched with the
hardware instructions. Obviously the call sites for these can't be
patched.

> 2) For those cases where a call to __aeabi_uidiv and __aeabi_idiv is not
> patched due to (1), you could patch a uidiv/idiv plus "bx lr"
> at those function entry points too.
>
> 3) In fact I was wondering if the overhead of the branch and back is
> really significant compared to the non trivial cost of a idiv
> instruction and all the complex infrastructure required to patch
> those branches directly, and consequently if the performance
> difference is actually worth it versus simply doing (2) alone.

Depending on the operands, the div instruction can take as few as 3
cycles on a Cortex-A7.

--
Måns Rullgård
ma...@mansr.com

Russell King - ARM Linux

unread,

Nov 25, 2015, 7:09:47 PM11/25/15

to Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård, Thomas Petazzoni

On Wed, Nov 25, 2015 at 01:51:04PM -0800, Stephen Boyd wrote:
> diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
> index 85e374f873ac..48c77d422a0d 100644
> --- a/arch/arm/include/asm/cputype.h
> +++ b/arch/arm/include/asm/cputype.h
> @@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void)
>
> return 0;
> }
> +
> +static inline bool __attribute_const__ cpu_is_pj4_nomp(void)
> +{
> + return read_cpuid_part() == 0x56005810;

One other thing here, we really ought to avoid adding more magic numbers
to this file. We have the ARM processors nicely #defined, but not a lot
else. Maybe its time that we continued with the nice definitions for
these part numbers?

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Nicolas Pitre

unread,

Nov 25, 2015, 7:44:59 PM11/25/15

to Måns Rullgård, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Russell King - ARM Linux, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

I'm more worried about correctness than performance. This kind of
patching should be done defensively.

> However, there are almost half as many calls to __aeabi_{u}idivmod as to
> the plain div functions, 129 vs 228 for signed and unsigned combined.
> For best results, these functions should also be patched with the
> hardware instructions. Obviously the call sites for these can't be
> patched.

Current __aeabi_{u}idivmod implementations are simple wrappers around a
call to __aeabi_{u}idiv so they'd get the benefit automatically
regardless of the chosen approach.

> > 2) For those cases where a call to __aeabi_uidiv and __aeabi_idiv is not
> > patched due to (1), you could patch a uidiv/idiv plus "bx lr"
> > at those function entry points too.
> >
> > 3) In fact I was wondering if the overhead of the branch and back is
> > really significant compared to the non trivial cost of a idiv
> > instruction and all the complex infrastructure required to patch
> > those branches directly, and consequently if the performance
> > difference is actually worth it versus simply doing (2) alone.
>
> Depending on the operands, the div instruction can take as few as 3
> cycles on a Cortex-A7.

Even the current software based implementation can produce a result with
about 5 simple ALU instructions depending on the operands.

The average cycle count is more important than the easy-way-out case.
And then how significant the two branches around it are compared to idiv
alone from direct patching of every call to it.

Nicolas

Måns Rullgård

unread,

Nov 25, 2015, 7:50:55 PM11/25/15

to Nicolas Pitre, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Russell King - ARM Linux, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

If not calling the function saves an I-cache miss, the benefit can be
substantial. No, I have no proof of this being a problem, but it's
something that could happen.

Of course, none of this is going to be as good as letting the compiler
generate div instructions directly.

Russell King - ARM Linux

unread,

Nov 25, 2015, 8:29:28 PM11/25/15

to Måns Rullgård, Nicolas Pitre, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

On Thu, Nov 26, 2015 at 12:50:08AM +0000, Måns Rullgård wrote:
> If not calling the function saves an I-cache miss, the benefit can be
> substantial. No, I have no proof of this being a problem, but it's
> something that could happen.

That's a simplistic view of modern CPUs.

As I've already said, modern CPUs which have branch prediction, but
they also have speculative instruction fetching and speculative data
prefetching - which the CPUs which have idiv support will have.

With such features, the branch predictor is able to learn that the
branch will be taken, and because of the speculative instruction
fetching, it can bring the cache line in so that it has the
instructions it needs with minimal or, if working correctly,
without stalling the CPU pipeline.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Måns Rullgård

unread,

Nov 25, 2015, 9:20:07 PM11/25/15

to Russell King - ARM Linux, Nicolas Pitre, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

Russell King - ARM Linux <li...@arm.linux.org.uk> writes:

> On Thu, Nov 26, 2015 at 12:50:08AM +0000, Måns Rullgård wrote:
>> If not calling the function saves an I-cache miss, the benefit can be
>> substantial. No, I have no proof of this being a problem, but it's
>> something that could happen.
>
> That's a simplistic view of modern CPUs.
>
> As I've already said, modern CPUs which have branch prediction, but
> they also have speculative instruction fetching and speculative data
> prefetching - which the CPUs which have idiv support will have.
>
> With such features, the branch predictor is able to learn that the
> branch will be taken, and because of the speculative instruction
> fetching, it can bring the cache line in so that it has the
> instructions it needs with minimal or, if working correctly,
> without stalling the CPU pipeline.

It doesn't matter how many fancy features the CPU has. Executing more
branches and using more cache lines puts additional pressure on those
resources, reducing overall performance. Besides, the performance
counters readily show that the prediction is nothing near as perfect as
you seem to believe.

--
Måns Rullgård
ma...@mansr.com

Nicolas Pitre

unread,

Nov 26, 2015, 12:33:07 AM11/26/15

to Måns Rullgård, Russell King - ARM Linux, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

On Thu, 26 Nov 2015, Måns Rullgård wrote:

> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
>
> > On Thu, Nov 26, 2015 at 12:50:08AM +0000, Måns Rullgård wrote:
> >> If not calling the function saves an I-cache miss, the benefit can be
> >> substantial. No, I have no proof of this being a problem, but it's
> >> something that could happen.
> >
> > That's a simplistic view of modern CPUs.
> >
> > As I've already said, modern CPUs which have branch prediction, but
> > they also have speculative instruction fetching and speculative data
> > prefetching - which the CPUs which have idiv support will have.
> >
> > With such features, the branch predictor is able to learn that the
> > branch will be taken, and because of the speculative instruction
> > fetching, it can bring the cache line in so that it has the
> > instructions it needs with minimal or, if working correctly,
> > without stalling the CPU pipeline.
>
> It doesn't matter how many fancy features the CPU has. Executing more
> branches and using more cache lines puts additional pressure on those
> resources, reducing overall performance. Besides, the performance
> counters readily show that the prediction is nothing near as perfect as
> you seem to believe.

OK... Let's try to come up with actual numbers.

We know that letting gcc emit idiv by itself is the ultimate solution.
And it is free of maintenance on our side besides passing the
appropriate argument to gcc of course. So this is worth doing.

For the case where you have a set of target machines in your kernel that
may or may not have idiv, then the first step should be to patch
__aeabi_uidiv and __aeabi_idiv. This is a pretty small and simple
change that might turn out to be more than good enough. It is necessary
anyway as the full patching solution does not cover all cases.

Then, IMHO, it would be a good idea to get performance numbers to
compare that first step and the full patching solution. Of course the
full patching will yield better performance. It has to. But if the
difference is not significant enough, then it might not be worth
introducing the implied complexity into mainline. And it is not because
the approach is bad. In fact I think this is a very cool hack. But it
comes with a cost in maintenance and that cost has to be justified.

Just to have an idea, I produced the attached micro benchmark. I tested
on a TC2 forced to a single Cortex-A15 core and I got those results:

Testing INLINE_DIV ...

real 0m7.182s
user 0m7.170s
sys 0m0.000s

Testing PATCHED_DIV ...

real 0m7.181s
user 0m7.170s
sys 0m0.000s

Testing OUTOFLINE_DIV ...

real 0m7.181s
user 0m7.170s
sys 0m0.005s

Testing LIBGCC_DIV ...

real 0m18.659s
user 0m18.635s
sys 0m0.000s

As you can see, whether the div is inline or out-of-line, whether
arguments are moved into r0-r1 or not, makes no difference at all on a
Cortex-A15.

Now forcing it onto a Cortex-A7 core:

Testing INLINE_DIV ...

real 0m8.917s
user 0m8.895s
sys 0m0.005s

Testing PATCHED_DIV ...

real 0m11.666s
user 0m11.645s
sys 0m0.000s

Testing OUTOFLINE_DIV ...

real 0m13.065s
user 0m13.025s
sys 0m0.000s

Testing LIBGCC_DIV ...

real 0m51.815s
user 0m51.750s
sys 0m0.005s

So on A cortex-A7 the various overheads become visible. How significant
is it in practice with normal kernel usage? I don't know.

Nicolas

go

divtest.S

Måns Rullgård

unread,

Nov 26, 2015, 7:42:13 AM11/26/15

to Nicolas Pitre, Russell King - ARM Linux, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Michal Marek, linux-...@vger.kernel.org, Arnd Bergmann, Steven Rostedt, Thomas Petazzoni

Bear in mind that in a trivial test like this, everything fits in L1
caches and branch prediction works perfectly. It would be more
informative to measure the effect on a load that already has some cache
and branch prediction misses.

Michal Marek

unread,

Nov 30, 2015, 10:13:17 AM11/30/15

to Russell King - ARM Linux, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

On 2015-11-26 00:47, Russell King - ARM Linux wrote:
> On Wed, Nov 25, 2015 at 01:51:03PM -0800, Stephen Boyd wrote:
>> The ARM compiler inserts calls to __aeabi_uidiv() and
>> __aeabi_idiv() when it needs to perform division on signed and
>> unsigned integers. If a processor has support for the udiv and
>> sdiv division instructions the calls to these support routines
>> can be replaced with those instructions. Therefore, record the
>> location of calls to these library functions into two sections
>> (one for udiv and one for sdiv) similar to how we trace calls to
>> mcount. When the kernel boots up it will check to see if the
>> processor supports the instructions and then patch the call sites
>> with the instruction.
>
> Do we have any resolution on these programs which modify the object
> files in-place, rather than breaking any hard-links which may be
> present (eg, as a result of using ccache in hard-link mode) ?

Good point, but I do not think anybody is using CCACHE_HARDLINK with the
kernel. As the manpage says, it is going to confuse make, so the time
saved by ccache would be offset by make trying to recompile all *.c
files each time.

Michal

Russell King - ARM Linux

unread,

Nov 30, 2015, 10:32:49 AM11/30/15

to Michal Marek, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

On Mon, Nov 30, 2015 at 04:11:16PM +0100, Michal Marek wrote:
> On 2015-11-26 00:47, Russell King - ARM Linux wrote:
> > On Wed, Nov 25, 2015 at 01:51:03PM -0800, Stephen Boyd wrote:
> >> The ARM compiler inserts calls to __aeabi_uidiv() and
> >> __aeabi_idiv() when it needs to perform division on signed and
> >> unsigned integers. If a processor has support for the udiv and
> >> sdiv division instructions the calls to these support routines
> >> can be replaced with those instructions. Therefore, record the
> >> location of calls to these library functions into two sections
> >> (one for udiv and one for sdiv) similar to how we trace calls to
> >> mcount. When the kernel boots up it will check to see if the
> >> processor supports the instructions and then patch the call sites
> >> with the instruction.
> >
> > Do we have any resolution on these programs which modify the object
> > files in-place, rather than breaking any hard-links which may be
> > present (eg, as a result of using ccache in hard-link mode) ?
>
> Good point, but I do not think anybody is using CCACHE_HARDLINK with the
> kernel.

That's wrong then, because I've been using it for a very long time with
my nightly builds. :) Therefore, there is somebody!

> As the manpage says, it is going to confuse make, so the time
> saved by ccache would be offset by make trying to recompile all *.c
> files each time.

From what I've noticed, it makes a big difference when running nightly
builds. My nightly builds use O= and always build into an empty target
tree, so there are no old objects back-dated to confuse make.

Even if there were, make would spot that the object is older than the
source, and try to re-make the target again, at which point ccache
would re-hardlink the object after looking up the hashed preprocessed
source.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Michal Marek

unread,

Nov 30, 2015, 10:40:37 AM11/30/15

to Russell King - ARM Linux, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

On 2015-11-30 16:32, Russell King - ARM Linux wrote:
> On Mon, Nov 30, 2015 at 04:11:16PM +0100, Michal Marek wrote:
>> On 2015-11-26 00:47, Russell King - ARM Linux wrote:
>>> On Wed, Nov 25, 2015 at 01:51:03PM -0800, Stephen Boyd wrote:
>>>> The ARM compiler inserts calls to __aeabi_uidiv() and
>>>> __aeabi_idiv() when it needs to perform division on signed and
>>>> unsigned integers. If a processor has support for the udiv and
>>>> sdiv division instructions the calls to these support routines
>>>> can be replaced with those instructions. Therefore, record the
>>>> location of calls to these library functions into two sections
>>>> (one for udiv and one for sdiv) similar to how we trace calls to
>>>> mcount. When the kernel boots up it will check to see if the
>>>> processor supports the instructions and then patch the call sites
>>>> with the instruction.
>>>
>>> Do we have any resolution on these programs which modify the object
>>> files in-place, rather than breaking any hard-links which may be
>>> present (eg, as a result of using ccache in hard-link mode) ?
>>
>> Good point, but I do not think anybody is using CCACHE_HARDLINK with the
>> kernel.
>
> That's wrong then, because I've been using it for a very long time with
> my nightly builds. :) Therefore, there is somebody!

OK.

>> As the manpage says, it is going to confuse make, so the time
>> saved by ccache would be offset by make trying to recompile all *.c
>> files each time.
>
> From what I've noticed, it makes a big difference when running nightly
> builds. My nightly builds use O= and always build into an empty target
> tree, so there are no old objects back-dated to confuse make.
>
> Even if there were, make would spot that the object is older than the
> source, and try to re-make the target again, at which point ccache
> would re-hardlink the object after looking up the hashed preprocessed
> source.

That's what I meant. If you do a second make in a freshly built tree, it
will recompile all the files again. It will all be cache hits, but each
of the will be lot slower than comparing two timestamps. But in
throwaway build trees you do not care, I haven't considered that...

Michal

Michal Marek

unread,

Dec 1, 2015, 11:07:20 AM12/1/15

to Russell King - ARM Linux, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

On 2015-11-30 16:40, Michal Marek wrote:
> On 2015-11-30 16:32, Russell King - ARM Linux wrote:
>> On Mon, Nov 30, 2015 at 04:11:16PM +0100, Michal Marek wrote:
>>> On 2015-11-26 00:47, Russell King - ARM Linux wrote:
>>>> Do we have any resolution on these programs which modify the object
>>>> files in-place, rather than breaking any hard-links which may be
>>>> present (eg, as a result of using ccache in hard-link mode) ?
>>>
>>> Good point, but I do not think anybody is using CCACHE_HARDLINK with the
>>> kernel.
>>
>> That's wrong then, because I've been using it for a very long time with
>> my nightly builds. :) Therefore, there is somebody!
>
> OK.

So, both recordmcount and the new recordudiv program are idempotent.
They check if the to-be-added section is already present and do nothing.
So the result is correct even with CCACHE_HARDLINK, just the
intermediate file might be incorrect. If this still is considered an
issue, I suggest clearing CCACHE_HARDLINK when using any of these
postprocessors, so as not to penalize other use cases.

The perl recordmcount does not modify the file in place.

Russell King - ARM Linux

unread,

Dec 1, 2015, 11:20:08 AM12/1/15

to Michal Marek, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

On Tue, Dec 01, 2015 at 05:07:05PM +0100, Michal Marek wrote:
> On 2015-11-30 16:40, Michal Marek wrote:
> > On 2015-11-30 16:32, Russell King - ARM Linux wrote:
> >> On Mon, Nov 30, 2015 at 04:11:16PM +0100, Michal Marek wrote:
> >>> On 2015-11-26 00:47, Russell King - ARM Linux wrote:
> >>>> Do we have any resolution on these programs which modify the object
> >>>> files in-place, rather than breaking any hard-links which may be
> >>>> present (eg, as a result of using ccache in hard-link mode) ?
> >>>
> >>> Good point, but I do not think anybody is using CCACHE_HARDLINK with the
> >>> kernel.
> >>
> >> That's wrong then, because I've been using it for a very long time with
> >> my nightly builds. :) Therefore, there is somebody!
> >
> > OK.
>
> So, both recordmcount and the new recordudiv program are idempotent.
> They check if the to-be-added section is already present and do nothing.

They hardly "do nothing", as the (eg) recordmcount plasters the build
log with warnings. A solution to that would be to make recordmcount
silent if the section is already present.

> So the result is correct even with CCACHE_HARDLINK, just the
> intermediate file might be incorrect. If this still is considered an
> issue, I suggest clearing CCACHE_HARDLINK when using any of these
> postprocessors, so as not to penalize other use cases.

Another solution would be to have the top level make file unset the
CCACHE_HARDLINK environment variable if any of the options which enable
in-place editing of object files is enabled. Looking at the ccache
code, the environment variable has to be deleted from the environment
to turn off the option - and I'm not sure whether make can delete
environment variables. It certainly can override them, but I see
nothing in the info pages which suggests that environment variables
can be deleted by a makefile.

However, doing it outside of the kernel build system is likely error
prone especially as the kernel configuration options change and/or
their effect changes.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Michal Marek

unread,

Dec 1, 2015, 11:43:58 AM12/1/15

to Russell King - ARM Linux, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Steven Rostedt, Måns Rullgård

On 2015-12-01 17:19, Russell King - ARM Linux wrote:
> On Tue, Dec 01, 2015 at 05:07:05PM +0100, Michal Marek wrote:
>> So, both recordmcount and the new recordudiv program are idempotent.
>> They check if the to-be-added section is already present and do nothing.
>
> They hardly "do nothing", as the (eg) recordmcount plasters the build
> log with warnings. A solution to that would be to make recordmcount
> silent if the section is already present.

Right, there is a warning. The recorduidiv program exits silently.

>> So the result is correct even with CCACHE_HARDLINK, just the
>> intermediate file might be incorrect. If this still is considered an
>> issue, I suggest clearing CCACHE_HARDLINK when using any of these
>> postprocessors, so as not to penalize other use cases.
>
> Another solution would be to have the top level make file unset the
> CCACHE_HARDLINK environment variable if any of the options which enable
> in-place editing of object files is enabled.

This is what I meant, sorry for not being clear.

> Looking at the ccache
> code, the environment variable has to be deleted from the environment
> to turn off the option - and I'm not sure whether make can delete
> environment variables. It certainly can override them, but I see
> nothing in the info pages which suggests that environment variables
> can be deleted by a makefile.

unexport CCACHE_HARDLINK

will do the trick.

Michal

Steven Rostedt

unread,

Dec 1, 2015, 11:49:41 AM12/1/15

to Russell King - ARM Linux, Michal Marek, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Måns Rullgård

On Tue, 1 Dec 2015 16:19:44 +0000
Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> They hardly "do nothing", as the (eg) recordmcount plasters the build
> log with warnings. A solution to that would be to make recordmcount
> silent if the section is already present.

Note, that warning found plenty of bugs when modifications of the build
system was being done and broke recordmcount.c. I really don't want to
silent it.

But for some reason, your build is causing lots of warnings and not for
others. Perhaps we can add a "SILENT_RECORDMCOUNT" environment variable
and have it set when something like CCACHE_HARDLINK or whatever is
causing it to trigger when we don't care.

-- Steve

Russell King - ARM Linux

unread,

Dec 1, 2015, 12:10:35 PM12/1/15

to Steven Rostedt, Michal Marek, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Måns Rullgård

On Tue, Dec 01, 2015 at 11:49:29AM -0500, Steven Rostedt wrote:
> On Tue, 1 Dec 2015 16:19:44 +0000
> Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:
>
> > They hardly "do nothing", as the (eg) recordmcount plasters the build
> > log with warnings. A solution to that would be to make recordmcount
> > silent if the section is already present.
>
> Note, that warning found plenty of bugs when modifications of the build
> system was being done and broke recordmcount.c. I really don't want to
> silent it.
>
> But for some reason, your build is causing lots of warnings and not for
> others. Perhaps we can add a "SILENT_RECORDMCOUNT" environment variable
> and have it set when something like CCACHE_HARDLINK or whatever is
> causing it to trigger when we don't care.

The case is:

Build 1 runs with CCACHE_HARDLINK enabled.
- Each object ccache creates will be stored in ccache, and hard linked
into the throw-away object tree.
- recordmcount modifies in-place the object in the object tree, which
also modifies the object in the ccache repository.

The throw-away object tree is thrown away, and a new tree is created,
and the build re-run. It doesn't matter what CCACHE options are used,
the effect will now be the same:
- Each "hit" ccache object from the previous build will be linked or
copied to the new throw-away object tree.
- recordmcount will be re-run on the object, which now contains the
results of the previous recordmcount in-place modification. This
causes recordmcount to issue a warning.

There's two solutions to this: one is to disable CCACHE_HARDLINK for
all kernel builds which use in-place object modification. The other
solution is to avoid in-place object modification, instead doing a
read-write-rename.

I think I ought to ask another question though, before we decide what
to do. With recordmcount doing in-place object modification, what
happens if a SIGINT or similar is received half way through the
modification of an object? I would hope that make would delete the
object and not leave it around.

Another suggestion - maybe recordmcount, which fstat()s the file,
should check the st_nlink before modifying the file, and error out
with a helpful error message telling people not to use hardlinks,
which would stop nasty surprises (and make it a rule that this should
be implemented as a general principle for good build behaviour) - iow,
something like this (untested):

scripts/recordmcount.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 698768bdc581..bb7589fd7392 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -203,6 +203,10 @@ static void *mmap_file(char const *fname)

fprintf(stderr, "not a regular file: %s\n", fname);

fail_file();
}
+ if (sb.st_nlink != 1) {
+ fprintf(stderr, "file is hard linked: %s\n", fname);
+ fail_file();
+ }

addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,

fd_map, 0);
mmap_failed = 0;

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Steven Rostedt

unread,

Dec 1, 2015, 12:22:25 PM12/1/15

to Russell King - ARM Linux, Michal Marek, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Måns Rullgård

On Tue, 1 Dec 2015 17:10:14 +0000
Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> Another suggestion - maybe recordmcount, which fstat()s the file,
> should check the st_nlink before modifying the file, and error out
> with a helpful error message telling people not to use hardlinks,
> which would stop nasty surprises (and make it a rule that this should
> be implemented as a general principle for good build behaviour) - iow,

Actually I like this solution the best.

> something like this (untested):

Can you test it to see if it gives you the error, otherwise I need to
set up a CCACHE_HARDLINK environment :-)

I guess another solution is to do a copy instead of modifying in place
if it detects the multiple hard link?

-- Steve

>
> scripts/recordmcount.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
> index 698768bdc581..bb7589fd7392 100644
> --- a/scripts/recordmcount.c
> +++ b/scripts/recordmcount.c
> @@ -203,6 +203,10 @@ static void *mmap_file(char const *fname)
> fprintf(stderr, "not a regular file: %s\n", fname);
> fail_file();
> }
> + if (sb.st_nlink != 1) {
> + fprintf(stderr, "file is hard linked: %s\n", fname);
> + fail_file();
> + }
> addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
> fd_map, 0);
> mmap_failed = 0;
>
>

--

Russell King - ARM Linux

unread,

Dec 1, 2015, 1:17:10 PM12/1/15

to Steven Rostedt, Michal Marek, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Måns Rullgård

On Tue, Dec 01, 2015 at 12:22:12PM -0500, Steven Rostedt wrote:
> On Tue, 1 Dec 2015 17:10:14 +0000
> Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:
>
> > Another suggestion - maybe recordmcount, which fstat()s the file,
> > should check the st_nlink before modifying the file, and error out
> > with a helpful error message telling people not to use hardlinks,
> > which would stop nasty surprises (and make it a rule that this should
> > be implemented as a general principle for good build behaviour) - iow,
>
> Actually I like this solution the best.
>
> > something like this (untested):
>
> Can you test it to see if it gives you the error, otherwise I need to
> set up a CCACHE_HARDLINK environment :-)

It does indeed:

CC init/do_mounts_initrd.o
file is hard linked: init/do_mounts_initrd.o
/home/rmk/git/linux-rmk/scripts/Makefile.build:258: recipe for target 'init/do_mounts_initrd.o' failed
make[2]: *** [init/do_mounts_initrd.o] Error 1

> I guess another solution is to do a copy instead of modifying in place
> if it detects the multiple hard link?

That would be the "transparent" solution. If you think it's worth
persuing, I'll have a go at fixing recordmcount to do that.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Michal Marek

unread,

Dec 1, 2015, 4:39:20 PM12/1/15

to Russell King - ARM Linux, Steven Rostedt, Stephen Boyd, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Petazzoni, linux-...@vger.kernel.org, Nicolas Pitre, Arnd Bergmann, Måns Rullgård

Dne 1.12.2015 v 19:16 Russell King - ARM Linux napsal(a):

> On Tue, Dec 01, 2015 at 12:22:12PM -0500, Steven Rostedt wrote:
>> I guess another solution is to do a copy instead of modifying in place
>> if it detects the multiple hard link?
>
> That would be the "transparent" solution. If you think it's worth
> persuing, I'll have a go at fixing recordmcount to do that.

But in terms of number of file copies, it would on par with disabling
CCACHE_HARDLINK from within the Makefile.

Michal

Russell King - ARM Linux

unread,

Dec 2, 2015, 5:24:10 AM12/2/15

to Steven Rostedt, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Tue, Dec 01, 2015 at 06:16:43PM +0000, Russell King - ARM Linux wrote:
> On Tue, Dec 01, 2015 at 12:22:12PM -0500, Steven Rostedt wrote:
> > I guess another solution is to do a copy instead of modifying in place
> > if it detects the multiple hard link?
>
> That would be the "transparent" solution. If you think it's worth
> persuing, I'll have a go at fixing recordmcount to do that.

Well, copying the file is easy - I've tested this and the linker
appears happy with the result:

scripts/recordmcount.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 698768bdc581..91705ef30402 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -211,6 +211,20 @@ static void *mmap_file(char const *fname)
addr = umalloc(sb.st_size);
uread(fd_map, addr, sb.st_size);

}
+ if (sb.st_nlink != 1) {

+ /* file is hard-linked, break the hard link */
+ close(fd_map);
+ if (unlink(fname) < 0) {

+ perror(fname);
+ fail_file();
+ }

+ fd_map = open(fname, O_RDWR | O_CREAT, sb.st_mode);
+ if (fd_map < 0) {

+ perror(fname);
+ fail_file();
+ }

+ uwrite(fd_map, addr, sb.st_size);
+ }
return addr;

Steven Rostedt

unread,

Dec 2, 2015, 9:05:37 AM12/2/15

to Russell King - ARM Linux, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Wed, 2 Dec 2015 10:23:39 +0000
Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> On Tue, Dec 01, 2015 at 06:16:43PM +0000, Russell King - ARM Linux wrote:
> > On Tue, Dec 01, 2015 at 12:22:12PM -0500, Steven Rostedt wrote:
> > > I guess another solution is to do a copy instead of modifying in place
> > > if it detects the multiple hard link?
> >
> > That would be the "transparent" solution. If you think it's worth
> > persuing, I'll have a go at fixing recordmcount to do that.
>
> Well, copying the file is easy - I've tested this and the linker
> appears happy with the result:

Want to make this into a proper patch and I'll start running it through
my tests?

-- Steve

Steven Rostedt

unread,

Dec 11, 2015, 9:31:39 AM12/11/15

to Russell King, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, 11 Dec 2015 12:09:03 +0000
Russell King <rmk+k...@arm.linux.org.uk> wrote:

> recordmcount edits the file in-place, which can cause problems when
> using ccache in hardlink mode. Arrange for recordmcount to break a
> hardlinked object.
>
> Signed-off-by: Russell King <rmk+k...@arm.linux.org.uk>
> ---
> Steven, sorry it took a while to get this out...

Should this be for stable, or is it fine to just add this to my 4.5
queue?

Russell King - ARM Linux

unread,

Dec 11, 2015, 9:46:02 AM12/11/15

to Steven Rostedt, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, Dec 11, 2015 at 09:31:25AM -0500, Steven Rostedt wrote:
> On Fri, 11 Dec 2015 12:09:03 +0000
> Russell King <rmk+k...@arm.linux.org.uk> wrote:
>
> > recordmcount edits the file in-place, which can cause problems when
> > using ccache in hardlink mode. Arrange for recordmcount to break a
> > hardlinked object.
> >
> > Signed-off-by: Russell King <rmk+k...@arm.linux.org.uk>
> > ---
> > Steven, sorry it took a while to get this out...
>
> Should this be for stable, or is it fine to just add this to my 4.5
> queue?

I thought you wanted to test it first - although I've been running with
this for a while now, my nightly builds have masked out the mcount
warning, and I suspect it'll take a while for ccache to purge itself
of the modified objects.

If you're happy to add a stable tag to it, then please do so.

--
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/

FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Steven Rostedt

unread,

Dec 11, 2015, 10:08:50 AM12/11/15

to Russell King - ARM Linux, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, 11 Dec 2015 14:45:41 +0000

Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> On Fri, Dec 11, 2015 at 09:31:25AM -0500, Steven Rostedt wrote:
> > On Fri, 11 Dec 2015 12:09:03 +0000
> > Russell King <rmk+k...@arm.linux.org.uk> wrote:
> >
> > > recordmcount edits the file in-place, which can cause problems when
> > > using ccache in hardlink mode. Arrange for recordmcount to break a
> > > hardlinked object.
> > >
> > > Signed-off-by: Russell King <rmk+k...@arm.linux.org.uk>
> > > ---
> > > Steven, sorry it took a while to get this out...
> >
> > Should this be for stable, or is it fine to just add this to my 4.5
> > queue?
>
> I thought you wanted to test it first - although I've been running with

You're right. I forgot I said that ;-)

> this for a while now, my nightly builds have masked out the mcount
> warning, and I suspect it'll take a while for ccache to purge itself
> of the modified objects.
>
> If you're happy to add a stable tag to it, then please do so.
>

I'm fine with you taking it too, but let me go ahead and run it through
my tests now. I'll let you know the results. Takes several hours.

Thanks,

-- Steve

Steven Rostedt

unread,

Dec 11, 2015, 1:10:42 PM12/11/15

to Russell King - ARM Linux, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, 11 Dec 2015 14:45:41 +0000

Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> On Fri, Dec 11, 2015 at 09:31:25AM -0500, Steven Rostedt wrote:
> > On Fri, 11 Dec 2015 12:09:03 +0000
> > Russell King <rmk+k...@arm.linux.org.uk> wrote:
> >
> > > recordmcount edits the file in-place, which can cause problems when
> > > using ccache in hardlink mode. Arrange for recordmcount to break a
> > > hardlinked object.
> > >
> > > Signed-off-by: Russell King <rmk+k...@arm.linux.org.uk>
> > > ---
> > > Steven, sorry it took a while to get this out...
> >
> > Should this be for stable, or is it fine to just add this to my 4.5
> > queue?
>
> I thought you wanted to test it first - although I've been running with
> this for a while now, my nightly builds have masked out the mcount
> warning, and I suspect it'll take a while for ccache to purge itself
> of the modified objects.
>
> If you're happy to add a stable tag to it, then please do so.
>

I ran it through most my tests (it's still running and is at 20 of 33
tests). If there was anything wrong with this patch, I'm sure one of my
tests would have crashed by now.

Do you want to take it, or shall I?

If you want to take it, you can add my:

Reviewed-by: Steven Rostedt <ros...@goodmis.org>

-- Steve

Russell King - ARM Linux

unread,

Dec 11, 2015, 1:33:50 PM12/11/15

to Steven Rostedt, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, Dec 11, 2015 at 01:10:29PM -0500, Steven Rostedt wrote:
> I ran it through most my tests (it's still running and is at 20 of 33
> tests). If there was anything wrong with this patch, I'm sure one of my
> tests would have crashed by now.

Thanks for testing.

> Do you want to take it, or shall I?

I'm easy - it probably makes more sense if take it.

--

FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Steven Rostedt

unread,

Dec 11, 2015, 1:51:29 PM12/11/15

to Russell King - ARM Linux, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, 11 Dec 2015 18:33:27 +0000

Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> On Fri, Dec 11, 2015 at 01:10:29PM -0500, Steven Rostedt wrote:
> > I ran it through most my tests (it's still running and is at 20 of 33
> > tests). If there was anything wrong with this patch, I'm sure one of my
> > tests would have crashed by now.
>
> Thanks for testing.
>
> > Do you want to take it, or shall I?
>
> I'm easy - it probably makes more sense if take it.
>

Heh, you got me hanging in suspense. You missed a word. Is it:

"makes more sense if *you* take it"

or

"makes more sense if *I* take it"

??

Or you could be replying to my sentence of "take it" or "shall I" in
which it would be *you* take it.

/me confused

-- Steve

Russell King - ARM Linux

unread,

Dec 11, 2015, 1:58:29 PM12/11/15

to Steven Rostedt, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, Dec 11, 2015 at 01:51:15PM -0500, Steven Rostedt wrote:
> On Fri, 11 Dec 2015 18:33:27 +0000
> Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:
>
> > On Fri, Dec 11, 2015 at 01:10:29PM -0500, Steven Rostedt wrote:
> > > I ran it through most my tests (it's still running and is at 20 of 33
> > > tests). If there was anything wrong with this patch, I'm sure one of my
> > > tests would have crashed by now.
> >
> > Thanks for testing.
> >
> > > Do you want to take it, or shall I?
> >
> > I'm easy - it probably makes more sense if take it.
> >
>
> Heh, you got me hanging in suspense. You missed a word. Is it:
>
> "makes more sense if *you* take it"
>
> or
>
> "makes more sense if *I* take it"
>
> ??

Oops, sorry. "makes more sense if *you* take it" was what I thought I
typed!

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Steven Rostedt

unread,

Dec 11, 2015, 2:28:53 PM12/11/15

to Russell King - ARM Linux, Michal Marek, Thomas Petazzoni, Måns Rullgård, Arnd Bergmann, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Nicolas Pitre, Stephen Boyd, linux-...@vger.kernel.org, linux-ar...@lists.infradead.org

On Fri, 11 Dec 2015 18:58:09 +0000
Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:

> Oops, sorry. "makes more sense if *you* take it" was what I thought I
> typed!
>

OK, will do.

Thanks!

-- Steve

Stephen Boyd

unread,

Jan 12, 2016, 8:51:16 PM1/12/16

to Russell King - ARM Linux, Arnd Bergmann, Nicolas Pitre, Peter Maydell, Måns Rullgård, linux-...@vger.kernel.org, Daniel Lezcano, lkml - Kernel Mailing List, Steven Rostedt, Christopher Covington, linux-ar...@lists.infradead.org

On 11/24, Russell King - ARM Linux wrote:
> On Tue, Nov 24, 2015 at 12:07:30PM -0800, Stephen Boyd wrote:
>
> > My understanding is that we want to support CPU_V7VE without
> > CPU_V7 enabled so that it uses the idiv instructions in that
> > configuration. When V7VE and V7 are both enabled, we should
> > degrade to the aeabi functions, and the same is true for when
> > V7VE is disabled.
>
> Let me have another look at this, it's been a while since I touched these
> options...
>

Any ideas on how to move forward on this? I don't have a problem
removing the duplication, but I'm not sure what else can be done.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

Stephen Boyd

Arnd Bergmann

Måns Rullgård

Arnd Bergmann

Måns Rullgård

Arnd Bergmann

Arnd Bergmann

Peter Maydell

Arnd Bergmann

Måns Rullgård

Russell King - ARM Linux

Arnd Bergmann

Russell King - ARM Linux

Måns Rullgård

Nicolas Pitre

Arnd Bergmann

Christopher Covington

Arnd Bergmann

Stephen Boyd

Arnd Bergmann

Stephen Boyd

Arnd Bergmann

Stephen Boyd

Stephen Boyd

Stephen Boyd

Arnd Bergmann

Russell King - ARM Linux

Arnd Bergmann

Russell King - ARM Linux

Russell King - ARM Linux

Måns Rullgård

Måns Rullgård

Russell King - ARM Linux

Måns Rullgård

Arnd Bergmann

Russell King - ARM Linux

Måns Rullgård

Stephen Boyd

Russell King - ARM Linux

Arnd Bergmann

Stephen Boyd

Arnd Bergmann

Stephen Boyd

Stephen Boyd

Stephen Boyd

Nicolas Pitre

Russell King - ARM Linux

Russell King - ARM Linux

Måns Rullgård

Russell King - ARM Linux

Nicolas Pitre

Måns Rullgård

Russell King - ARM Linux

Måns Rullgård

Nicolas Pitre

Måns Rullgård

Michal Marek

Russell King - ARM Linux

Michal Marek

Michal Marek

Russell King - ARM Linux

Michal Marek

Steven Rostedt

Russell King - ARM Linux

Steven Rostedt

Russell King - ARM Linux

Michal Marek

Russell King - ARM Linux

Steven Rostedt

Steven Rostedt

Russell King - ARM Linux

Steven Rostedt

Steven Rostedt

Russell King - ARM Linux

Steven Rostedt

Russell King - ARM Linux

Steven Rostedt

Stephen Boyd