[PATCH] crypto: ecc - Unbreak the build on arm with CONFIG_KASAN_STACK=y

5 views
Skip to first unread message

Lukas Wunner

unread,
Apr 8, 2026, 2:16:09 AMApr 8
to Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko
Andrew reports the following build breakage of arm allmodconfig,
reproducible with gcc 14.2.0 and 15.2.0:

crypto/ecc.c: In function 'ecc_point_mult':
crypto/ecc.c:1380:1: error: the frame size of 1360 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]

gcc excessively inlines functions called by ecc_point_mult() (without
there being any explicit inline declarations) and doesn't seem smart
enough to stay below CONFIG_FRAME_WARN.

clang does not exhibit the issue.

The issue only occurs with CONFIG_KASAN_STACK=y because it enlarges the
frame size. This has been a controversial topic a couple of times:

https://lore.kernel.org/r/CAK8P3a3_Tdc-XVPXrJ69j3S9...@mail.gmail.com/

Prevent gcc from going overboard with inlining to unbreak the build.
The maximum inline limit to avoid the error is 101. Use 100 to get a
nice round number per Andrew's preference.

Reported-by: Andrew Morton <ak...@linux-foundation.org> # off-list
Signed-off-by: Lukas Wunner <lu...@wunner.de>
---
crypto/Makefile | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/crypto/Makefile b/crypto/Makefile
index 04e269117589..b3ac7f29153e 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -181,6 +181,11 @@ obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
obj-$(CONFIG_CRYPTO_ECC) += ecc.o
obj-$(CONFIG_CRYPTO_ESSIV) += essiv.o

+# Avoid exceeding stack frame due to excessive gcc inlining in ecc_point_mult()
+ifeq ($(ARCH)$(CONFIG_KASAN_STACK)$(LLVM),army)
+CFLAGS_ecc.o += $(call cc-option,-finline-limit=100)
+endif
+
ecdh_generic-y += ecdh.o
ecdh_generic-y += ecdh_helper.o
obj-$(CONFIG_CRYPTO_ECDH) += ecdh_generic.o
--
2.51.0

Andy Shevchenko

unread,
Apr 8, 2026, 7:31:29 AMApr 8
to Lukas Wunner, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
On Wed, Apr 08, 2026 at 08:15:49AM +0200, Lukas Wunner wrote:
> Andrew reports the following build breakage of arm allmodconfig,
> reproducible with gcc 14.2.0 and 15.2.0:
>
> crypto/ecc.c: In function 'ecc_point_mult':
> crypto/ecc.c:1380:1: error: the frame size of 1360 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
>
> gcc excessively inlines functions called by ecc_point_mult() (without
> there being any explicit inline declarations) and doesn't seem smart
> enough to stay below CONFIG_FRAME_WARN.
>
> clang does not exhibit the issue.
>
> The issue only occurs with CONFIG_KASAN_STACK=y because it enlarges the
> frame size. This has been a controversial topic a couple of times:
>
> https://lore.kernel.org/r/CAK8P3a3_Tdc-XVPXrJ69j3S9...@mail.gmail.com/
>
> Prevent gcc from going overboard with inlining to unbreak the build.
> The maximum inline limit to avoid the error is 101. Use 100 to get a
> nice round number per Andrew's preference.

I think this is not the best solution. We still can refactor the code and avoid
being dependant to the (useful) kernel options.

--
With Best Regards,
Andy Shevchenko


Lukas Wunner

unread,
Apr 8, 2026, 9:36:50 AMApr 8
to Andy Shevchenko, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
Refactor how? Mark functions "noinline"? That may negatively impact
performance for everyone.

Note that this is a different kind of stack frame exhaustion than the one
in drivers/mtd/chips/cfi_cmdset_0001.c:do_write_buffer(): The latter
is a single function with lots of large local variables, whereas
ecc_point_mult() itself has a reasonable number of variables on the stack,
but gcc inlines numerous function calls that each increase the stack frame.

And gcc isn't smart enough to stop inlining when it reaches the maximum
stack frame size allowed by CONFIG_FRAME_WARN.

It's apparently a compiler bug. Why should we work around compiler bugs
by refactoring the code? The proposed patch instructs gcc to limit
inlining and we can easily remove that once the bug is fixed.

As Arnd explains in the above-linked message, stack frame exhaustion
in crypto/ tends to be caused by compiler bugs. There are already two
other workarounds for compiler bugs in crypto/Makefile, one for wp512.o
and another for serpent_generic.o. Amending CFLAGS is how we've dealt
with these issues in the past, not by refactoring code.

Thanks,

Lukas

Andy Shevchenko

unread,
Apr 8, 2026, 10:32:54 AMApr 8
to Lukas Wunner, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
Ah, that makes the difference, thanks for elaborating!

> And gcc isn't smart enough to stop inlining when it reaches the maximum
> stack frame size allowed by CONFIG_FRAME_WARN.
>
> It's apparently a compiler bug. Why should we work around compiler bugs
> by refactoring the code? The proposed patch instructs gcc to limit
> inlining and we can easily remove that once the bug is fixed.
>
> As Arnd explains in the above-linked message, stack frame exhaustion
> in crypto/ tends to be caused by compiler bugs. There are already two
> other workarounds for compiler bugs in crypto/Makefile, one for wp512.o
> and another for serpent_generic.o. Amending CFLAGS is how we've dealt
> with these issues in the past, not by refactoring code.

Yeah, that's the way we may deal with the issue.

Acked-by: Andy Shevchenko <andriy.s...@linux.intel.com>

Nathan Chancellor

unread,
Apr 8, 2026, 4:57:54 PMApr 8
to Lukas Wunner, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko
On Wed, Apr 08, 2026 at 08:15:49AM +0200, Lukas Wunner wrote:
Please use proper Kconfig variables here.

ifeq ($(CONFIG_ARM)$(CONFIG_KASAN_STACK)$(CONFIG_CC_IS_GCC),yyy)

Which is both more robust, as $(LLVM) may not be set but CC=clang could
be, and it is clearer (in my opinion). If all supported versions of GCC
support this flag, you could drop the cc-option at that point.

Arnd Bergmann

unread,
Apr 13, 2026, 11:43:04 AMApr 13
to Lukas Wunner, Andy Shevchenko, Herbert Xu, David S . Miller, Andrew Morton, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
On Wed, Apr 8, 2026, at 15:36, Lukas Wunner wrote:
> On Wed, Apr 08, 2026 at 02:31:21PM +0300, Andy Shevchenko wrote:
>> On Wed, Apr 08, 2026 at 08:15:49AM +0200, Lukas Wunner wrote:
>> > Prevent gcc from going overboard with inlining to unbreak the build.
>> > The maximum inline limit to avoid the error is 101. Use 100 to get a
>> > nice round number per Andrew's preference.

Have you checked if the total call chain gets a lower stack usage this
way? Usually the high stack usage is a sign of absolutely awful
code generation when the compiler runs into a corner case that
spills variables onto the stack instead of keeping them in registers.

The question is whether the lower inline limit causes the compiler
to not get into this state at all and produce the expected object
code, or if it just ends up producing multiple functions that
stay under the limit individually but have the same problems with
stack usage and performance as before.

I think your patch can be merged either way, but it would be
good to describe what type of problem we are hitting here.

>> I think this is not the best solution. We still can refactor the code
>> and avoid being dependant to the (useful) kernel options.
>
> Refactor how? Mark functions "noinline"? That may negatively impact
> performance for everyone.

I ran into the same issue last year and worked around it by
turning off kasan for this file, which of course is problematic
for other reasons, and I never submitted my hack for inclusion:

--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -176,6 +176,7 @@ obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
obj-$(CONFIG_CRYPTO_ECC) += ecc.o
+KASAN_SANITIZE_ecc.o = n
obj-$(CONFIG_CRYPTO_ESSIV) += essiv.o

ecdh_generic-y += ecdh.o

In principle this could be done on a per-function basis.

Arnd

Lukas Wunner

unread,
Apr 13, 2026, 3:46:41 PMApr 13
to Arnd Bergmann, Andy Shevchenko, Herbert Xu, David S . Miller, Andrew Morton, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
On Mon, Apr 13, 2026 at 05:42:39PM +0200, Arnd Bergmann wrote:
> On Wed, Apr 8, 2026, at 15:36, Lukas Wunner wrote:
> > On Wed, Apr 08, 2026 at 02:31:21PM +0300, Andy Shevchenko wrote:
> > > On Wed, Apr 08, 2026 at 08:15:49AM +0200, Lukas Wunner wrote:
> > > > Prevent gcc from going overboard with inlining to unbreak the build.
> > > > The maximum inline limit to avoid the error is 101. Use 100 to get a
> > > > nice round number per Andrew's preference.
>
> Have you checked if the total call chain gets a lower stack usage this
> way? Usually the high stack usage is a sign of absolutely awful
> code generation when the compiler runs into a corner case that
> spills variables onto the stack instead of keeping them in registers.
>
> The question is whether the lower inline limit causes the compiler
> to not get into this state at all and produce the expected object
> code, or if it just ends up producing multiple functions that
> stay under the limit individually but have the same problems with
> stack usage and performance as before.

Attached please find the Assembler output created by gcc -save-temps,
both the original version and the one with limited inlining.

The former requires a 1360 bytes stack frame, the latter 1232 bytes.
E.g. xycz_initial_double() is not inlined into ecc_point_mult(),
together with all its recursive baggage, so the latter version
contains two branch instructions to that function which the former
(original) version does not contain.

At the beginning of the function, it looks like the same register values
are stored to multiple locations on the stack. I assume that's what you
mean by awful code generation? This odd behavior seems more subdued in
the version with limited inlining.

> I think your patch can be merged either way, but it would be
> good to describe what type of problem we are hitting here.

I will respin and I will also take Nathan's suggestion into account.

Thanks,

Lukas
ecc_point_mult_orig.s
ecc_point_mult_limited_inlining.s

Arnd Bergmann

unread,
Apr 13, 2026, 4:32:47 PMApr 13
to Lukas Wunner, Andy Shevchenko, Herbert Xu, David S . Miller, Andrew Morton, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
On Mon, Apr 13, 2026, at 21:46, Lukas Wunner wrote:
> On Mon, Apr 13, 2026 at 05:42:39PM +0200, Arnd Bergmann wrote:
>> On Wed, Apr 8, 2026, at 15:36, Lukas Wunner wrote:
>
> Attached please find the Assembler output created by gcc -save-temps,
> both the original version and the one with limited inlining.
>
> The former requires a 1360 bytes stack frame, the latter 1232 bytes.
> E.g. xycz_initial_double() is not inlined into ecc_point_mult(),
> together with all its recursive baggage, so the latter version
> contains two branch instructions to that function which the former
> (original) version does not contain.

Thanks!

So it indeed appears that the problem does not go away but only
stays below the arbitrary threshold of 1280 bytes (which was
recently raised). I would not trust that to actually be the
case across all architectures then, as there are some targets
like mips or parisc tend to use even more stack space than
arm. With your current patch, that means there is a good chance
the problem will come back later.

> At the beginning of the function, it looks like the same register values
> are stored to multiple locations on the stack. I assume that's what you
> mean by awful code generation? This odd behavior seems more subdued in
> the version with limited inlining.

Right. As far as I can tell, the source code is heavily optimized
for performance, but with the sanitizer active this would likely
be several times slower, both from the actual sanitizing and
from the register spilling. I can see how the use of 'u64'
arrays makes this harder for a 32-bit target with limited
available registers.

Arnd

Lukas Wunner

unread,
Apr 14, 2026, 12:57:16 AMApr 14
to Arnd Bergmann, Andy Shevchenko, Herbert Xu, David S . Miller, Andrew Morton, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
On Mon, Apr 13, 2026 at 10:32:24PM +0200, Arnd Bergmann wrote:
> On Mon, Apr 13, 2026, at 21:46, Lukas Wunner wrote:
> > On Mon, Apr 13, 2026 at 05:42:39PM +0200, Arnd Bergmann wrote:
> > > On Wed, Apr 8, 2026, at 15:36, Lukas Wunner wrote:
> > Attached please find the Assembler output created by gcc -save-temps,
> > both the original version and the one with limited inlining.
> >
> > The former requires a 1360 bytes stack frame, the latter 1232 bytes.
> > E.g. xycz_initial_double() is not inlined into ecc_point_mult(),
> > together with all its recursive baggage, so the latter version
> > contains two branch instructions to that function which the former
> > (original) version does not contain.
>
> So it indeed appears that the problem does not go away but only
> stays below the arbitrary threshold of 1280 bytes (which was
> recently raised). I would not trust that to actually be the
> case across all architectures then, as there are some targets
> like mips or parisc tend to use even more stack space than
> arm. With your current patch, that means there is a good chance
> the problem will come back later.

The only 32-bit architectures with HAVE_ARCH_KASAN are:
arm powerpc xtensa

I've cross-compiled ecc.o successfully in an allmodconfig build for
powerpc and xtensa, so arm seems to be the only architecture affected
by the large stack frame issue.

Maybe mips and parisc will see the issue as well but they'd have to
support KASAN first.

The problem is that gcc *knows* that it should warn when the stack
goes above CONFIG_FRAME_WARN and that warning is even promoted to
an error, but gcc happily keeps inlining stuff and goes beyond that
limit. My expectation is it should stop inlining before that happens.
clang doesn't have the same problem.

Completely disabling KASAN for this file doesn't seem like a good option
as this is security-relevant code. On the other hand disabling inlining
for this file isn't great either because I recall Google is dogfooding
KASAN on internally used phones, I imagine it would ruin performance
for such use cases (granted those are likely arm64 devices).

*Limiting* inlining strikes a middle ground between those two extremes.

And I don't want to annotate individual functions as noinline only
because gcc does stupid things on a single architecture.

Thanks,

Lukas

David Laight

unread,
Apr 14, 2026, 6:26:06 AMApr 14
to Arnd Bergmann, Lukas Wunner, Andy Shevchenko, Herbert Xu, David S . Miller, Andrew Morton, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino
On Mon, 13 Apr 2026 22:32:24 +0200
"Arnd Bergmann" <ar...@arndb.de> wrote:

> On Mon, Apr 13, 2026, at 21:46, Lukas Wunner wrote:
> > On Mon, Apr 13, 2026 at 05:42:39PM +0200, Arnd Bergmann wrote:
> >> On Wed, Apr 8, 2026, at 15:36, Lukas Wunner wrote:
> >
> > Attached please find the Assembler output created by gcc -save-temps,
> > both the original version and the one with limited inlining.
> >
> > The former requires a 1360 bytes stack frame, the latter 1232 bytes.
> > E.g. xycz_initial_double() is not inlined into ecc_point_mult(),
> > together with all its recursive baggage, so the latter version
> > contains two branch instructions to that function which the former
> > (original) version does not contain.
>
> Thanks!
>
> So it indeed appears that the problem does not go away but only
> stays below the arbitrary threshold of 1280 bytes (which was
> recently raised). I would not trust that to actually be the
> case across all architectures then, as there are some targets
> like mips or parisc tend to use even more stack space than
> arm. With your current patch, that means there is a good chance
> the problem will come back later.

Not only that, the 'stack frome size' is just a proxy for total
stack use - which is a lot harder to calculate.
I've a cunning plan to use clangs function prototype hashing
to do a static stack calculation that includes indirect calls.
(I did one many years ago for some embedded code that had none.)
I suspect it will find all sorts of code paths that 'blow' the
kernel stack out of the water.
A good bet will be snprintf() calls in unusual error paths
(even after ignoring recursive snprintf() calls and all the %px
modifiers).

> > At the beginning of the function, it looks like the same register values
> > are stored to multiple locations on the stack. I assume that's what you
> > mean by awful code generation? This odd behavior seems more subdued in
> > the version with limited inlining.
>
> Right. As far as I can tell, the source code is heavily optimized
> for performance, but with the sanitizer active this would likely
> be several times slower, both from the actual sanitizing and
> from the register spilling. I can see how the use of 'u64'
> arrays makes this harder for a 32-bit target with limited
> available registers.

gcc make a right 'pigs breakfast' of handling u64 items on 32bit.
It gets really horrid on x86 (which has 8 registers including %sp
and %bp).
I got the impression it sometimes treats a u64 as being two 32bit
values, and other times as a 64bit value held in two registers.
The former tends to generate better code, but that latter happens
if an asm() block (or probably anything else) ends up with an 'A'
constraint for a value in %edx:%eax.
It will spill constant zero words to stack, and do multiplies by
values that are constant zero.
(I think the code generated for a single call to mul_64_64()
will show it all.)

I've just looked at that source.
It seems to be doing 'very wide' arithmetic using u64[].
That will be really horrid on 32bit - it needs to use u32[].

Stopping some of those function being inlined will help.
Even on 64bit I doubt it'll make that much difference to
overall performance.

David

>
> Arnd
>

Herbert Xu

unread,
May 5, 2026, 4:40:50 AMMay 5
to Lukas Wunner, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko
On Wed, Apr 08, 2026 at 08:15:49AM +0200, Lukas Wunner wrote:
> Andrew reports the following build breakage of arm allmodconfig,
> reproducible with gcc 14.2.0 and 15.2.0:
>
> crypto/ecc.c: In function 'ecc_point_mult':
> crypto/ecc.c:1380:1: error: the frame size of 1360 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
>
> gcc excessively inlines functions called by ecc_point_mult() (without
> there being any explicit inline declarations) and doesn't seem smart
> enough to stay below CONFIG_FRAME_WARN.
>
> clang does not exhibit the issue.
>
> The issue only occurs with CONFIG_KASAN_STACK=y because it enlarges the
> frame size. This has been a controversial topic a couple of times:
>
> https://lore.kernel.org/r/CAK8P3a3_Tdc-XVPXrJ69j3S9...@mail.gmail.com/
>
> Prevent gcc from going overboard with inlining to unbreak the build.
> The maximum inline limit to avoid the error is 101. Use 100 to get a
> nice round number per Andrew's preference.
>
> Reported-by: Andrew Morton <ak...@linux-foundation.org> # off-list
> Signed-off-by: Lukas Wunner <lu...@wunner.de>
> ---
> crypto/Makefile | 5 +++++
> 1 file changed, 5 insertions(+)

Patch applied. Thanks.
--
Email: Herbert Xu <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Lukas Wunner

unread,
May 6, 2026, 9:27:46 AMMay 6
to Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
Andrew reports build breakage of arm allmodconfig, reproducible with gcc
14.2.0 and 15.2.0:

crypto/ecc.c: In function 'ecc_point_mult':
crypto/ecc.c:1380:1: error: the frame size of 1360 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]

gcc aggressively inlines functions called by ecc_point_mult() (without
there being any explicit inline declarations), which pushes stack usage
close to the limit imposed by CONFIG_FRAME_WARN. allmodconfig implies
CONFIG_KASAN_STACK=y, which increases the stack above that limit.

In the bugzilla entry linked below, gcc maintainers explain that gcc
estimates extra stack usage caused by inlining, but ASAN instrumentation
is added in post-IPA passes and thus the inlining heuristics cannot
account for it.

It could be argued that -Werror=frame-larger-than=1280 instructs the
compiler to avoid inlining beyond that limit lest the build breaks,
which would imply gcc behaves incorrectly. But gcc maintainers reject
this notion and believe that a warning switch should never affect code
generation, even if it is promoted to an error.

One way to unbreak the build is to limit inlining via -finline-limit=100
or by explicitly declaring some functions noinline. However while it
does keep stack usage of individual functions below the limit, *total*
stack usage increases.

A longterm solution is to refactor ecc.c for reduced stack usage. It
currently performs ECC point multiplication with a Montgomery ladder
which uses co-Z (conjugate) addition to trade off memory for speed.
The algorithm is susceptible to timing attacks and needs to be replaced
with a constant time Montgomery ladder, which should consume less memory
and thus resolve the stack usage issue as a side effect.

In the interim, raise the limit for ecc.c, as is already done for
several other files in the source tree.

Constrain to gcc because clang 19.1.7 does not exhibit the issue. It
makes do with a 724 bytes stack frame even though it inlines almost the
same functions as gcc.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124949
Reported-by: Andrew Morton <ak...@linux-foundation.org> # off-list
Signed-off-by: Lukas Wunner <lu...@wunner.de>
Acked-by: Andy Shevchenko <andriy.s...@linux.intel.com>
---
Changes v1 -> v2:
* s/ARCH/CONFIG_ARM/, s/LLVM/CONFIG_CC_IS_GCC/ (Nathan)
* Add link to gcc bugzilla entry
* Rewrite commit message to include feedback provided by gcc maintainers
and explain high stack usage with algorithm choice

Link to v1:
https://lore.kernel.org/r/abfaede9ab2e963d784fb70598ed74...@wunner.de/

crypto/Makefile | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/crypto/Makefile b/crypto/Makefile
index 1622425..c73f4d5 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -178,6 +178,11 @@ obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
obj-$(CONFIG_CRYPTO_ECC) += ecc.o
obj-$(CONFIG_CRYPTO_ESSIV) += essiv.o

+# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124949
+ifeq ($(CONFIG_ARM)$(CONFIG_KASAN_STACK)$(CONFIG_CC_IS_GCC),yyy)
+CFLAGS_ecc.o += $(call cc-option,-Wframe-larger-than=1536)

Lukas Wunner

unread,
May 6, 2026, 9:41:26 AMMay 6
to Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko
On Tue, May 05, 2026 at 04:40:25PM +0800, Herbert Xu wrote:
> On Wed, Apr 08, 2026 at 08:15:49AM +0200, Lukas Wunner wrote:
> > Andrew reports the following build breakage of arm allmodconfig,
> > reproducible with gcc 14.2.0 and 15.2.0:
> >
> > crypto/ecc.c: In function 'ecc_point_mult':
> > crypto/ecc.c:1380:1: error: the frame size of 1360 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
> >
> > gcc excessively inlines functions called by ecc_point_mult() (without
> > there being any explicit inline declarations) and doesn't seem smart
> > enough to stay below CONFIG_FRAME_WARN.
> >
> > clang does not exhibit the issue.
> >
> > The issue only occurs with CONFIG_KASAN_STACK=y because it enlarges the
> > frame size. This has been a controversial topic a couple of times:
> >
> > https://lore.kernel.org/r/CAK8P3a3_Tdc-XVPXrJ69j3S9...@mail.gmail.com/
> >
> > Prevent gcc from going overboard with inlining to unbreak the build.
> > The maximum inline limit to avoid the error is 101. Use 100 to get a
> > nice round number per Andrew's preference.
> >
> > Reported-by: Andrew Morton <ak...@linux-foundation.org> # off-list
> > Signed-off-by: Lukas Wunner <lu...@wunner.de>
> > ---
> > crypto/Makefile | 5 +++++
> > 1 file changed, 5 insertions(+)
>
> Patch applied. Thanks.

My apologies Herbert, I was working on a v2 for this patch
but unfortunately didn't finish it until today:

https://lore.kernel.org/r/7e3d64a53efb28740b32d1f934e78c...@wunner.de/

Would it be possible for you to replace the patch you've already applied
with the new one? I am very sorry for the hassle.

Since submitting v1, I've opened a gcc bug to get feedback from gcc
maintainers. They're acknowledging a missing optimization here but
believe that a warning switch such as -Werror=frame-larger-than=1280
should never affect code generation, even if it is promoted to an
error. Basically if the user is asking to be warned but gcc inlines
beyond the limit, the user gets to keep the pieces:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124949

I took a closer look at the ECC point multiplication algorithm used,
it's relatively memory intensive because it uses co-Z addition as a
speedup:

https://eprint.iacr.org/2011/338.pdf

The algorithm is susceptible to timing attacks. Newer constant time
Montgomery ladder algorithms are not, use less memory (thus likely
avoiding the high stack usage warning) but are not as fast. I think
that's the proper longterm solution for this problem:

https://eprint.iacr.org/2020/956.pdf

In v2, I've amended the commit message with all that extra information
and I've also taken Nathan's review comments into account.

Thanks,

Lukas

Andy Shevchenko

unread,
May 6, 2026, 9:42:37 AMMay 6
to Lukas Wunner, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
Reviewed-by: Andy Shevchenko <andriy.s...@linux.intel.com>

...

> +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124949

Perhaps also mention the algo change as that one sounds to me even more
critical than this issue per se.

> +ifeq ($(CONFIG_ARM)$(CONFIG_KASAN_STACK)$(CONFIG_CC_IS_GCC),yyy)
> +CFLAGS_ecc.o += $(call cc-option,-Wframe-larger-than=1536)
> +endif

Lukas Wunner

unread,
May 6, 2026, 9:56:25 AMMay 6
to Andy Shevchenko, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
On Wed, May 06, 2026 at 04:42:25PM +0300, Andy Shevchenko wrote:
> On Wed, May 06, 2026 at 03:27:49PM +0200, Lukas Wunner wrote:
> > A longterm solution is to refactor ecc.c for reduced stack usage. It
> > currently performs ECC point multiplication with a Montgomery ladder
> > which uses co-Z (conjugate) addition to trade off memory for speed.
> > The algorithm is susceptible to timing attacks and needs to be replaced
> > with a constant time Montgomery ladder, which should consume less memory
> > and thus resolve the stack usage issue as a side effect.
[...]
> > +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124949
>
> Perhaps also mention the algo change as that one sounds to me even more
> critical than this issue per se.

Hm, but it's already mentioned above in the commit message?

Andy Shevchenko

unread,
May 6, 2026, 10:03:51 AMMay 6
to Lukas Wunner, Herbert Xu, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
Commit message != Makefile (or any other in-tree file).

But if you think that this is enough, I am not going to object, it would just
require a few steps to get that from the line in file.

Herbert Xu

unread,
May 7, 2026, 12:26:41 AM (14 days ago) May 7
to Lukas Wunner, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
On Wed, May 06, 2026 at 03:27:49PM +0200, Lukas Wunner wrote:
>
> Changes v1 -> v2:
> * s/ARCH/CONFIG_ARM/, s/LLVM/CONFIG_CC_IS_GCC/ (Nathan)
> * Add link to gcc bugzilla entry
> * Rewrite commit message to include feedback provided by gcc maintainers
> and explain high stack usage with algorithm choice
>
> Link to v1:
> https://lore.kernel.org/r/abfaede9ab2e963d784fb70598ed74...@wunner.de/
>
> crypto/Makefile | 5 +++++
> 1 file changed, 5 insertions(+)

Sorry but v1 has already been applied.

Cheers,

Herbert Xu

unread,
May 7, 2026, 4:11:33 AM (13 days ago) May 7
to Lukas Wunner, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko
On Wed, May 06, 2026 at 03:41:21PM +0200, Lukas Wunner wrote:
>
> My apologies Herbert, I was working on a v2 for this patch
> but unfortunately didn't finish it until today:
>
> https://lore.kernel.org/r/7e3d64a53efb28740b32d1f934e78c...@wunner.de/
>
> Would it be possible for you to replace the patch you've already applied
> with the new one? I am very sorry for the hassle.

OK I've backed it out for now.

Thanks,

Andy Shevchenko

unread,
May 7, 2026, 5:27:08 AM (13 days ago) May 7
to Herbert Xu, Lukas Wunner, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
On Thu, May 07, 2026 at 12:26:10PM +0800, Herbert Xu wrote:
> On Wed, May 06, 2026 at 03:27:49PM +0200, Lukas Wunner wrote:
> >
> > Changes v1 -> v2:
> > * s/ARCH/CONFIG_ARM/, s/LLVM/CONFIG_CC_IS_GCC/ (Nathan)
> > * Add link to gcc bugzilla entry
> > * Rewrite commit message to include feedback provided by gcc maintainers
> > and explain high stack usage with algorithm choice
> >
> > Link to v1:
> > https://lore.kernel.org/r/abfaede9ab2e963d784fb70598ed74...@wunner.de/
> >
> > crypto/Makefile | 5 +++++
> > 1 file changed, 5 insertions(+)
>
> Sorry but v1 has already been applied.

Does it make sense to revert and apply v2?

Herbert Xu

unread,
May 15, 2026, 6:22:04 AM (5 days ago) May 15
to Lukas Wunner, David S. Miller, Andrew Morton, Arnd Bergmann, Andrey Ryabinin, Ignat Korchagin, Stefan Berger, linux-...@vger.kernel.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andy Shevchenko, Eric Biggers, Nathan Chancellor, David Laight, Jason A. Donenfeld, Ard Biesheuvel
On Wed, May 06, 2026 at 03:27:49PM +0200, Lukas Wunner wrote:
Patch applied. Thanks.
Reply all
Reply to author
Forward
0 new messages