[PATCH v4 0/8] Generic IRQ entry/exit support for powerpc

3 views
Skip to first unread message

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:39:54 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Mukesh Kumar Chaurasiya
Adding support for the generic irq entry/exit handling for PowerPC. The
goal is to bring PowerPC in line with other architectures that already
use the common irq entry infrastructure, reducing duplicated code and
making it easier to share future changes in entry/exit paths.

This is slightly tested of ppc64le and ppc32.

The performance benchmarks are below:

perf bench syscall usec/op (-ve is improvement)

| Syscall | Base | test | change % |
| ------- | ----------- | ----------- | -------- |
| basic | 0.093543 | 0.093023 | -0.56 |
| execve | 446.557781 | 450.107172 | +0.79 |
| fork | 1142.204391 | 1156.377214 | +1.24 |
| getpgid | 0.097666 | 0.092677 | -5.11 |

perf bench syscall ops/sec (+ve is improvement)

| Syscall | Base | New | change % |
| ------- | -------- | -------- | -------- |
| basic | 10690548 | 10750140 | +0.56 |
| execve | 2239 | 2221 | -0.80 |
| fork | 875 | 864 | -1.26 |
| getpgid | 10239026 | 10790324 | +5.38 |


IPI latency benchmark (-ve is improvement)

| Metric | Base (ns) | New (ns) | % Change |
| -------------- | ------------- | ------------- | -------- |
| Dry run | 583136.56 | 584136.35 | 0.17% |
| Self IPI | 4167393.42 | 4149093.90 | -0.44% |
| Normal IPI | 61769347.82 | 61753728.39 | -0.03% |
| Broadcast IPI | 2235584825.02 | 2227521401.45 | -0.36% |
| Broadcast lock | 2164964433.31 | 2125658641.76 | -1.82% |


Thats very close to performance earlier with arch specific handling.

Tests done:
- Build and boot on ppc64le pseries.
- Build and boot on ppc64le powernv8 powernv9 powernv10.
- Build and boot on ppc32.
- Performance benchmark done with perf syscall basic on pseries.

Changelog:
V3 -> V4
- Fixed the issue in older gcc version where linker couldn't find
mem functions
- Merged IRQ enable and syscall enable into a single patch
- Cleanup for unused functions done in separate patch.
- Some other cosmetic changes
V3: https://lore.kernel.org/all/20251229045416.31...@linux.ibm.com/

V2 -> V3
- #ifdef CONFIG_GENERIC_IRQ_ENTRY removed from unnecessary places
- Some functions made __always_inline
- pt_regs padding changed to match 16byte interrupt stack alignment
- And some cosmetic changes from reviews from earlier patch
V2: https://lore.kernel.org/all/20251214130245.4...@linux.ibm.com/

V1 -> V2
- Fix an issue where context tracking was showing warnings for
incorrect context
V1: https://lore.kernel.org/all/20251102115358.17...@linux.ibm.com/

RFC -> PATCH V1
- Fix for ppc32 spitting out kuap lock warnings.
- ppc64le powernv8 crash fix.
- Review comments incorporated from previous RFC.
RFC https://lore.kernel.org/all/20250908210235.1...@linux.ibm.com/

Mukesh Kumar Chaurasiya (8):
powerpc: rename arch_irq_disabled_regs
powerpc: Prepare to build with generic entry/exit framework
powerpc: introduce arch_enter_from_user_mode
powerpc: Introduce syscall exit arch functions
powerpc: add exit_flags field in pt_regs
powerpc: Prepare for IRQ entry exit
powerpc: Enable GENERIC_ENTRY feature
powerpc: Remove unused functions

arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/entry-common.h | 533 ++++++++++++++++++++++++
arch/powerpc/include/asm/hw_irq.h | 4 +-
arch/powerpc/include/asm/interrupt.h | 386 +++--------------
arch/powerpc/include/asm/kasan.h | 15 +-
arch/powerpc/include/asm/ptrace.h | 6 +-
arch/powerpc/include/asm/signal.h | 1 -
arch/powerpc/include/asm/stacktrace.h | 6 +
arch/powerpc/include/asm/syscall.h | 5 +
arch/powerpc/include/asm/thread_info.h | 1 +
arch/powerpc/include/uapi/asm/ptrace.h | 14 +-
arch/powerpc/kernel/interrupt.c | 254 ++---------
arch/powerpc/kernel/ptrace/ptrace.c | 142 +------
arch/powerpc/kernel/signal.c | 25 +-
arch/powerpc/kernel/syscall.c | 119 +-----
arch/powerpc/kernel/traps.c | 2 +-
arch/powerpc/kernel/watchdog.c | 2 +-
arch/powerpc/perf/core-book3s.c | 2 +-
18 files changed, 690 insertions(+), 828 deletions(-)
create mode 100644 arch/powerpc/include/asm/entry-common.h

--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:06 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
From: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>

Rename arch_irq_disabled_regs() to regs_irqs_disabled() to align with the
naming used in the generic irqentry framework. This makes the function
available for use both in the PowerPC architecture code and in the
common entry/exit paths shared with other architectures.

This is a preparatory change for enabling the generic irqentry framework
on PowerPC.

Signed-off-by: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <ssh...@linux.ibm.com>
Reviewed-by: Jinjie Ruan <ruanj...@huawei.com>
---
arch/powerpc/include/asm/hw_irq.h | 4 ++--
arch/powerpc/include/asm/interrupt.h | 16 ++++++++--------
arch/powerpc/kernel/interrupt.c | 4 ++--
arch/powerpc/kernel/syscall.c | 2 +-
arch/powerpc/kernel/traps.c | 2 +-
arch/powerpc/kernel/watchdog.c | 2 +-
arch/powerpc/perf/core-book3s.c | 2 +-
7 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 9cd945f2acaf..b7eee6385ae5 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -393,7 +393,7 @@ static inline void do_hard_irq_enable(void)
__hard_irq_enable();
}

-static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
+static inline bool regs_irqs_disabled(struct pt_regs *regs)
{
return (regs->softe & IRQS_DISABLED);
}
@@ -466,7 +466,7 @@ static inline bool arch_irqs_disabled(void)

#define hard_irq_disable() arch_local_irq_disable()

-static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
+static inline bool regs_irqs_disabled(struct pt_regs *regs)
{
return !(regs->msr & MSR_EE);
}
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index eb0e4a20b818..0e2cddf8bd21 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -172,7 +172,7 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs)
/* Enable MSR[RI] early, to support kernel SLB and hash faults */
#endif

- if (!arch_irq_disabled_regs(regs))
+ if (!regs_irqs_disabled(regs))
trace_hardirqs_off();

if (user_mode(regs)) {
@@ -192,11 +192,11 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs)
CT_WARN_ON(ct_state() != CT_STATE_KERNEL &&
ct_state() != CT_STATE_IDLE);
INT_SOFT_MASK_BUG_ON(regs, is_implicit_soft_masked(regs));
- INT_SOFT_MASK_BUG_ON(regs, arch_irq_disabled_regs(regs) &&
- search_kernel_restart_table(regs->nip));
+ INT_SOFT_MASK_BUG_ON(regs, regs_irqs_disabled(regs) &&
+ search_kernel_restart_table(regs->nip));
}
- INT_SOFT_MASK_BUG_ON(regs, !arch_irq_disabled_regs(regs) &&
- !(regs->msr & MSR_EE));
+ INT_SOFT_MASK_BUG_ON(regs, !regs_irqs_disabled(regs) &&
+ !(regs->msr & MSR_EE));

booke_restore_dbcr0();
}
@@ -298,7 +298,7 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
* Adjust regs->softe to be soft-masked if it had not been
* reconcied (e.g., interrupt entry with MSR[EE]=0 but softe
* not yet set disabled), or if it was in an implicit soft
- * masked state. This makes arch_irq_disabled_regs(regs)
+ * masked state. This makes regs_irqs_disabled(regs)
* behave as expected.
*/
regs->softe = IRQS_ALL_DISABLED;
@@ -372,7 +372,7 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter

#ifdef CONFIG_PPC64
#ifdef CONFIG_PPC_BOOK3S
- if (arch_irq_disabled_regs(regs)) {
+ if (regs_irqs_disabled(regs)) {
unsigned long rst = search_kernel_restart_table(regs->nip);
if (rst)
regs_set_return_ip(regs, rst);
@@ -661,7 +661,7 @@ void replay_soft_interrupts(void);

static inline void interrupt_cond_local_irq_enable(struct pt_regs *regs)
{
- if (!arch_irq_disabled_regs(regs))
+ if (!regs_irqs_disabled(regs))
local_irq_enable();
}

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index e63bfde13e03..666eadb589a5 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -347,7 +347,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)
unsigned long ret;

BUG_ON(regs_is_unrecoverable(regs));
- BUG_ON(arch_irq_disabled_regs(regs));
+ BUG_ON(regs_irqs_disabled(regs));
CT_WARN_ON(ct_state() == CT_STATE_USER);

/*
@@ -396,7 +396,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)

local_irq_disable();

- if (!arch_irq_disabled_regs(regs)) {
+ if (!regs_irqs_disabled(regs)) {
/* Returning to a kernel context with local irqs enabled. */
WARN_ON_ONCE(!(regs->msr & MSR_EE));
again:
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index be159ad4b77b..9f03a6263fb4 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -32,7 +32,7 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)

BUG_ON(regs_is_unrecoverable(regs));
BUG_ON(!user_mode(regs));
- BUG_ON(arch_irq_disabled_regs(regs));
+ BUG_ON(regs_irqs_disabled(regs));

#ifdef CONFIG_PPC_PKEY
if (mmu_has_feature(MMU_FTR_PKEY)) {
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index cb8e9357383e..629f2a2d4780 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1956,7 +1956,7 @@ DEFINE_INTERRUPT_HANDLER_RAW(performance_monitor_exception)
* prevent hash faults on user addresses when reading callchains (and
* looks better from an irq tracing perspective).
*/
- if (IS_ENABLED(CONFIG_PPC64) && unlikely(arch_irq_disabled_regs(regs)))
+ if (IS_ENABLED(CONFIG_PPC64) && unlikely(regs_irqs_disabled(regs)))
performance_monitor_exception_nmi(regs);
else
performance_monitor_exception_async(regs);
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 2429cb1c7baa..6111cbbde069 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -373,7 +373,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
u64 tb;

/* should only arrive from kernel, with irqs disabled */
- WARN_ON_ONCE(!arch_irq_disabled_regs(regs));
+ WARN_ON_ONCE(!regs_irqs_disabled(regs));

if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
return 0;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 8b0081441f85..f7518b7e3055 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2482,7 +2482,7 @@ static void __perf_event_interrupt(struct pt_regs *regs)
* will trigger a PMI after waking up from idle. Since counter values are _not_
* saved/restored in idle path, can lead to below "Can't find PMC" message.
*/
- if (unlikely(!found) && !arch_irq_disabled_regs(regs))
+ if (unlikely(!found) && !regs_irqs_disabled(regs))
printk_ratelimited(KERN_WARNING "Can't find PMC that caused IRQ\n");

/*
--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:11 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
From: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>

This patch introduces preparatory changes needed to support building
PowerPC with the generic entry/exit (irqentry) framework.

The following infrastructure updates are added:
- Add a syscall_work field to struct thread_info to hold SYSCALL_WORK_* flags.
- Provide a stub implementation of arch_syscall_is_vdso_sigreturn(),
returning false for now.
- Introduce on_thread_stack() helper to detect if the current stack pointer
lies within the task’s kernel stack.

These additions enable later integration with the generic entry/exit
infrastructure while keeping existing PowerPC behavior unchanged.

No functional change is intended in this patch.

Signed-off-by: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>
---
arch/powerpc/include/asm/entry-common.h | 8 ++++++++
arch/powerpc/include/asm/stacktrace.h | 6 ++++++
arch/powerpc/include/asm/syscall.h | 5 +++++
arch/powerpc/include/asm/thread_info.h | 1 +
4 files changed, 20 insertions(+)
create mode 100644 arch/powerpc/include/asm/entry-common.h

diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
new file mode 100644
index 000000000000..05ce0583b600
--- /dev/null
+++ b/arch/powerpc/include/asm/entry-common.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_PPC_ENTRY_COMMON_H
+#define _ASM_PPC_ENTRY_COMMON_H
+
+#include <asm/stacktrace.h>
+
+#endif /* _ASM_PPC_ENTRY_COMMON_H */
diff --git a/arch/powerpc/include/asm/stacktrace.h b/arch/powerpc/include/asm/stacktrace.h
index 6149b53b3bc8..987f2e996262 100644
--- a/arch/powerpc/include/asm/stacktrace.h
+++ b/arch/powerpc/include/asm/stacktrace.h
@@ -10,4 +10,10 @@

void show_user_instructions(struct pt_regs *regs);

+static __always_inline bool on_thread_stack(void)
+{
+ return !(((unsigned long)(current->stack) ^ current_stack_pointer)
+ & ~(THREAD_SIZE - 1));
+}
+
#endif /* _ASM_POWERPC_STACKTRACE_H */
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index 4b3c52ed6e9d..834fcc4f7b54 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -139,4 +139,9 @@ static inline int syscall_get_arch(struct task_struct *task)
else
return AUDIT_ARCH_PPC64;
}
+
+static inline bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs)
+{
+ return false;
+}
#endif /* _ASM_SYSCALL_H */
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index b0f200aba2b3..9c8270354f0b 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -57,6 +57,7 @@ struct thread_info {
#ifdef CONFIG_SMP
unsigned int cpu;
#endif
+ unsigned long syscall_work; /* SYSCALL_WORK_ flags */
unsigned long local_flags; /* private flags for thread */
#ifdef CONFIG_LIVEPATCH_64
unsigned long *livepatch_sp;
--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:17 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
From: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>

Implement the arch_enter_from_user_mode() hook required by the generic
entry/exit framework. This helper prepares the CPU state when entering
the kernel from userspace, ensuring correct handling of KUAP/KUEP,
transactional memory, and debug register state.

This patch contains no functional changes, it is purely preparatory for
enabling the generic syscall and interrupt entry path on PowerPC.

Signed-off-by: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>
---
arch/powerpc/include/asm/entry-common.h | 118 ++++++++++++++++++++++++
1 file changed, 118 insertions(+)

diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
index 05ce0583b600..837a7e020e82 100644
--- a/arch/powerpc/include/asm/entry-common.h
+++ b/arch/powerpc/include/asm/entry-common.h
@@ -3,6 +3,124 @@
#ifndef _ASM_PPC_ENTRY_COMMON_H
#define _ASM_PPC_ENTRY_COMMON_H

+#include <asm/cputime.h>
+#include <asm/interrupt.h>
#include <asm/stacktrace.h>
+#include <asm/tm.h>
+
+static __always_inline void booke_load_dbcr0(void)
+{
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+ unsigned long dbcr0 = current->thread.debug.dbcr0;
+
+ if (likely(!(dbcr0 & DBCR0_IDM)))
+ return;
+
+ /*
+ * Check to see if the dbcr0 register is set up to debug.
+ * Use the internal debug mode bit to do this.
+ */
+ mtmsr(mfmsr() & ~MSR_DE);
+ if (IS_ENABLED(CONFIG_PPC32)) {
+ isync();
+ global_dbcr0[smp_processor_id()] = mfspr(SPRN_DBCR0);
+ }
+ mtspr(SPRN_DBCR0, dbcr0);
+ mtspr(SPRN_DBSR, -1);
+#endif
+}
+
+static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)
+{
+ kuap_lock();
+
+ if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+ BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
+
+ BUG_ON(regs_is_unrecoverable(regs));
+ BUG_ON(!user_mode(regs));
+ BUG_ON(regs_irqs_disabled(regs));
+
+#ifdef CONFIG_PPC_PKEY
+ if (mmu_has_feature(MMU_FTR_PKEY) && trap_is_syscall(regs)) {
+ unsigned long amr, iamr;
+ bool flush_needed = false;
+ /*
+ * When entering from userspace we mostly have the AMR/IAMR
+ * different from kernel default values. Hence don't compare.
+ */
+ amr = mfspr(SPRN_AMR);
+ iamr = mfspr(SPRN_IAMR);
+ regs->amr = amr;
+ regs->iamr = iamr;
+ if (mmu_has_feature(MMU_FTR_KUAP)) {
+ mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
+ flush_needed = true;
+ }
+ if (mmu_has_feature(MMU_FTR_BOOK3S_KUEP)) {
+ mtspr(SPRN_IAMR, AMR_KUEP_BLOCKED);
+ flush_needed = true;
+ }
+ if (flush_needed)
+ isync();
+ }
+#endif
+ kuap_assert_locked();
+ booke_restore_dbcr0();
+ account_cpu_user_entry();
+ account_stolen_time();
+
+ /*
+ * This is not required for the syscall exit path, but makes the
+ * stack frame look nicer. If this was initialised in the first stack
+ * frame, or if the unwinder was taught the first stack frame always
+ * returns to user with IRQS_ENABLED, this store could be avoided!
+ */
+ irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
+
+ /*
+ * If system call is called with TM active, set _TIF_RESTOREALL to
+ * prevent RFSCV being used to return to userspace, because POWER9
+ * TM implementation has problems with this instruction returning to
+ * transactional state. Final register values are not relevant because
+ * the transaction will be aborted upon return anyway. Or in the case
+ * of unsupported_scv SIGILL fault, the return state does not much
+ * matter because it's an edge case.
+ */
+ if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
+ unlikely(MSR_TM_TRANSACTIONAL(regs->msr)))
+ set_bits(_TIF_RESTOREALL, &current_thread_info()->flags);
+
+ /*
+ * If the system call was made with a transaction active, doom it and
+ * return without performing the system call. Unless it was an
+ * unsupported scv vector, in which case it's treated like an illegal
+ * instruction.
+ */
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ if (unlikely(MSR_TM_TRANSACTIONAL(regs->msr)) &&
+ !trap_is_unsupported_scv(regs)) {
+ /* Enable TM in the kernel, and disable EE (for scv) */
+ hard_irq_disable();
+ mtmsr(mfmsr() | MSR_TM);
+
+ /* tabort, this dooms the transaction, nothing else */
+ asm volatile(".long 0x7c00071d | ((%0) << 16)"
+ :: "r"(TM_CAUSE_SYSCALL | TM_CAUSE_PERSISTENT));
+
+ /*
+ * Userspace will never see the return value. Execution will
+ * resume after the tbegin. of the aborted transaction with the
+ * checkpointed register state. A context switch could occur
+ * or signal delivered to the process before resuming the
+ * doomed transaction context, but that should all be handled
+ * as expected.
+ */
+ return;
+ }
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+}
+
+#define arch_enter_from_user_mode arch_enter_from_user_mode

#endif /* _ASM_PPC_ENTRY_COMMON_H */
--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:30 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
From: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>

Add PowerPC-specific implementations of the generic syscall exit hooks
used by the generic entry/exit framework:

- arch_exit_to_user_mode_work_prepare()
- arch_exit_to_user_mode_work()

These helpers handle user state restoration when returning from the
kernel to userspace, including FPU/VMX/VSX state, transactional memory,
KUAP restore, and per-CPU accounting.

Additionally, move check_return_regs_valid() from interrupt.c to
interrupt.h so it can be shared by the new entry/exit logic.

No functional change is intended with this patch.

Signed-off-by: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>
---
arch/powerpc/include/asm/entry-common.h | 49 +++++++++++++++++++++++++
1 file changed, 49 insertions(+)

diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
index 837a7e020e82..ff0625e04778 100644
--- a/arch/powerpc/include/asm/entry-common.h
+++ b/arch/powerpc/include/asm/entry-common.h
@@ -6,6 +6,7 @@
#include <asm/cputime.h>
#include <asm/interrupt.h>
#include <asm/stacktrace.h>
+#include <asm/switch_to.h>
#include <asm/tm.h>

static __always_inline void booke_load_dbcr0(void)
@@ -123,4 +124,52 @@ static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)

#define arch_enter_from_user_mode arch_enter_from_user_mode

+static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
+ unsigned long ti_work)
+{
+ unsigned long mathflags;
+
+ if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && IS_ENABLED(CONFIG_PPC_FPU)) {
+ if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
+ unlikely((ti_work & _TIF_RESTORE_TM))) {
+ restore_tm_state(regs);
+ } else {
+ mathflags = MSR_FP;
+
+ if (cpu_has_feature(CPU_FTR_VSX))
+ mathflags |= MSR_VEC | MSR_VSX;
+ else if (cpu_has_feature(CPU_FTR_ALTIVEC))
+ mathflags |= MSR_VEC;
+
+ /*
+ * If userspace MSR has all available FP bits set,
+ * then they are live and no need to restore. If not,
+ * it means the regs were given up and restore_math
+ * may decide to restore them (to avoid taking an FP
+ * fault).
+ */
+ if ((regs->msr & mathflags) != mathflags)
+ restore_math(regs);
+ }
+ }
+
+ check_return_regs_valid(regs);
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ local_paca->tm_scratch = regs->msr;
+#endif
+ /* Restore user access locks last */
+ kuap_user_restore(regs);
+}
+
+#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
+
+static __always_inline void arch_exit_to_user_mode(void)
+{
+ booke_load_dbcr0();
+
+ account_cpu_user_exit();
+}
+
+#define arch_exit_to_user_mode arch_exit_to_user_mode
+

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:31 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
From: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>

Add a new field `exit_flags` in the pt_regs structure. This field will hold
the flags set during interrupt or syscall execution that are required during
exit to user mode.

Specifically, the `TIF_RESTOREALL` flag, stored in this field, helps the
exit routine determine if any NVGPRs were modified and need to be restored
before returning to userspace.

This addition ensures a clean and architecture-specific mechanism to track
per-syscall or per-interrupt state transitions related to register restore.

Changes:
- Add `exit_flags` and `__pt_regs_pad` to maintain 16-byte stack alignment
- Update asm-offsets.c and ptrace.c for offset and validation
- Update PT_* constants in uapi header to reflect the new layout

Signed-off-by: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>
---
arch/powerpc/include/asm/ptrace.h | 3 +++
arch/powerpc/include/uapi/asm/ptrace.h | 14 +++++++++-----
arch/powerpc/kernel/ptrace/ptrace.c | 1 +
3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 94aa1de2b06e..2e741ea57b80 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -53,6 +53,9 @@ struct pt_regs
unsigned long esr;
};
unsigned long result;
+ unsigned long exit_flags;
+ /* Maintain 16 byte interrupt stack alignment */
+ unsigned long __pt_regs_pad[3];
};
};
#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_KUAP)
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h
index 01e630149d48..a393b7f2760a 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -55,6 +55,8 @@ struct pt_regs
unsigned long dar; /* Fault registers */
unsigned long dsisr; /* on 4xx/Book-E used for ESR */
unsigned long result; /* Result of a system call */
+ unsigned long exit_flags; /* System call exit flags */
+ unsigned long __pt_regs_pad[3]; /* Maintain 16 byte interrupt stack alignment */
};

#endif /* __ASSEMBLER__ */
@@ -114,10 +116,12 @@ struct pt_regs
#define PT_DAR 41
#define PT_DSISR 42
#define PT_RESULT 43
-#define PT_DSCR 44
-#define PT_REGS_COUNT 44
+#define PT_EXIT_FLAGS 44
+#define PT_PAD 47 /* 3 times */
+#define PT_DSCR 48
+#define PT_REGS_COUNT 48

-#define PT_FPR0 48 /* each FP reg occupies 2 slots in this space */
+#define PT_FPR0 (PT_REGS_COUNT + 4) /* each FP reg occupies 2 slots in this space */

#ifndef __powerpc64__

@@ -129,7 +133,7 @@ struct pt_regs
#define PT_FPSCR (PT_FPR0 + 32) /* each FP reg occupies 1 slot in 64-bit space */


-#define PT_VR0 82 /* each Vector reg occupies 2 slots in 64-bit */
+#define PT_VR0 (PT_FPSCR + 2) /* <82> each Vector reg occupies 2 slots in 64-bit */
#define PT_VSCR (PT_VR0 + 32*2 + 1)
#define PT_VRSAVE (PT_VR0 + 33*2)

@@ -137,7 +141,7 @@ struct pt_regs
/*
* Only store first 32 VSRs here. The second 32 VSRs in VR0-31
*/
-#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR0 (PT_VRSAVE + 2) /* each VSR reg occupies 2 slots in 64-bit */
#define PT_VSR31 (PT_VSR0 + 2*31)
#endif /* __powerpc64__ */

diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
index c6997df63287..2134b6d155ff 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -432,6 +432,7 @@ void __init pt_regs_check(void)
CHECK_REG(PT_DAR, dar);
CHECK_REG(PT_DSISR, dsisr);
CHECK_REG(PT_RESULT, result);
+ CHECK_REG(PT_EXIT_FLAGS, exit_flags);
#undef CHECK_REG

BUILD_BUG_ON(PT_REGS_COUNT != sizeof(struct user_pt_regs) / sizeof(unsigned long));
--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:36 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
From: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>

Move interrupt entry and exit helper routines from interrupt.h into the
PowerPC-specific entry-common.h header as a preparatory step for enabling
the generic entry/exit framework.

This consolidation places all PowerPC interrupt entry/exit handling in a
single common header, aligning with the generic entry infrastructure.
The helpers provide architecture-specific handling for interrupt and NMI
entry/exit sequences, including:

- arch_interrupt_enter/exit_prepare()
- arch_interrupt_async_enter/exit_prepare()
- arch_interrupt_nmi_enter/exit_prepare()
- Supporting helpers such as nap_adjust_return(), check_return_regs_valid(),
debug register maintenance, and soft mask handling.

The functions are copied verbatim from interrupt.h.Subsequent patches will
integrate these routines into the generic entry/exit flow.

No functional change intended.

Signed-off-by: Mukesh Kumar Chaurasiya <mcha...@linux.ibm.com>
---
arch/powerpc/include/asm/entry-common.h | 358 ++++++++++++++++++++++++
1 file changed, 358 insertions(+)

diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
index ff0625e04778..de5601282755 100644
--- a/arch/powerpc/include/asm/entry-common.h
+++ b/arch/powerpc/include/asm/entry-common.h
@@ -5,10 +5,75 @@

#include <asm/cputime.h>
#include <asm/interrupt.h>
+#include <asm/runlatch.h>
#include <asm/stacktrace.h>
#include <asm/switch_to.h>
#include <asm/tm.h>

+#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
+/*
+ * WARN/BUG is handled with a program interrupt so minimise checks here to
+ * avoid recursion and maximise the chance of getting the first oops handled.
+ */
+#define INT_SOFT_MASK_BUG_ON(regs, cond) \
+do { \
+ if ((user_mode(regs) || (TRAP(regs) != INTERRUPT_PROGRAM))) \
+ BUG_ON(cond); \
+} while (0)
+#else
+#define INT_SOFT_MASK_BUG_ON(regs, cond)
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+extern char __end_soft_masked[];
+bool search_kernel_soft_mask_table(unsigned long addr);
+unsigned long search_kernel_restart_table(unsigned long addr);
+
+DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
+
+static inline bool is_implicit_soft_masked(struct pt_regs *regs)
+{
+ if (user_mode(regs))
+ return false;
+
+ if (regs->nip >= (unsigned long)__end_soft_masked)
+ return false;
+
+ return search_kernel_soft_mask_table(regs->nip);
+}
+
+static inline void srr_regs_clobbered(void)
+{
+ local_paca->srr_valid = 0;
+ local_paca->hsrr_valid = 0;
+}
+#else
+static inline unsigned long search_kernel_restart_table(unsigned long addr)
+{
+ return 0;
+}
+
+static inline bool is_implicit_soft_masked(struct pt_regs *regs)
+{
+ return false;
+}
+
+static inline void srr_regs_clobbered(void)
+{
+}
+#endif
+
+static inline void nap_adjust_return(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC_970_NAP
+ if (unlikely(test_thread_local_flags(_TLF_NAPPING))) {
+ /* Can avoid a test-and-clear because NMIs do not call this */
+ clear_thread_local_flags(_TLF_NAPPING);
+ regs_set_return_ip(regs, (unsigned long)power4_idle_nap_return);
+ }
+#endif
+}
+
static __always_inline void booke_load_dbcr0(void)
{
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
@@ -31,6 +96,299 @@ static __always_inline void booke_load_dbcr0(void)
#endif
}

+static inline void booke_restore_dbcr0(void)
+{
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+ unsigned long dbcr0 = current->thread.debug.dbcr0;
+
+ if (IS_ENABLED(CONFIG_PPC32) && unlikely(dbcr0 & DBCR0_IDM)) {
+ mtspr(SPRN_DBSR, -1);
+ mtspr(SPRN_DBCR0, global_dbcr0[smp_processor_id()]);
+ }
+#endif
+}
+
+static inline void check_return_regs_valid(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+ unsigned long trap, srr0, srr1;
+ static bool warned;
+ u8 *validp;
+ char *h;
+
+ if (trap_is_scv(regs))
+ return;
+
+ trap = TRAP(regs);
+ // EE in HV mode sets HSRRs like 0xea0
+ if (cpu_has_feature(CPU_FTR_HVMODE) && trap == INTERRUPT_EXTERNAL)
+ trap = 0xea0;
+
+ switch (trap) {
+ case 0x980:
+ case INTERRUPT_H_DATA_STORAGE:
+ case 0xe20:
+ case 0xe40:
+ case INTERRUPT_HMI:
+ case 0xe80:
+ case 0xea0:
+ case INTERRUPT_H_FAC_UNAVAIL:
+ case 0x1200:
+ case 0x1500:
+ case 0x1600:
+ case 0x1800:
+ validp = &local_paca->hsrr_valid;
+ if (!READ_ONCE(*validp))
+ return;
+
+ srr0 = mfspr(SPRN_HSRR0);
+ srr1 = mfspr(SPRN_HSRR1);
+ h = "H";
+
+ break;
+ default:
+ validp = &local_paca->srr_valid;
+ if (!READ_ONCE(*validp))
+ return;
+
+ srr0 = mfspr(SPRN_SRR0);
+ srr1 = mfspr(SPRN_SRR1);
+ h = "";
+ break;
+ }
+
+ if (srr0 == regs->nip && srr1 == regs->msr)
+ return;
+
+ /*
+ * A NMI / soft-NMI interrupt may have come in after we found
+ * srr_valid and before the SRRs are loaded. The interrupt then
+ * comes in and clobbers SRRs and clears srr_valid. Then we load
+ * the SRRs here and test them above and find they don't match.
+ *
+ * Test validity again after that, to catch such false positives.
+ *
+ * This test in general will have some window for false negatives
+ * and may not catch and fix all such cases if an NMI comes in
+ * later and clobbers SRRs without clearing srr_valid, but hopefully
+ * such things will get caught most of the time, statistically
+ * enough to be able to get a warning out.
+ */
+ if (!READ_ONCE(*validp))
+ return;
+
+ if (!data_race(warned)) {
+ data_race(warned = true);
+ pr_warn("%sSRR0 was: %lx should be: %lx\n", h, srr0, regs->nip);
+ pr_warn("%sSRR1 was: %lx should be: %lx\n", h, srr1, regs->msr);
+ show_regs(regs);
+ }
+
+ WRITE_ONCE(*validp, 0); /* fixup */
+#endif
+}
+
+static inline void arch_interrupt_enter_prepare(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC64
+ irq_soft_mask_set(IRQS_ALL_DISABLED);
+
+ /*
+ * If the interrupt was taken with HARD_DIS clear, then enable MSR[EE].
+ * Asynchronous interrupts get here with HARD_DIS set (see below), so
+ * this enables MSR[EE] for synchronous interrupts. IRQs remain
+ * soft-masked. The interrupt handler may later call
+ * interrupt_cond_local_irq_enable() to achieve a regular process
+ * context.
+ */
+ if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) {
+ INT_SOFT_MASK_BUG_ON(regs, !(regs->msr & MSR_EE));
+ __hard_irq_enable();
+ } else {
+ __hard_RI_enable();
+ }
+ /* Enable MSR[RI] early, to support kernel SLB and hash faults */
+#endif
+
+ if (!regs_irqs_disabled(regs))
+ trace_hardirqs_off();
+
+ if (user_mode(regs)) {
+ kuap_lock();
+ account_cpu_user_entry();
+ account_stolen_time();
+ } else {
+ kuap_save_and_lock(regs);
+ /*
+ * CT_WARN_ON comes here via program_check_exception,
+ * so avoid recursion.
+ */
+ if (TRAP(regs) != INTERRUPT_PROGRAM)
+ CT_WARN_ON(ct_state() != CT_STATE_KERNEL &&
+ ct_state() != CT_STATE_IDLE);
+ INT_SOFT_MASK_BUG_ON(regs, is_implicit_soft_masked(regs));
+ INT_SOFT_MASK_BUG_ON(regs, regs_irqs_disabled(regs) &&
+ search_kernel_restart_table(regs->nip));
+ }
+ INT_SOFT_MASK_BUG_ON(regs, !regs_irqs_disabled(regs) &&
+ !(regs->msr & MSR_EE));
+
+ booke_restore_dbcr0();
+}
+
+/*
+ * Care should be taken to note that arch_interrupt_exit_prepare and
+ * arch_interrupt_async_exit_prepare do not necessarily return immediately to
+ * regs context (e.g., if regs is usermode, we don't necessarily return to
+ * user mode). Other interrupts might be taken between here and return,
+ * context switch / preemption may occur in the exit path after this, or a
+ * signal may be delivered, etc.
+ *
+ * The real interrupt exit code is platform specific, e.g.,
+ * interrupt_exit_user_prepare / interrupt_exit_kernel_prepare for 64s.
+ *
+ * However arch_interrupt_nmi_exit_prepare does return directly to regs, because
+ * NMIs do not do "exit work" or replay soft-masked interrupts.
+ */
+static inline void arch_interrupt_exit_prepare(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ BUG_ON(regs_is_unrecoverable(regs));
+ BUG_ON(regs_irqs_disabled(regs));
+ /*
+ * We don't need to restore AMR on the way back to userspace for KUAP.
+ * AMR can only have been unlocked if we interrupted the kernel.
+ */
+ kuap_assert_locked();
+
+ local_irq_disable();
+ }
+}
+
+static inline void arch_interrupt_async_enter_prepare(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC64
+ /* Ensure arch_interrupt_enter_prepare does not enable MSR[EE] */
+ local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+#endif
+ arch_interrupt_enter_prepare(regs);
+#ifdef CONFIG_PPC_BOOK3S_64
+ /*
+ * RI=1 is set by arch_interrupt_enter_prepare, so this thread flags access
+ * has to come afterward (it can cause SLB faults).
+ */
+ if (cpu_has_feature(CPU_FTR_CTRL) &&
+ !test_thread_local_flags(_TLF_RUNLATCH))
+ __ppc64_runlatch_on();
+#endif
+}
+
+static inline void arch_interrupt_async_exit_prepare(struct pt_regs *regs)
+{
+ /*
+ * Adjust at exit so the main handler sees the true NIA. This must
+ * come before irq_exit() because irq_exit can enable interrupts, and
+ * if another interrupt is taken before nap_adjust_return has run
+ * here, then that interrupt would return directly to idle nap return.
+ */
+ nap_adjust_return(regs);
+
+ arch_interrupt_exit_prepare(regs);
+}
+
+struct interrupt_nmi_state {
+#ifdef CONFIG_PPC64
+ u8 irq_soft_mask;
+ u8 irq_happened;
+ u8 ftrace_enabled;
+ u64 softe;
+#endif
+};
+
+static inline bool nmi_disables_ftrace(struct pt_regs *regs)
+{
+ /* Allow DEC and PMI to be traced when they are soft-NMI */
+ if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
+ if (TRAP(regs) == INTERRUPT_DECREMENTER)
+ return false;
+ if (TRAP(regs) == INTERRUPT_PERFMON)
+ return false;
+ }
+ if (IS_ENABLED(CONFIG_PPC_BOOK3E_64)) {
+ if (TRAP(regs) == INTERRUPT_PERFMON)
+ return false;
+ }
+
+ return true;
+}
+
+static inline void arch_interrupt_nmi_enter_prepare(struct pt_regs *regs,
+ struct interrupt_nmi_state *state)
+{
+#ifdef CONFIG_PPC64
+ state->irq_soft_mask = local_paca->irq_soft_mask;
+ state->irq_happened = local_paca->irq_happened;
+ state->softe = regs->softe;
+
+ /*
+ * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
+ * the right thing, and set IRQ_HARD_DIS. We do not want to reconcile
+ * because that goes through irq tracing which we don't want in NMI.
+ */
+ local_paca->irq_soft_mask = IRQS_ALL_DISABLED;
+ local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+
+ if (!(regs->msr & MSR_EE) || is_implicit_soft_masked(regs)) {
+ /*
+ * Adjust regs->softe to be soft-masked if it had not been
+ * reconcied (e.g., interrupt entry with MSR[EE]=0 but softe
+ * not yet set disabled), or if it was in an implicit soft
+ * masked state. This makes regs_irqs_disabled(regs)
+ * behave as expected.
+ */
+ regs->softe = IRQS_ALL_DISABLED;
+ }
+
+ __hard_RI_enable();
+
+ /* Don't do any per-CPU operations until interrupt state is fixed */
+
+ if (nmi_disables_ftrace(regs)) {
+ state->ftrace_enabled = this_cpu_get_ftrace_enabled();
+ this_cpu_set_ftrace_enabled(0);
+ }
+#endif
+}
+
+static inline void arch_interrupt_nmi_exit_prepare(struct pt_regs *regs,
+ struct interrupt_nmi_state *state)
+{
+ /*
+ * nmi does not call nap_adjust_return because nmi should not create
+ * new work to do (must use irq_work for that).
+ */
+
+#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
+ if (regs_irqs_disabled(regs)) {
+ unsigned long rst = search_kernel_restart_table(regs->nip);
+
+ if (rst)
+ regs_set_return_ip(regs, rst);
+ }
+#endif
+
+ if (nmi_disables_ftrace(regs))
+ this_cpu_set_ftrace_enabled(state->ftrace_enabled);
+
+ /* Check we didn't change the pending interrupt mask. */
+ WARN_ON_ONCE((state->irq_happened | PACA_IRQ_HARD_DIS) != local_paca->irq_happened);
+ regs->softe = state->softe;
+ local_paca->irq_happened = state->irq_happened;
+ local_paca->irq_soft_mask = state->irq_soft_mask;
+#endif
+}
+
static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)
{
kuap_lock();
--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:44 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Mukesh Kumar Chaurasiya
Enable the generic IRQ entry/exit infrastructure on PowerPC by selecting
GENERIC_ENTRY and integrating the architecture-specific interrupt and
syscall handlers with the generic entry/exit APIs.

This change replaces PowerPC’s local interrupt entry/exit handling with
calls to the generic irqentry_* helpers, aligning the architecture with
the common kernel entry model. The macros that define interrupt, async,
and NMI handlers are updated to use irqentry_enter()/irqentry_exit()
and irqentry_nmi_enter()/irqentry_nmi_exit() where applicable also
convert the PowerPC syscall entry and exit paths to use the generic
entry/exit framework and integrating with the common syscall handling
routines.

Key updates include:
- The architecture now selects GENERIC_ENTRY in Kconfig.
- Replace interrupt_enter/exit_prepare() with arch_interrupt_* helpers.
- Integrate irqentry_enter()/exit() in standard and async interrupt paths.
- Integrate irqentry_nmi_enter()/exit() in NMI handlers.
- Remove redundant irq_enter()/irq_exit() calls now handled generically.
- Use irqentry_exit_cond_resched() for preemption checks.
- interrupt.c and syscall.c are simplified to delegate context
management and user exit handling to the generic entry path.
- The new pt_regs field `exit_flags` introduced earlier is now used
to carry per-syscall exit state flags (e.g. _TIF_RESTOREALL).
- Remove unused code.

This change establishes the necessary wiring for PowerPC to use the
generic IRQ entry/exit framework while maintaining existing semantics.
This aligns PowerPC with the common entry code used by other
architectures and reduces duplicated logic around syscall tracing,
context tracking, and signal handling.

The performance benchmarks from perf bench basic syscall are below:

perf bench syscall usec/op (-ve is improvement)

| Syscall | Base | test | change % |
| ------- | ----------- | ----------- | -------- |
| basic | 0.093543 | 0.093023 | -0.56 |
| execve | 446.557781 | 450.107172 | +0.79 |
| fork | 1142.204391 | 1156.377214 | +1.24 |
| getpgid | 0.097666 | 0.092677 | -5.11 |

perf bench syscall ops/sec (+ve is improvement)

| Syscall | Base | New | change % |
| ------- | -------- | -------- | -------- |
| basic | 10690548 | 10750140 | +0.56 |
| execve | 2239 | 2221 | -0.80 |
| fork | 875 | 864 | -1.26 |
| getpgid | 10239026 | 10790324 | +5.38 |

IPI latency benchmark (-ve is improvement)

| Metric | Base (ns) | New (ns) | % Change |
| -------------- | ------------- | ------------- | -------- |
| Dry run | 583136.56 | 584136.35 | 0.17% |
| Self IPI | 4167393.42 | 4149093.90 | -0.44% |
| Normal IPI | 61769347.82 | 61753728.39 | -0.03% |
| Broadcast IPI | 2235584825.02 | 2227521401.45 | -0.36% |
| Broadcast lock | 2164964433.31 | 2125658641.76 | -1.82% |

Thats very close to performance earlier with arch specific handling.

Signed-off-by: Mukesh Kumar Chaurasiya <mkch...@linux.ibm.com>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/interrupt.h | 384 +++++----------------------
arch/powerpc/include/asm/kasan.h | 15 +-
arch/powerpc/kernel/interrupt.c | 250 +++--------------
arch/powerpc/kernel/ptrace/ptrace.c | 3 -
arch/powerpc/kernel/signal.c | 8 +
arch/powerpc/kernel/syscall.c | 119 +--------
7 files changed, 124 insertions(+), 656 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9537a61ebae0..455dcc025eb9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -204,6 +204,7 @@ config PPC
select GENERIC_CPU_AUTOPROBE
select GENERIC_CPU_VULNERABILITIES if PPC_BARRIER_NOSPEC
select GENERIC_EARLY_IOREMAP
+ select GENERIC_ENTRY
select GENERIC_GETTIMEOFDAY
select GENERIC_IDLE_POLL_SETUP
select GENERIC_IOREMAP
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index 0e2cddf8bd21..fb42a664ae54 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -66,11 +66,9 @@

#ifndef __ASSEMBLER__

-#include <linux/context_tracking.h>
-#include <linux/hardirq.h>
-#include <asm/cputime.h>
-#include <asm/firmware.h>
-#include <asm/ftrace.h>
+#include <linux/sched/debug.h> /* for show_regs */
+#include <linux/irq-entry-common.h>
+
#include <asm/kprobes.h>
#include <asm/runlatch.h>

@@ -88,308 +86,6 @@ do { \
#define INT_SOFT_MASK_BUG_ON(regs, cond)
#endif

-#ifdef CONFIG_PPC_BOOK3S_64
-extern char __end_soft_masked[];
-bool search_kernel_soft_mask_table(unsigned long addr);
-unsigned long search_kernel_restart_table(unsigned long addr);
-
-DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
-
-static inline bool is_implicit_soft_masked(struct pt_regs *regs)
-{
- if (user_mode(regs))
- return false;
-
- if (regs->nip >= (unsigned long)__end_soft_masked)
- return false;
-
- return search_kernel_soft_mask_table(regs->nip);
-}
-
-static inline void srr_regs_clobbered(void)
-{
- local_paca->srr_valid = 0;
- local_paca->hsrr_valid = 0;
-}
-#else
-static inline unsigned long search_kernel_restart_table(unsigned long addr)
-{
- return 0;
-}
-
-static inline bool is_implicit_soft_masked(struct pt_regs *regs)
-{
- return false;
-}
-
-static inline void srr_regs_clobbered(void)
-{
-}
-#endif
-
-static inline void nap_adjust_return(struct pt_regs *regs)
-{
-#ifdef CONFIG_PPC_970_NAP
- if (unlikely(test_thread_local_flags(_TLF_NAPPING))) {
- /* Can avoid a test-and-clear because NMIs do not call this */
- clear_thread_local_flags(_TLF_NAPPING);
- regs_set_return_ip(regs, (unsigned long)power4_idle_nap_return);
- }
-#endif
-}
-
-static inline void booke_restore_dbcr0(void)
-{
-#ifdef CONFIG_PPC_ADV_DEBUG_REGS
- unsigned long dbcr0 = current->thread.debug.dbcr0;
-
- if (IS_ENABLED(CONFIG_PPC32) && unlikely(dbcr0 & DBCR0_IDM)) {
- mtspr(SPRN_DBSR, -1);
- mtspr(SPRN_DBCR0, global_dbcr0[smp_processor_id()]);
- }
-#endif
-}
-
-static inline void interrupt_enter_prepare(struct pt_regs *regs)
-{
-#ifdef CONFIG_PPC64
- irq_soft_mask_set(IRQS_ALL_DISABLED);
-
- /*
- * If the interrupt was taken with HARD_DIS clear, then enable MSR[EE].
- * Asynchronous interrupts get here with HARD_DIS set (see below), so
- * this enables MSR[EE] for synchronous interrupts. IRQs remain
- * soft-masked. The interrupt handler may later call
- * interrupt_cond_local_irq_enable() to achieve a regular process
- * context.
- */
- if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) {
- INT_SOFT_MASK_BUG_ON(regs, !(regs->msr & MSR_EE));
- __hard_irq_enable();
- } else {
- __hard_RI_enable();
- }
- /* Enable MSR[RI] early, to support kernel SLB and hash faults */
-#endif
-
- if (!regs_irqs_disabled(regs))
- trace_hardirqs_off();
-
- if (user_mode(regs)) {
- kuap_lock();
- CT_WARN_ON(ct_state() != CT_STATE_USER);
- user_exit_irqoff();
-
- account_cpu_user_entry();
- account_stolen_time();
- } else {
- kuap_save_and_lock(regs);
- /*
- * CT_WARN_ON comes here via program_check_exception,
- * so avoid recursion.
- */
- if (TRAP(regs) != INTERRUPT_PROGRAM)
- CT_WARN_ON(ct_state() != CT_STATE_KERNEL &&
- ct_state() != CT_STATE_IDLE);
- INT_SOFT_MASK_BUG_ON(regs, is_implicit_soft_masked(regs));
- INT_SOFT_MASK_BUG_ON(regs, regs_irqs_disabled(regs) &&
- search_kernel_restart_table(regs->nip));
- }
- INT_SOFT_MASK_BUG_ON(regs, !regs_irqs_disabled(regs) &&
- !(regs->msr & MSR_EE));
-
- booke_restore_dbcr0();
-}
-
-/*
- * Care should be taken to note that interrupt_exit_prepare and
- * interrupt_async_exit_prepare do not necessarily return immediately to
- * regs context (e.g., if regs is usermode, we don't necessarily return to
- * user mode). Other interrupts might be taken between here and return,
- * context switch / preemption may occur in the exit path after this, or a
- * signal may be delivered, etc.
- *
- * The real interrupt exit code is platform specific, e.g.,
- * interrupt_exit_user_prepare / interrupt_exit_kernel_prepare for 64s.
- *
- * However interrupt_nmi_exit_prepare does return directly to regs, because
- * NMIs do not do "exit work" or replay soft-masked interrupts.
- */
-static inline void interrupt_exit_prepare(struct pt_regs *regs)
-{
-}
-
-static inline void interrupt_async_enter_prepare(struct pt_regs *regs)
-{
-#ifdef CONFIG_PPC64
- /* Ensure interrupt_enter_prepare does not enable MSR[EE] */
- local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
-#endif
- interrupt_enter_prepare(regs);
-#ifdef CONFIG_PPC_BOOK3S_64
- /*
- * RI=1 is set by interrupt_enter_prepare, so this thread flags access
- * has to come afterward (it can cause SLB faults).
- */
- if (cpu_has_feature(CPU_FTR_CTRL) &&
- !test_thread_local_flags(_TLF_RUNLATCH))
- __ppc64_runlatch_on();
-#endif
- irq_enter();
-}
-
-static inline void interrupt_async_exit_prepare(struct pt_regs *regs)
-{
- /*
- * Adjust at exit so the main handler sees the true NIA. This must
- * come before irq_exit() because irq_exit can enable interrupts, and
- * if another interrupt is taken before nap_adjust_return has run
- * here, then that interrupt would return directly to idle nap return.
- */
- nap_adjust_return(regs);
-
- irq_exit();
- interrupt_exit_prepare(regs);
-}
-
-struct interrupt_nmi_state {
-#ifdef CONFIG_PPC64
- u8 irq_soft_mask;
- u8 irq_happened;
- u8 ftrace_enabled;
- u64 softe;
-#endif
-};
-
-static inline bool nmi_disables_ftrace(struct pt_regs *regs)
-{
- /* Allow DEC and PMI to be traced when they are soft-NMI */
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
- if (TRAP(regs) == INTERRUPT_DECREMENTER)
- return false;
- if (TRAP(regs) == INTERRUPT_PERFMON)
- return false;
- }
- if (IS_ENABLED(CONFIG_PPC_BOOK3E_64)) {
- if (TRAP(regs) == INTERRUPT_PERFMON)
- return false;
- }
-
- return true;
-}
-
-static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
-{
-#ifdef CONFIG_PPC64
- state->irq_soft_mask = local_paca->irq_soft_mask;
- state->irq_happened = local_paca->irq_happened;
- state->softe = regs->softe;
-
- /*
- * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
- * the right thing, and set IRQ_HARD_DIS. We do not want to reconcile
- * because that goes through irq tracing which we don't want in NMI.
- */
- local_paca->irq_soft_mask = IRQS_ALL_DISABLED;
- local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
-
- if (!(regs->msr & MSR_EE) || is_implicit_soft_masked(regs)) {
- /*
- * Adjust regs->softe to be soft-masked if it had not been
- * reconcied (e.g., interrupt entry with MSR[EE]=0 but softe
- * not yet set disabled), or if it was in an implicit soft
- * masked state. This makes regs_irqs_disabled(regs)
- * behave as expected.
- */
- regs->softe = IRQS_ALL_DISABLED;
- }
-
- __hard_RI_enable();
-
- /* Don't do any per-CPU operations until interrupt state is fixed */
-
- if (nmi_disables_ftrace(regs)) {
- state->ftrace_enabled = this_cpu_get_ftrace_enabled();
- this_cpu_set_ftrace_enabled(0);
- }
-#endif
-
- /* If data relocations are enabled, it's safe to use nmi_enter() */
- if (mfmsr() & MSR_DR) {
- nmi_enter();
- return;
- }
-
- /*
- * But do not use nmi_enter() for pseries hash guest taking a real-mode
- * NMI because not everything it touches is within the RMA limit.
- */
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
- firmware_has_feature(FW_FEATURE_LPAR) &&
- !radix_enabled())
- return;
-
- /*
- * Likewise, don't use it if we have some form of instrumentation (like
- * KASAN shadow) that is not safe to access in real mode (even on radix)
- */
- if (IS_ENABLED(CONFIG_KASAN))
- return;
-
- /*
- * Likewise, do not use it in real mode if percpu first chunk is not
- * embedded. With CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there
- * are chances where percpu allocation can come from vmalloc area.
- */
- if (percpu_first_chunk_is_paged)
- return;
-
- /* Otherwise, it should be safe to call it */
- nmi_enter();
-}
-
-static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
-{
- if (mfmsr() & MSR_DR) {
- // nmi_exit if relocations are on
- nmi_exit();
- } else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
- firmware_has_feature(FW_FEATURE_LPAR) &&
- !radix_enabled()) {
- // no nmi_exit for a pseries hash guest taking a real mode exception
- } else if (IS_ENABLED(CONFIG_KASAN)) {
- // no nmi_exit for KASAN in real mode
- } else if (percpu_first_chunk_is_paged) {
- // no nmi_exit if percpu first chunk is not embedded
- } else {
- nmi_exit();
- }
-
- /*
- * nmi does not call nap_adjust_return because nmi should not create
- * new work to do (must use irq_work for that).
- */
-
-#ifdef CONFIG_PPC64
-#ifdef CONFIG_PPC_BOOK3S
- if (regs_irqs_disabled(regs)) {
- unsigned long rst = search_kernel_restart_table(regs->nip);
- if (rst)
- regs_set_return_ip(regs, rst);
- }
-#endif
-
- if (nmi_disables_ftrace(regs))
- this_cpu_set_ftrace_enabled(state->ftrace_enabled);
-
- /* Check we didn't change the pending interrupt mask. */
- WARN_ON_ONCE((state->irq_happened | PACA_IRQ_HARD_DIS) != local_paca->irq_happened);
- regs->softe = state->softe;
- local_paca->irq_happened = state->irq_happened;
- local_paca->irq_soft_mask = state->irq_soft_mask;
-#endif
-}
-
/*
* Don't use noinstr here like x86, but rather add NOKPROBE_SYMBOL to each
* function definition. The reason for this is the noinstr section is placed
@@ -470,11 +166,14 @@ static __always_inline void ____##func(struct pt_regs *regs); \
\
interrupt_handler void func(struct pt_regs *regs) \
{ \
- interrupt_enter_prepare(regs); \
- \
+ irqentry_state_t state; \
+ arch_interrupt_enter_prepare(regs); \
+ state = irqentry_enter(regs); \
+ instrumentation_begin(); \
____##func (regs); \
- \
- interrupt_exit_prepare(regs); \
+ instrumentation_end(); \
+ arch_interrupt_exit_prepare(regs); \
+ irqentry_exit(regs, state); \
} \
NOKPROBE_SYMBOL(func); \
\
@@ -504,12 +203,15 @@ static __always_inline long ____##func(struct pt_regs *regs); \
interrupt_handler long func(struct pt_regs *regs) \
{ \
long ret; \
+ irqentry_state_t state; \
\
- interrupt_enter_prepare(regs); \
- \
+ arch_interrupt_enter_prepare(regs); \
+ state = irqentry_enter(regs); \
+ instrumentation_begin(); \
ret = ____##func (regs); \
- \
- interrupt_exit_prepare(regs); \
+ instrumentation_end(); \
+ arch_interrupt_exit_prepare(regs); \
+ irqentry_exit(regs, state); \
\
return ret; \
} \
@@ -538,11 +240,16 @@ static __always_inline void ____##func(struct pt_regs *regs); \
\
interrupt_handler void func(struct pt_regs *regs) \
{ \
- interrupt_async_enter_prepare(regs); \
- \
+ irqentry_state_t state; \
+ arch_interrupt_async_enter_prepare(regs); \
+ state = irqentry_enter(regs); \
+ instrumentation_begin(); \
+ irq_enter_rcu(); \
____##func (regs); \
- \
- interrupt_async_exit_prepare(regs); \
+ irq_exit_rcu(); \
+ instrumentation_end(); \
+ arch_interrupt_async_exit_prepare(regs); \
+ irqentry_exit(regs, state); \
} \
NOKPROBE_SYMBOL(func); \
\
@@ -572,14 +279,43 @@ ____##func(struct pt_regs *regs); \
\
interrupt_handler long func(struct pt_regs *regs) \
{ \
- struct interrupt_nmi_state state; \
+ irqentry_state_t state; \
+ struct interrupt_nmi_state nmi_state; \
long ret; \
\
- interrupt_nmi_enter_prepare(regs, &state); \
- \
+ arch_interrupt_nmi_enter_prepare(regs, &nmi_state); \
+ if (mfmsr() & MSR_DR) { \
+ /* nmi_entry if relocations are on */ \
+ state = irqentry_nmi_enter(regs); \
+ } else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && \
+ firmware_has_feature(FW_FEATURE_LPAR) && \
+ !radix_enabled()) { \
+ /* no nmi_entry for a pseries hash guest \
+ * taking a real mode exception */ \
+ } else if (IS_ENABLED(CONFIG_KASAN)) { \
+ /* no nmi_entry for KASAN in real mode */ \
+ } else if (percpu_first_chunk_is_paged) { \
+ /* no nmi_entry if percpu first chunk is not embedded */\
+ } else { \
+ state = irqentry_nmi_enter(regs); \
+ } \
ret = ____##func (regs); \
- \
- interrupt_nmi_exit_prepare(regs, &state); \
+ arch_interrupt_nmi_exit_prepare(regs, &nmi_state); \
+ if (mfmsr() & MSR_DR) { \
+ /* nmi_exit if relocations are on */ \
+ irqentry_nmi_exit(regs, state); \
+ } else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && \
+ firmware_has_feature(FW_FEATURE_LPAR) && \
+ !radix_enabled()) { \
+ /* no nmi_exit for a pseries hash guest \
+ * taking a real mode exception */ \
+ } else if (IS_ENABLED(CONFIG_KASAN)) { \
+ /* no nmi_exit for KASAN in real mode */ \
+ } else if (percpu_first_chunk_is_paged) { \
+ /* no nmi_exit if percpu first chunk is not embedded */ \
+ } else { \
+ irqentry_nmi_exit(regs, state); \
+ } \
\
return ret; \
} \
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 045804a86f98..a690e7da53c2 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -3,14 +3,19 @@
#define __ASM_KASAN_H

#if defined(CONFIG_KASAN) && !defined(CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX)
-#define _GLOBAL_KASAN(fn) _GLOBAL(__##fn)
-#define _GLOBAL_TOC_KASAN(fn) _GLOBAL_TOC(__##fn)
-#define EXPORT_SYMBOL_KASAN(fn) EXPORT_SYMBOL(__##fn)
-#else
+#define _GLOBAL_KASAN(fn) \
+ _GLOBAL(fn); \
+ _GLOBAL(__##fn)
+#define _GLOBAL_TOC_KASAN(fn) \
+ _GLOBAL_TOC(fn); \
+ _GLOBAL_TOC(__##fn)
+#define EXPORT_SYMBOL_KASAN(fn) \
+ EXPORT_SYMBOL(__##fn)
+#else /* CONFIG_KASAN && !CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX */
#define _GLOBAL_KASAN(fn) _GLOBAL(fn)
#define _GLOBAL_TOC_KASAN(fn) _GLOBAL_TOC(fn)
#define EXPORT_SYMBOL_KASAN(fn)
-#endif
+#endif /* CONFIG_KASAN && !CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX */

#ifndef __ASSEMBLER__

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 666eadb589a5..89a999be1352 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-or-later

#include <linux/context_tracking.h>
+#include <linux/entry-common.h>
#include <linux/err.h>
#include <linux/compat.h>
#include <linux/rseq.h>
@@ -25,10 +26,6 @@
unsigned long global_dbcr0[NR_CPUS];
#endif

-#if defined(CONFIG_PREEMPT_DYNAMIC)
-DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-#endif
-
#ifdef CONFIG_PPC_BOOK3S_64
DEFINE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
static inline bool exit_must_hard_disable(void)
@@ -78,181 +75,6 @@ static notrace __always_inline bool prep_irq_for_enabled_exit(bool restartable)
return true;
}

-static notrace void booke_load_dbcr0(void)
-{
-#ifdef CONFIG_PPC_ADV_DEBUG_REGS
- unsigned long dbcr0 = current->thread.debug.dbcr0;
-
- if (likely(!(dbcr0 & DBCR0_IDM)))
- return;
-
- /*
- * Check to see if the dbcr0 register is set up to debug.
- * Use the internal debug mode bit to do this.
- */
- mtmsr(mfmsr() & ~MSR_DE);
- if (IS_ENABLED(CONFIG_PPC32)) {
- isync();
- global_dbcr0[smp_processor_id()] = mfspr(SPRN_DBCR0);
- }
- mtspr(SPRN_DBCR0, dbcr0);
- mtspr(SPRN_DBSR, -1);
-#endif
-}
-
-static notrace void check_return_regs_valid(struct pt_regs *regs)
-{
-#ifdef CONFIG_PPC_BOOK3S_64
- unsigned long trap, srr0, srr1;
- static bool warned;
- u8 *validp;
- char *h;
-
- if (trap_is_scv(regs))
- return;
-
- trap = TRAP(regs);
- // EE in HV mode sets HSRRs like 0xea0
- if (cpu_has_feature(CPU_FTR_HVMODE) && trap == INTERRUPT_EXTERNAL)
- trap = 0xea0;
-
- switch (trap) {
- case 0x980:
- case INTERRUPT_H_DATA_STORAGE:
- case 0xe20:
- case 0xe40:
- case INTERRUPT_HMI:
- case 0xe80:
- case 0xea0:
- case INTERRUPT_H_FAC_UNAVAIL:
- case 0x1200:
- case 0x1500:
- case 0x1600:
- case 0x1800:
- validp = &local_paca->hsrr_valid;
- if (!READ_ONCE(*validp))
- return;
-
- srr0 = mfspr(SPRN_HSRR0);
- srr1 = mfspr(SPRN_HSRR1);
- h = "H";
-
- break;
- default:
- validp = &local_paca->srr_valid;
- if (!READ_ONCE(*validp))
- return;
-
- srr0 = mfspr(SPRN_SRR0);
- srr1 = mfspr(SPRN_SRR1);
- h = "";
- break;
- }
-
- if (srr0 == regs->nip && srr1 == regs->msr)
- return;
-
- /*
- * A NMI / soft-NMI interrupt may have come in after we found
- * srr_valid and before the SRRs are loaded. The interrupt then
- * comes in and clobbers SRRs and clears srr_valid. Then we load
- * the SRRs here and test them above and find they don't match.
- *
- * Test validity again after that, to catch such false positives.
- *
- * This test in general will have some window for false negatives
- * and may not catch and fix all such cases if an NMI comes in
- * later and clobbers SRRs without clearing srr_valid, but hopefully
- * such things will get caught most of the time, statistically
- * enough to be able to get a warning out.
- */
- if (!READ_ONCE(*validp))
- return;
-
- if (!data_race(warned)) {
- data_race(warned = true);
- printk("%sSRR0 was: %lx should be: %lx\n", h, srr0, regs->nip);
- printk("%sSRR1 was: %lx should be: %lx\n", h, srr1, regs->msr);
- show_regs(regs);
- }
-
- WRITE_ONCE(*validp, 0); /* fixup */
-#endif
-}
-
-static notrace unsigned long
-interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
-{
- unsigned long ti_flags;
-
-again:
- ti_flags = read_thread_flags();
- while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
- local_irq_enable();
- if (ti_flags & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
- schedule();
- } else {
- /*
- * SIGPENDING must restore signal handler function
- * argument GPRs, and some non-volatiles (e.g., r1).
- * Restore all for now. This could be made lighter.
- */
- if (ti_flags & _TIF_SIGPENDING)
- ret |= _TIF_RESTOREALL;
- do_notify_resume(regs, ti_flags);
- }
- local_irq_disable();
- ti_flags = read_thread_flags();
- }
-
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && IS_ENABLED(CONFIG_PPC_FPU)) {
- if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
- unlikely((ti_flags & _TIF_RESTORE_TM))) {
- restore_tm_state(regs);
- } else {
- unsigned long mathflags = MSR_FP;
-
- if (cpu_has_feature(CPU_FTR_VSX))
- mathflags |= MSR_VEC | MSR_VSX;
- else if (cpu_has_feature(CPU_FTR_ALTIVEC))
- mathflags |= MSR_VEC;
-
- /*
- * If userspace MSR has all available FP bits set,
- * then they are live and no need to restore. If not,
- * it means the regs were given up and restore_math
- * may decide to restore them (to avoid taking an FP
- * fault).
- */
- if ((regs->msr & mathflags) != mathflags)
- restore_math(regs);
- }
- }
-
- check_return_regs_valid(regs);
-
- user_enter_irqoff();
- if (!prep_irq_for_enabled_exit(true)) {
- user_exit_irqoff();
- local_irq_enable();
- local_irq_disable();
- goto again;
- }
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
- local_paca->tm_scratch = regs->msr;
-#endif
-
- booke_load_dbcr0();
-
- account_cpu_user_exit();
-
- /* Restore user access locks last */
- kuap_user_restore(regs);
-
- return ret;
-}
-
/*
* This should be called after a syscall returns, with r3 the return value
* from the syscall. If this function returns non-zero, the system call
@@ -267,17 +89,12 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
long scv)
{
unsigned long ti_flags;
- unsigned long ret = 0;
bool is_not_scv = !IS_ENABLED(CONFIG_PPC_BOOK3S_64) || !scv;

- CT_WARN_ON(ct_state() == CT_STATE_USER);
-
kuap_assert_locked();

regs->result = r3;
-
- /* Check whether the syscall is issued inside a restartable sequence */
- rseq_syscall(regs);
+ regs->exit_flags = 0;

ti_flags = read_thread_flags();

@@ -290,7 +107,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,

if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
if (ti_flags & _TIF_RESTOREALL)
- ret = _TIF_RESTOREALL;
+ regs->exit_flags = _TIF_RESTOREALL;
else
regs->gpr[3] = r3;
clear_bits(_TIF_PERSYSCALL_MASK, &current_thread_info()->flags);
@@ -299,18 +116,28 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
}

if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
- do_syscall_trace_leave(regs);
- ret |= _TIF_RESTOREALL;
+ regs->exit_flags |= _TIF_RESTOREALL;
}

- local_irq_disable();
- ret = interrupt_exit_user_prepare_main(ret, regs);
+ syscall_exit_to_user_mode(regs);
+
+again:
+ user_enter_irqoff();
+ if (!prep_irq_for_enabled_exit(true)) {
+ user_exit_irqoff();
+ local_irq_enable();
+ local_irq_disable();
+ goto again;
+ }
+
+ /* Restore user access locks last */
+ kuap_user_restore(regs);

#ifdef CONFIG_PPC64
- regs->exit_result = ret;
+ regs->exit_result = regs->exit_flags;
#endif

- return ret;
+ return regs->exit_flags;
}

#ifdef CONFIG_PPC64
@@ -330,13 +157,16 @@ notrace unsigned long syscall_exit_restart(unsigned long r3, struct pt_regs *reg
set_kuap(AMR_KUAP_BLOCKED);
#endif

- trace_hardirqs_off();
- user_exit_irqoff();
- account_cpu_user_entry();
-
- BUG_ON(!user_mode(regs));
+again:
+ user_enter_irqoff();
+ if (!prep_irq_for_enabled_exit(true)) {
+ user_exit_irqoff();
+ local_irq_enable();
+ local_irq_disable();
+ goto again;
+ }

- regs->exit_result = interrupt_exit_user_prepare_main(regs->exit_result, regs);
+ regs->exit_result |= regs->exit_flags;

return regs->exit_result;
}
@@ -348,7 +178,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)

BUG_ON(regs_is_unrecoverable(regs));
BUG_ON(regs_irqs_disabled(regs));
- CT_WARN_ON(ct_state() == CT_STATE_USER);

/*
* We don't need to restore AMR on the way back to userspace for KUAP.
@@ -357,8 +186,21 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)
kuap_assert_locked();

local_irq_disable();
+ regs->exit_flags = 0;
+again:
+ check_return_regs_valid(regs);
+ user_enter_irqoff();
+ if (!prep_irq_for_enabled_exit(true)) {
+ user_exit_irqoff();
+ local_irq_enable();
+ local_irq_disable();
+ goto again;
+ }
+
+ /* Restore user access locks last */
+ kuap_user_restore(regs);

- ret = interrupt_exit_user_prepare_main(0, regs);
+ ret = regs->exit_flags;

#ifdef CONFIG_PPC64
regs->exit_result = ret;
@@ -400,13 +242,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
/* Returning to a kernel context with local irqs enabled. */
WARN_ON_ONCE(!(regs->msr & MSR_EE));
again:
- if (need_irq_preemption()) {
- /* Return to preemptible kernel context */
- if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) {
- if (preempt_count() == 0)
- preempt_schedule_irq();
- }
- }

check_return_regs_valid(regs);

@@ -479,7 +314,6 @@ notrace unsigned long interrupt_exit_user_restart(struct pt_regs *regs)
#endif

trace_hardirqs_off();
- user_exit_irqoff();
account_cpu_user_entry();

BUG_ON(!user_mode(regs));
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
index 2134b6d155ff..f006a03a0211 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -21,9 +21,6 @@
#include <asm/switch_to.h>
#include <asm/debug.h>

-#define CREATE_TRACE_POINTS
-#include <trace/events/syscalls.h>
-
#include "ptrace-decl.h"

/*
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index aa17e62f3754..9f1847b4742e 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -6,6 +6,7 @@
* Extracted from signal_32.c and signal_64.c
*/

+#include <linux/entry-common.h>
#include <linux/resume_user_mode.h>
#include <linux/signal.h>
#include <linux/uprobes.h>
@@ -368,3 +369,10 @@ void signal_fault(struct task_struct *tsk, struct pt_regs *regs,
printk_ratelimited(regs->msr & MSR_64BIT ? fm64 : fm32, tsk->comm,
task_pid_nr(tsk), where, ptr, regs->nip, regs->link);
}
+
+void arch_do_signal_or_restart(struct pt_regs *regs)
+{
+ BUG_ON(regs != current->thread.regs);
+ regs->exit_flags |= _TIF_RESTOREALL;
+ do_signal(current);
+}
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 9f03a6263fb4..df1c9a8d62bc 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -3,6 +3,7 @@
#include <linux/compat.h>
#include <linux/context_tracking.h>
#include <linux/randomize_kstack.h>
+#include <linux/entry-common.h>

#include <asm/interrupt.h>
#include <asm/kup.h>
@@ -18,124 +19,10 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
long ret;
syscall_fn f;

- kuap_lock();
-
add_random_kstack_offset();
+ r0 = syscall_enter_from_user_mode(regs, r0);

- if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
- BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
-
- trace_hardirqs_off(); /* finish reconciling */
-
- CT_WARN_ON(ct_state() == CT_STATE_KERNEL);
- user_exit_irqoff();
-
- BUG_ON(regs_is_unrecoverable(regs));
- BUG_ON(!user_mode(regs));
- BUG_ON(regs_irqs_disabled(regs));
-
-#ifdef CONFIG_PPC_PKEY
- if (mmu_has_feature(MMU_FTR_PKEY)) {
- unsigned long amr, iamr;
- bool flush_needed = false;
- /*
- * When entering from userspace we mostly have the AMR/IAMR
- * different from kernel default values. Hence don't compare.
- */
- amr = mfspr(SPRN_AMR);
- iamr = mfspr(SPRN_IAMR);
- regs->amr = amr;
- regs->iamr = iamr;
- if (mmu_has_feature(MMU_FTR_KUAP)) {
- mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
- flush_needed = true;
- }
- if (mmu_has_feature(MMU_FTR_BOOK3S_KUEP)) {
- mtspr(SPRN_IAMR, AMR_KUEP_BLOCKED);
- flush_needed = true;
- }
- if (flush_needed)
- isync();
- } else
-#endif
- kuap_assert_locked();
-
- booke_restore_dbcr0();
-
- account_cpu_user_entry();
-
- account_stolen_time();
-
- /*
- * This is not required for the syscall exit path, but makes the
- * stack frame look nicer. If this was initialised in the first stack
- * frame, or if the unwinder was taught the first stack frame always
- * returns to user with IRQS_ENABLED, this store could be avoided!
- */
- irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
-
- /*
- * If system call is called with TM active, set _TIF_RESTOREALL to
- * prevent RFSCV being used to return to userspace, because POWER9
- * TM implementation has problems with this instruction returning to
- * transactional state. Final register values are not relevant because
- * the transaction will be aborted upon return anyway. Or in the case
- * of unsupported_scv SIGILL fault, the return state does not much
- * matter because it's an edge case.
- */
- if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
- unlikely(MSR_TM_TRANSACTIONAL(regs->msr)))
- set_bits(_TIF_RESTOREALL, &current_thread_info()->flags);
-
- /*
- * If the system call was made with a transaction active, doom it and
- * return without performing the system call. Unless it was an
- * unsupported scv vector, in which case it's treated like an illegal
- * instruction.
- */
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
- if (unlikely(MSR_TM_TRANSACTIONAL(regs->msr)) &&
- !trap_is_unsupported_scv(regs)) {
- /* Enable TM in the kernel, and disable EE (for scv) */
- hard_irq_disable();
- mtmsr(mfmsr() | MSR_TM);
-
- /* tabort, this dooms the transaction, nothing else */
- asm volatile(".long 0x7c00071d | ((%0) << 16)"
- :: "r"(TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT));
-
- /*
- * Userspace will never see the return value. Execution will
- * resume after the tbegin. of the aborted transaction with the
- * checkpointed register state. A context switch could occur
- * or signal delivered to the process before resuming the
- * doomed transaction context, but that should all be handled
- * as expected.
- */
- return -ENOSYS;
- }
-#endif // CONFIG_PPC_TRANSACTIONAL_MEM
-
- local_irq_enable();
-
- if (unlikely(read_thread_flags() & _TIF_SYSCALL_DOTRACE)) {
- if (unlikely(trap_is_unsupported_scv(regs))) {
- /* Unsupported scv vector */
- _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
- return regs->gpr[3];
- }
- /*
- * We use the return value of do_syscall_trace_enter() as the
- * syscall number. If the syscall was rejected for any reason
- * do_syscall_trace_enter() returns an invalid syscall number
- * and the test against NR_syscalls will fail and the return
- * value to be used is in regs->gpr[3].
- */
- r0 = do_syscall_trace_enter(regs);
- if (unlikely(r0 >= NR_syscalls))
- return regs->gpr[3];
-
- } else if (unlikely(r0 >= NR_syscalls)) {
+ if (unlikely(r0 >= NR_syscalls)) {
if (unlikely(trap_is_unsupported_scv(regs))) {
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
--
2.52.0

Mukesh Kumar Chaurasiya

unread,
Jan 23, 2026, 2:40:52 AM (8 days ago) Jan 23
to ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Mukesh Kumar Chaurasiya
After enabling GENERIC_ENTRY some functions are left unused.
Cleanup all those functions which includes:
- do_syscall_trace_enter
- do_syscall_trace_leave
- do_notify_resume
- do_seccomp

Signed-off-by: Mukesh Kumar Chaurasiya <mkch...@linux.ibm.com>
---
arch/powerpc/include/asm/ptrace.h | 3 -
arch/powerpc/include/asm/signal.h | 1 -
arch/powerpc/kernel/ptrace/ptrace.c | 138 ----------------------------
arch/powerpc/kernel/signal.c | 17 ----
4 files changed, 159 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 2e741ea57b80..fdeb97421785 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -177,9 +177,6 @@ extern unsigned long profile_pc(struct pt_regs *regs);
#define profile_pc(regs) instruction_pointer(regs)
#endif

-long do_syscall_trace_enter(struct pt_regs *regs);
-void do_syscall_trace_leave(struct pt_regs *regs);
-
static inline void set_return_regs_changed(void)
{
#ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index 922d43700fb4..21af92cdb237 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -7,7 +7,6 @@
#include <uapi/asm/ptrace.h>

struct pt_regs;
-void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);

unsigned long get_min_sigframe_size_32(void);
unsigned long get_min_sigframe_size_64(void);
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
index f006a03a0211..316d4f5ead8e 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -192,144 +192,6 @@ long arch_ptrace(struct task_struct *child, long request,
return ret;
}

-#ifdef CONFIG_SECCOMP
-static int do_seccomp(struct pt_regs *regs)
-{
- if (!test_thread_flag(TIF_SECCOMP))
- return 0;
-
- /*
- * The ABI we present to seccomp tracers is that r3 contains
- * the syscall return value and orig_gpr3 contains the first
- * syscall parameter. This is different to the ptrace ABI where
- * both r3 and orig_gpr3 contain the first syscall parameter.
- */
- regs->gpr[3] = -ENOSYS;
-
- /*
- * We use the __ version here because we have already checked
- * TIF_SECCOMP. If this fails, there is nothing left to do, we
- * have already loaded -ENOSYS into r3, or seccomp has put
- * something else in r3 (via SECCOMP_RET_ERRNO/TRACE).
- */
- if (__secure_computing())
- return -1;
-
- /*
- * The syscall was allowed by seccomp, restore the register
- * state to what audit expects.
- * Note that we use orig_gpr3, which means a seccomp tracer can
- * modify the first syscall parameter (in orig_gpr3) and also
- * allow the syscall to proceed.
- */
- regs->gpr[3] = regs->orig_gpr3;
-
- return 0;
-}
-#else
-static inline int do_seccomp(struct pt_regs *regs) { return 0; }
-#endif /* CONFIG_SECCOMP */
-
-/**
- * do_syscall_trace_enter() - Do syscall tracing on kernel entry.
- * @regs: the pt_regs of the task to trace (current)
- *
- * Performs various types of tracing on syscall entry. This includes seccomp,
- * ptrace, syscall tracepoints and audit.
- *
- * The pt_regs are potentially visible to userspace via ptrace, so their
- * contents is ABI.
- *
- * One or more of the tracers may modify the contents of pt_regs, in particular
- * to modify arguments or even the syscall number itself.
- *
- * It's also possible that a tracer can choose to reject the system call. In
- * that case this function will return an illegal syscall number, and will put
- * an appropriate return value in regs->r3.
- *
- * Return: the (possibly changed) syscall number.
- */
-long do_syscall_trace_enter(struct pt_regs *regs)
-{
- u32 flags;
-
- flags = read_thread_flags() & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE);
-
- if (flags) {
- int rc = ptrace_report_syscall_entry(regs);
-
- if (unlikely(flags & _TIF_SYSCALL_EMU)) {
- /*
- * A nonzero return code from
- * ptrace_report_syscall_entry() tells us to prevent
- * the syscall execution, but we are not going to
- * execute it anyway.
- *
- * Returning -1 will skip the syscall execution. We want
- * to avoid clobbering any registers, so we don't goto
- * the skip label below.
- */
- return -1;
- }
-
- if (rc) {
- /*
- * The tracer decided to abort the syscall. Note that
- * the tracer may also just change regs->gpr[0] to an
- * invalid syscall number, that is handled below on the
- * exit path.
- */
- goto skip;
- }
- }
-
- /* Run seccomp after ptrace; allow it to set gpr[3]. */
- if (do_seccomp(regs))
- return -1;
-
- /* Avoid trace and audit when syscall is invalid. */
- if (regs->gpr[0] >= NR_syscalls)
- goto skip;
-
- if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
- trace_sys_enter(regs, regs->gpr[0]);
-
- if (!is_32bit_task())
- audit_syscall_entry(regs->gpr[0], regs->gpr[3], regs->gpr[4],
- regs->gpr[5], regs->gpr[6]);
- else
- audit_syscall_entry(regs->gpr[0],
- regs->gpr[3] & 0xffffffff,
- regs->gpr[4] & 0xffffffff,
- regs->gpr[5] & 0xffffffff,
- regs->gpr[6] & 0xffffffff);
-
- /* Return the possibly modified but valid syscall number */
- return regs->gpr[0];
-
-skip:
- /*
- * If we are aborting explicitly, or if the syscall number is
- * now invalid, set the return value to -ENOSYS.
- */
- regs->gpr[3] = -ENOSYS;
- return -1;
-}
-
-void do_syscall_trace_leave(struct pt_regs *regs)
-{
- int step;
-
- audit_syscall_exit(regs);
-
- if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
- trace_sys_exit(regs, regs->result);
-
- step = test_thread_flag(TIF_SINGLESTEP);
- if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- ptrace_report_syscall_exit(regs, step);
-}
-
void __init pt_regs_check(void);

/*
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 9f1847b4742e..bb42a8b6c642 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -293,23 +293,6 @@ static void do_signal(struct task_struct *tsk)
signal_setup_done(ret, &ksig, test_thread_flag(TIF_SINGLESTEP));
}

-void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
-{
- if (thread_info_flags & _TIF_UPROBE)
- uprobe_notify_resume(regs);
-
- if (thread_info_flags & _TIF_PATCH_PENDING)
- klp_update_patch_state(current);
-
- if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
- BUG_ON(regs != current->thread.regs);
- do_signal(current);
- }
-
- if (thread_info_flags & _TIF_NOTIFY_RESUME)
- resume_user_mode_work(regs);
-}
-
static unsigned long get_tm_stackpointer(struct task_struct *tsk)
{
/* When in an active transaction that takes a signal, we need to be
--
2.52.0

Shrikanth Hegde

unread,
Jan 23, 2026, 12:54:41 PM (7 days ago) Jan 23
to Mukesh Kumar Chaurasiya, ma...@linux.ibm.com, chl...@kernel.org, linuxp...@lists.ozlabs.org, npi...@gmail.com, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linux-...@vger.kernel.org, kasa...@googlegroups.com, m...@ellerman.id.au
Hi Mukesh.
Ran it a bit on powernv (power9) too. Not warnings and similar
micro benchmark numbers.

I think this is in better shape now. With that,

for the series.
Reviewed-by: Shrikanth Hegde <ssh...@linux.ibm.com>

Venkat

unread,
2:25 AM (16 hours ago) 2:25 AM
to Mukesh Kumar Chaurasiya, ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Tested this patch set, and it builds successfully. Also ran ltp, ptrace, ftrace, perf related tests and no crash or warnings observed. Please add below tag.

Tested-by: Venkat Rao Bagalkote <venk...@linux.ibm.com>

Regards,
Venkat.

David Gow

unread,
3:41 AM (15 hours ago) 3:41 AM
to Mukesh Kumar Chaurasiya, ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Passes the irq_test_cases KUnit suite on (qemu) powerpc(64),
powerpcle, and powerpc32 targets.

./tools/testing/kunit/kunit.py run --arch powerpc irq_test_cases
./tools/testing/kunit/kunit.py run --arch powerpcle irq_test_cases
./tools/testing/kunit/kunit.py run --arch powerpc32 irq_test_cases

Tested-by: David Gow <davi...@google.com>

Cheers,
-- David
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/kasan-dev/20260123073916.956498-1-mkchauras%40linux.ibm.com.

Samir M

unread,
1:14 PM (5 hours ago) 1:14 PM
to Mukesh Kumar Chaurasiya, ma...@linux.ibm.com, m...@ellerman.id.au, npi...@gmail.com, chl...@kernel.org, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, ol...@redhat.com, ke...@kernel.org, lu...@amacapital.net, w...@chromium.org, mcha...@linux.ibm.com, th...@redhat.com, ruanj...@huawei.com, ssh...@linux.ibm.com, ak...@linux-foundation.org, cha...@rivosinc.com, del...@gmx.de, l...@strace.io, ma...@orcam.me.uk, seg...@kernel.crashing.org, pet...@infradead.org, big...@linutronix.de, nam...@linutronix.de, tg...@linutronix.de, mark.b...@arm.com, linuxp...@lists.ozlabs.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Hi Mukesh,

I verified this patch with the following configuration and test coverage.

Test configuration:

* Kernel version: 6.19.0-rc6
* Number of CPUs: 80

Tests that are performed:
1. Kernel selftests
2. LTP
3. will-it-scale
4. stress-ng (IRQ and syscall focused)
5. DLPAR with SMT stress testing
6. DLPAR with CPU folding scenarios
7. ptrace, ftrace and perf related tests.
8. Build and boot.

No functional issues were observed during testing.


Performance Tests:
perf bench syscall usec/op:(+ve is regression)
syscall  | without_patch | with_patch |  %change |
--------------------------------------------------
getppid  |   0.100       |  0.102    |  +2.0 %  |
fork.    |   363.281     |  369.995  |  +1.85%  |
execve.  |   360.610     |  360.826  |  +0.06%  |


perf bench syscall ops/sec:(-ve is regression)
syscall  | without_patch | with_patch  |  %change |
--------------------------------------------------
getppid  |   10,048,674  | 9,851,574|   −1.96% |
fork.    |   2,752       |    2,703   |   −1.78% |
execve.  |   2,772       | 2,771    |   −0.04% |


IPI latency benchmark (-ve is improvement)

| Metric  | without_patch (ns)| with_patch (ns) | % Change |
| -------------- | ----------------- | --------------- | -------- |
| Dry run        | 202259.20        | 201962.38       | -0.15%   |
| Self IPI       | 3565899.21       | 3271122.04      | -8.27%   |
| Normal IPI     | 47146345.28       | 42920014.89     | -8.97%   |
| Broadcast IPI  | 3920749623.87     | 3838799420.04   | -2.09%   |
| Broadcast lock | 3877260906.55     | 3803805814.03   | -1.89%   |

Please add the below tag,

Tested-by: Samir M <sa...@linux.ibm.com>


Regards,
Samir.
Reply all
Reply to author
Forward
0 new messages