[PATCH] drmgr: Use 30 secs timeout for each LMB removal kernel interface

18 views
Skip to first unread message

Haren Myneni

<haren@linux.ibm.com>
unread,
May 29, 2026, 6:36:05 PMMay 29
to powerpc-utils-devel@googlegroups.com, tyreld@linux.ibm.com, mmc@linux.ibm.com, davemarq@linux.ibm.com, hbabu@us.ibm.com, haren@linux.ibm.com
The drmgr selects the removal LMB based on NUMA node ratio and
calls the kernel interface to remove the each selected LMB. Then
the kernel interface removes the LMB only after all pages are
isolated. But this page isolation can take longer which may affect
the memory removal process. The kernel interface returns to the
user space if sny signals are pending.

So do not allow the kernel interface execute more then 30 secs for
each LMB removal. Setup 30 secs timer and generate SIGUSR1 signal
for each kernel interface.

Signed-off-by: Haren Myneni <ha...@linux.ibm.com>
---
src/drmgr/common.c | 1 +
src/drmgr/drslot_chrp_mem.c | 68 +++++++++++++++++++++++++++++++++----
2 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/src/drmgr/common.c b/src/drmgr/common.c
index 68041a9..70f5d3b 100644
--- a/src/drmgr/common.c
+++ b/src/drmgr/common.c
@@ -937,6 +937,7 @@ sig_setup(void)
sigdelset(&sigset, SIGALRM);
sigdelset(&sigset, SIGQUIT);
sigdelset(&sigset, SIGABRT);
+ sigdelset(&sigset, SIGUSR1);

/* Now block all remaining signals */
rc = sigprocmask(SIG_SETMASK, &sigset, NULL);
diff --git a/src/drmgr/drslot_chrp_mem.c b/src/drmgr/drslot_chrp_mem.c
index 4a3fe3a..ea8e8ce 100644
--- a/src/drmgr/drslot_chrp_mem.c
+++ b/src/drmgr/drslot_chrp_mem.c
@@ -39,6 +39,10 @@
uint64_t block_sz_bytes = 0;
static char *state_strs[] = {"offline", "online"};
sig_atomic_t numa_mem_timeout = 0;
+sig_atomic_t lmb_rm_timeout = 0;
+struct itimerspec lmb_tval;
+struct sigevent lmb_sevent;
+timer_t lmb_timer;

static char *usagestr = "-c mem {-a | -r} {-q <quantity> -p {variable_weight | ent_capacity} | {-q <quantity> | -s [<drc_name> | <drc_index>]}}";

@@ -62,6 +66,15 @@ void mem_timeout_handler(int sig)
numa_mem_timeout = 1;
}

+/*
+ * SIGUSR1 handler for LMB removal timeout and used
+ * only for NUMA based memory removal
+ */
+void lmb_rm_timeout_handler(int sig)
+{
+ if (sig == SIGUSR1) lmb_rm_timeout = 1;
+}
+
/**
* report_resource_count
* @brief Report the number of LMBs that were added or removed.
@@ -1461,12 +1474,36 @@ int valid_mem_options(void)
static int remove_lmb_by_index(uint32_t drc_index)
{
char cmdbuf[128];
- int offset;
+ int offset, rc;

offset = sprintf(cmdbuf, "memory remove index 0x%x", drc_index);

- return do_kernel_dlpar_common(cmdbuf, offset,
- 1 /* Don't report error */);
+ /*
+ * The kernel interface removes LMB only after all pages are
+ * isolated. So sometimes the kernel waits forever to isolate
+ * pages and the drmgr can not make any progress. The kernel
+ * returns to the user space for any pending signals.
+ *
+ * Setup 30 secs timer and generate SIGUSR1 signal in case
+ * the kernel request takes longer than 30 secs.
+ */
+ lmb_rm_timeout = 0;
+ lmb_tval.it_value.tv_sec = 30;
+ timer_settime(lmb_timer, 0, &lmb_tval, NULL);
+
+ rc = do_kernel_dlpar_common(cmdbuf, offset,
+ 1 /* Don't report error */);
+
+ if (!lmb_rm_timeout) {
+ /*
+ * Disable the timer if the kernel request returned before
+ * 30 secs interval.
+ */
+ lmb_tval.it_value.tv_sec = 0;
+ timer_settime(lmb_timer, 0, &lmb_tval, NULL);
+ }
+
+ return rc;
}

static int remove_lmb_from_node(struct ppcnuma_node *node, uint32_t count)
@@ -1707,12 +1744,30 @@ static void clear_numa_lmb_links(void)
* (with -w option). In the case of LMB removal, the kernel
* interface can run longer until all pages in LMB are isolated
* and can return to the user space if any pending signals.
- * This SIGALRM signal can exit the kernel in case LMB removal
- * is taking longer than timeout.
+ * It may cause drmgr waiting forever on 1 LMB removal and can not
+ * make progress further. So setup SIGUSR1 30 secs timer for each
+ * LMB kernel removal request.
+ *
+ * This SIGALRM signal is used to exit drmgr in case if the complete
+ * memory removal process takes longer than the timeout value.
*/
static int drmem_timer_setup(void)
{
- struct sigaction sigact;
+ struct sigaction sigact, lmb_sigact;
+
+ lmb_sigact.sa_handler = lmb_rm_timeout_handler;
+ sigemptyset(&lmb_sigact.sa_mask);
+ lmb_sigact.sa_flags = 0;
+ sigaction(SIGUSR1, &lmb_sigact, NULL);
+
+ lmb_sevent.sigev_notify = SIGEV_SIGNAL;
+ lmb_sevent.sigev_signo = SIGUSR1;
+ lmb_sevent.sigev_value.sival_ptr = &lmb_timer;
+ timer_create(CLOCK_MONOTONIC, &lmb_sevent, &lmb_timer);
+ lmb_tval.it_value.tv_sec = 30;
+ lmb_tval.it_value.tv_nsec = 0;
+ lmb_tval.it_interval.tv_sec = 0;
+ lmb_tval.it_interval.tv_nsec = 0;

if (!usr_timeout)
return 0;
@@ -1770,6 +1825,7 @@ static int numa_based_remove(uint32_t count)
out_free:
free_lmbs(lmb_list);
out_clear:
+ timer_delete(lmb_timer);
clear_numa_lmb_links();
report_resource_count(done);
return rc;
--
2.54.0

Dave Marquardt

<davemarq@linux.ibm.com>
unread,
Jun 1, 2026, 2:35:03 PMJun 1
to Haren Myneni, powerpc-utils-devel@googlegroups.com, tyreld@linux.ibm.com, mmc@linux.ibm.com, hbabu@us.ibm.com
Haren Myneni <ha...@linux.ibm.com> writes:

> The drmgr selects the removal LMB based on NUMA node ratio and
> calls the kernel interface to remove the each selected LMB. Then
> the kernel interface removes the LMB only after all pages are
> isolated. But this page isolation can take longer which may affect
> the memory removal process. The kernel interface returns to the
> user space if sny signals are pending.
>
> So do not allow the kernel interface execute more then 30 secs for
> each LMB removal. Setup 30 secs timer and generate SIGUSR1 signal
> for each kernel interface.
>
> @@ -1707,12 +1744,30 @@ static void clear_numa_lmb_links(void)
> * (with -w option). In the case of LMB removal, the kernel
> * interface can run longer until all pages in LMB are isolated
> * and can return to the user space if any pending signals.
> - * This SIGALRM signal can exit the kernel in case LMB removal
> - * is taking longer than timeout.
> + * It may cause drmgr waiting forever on 1 LMB removal and can not
> + * make progress further. So setup SIGUSR1 30 secs timer for each
> + * LMB kernel removal request.
> + *
> + * This SIGALRM signal is used to exit drmgr in case if the complete
> + * memory removal process takes longer than the timeout value.

This should be changed to say "This SIGUSR1 signal...."

Haren Myneni

<haren@linux.ibm.com>
unread,
Jun 2, 2026, 10:31:18 AMJun 2
to Dave Marquardt, powerpc-utils-devel@googlegroups.com, tyreld@linux.ibm.com, mmc@linux.ibm.com
Using 2 signals - SIGALRM is for drmgr timeout handling and SIGUSR1 is
for to timeout from the kernel interface in the case of stuck to
isolate page.

Will update the above comments to make it more clear.

Thanks for the review
Haren

Tyrel Datwyler

<tyreld@linux.ibm.com>
unread,
Jun 3, 2026, 9:02:08 PMJun 3
to Haren Myneni, powerpc-utils-devel@googlegroups.com, mmc@linux.ibm.com, davemarq@linux.ibm.com, hbabu@us.ibm.com
I think we should define a constant for the timer so we can change it in one
place if we want to change the amount of time in the future.

Something like LMB_REMOVAL_TIMER_SECS.

> + timer_settime(lmb_timer, 0, &lmb_tval, NULL);

Probably should check if timer_settime fails.

> +
> + rc = do_kernel_dlpar_common(cmdbuf, offset,
> + 1 /* Don't report error */);
> +

There is a small window where the timer might fire after return from the kernel
and before the timeout check resulting in the timer not being disabled even
though the kernel command completed. Should disable the timer unconditionally.
Need to check for sigaction failure.

> +
> + lmb_sevent.sigev_notify = SIGEV_SIGNAL;
> + lmb_sevent.sigev_signo = SIGUSR1;
> + lmb_sevent.sigev_value.sival_ptr = &lmb_timer;
> + timer_create(CLOCK_MONOTONIC, &lmb_sevent, &lmb_timer);

Need to check for timer_create failure.

-Tyrel

Haren Myneni

<hmyneni@gmail.com>
unread,
Jun 5, 2026, 8:38:52 AMJun 5
to Tyrel Datwyler, Haren Myneni, powerpc-utils-devel@googlegroups.com, mmc@linux.ibm.com, davemarq@linux.ibm.com, hbabu@us.ibm.com
30 secs is used in one place during setup. Sure will add this macro. 
 

> +     timer_settime(lmb_timer, 0, &lmb_tval, NULL);

Probably should check if timer_settime fails.

Have to be careful to consider return error code here in case it fails in the middle of LMBs removal since the partial memory removal is supported and will be reporting "Number of LMBs removed"  successfully to HMC. Noticed in some other places in the code not considering some command failures. But will look in to this and make necessary changes.
 

> +
> +     rc = do_kernel_dlpar_common(cmdbuf, offset,
> +                                     1 /* Don't report error */);
> +

There is a small window where the timer might fire after return from the kernel
and before the timeout check resulting in the timer not being disabled even
though the kernel command completed. Should disable the timer unconditionally.

I thought one sigact is active at a time with the same signal. I did not find any issue with my testing. If not,  possible in the corner case. 
I think we should need any handler and simply activate the timer before the kernel interface and disable it immediately after that interface. Just a signal can make the kernel interface return anyway.
Should not be a problem to check the error condition in the setup code.  Will update these changes and submit V2 patch.

Thanks for the review
Haren

-Tyrel

--
You received this message because you are subscribed to the Google Groups "Powerpc-utils development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to powerpc-utils-d...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/powerpc-utils-devel/019e4810-a8e5-46d4-92c1-f6b872232995%40linux.ibm.com.
Reply all
Reply to author
Forward
0 new messages