[PATCH RESEND next] drmgr: Enhanced debug message add memory

7 views
Skip to first unread message

ryancwmd

<ryancwmd@linux.ibm.com>
unread,
Jul 21, 2025, 3:55:45 PMJul 21
to powerpc-utils-devel@googlegroups.com, tyreld@linux.ibm.com, ryancwmd, Mingming Cao, Haren Myneni
Debug info on LMB breakdown before add memory operation.

Previously, drmgr gave no report of LMB count for each node during an
add operation, only for remove. Sample below:

drmgr: -a -c mem -q 2 -d 5
Validating Memory DLPAR capability...yes.
Found 254 LMBs currently allocated
node 0 48 CPUs 245 LMBs
node 2 0 CPUs 0 LMBs
Initiating kernel DLPAR "memory add count 2"
Success
DR_TOTAL_RESOURCES=2

Patch additionally addresses the following:

Unifies processing logic between add & remove operations. Code now utilizes the
same numa tunnel for both add and remove operations given necessary conditions.

Ensures proper reset of node lmb count upon cleanup call.

Signed-off-by: Ryan Whittaker <ryan...@linux.ibm.com>
Reviewed-by: Mingming Cao <m...@linux.ibm.com>
Reviewed-by: Haren Myneni <ha...@linux.ibm.com>
---
src/drmgr/drslot_chrp_mem.c | 118 ++++++++++++++++++++++++------------
1 file changed, 80 insertions(+), 38 deletions(-)

diff --git a/src/drmgr/drslot_chrp_mem.c b/src/drmgr/drslot_chrp_mem.c
index d37ee80..6206492 100644
--- a/src/drmgr/drslot_chrp_mem.c
+++ b/src/drmgr/drslot_chrp_mem.c
@@ -903,7 +903,7 @@ add_device_tree_lmb(struct dr_node *lmb, struct lmb_list_head *lmb_list)
release_drc(lmb->drc_index, MEM_DEV);
return -1;
}
-
+
if (lmb_list->drconf_buf) {
errno = 0;
rc = update_drconf_node(lmb, lmb_list, ADD);
@@ -957,7 +957,7 @@ add_device_tree_lmb(struct dr_node *lmb, struct lmb_list_head *lmb_list)
}

rc = get_mem_scns(lmb);
- if (rc)
+ if (rc)
remove_device_tree_lmb(lmb, lmb_list);

return rc;
@@ -1698,38 +1698,31 @@ static void clear_numa_lmb_links(void)
int nid;
struct ppcnuma_node *node;

- ppcnuma_foreach_node(&numa, nid, node)
+ ppcnuma_foreach_node(&numa, nid, node) {
node->lmbs = NULL;
+ node->n_lmbs = 0;
+ }
}

-static int numa_based_remove(uint32_t count)
+/*
+* This function prints out numa node LMB and CPU info to debug.
+*/
+static int dump_lmb_info()
{
- struct lmb_list_head *lmb_list;
struct ppcnuma_node *node;
int nid;
- uint32_t done = 0;

- /*
- * Read the LMBs
- * Link the LMBs to their node
- * Update global counter
- */
- lmb_list = get_lmbs(LMB_NORMAL_SORT);
- if (lmb_list == NULL) {
- clear_numa_lmb_links();
- return -1;
+ ppcnuma_foreach_node(&numa, nid, node) {
+ say(DEBUG, "node %4d %4d CPUs %8d LMBs\n",
+ nid, node->n_cpus, node->n_lmbs);
}

- if (!numa.node_count) {
- clear_numa_lmb_links();
- free_lmbs(lmb_list);
- return -EINVAL;
- }
+ return 0;
+}

- ppcnuma_foreach_node(&numa, nid, node) {
- say(INFO, "node %4d %4d CPUs %8d LMBs\n",
- nid, node->n_cpus, node->n_lmbs);
- }
+static int numa_based_remove(uint32_t count)
+{
+ uint32_t done = 0;

done += remove_cpuless_lmbs(count);
count -= done;
@@ -1738,29 +1731,77 @@ static int numa_based_remove(uint32_t count)

report_resource_count(done);

- clear_numa_lmb_links();
- free_lmbs(lmb_list);
return 0;
}

+/*
+* This function wraps numa debug output and the numa based remove operation.
+* Refactors flow such that both add and remove operations call this function
+* given the right debug conditions.
+*/
+static int numa_mem_dlpar(uint32_t count)
+{
+ struct lmb_list_head *lmb_list;
+ int rc = 0;
+
+ build_numa_topology();
+ if (numa_enabled) {
+ /*
+ * Read the LMBs
+ * Link the LMBs to their node
+ * Update global counter
+ */
+ lmb_list = get_lmbs(LMB_NORMAL_SORT);
+ if (lmb_list == NULL) {
+ clear_numa_lmb_links();
+ return -1;
+ }
+
+ if (!numa.node_count) {
+ clear_numa_lmb_links();
+ free_lmbs(lmb_list);
+ return -EINVAL;
+ }
+
+ dump_lmb_info();
+
+ if (usr_action == REMOVE)
+ rc = numa_based_remove(count);
+
+ clear_numa_lmb_links();
+ free_lmbs(lmb_list);
+ numa_enabled = 0;
+ }
+ return rc;
+}
+
int do_mem_kernel_dlpar(void)
{
char cmdbuf[128];
int rc, offset;

-
- if (usr_action == REMOVE && usr_drc_count && !usr_drc_index) {
- build_numa_topology();
- if (numa_enabled) {
- if (!numa_based_remove(usr_drc_count))
- return 0;
-
+ /* If ADD, don't perform numa build unless debug statements required */
+ if ((usr_action == REMOVE && usr_drc_count && !usr_drc_index) ||
+ (usr_action == ADD && output_level >= DEBUG)) {
+ /*
+ * This function call builds numa topology and prints debug statements
+ * but also performs the remove operation. It iterates over DRC indices
+ * rather than letting the kernel decide which LMBs to remove. The add
+ * operation is not performed in this function call.
+ */
+
+ if (!numa_mem_dlpar(usr_drc_count) && usr_action == REMOVE)
/*
- * If the NUMA based removal failed, lets try the legacy
- * way.
- */
+ * If numa operation was a success and we are removing then
+ * processing is finished.
+ */
+ return 0;
+ else if (usr_action == REMOVE)
say(WARN, "Can't do NUMA based removal operation.\n");
- }
+ /*
+ * If the NUMA based removal failed, lets try the legacy
+ * way.
+ */
}

offset = sprintf(cmdbuf, "%s ", "memory");
@@ -1791,8 +1832,9 @@ int do_mem_kernel_dlpar(void)
*/
if (rc)
report_resource_count(0);
- else
+ else {
report_resource_count(usr_drc_count);
+ }

return rc;
}
--
2.47.1

Tyrel Datwyler

<tyreld@linux.ibm.com>
unread,
Aug 15, 2025, 4:49:37 PMAug 15
to ryancwmd, powerpc-utils-devel@googlegroups.com, Mingming Cao, Haren Myneni
On 7/21/25 12:55 PM, ryancwmd wrote:
> Debug info on LMB breakdown before add memory operation.
>
> Previously, drmgr gave no report of LMB count for each node during an
> add operation, only for remove. Sample below:

Generally, when providing example output you would include unpatched example,
and then the patched example to differentiate the change in behavior.
>
> drmgr: -a -c mem -q 2 -d 5
> Validating Memory DLPAR capability...yes.
> Found 254 LMBs currently allocated
> node 0 48 CPUs 245 LMBs
> node 2 0 CPUs 0 LMBs
> Initiating kernel DLPAR "memory add count 2"
> Success
> DR_TOTAL_RESOURCES=2
>
> Patch additionally addresses the following:

As I addressed in your other patch if it feels like there are multiple logical
changes its best to break those out into individual patches. This is true even
if each patch is dependent on the previous in building to the final result.
These patches as a group would then be submitted as a logical patchset.

For example you would probably start be creating a helper that is called just to
dump NODE LMB debug info, followed by the next that moves the LMB NODE list
gathering to its own function, etc.

>
> Unifies processing logic between add & remove operations. Code now utilizes the
> same numa tunnel for both add and remove operations given necessary conditions.
>
> Ensures proper reset of node lmb count upon cleanup call.
>
> Signed-off-by: Ryan Whittaker <ryan...@linux.ibm.com>
> Reviewed-by: Mingming Cao <m...@linux.ibm.com>
> Reviewed-by: Haren Myneni <ha...@linux.ibm.com>
> ---
> src/drmgr/drslot_chrp_mem.c | 118 ++++++++++++++++++++++++------------
> 1 file changed, 80 insertions(+), 38 deletions(-)
>
> diff --git a/src/drmgr/drslot_chrp_mem.c b/src/drmgr/drslot_chrp_mem.c
> index d37ee80..6206492 100644
> --- a/src/drmgr/drslot_chrp_mem.c
> +++ b/src/drmgr/drslot_chrp_mem.c
> @@ -903,7 +903,7 @@ add_device_tree_lmb(struct dr_node *lmb, struct lmb_list_head *lmb_list)
> release_drc(lmb->drc_index, MEM_DEV);
> return -1;
> }
> -
> +

Not sure if you were intending to include spurious white space cleanup or not,
but it is generally frowned upon unless its in the area of context that the
patch is already making changes.

1.) It's unnecessary churn
2.) It can create contextual merge conflicts when applying fixes to the same
file from multiple submitters.

This applies to coding style, formatting, and other trivial changes that are
outside of the direct functional change your patch is addressing.


> if (lmb_list->drconf_buf) {
> errno = 0;
> rc = update_drconf_node(lmb, lmb_list, ADD);
> @@ -957,7 +957,7 @@ add_device_tree_lmb(struct dr_node *lmb, struct lmb_list_head *lmb_list)
> }
>
> rc = get_mem_scns(lmb);
> - if (rc)
> + if (rc)

ditto
The more I look at this code as it stands today, and consider the change you are
proposing I have some thoughts and how I think this should evolve.

We currently aren't doing NUMA enabled add, but it seems like there was probably
a desire to eventually add that feature. So, I'm not really sure we need such
complicated gating logic. I think I would like to see the following order of
execution logic.

Try and build the topology regardless
Check for numa_enabled
Build NODE LMB lists
Check for DEBUG
Dump NODE LMB info
Check for usr_drc_count
Do NUMA dlpar (just remove for now)
Cleanup NODE LMB lists
Fall back to legacy
This is inline with the trivial change comment I made above. I'll even go
further that the change of coding style isn't even consistent here. If you were
going to bother to brace the else clause why not also the initial if? Either way
this is irrelevant to the fix at hand and should be dropped.

-Tyrel

>
> return rc;
> }
> --
> 2.47.1
>

Reply all
Reply to author
Forward
0 new messages