Previously, when a cpuless node’s lmb ratio was too small compared to the other
cpuless nodes, the drmgr would attempt to remove more lmbs than are available
in the node. This upset the ratio of lmbs removed, further breaking counters
racking how many lmbs to remove, causing more lmbs than requested to be
removed. Removing more lmbs than requested caused a decrement of an unsigned
int to suffer integer overflow, eventually leading to the drmgr removing all
available memory in the system.
Existing testing exposed the out of memory issue triggered by counter
overflow. The bug was recreated and testing confirmed that this patch
resolves the issue.
This patch contains the following:
Disallows requesting more lmbs than available in a node.
Adjusts removal todo per node dynamically based on the count value (only if the
amount intended to remove has changed in a particular iteration).
Prevents decrementing total counter by greater than the total count to prevent
integer overflow.
Changes which counter is decremented after unlinking nodes. The code previously
decremented the numa number of cpuless nodes instead of the number of cpuless
lmbs, causing integer overflow.
Signed-off-by: Ryan Whittaker <
ryan...@linux.ibm.com>
Reviewed-by: Mingming Cao <
m...@linux.ibm.com>
---
src/drmgr/drslot_chrp_mem.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/src/drmgr/drslot_chrp_mem.c b/src/drmgr/drslot_chrp_mem.c
index 6206492..a84d91b 100644
--- a/src/drmgr/drslot_chrp_mem.c
+++ b/src/drmgr/drslot_chrp_mem.c
@@ -1502,7 +1502,7 @@ static int remove_lmb_from_node(struct ppcnuma_node *node, uint32_t count)
if (node->n_cpus)
numa.lmb_count -= unlinked;
else
- numa.cpuless_node_count -= unlinked;
+ numa.cpuless_lmb_count -= unlinked;
if (!node->n_lmbs) {
node->ratio = 0; /* for sanity only */
@@ -1565,11 +1565,13 @@ static int remove_cpuless_lmbs(uint32_t count)
continue;
todo = (count * node->ratio) / 100;
- todo = min(todo, node->n_lmbs);
- /* Fix rounded value to 0 */
- if (!todo && node->n_lmbs)
+ /* Fix rounded value to 0 and fix if a 0 ratio has been processed */
+ if ((!todo && node->n_lmbs) || count - this_loop < todo)
todo = (count - this_loop);
+ /* Never request more than available */
+ todo = min(todo, node->n_lmbs);
+
if (todo)
todo = remove_lmb_from_node(node, todo);
@@ -1583,7 +1585,10 @@ static int remove_cpuless_lmbs(uint32_t count)
if (!this_loop)
break;
- count -= this_loop;
+ if (this_loop < count)
+ count -= this_loop;
+ else
+ count = 0;
}
say(DEBUG, "%d / %d LMBs removed from the CPU less nodes\n",
@@ -1751,6 +1756,7 @@ static int numa_mem_dlpar(uint32_t count)
* Link the LMBs to their node
* Update global counter
*/
+
lmb_list = get_lmbs(LMB_NORMAL_SORT);
if (lmb_list == NULL) {
clear_numa_lmb_links();
--
2.47.1