On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
>
> kernel config:
https://kt0755.github.io/etc/config_v2-4.19
> repro:
https://kt0755.github.io/etc/repro.c4074.c
>
> In the middle of page request, this arose because order is too large to handle
> (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.
Yes, we do only check the max order in the slow path. We have already
discussed something similar with Konstantin [1][2]. Basically kvmalloc
for a large size might get to the page allocator with an out of bound
order and warn during direct reclaim.
I am wondering whether really want to check for the order in the fast
path instead. I have hard time to imagine this could cause a measurable
impact.
The full patch is below
[1]
http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
[2]
http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz
From 7110220512be16054f2c8ee16bdd076c77c2456c Mon Sep 17 00:00:00 2001
From: Michal Hocko <
mho...@suse.com>
Date: Fri, 9 Nov 2018 09:35:29 +0100
Subject: [PATCH] mm, page_alloc: check for max order in hot path
Konstantin has noticed that kvmalloc might trigger the following warning
[Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
[...]
[Thu Nov 1 08:43:56 2018] Call Trace:
[Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90
[Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0
[Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310
[Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250
[Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0
[Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0
[Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90
[Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
[Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0
[Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90
[Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0
[Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70
[Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables]
[Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
[Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60
[Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0
[Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120
[Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already. This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile. A recent UBSAN report just underlines that
by the following report
Note that this is not a kvmalloc path. It is just that the fast path
really depends on having sanitzed order as well. Therefore move the
order check to the fast path.
Reported-by: Konstantin Khlebnikov <
khleb...@yandex-team.ru>
Reported-by: Kyungtae Kim <
kt0...@gmail.com>
Signed-off-by: Michal Hocko <
mho...@suse.com>
---
mm/page_alloc.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..9fc10a1029cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4060,17 +4060,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int reserve_flags;
- /*
- * In the slowpath, we sanity check order to avoid ever trying to
- * reclaim >= MAX_ORDER areas which will never succeed. Callers may
- * be using allocators in order of preference for an area that is
- * too large.
- */
- if (order >= MAX_ORDER) {
- WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
- return NULL;
- }
-
/*
* We also sanity check to catch abuse of atomic reserves being used by
* callers that are not in atomic context.
@@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = { };
+ /*
+ * In the slowpath, we sanity check order to avoid ever trying to
+ * reclaim >= MAX_ORDER areas which will never succeed. Callers may
+ * be using allocators in order of preference for an area that is
+ * too large.
+ */
+ if (order >= MAX_ORDER) {
+ WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
+ return NULL;
+ }
+
gfp_mask &= gfp_allowed_mask;
alloc_mask = gfp_mask;
if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
--
2.19.1
--
Michal Hocko
SUSE Labs