lost connection to test machine (4)

49 views
Skip to first unread message

syzbot

unread,
Feb 12, 2018, 11:00:05 AM2/12/18
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot hit the following crash on bpf-next commit
617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +0000)
Merge tag 'usercopy-v4.16-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

So far this crash happened 898 times on bpf-next, net-next, upstream.
C reproducer is attached.
syzkaller reproducer is attached.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+adb03f...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.



---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
raw.log.txt
repro.syz.txt
repro.c.txt
config.txt

Dmitry Vyukov

unread,
Feb 12, 2018, 11:03:29 AM2/12/18
to syzbot, Alexei Starovoitov, Daniel Borkmann, netdev, LKML, syzkall...@googlegroups.com
On Mon, Feb 12, 2018 at 5:00 PM, syzbot
<syzbot+adb03f...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on bpf-next commit
> 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +0000)
> Merge tag 'usercopy-v4.16-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
>
> So far this crash happened 898 times on bpf-next, net-next, upstream.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.


The reproducer first causes several tasks spending minutes at this stack:

[ 110.762189] NMI backtrace for cpu 2
[ 110.762206] CPU: 2 PID: 3760 Comm: syz-executor Not tainted 4.15.0+ #96
[ 110.762210] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 110.762224] RIP: 0010:mutex_spin_on_owner+0x303/0x420
[ 110.762232] INFO: NMI handler (nmi_cpu_backtrace_handler) took too
long to run: 1.103 msecs
[ 110.762237] RSP: 0018:ffff88005be470e8 EFLAGS: 00000246
[ 110.762268] RAX: ffff88006ca00000 RBX: 0000000000000000 RCX: ffffffff81554165
[ 110.762275] RDX: 0000000000000001 RSI: 1ffffffff0d97884 RDI: 0000000000000000
[ 110.762281] RBP: ffff88005be47210 R08: dffffc0000000001 R09: fffffbfff0db2b75
[ 110.762286] R10: fffffbfff0db2b74 R11: ffffffff86d95ba7 R12: ffffffff86d95ba0
[ 110.762292] R13: ffffed000b7c8e25 R14: dffffc0000000000 R15: ffff880064691040
[ 110.762300] FS: 00007f84ed029700(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
[ 110.762305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 110.762311] CR2: 00007fd565f7b1b0 CR3: 000000005bddf002 CR4: 00000000001606e0
[ 110.762316] Call Trace:
[ 110.762383] __mutex_lock.isra.1+0x97d/0x1440
[ 110.762659] __mutex_lock_slowpath+0xe/0x10
[ 110.762668] mutex_lock+0x3e/0x50
[ 110.762677] pcpu_alloc+0x846/0xfe0
[ 110.762778] __alloc_percpu_gfp+0x27/0x30
[ 110.762801] array_map_alloc+0x484/0x690
[ 110.762832] SyS_bpf+0xa27/0x4770
[ 110.763190] do_syscall_64+0x297/0x760
[ 110.763260] entry_SYSCALL_64_after_hwframe+0x21/0x86

and later machine dies with:

[ 191.484308] Kernel panic - not syncing: Out of memory and no
killable processes...
[ 191.484308]
[ 191.485740] CPU: 3 PID: 746 Comm: kworker/3:1 Not tainted 4.15.0+ #96
[ 191.486761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 191.488071] Workqueue: events pcpu_balance_workfn
[ 191.488821] Call Trace:
[ 191.489299] dump_stack+0x175/0x225
[ 191.490590] panic+0x22a/0x4be
[ 191.493061] out_of_memory.cold.31+0x20/0x21
[ 191.496380] __alloc_pages_slowpath+0x1d98/0x28a0
[ 191.503616] __alloc_pages_nodemask+0x89c/0xc60
[ 191.507876] pcpu_populate_chunk+0x1fd/0x9b0
[ 191.510114] pcpu_balance_workfn+0x1019/0x1450
[ 191.517804] process_one_work+0x9d5/0x1460
[ 191.522714] worker_thread+0x1cc/0x1410
[ 191.529319] kthread+0x304/0x3c0


The original message with attachments is here:
https://groups.google.com/d/msg/syzkaller-bugs/Km3xEZu9zzU/rO-7XuwZAgAJ


> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+adb03f...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzk...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a113f8734783e94056505f8fd%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Daniel Borkmann

unread,
Feb 12, 2018, 12:00:21 PM2/12/18
to denni...@gmail.com, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com, t...@kernel.org
[ +Dennis, +Tejun ]

Looks like we're stuck in percpu allocator with key/value size of 4 bytes
each and large number of entries (max_entries) in the reproducer in above
link.

Could we have some __GFP_NORETRY semantics and let allocations fail instead
of triggering OOM killer?

Tejun Heo

unread,
Feb 12, 2018, 12:03:28 PM2/12/18
to Daniel Borkmann, denni...@gmail.com, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com
Hello, Daniel.

On Mon, Feb 12, 2018 at 06:00:13PM +0100, Daniel Borkmann wrote:
> [ +Dennis, +Tejun ]
>
> Looks like we're stuck in percpu allocator with key/value size of 4 bytes
> each and large number of entries (max_entries) in the reproducer in above
> link.
>
> Could we have some __GFP_NORETRY semantics and let allocations fail instead
> of triggering OOM killer?

For some part, maybe, but not generally. The virt area allocation
goes down to page table allocation which is hard coded to use
GFP_KERNEL in arch mm code.

Thanks.

--
tejun

Tejun Heo

unread,
Feb 12, 2018, 3:05:52 PM2/12/18
to Daniel Borkmann, denni...@gmail.com, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com
So, the following should convert majority of allocations to use
__GFP_NORETRY. It doesn't catch everything but should significantly
lower the probability of hitting this and put this on the same footing
as vmalloc. Can you see whether this is enough?

Note that this patch isn't upstreamable. We definitely want to
restrict this to the rebalance path, but it should be good enough for
testing.

Thanks.

diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9158e5a..0b4739f 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -81,7 +81,7 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
struct page **pages, int page_start, int page_end)
{
- const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
+ const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_NORETRY;
unsigned int cpu, tcpu;
int i;

Eric Dumazet

unread,
Feb 13, 2018, 8:35:30 AM2/13/18
to Tejun Heo, Daniel Borkmann, denni...@gmail.com, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com
Also I would consider using this fix as I had warnings of cpus being
stuck there for more than 50 ms :


diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9158e5a81391ced4e268e3d5dd9879c2bc7280ce..6309b01ceb357be01e857e5f899429403836f41f 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -92,6 +92,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
*pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
if (!*pagep)
goto err;
+ cond_resched();
}
}
return 0;


Dennis Zhou

unread,
Feb 13, 2018, 12:34:55 PM2/13/18
to Eric Dumazet, Tejun Heo, Daniel Borkmann, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com
Hi Eric,

On Tue, Feb 13, 2018 at 05:35:26AM -0800, Eric Dumazet wrote:
>
> Also I would consider using this fix as I had warnings of cpus being
> stuck there for more than 50 ms :
>
>
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index 9158e5a81391ced4e268e3d5dd9879c2bc7280ce..6309b01ceb357be01e857e5f899429403836f41f 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -92,6 +92,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
> *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
> if (!*pagep)
> goto err;
> + cond_resched();
> }
> }
> return 0;
>
>

This function gets called from pcpu_populate_chunk while holding the
pcpu_alloc_mutex and is called from two scenarios. First, when an
allocation occurs to a place without backing pages, and second when the
workqueue item is scheduled to replenish the number of empty pages. So,
I don't think this is a good idea.

My understanding is if we're seeing warnings here, that means we're
struggling to find backing pages. I believe adding __GFP_NORETRY on the
workqueue path as Tejun mentioned above would help with warnings as
well, but not if they are caused by the allocation path.

Thanks,
Dennis

Eric Dumazet

unread,
Feb 13, 2018, 12:49:30 PM2/13/18
to Dennis Zhou, Tejun Heo, Daniel Borkmann, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com
On Tue, 2018-02-13 at 11:34 -0600, Dennis Zhou wrote:
> Hi Eric,
>
> On Tue, Feb 13, 2018 at 05:35:26AM -0800, Eric Dumazet wrote:
> >
> > Also I would consider using this fix as I had warnings of cpus being
> > stuck there for more than 50 ms :
> >
> >
> > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> > index 9158e5a81391ced4e268e3d5dd9879c2bc7280ce..6309b01ceb357be01e857e5f899429403836f41f 100644
> > --- a/mm/percpu-vm.c
> > +++ b/mm/percpu-vm.c
> > @@ -92,6 +92,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
> > *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
> > if (!*pagep)
> > goto err;
> > + cond_resched();
> > }
> > }
> > return 0;
> >
> >
>
> This function gets called from pcpu_populate_chunk while holding the
> pcpu_alloc_mutex and is called from two scenarios. First, when an
> allocation occurs to a place without backing pages, and second when the
> workqueue item is scheduled to replenish the number of empty pages. So,
> I don't think this is a good idea.
>

That _is_ a good idea, we do this already in vmalloc(), and vmalloc()
can absolutely be called while some mutex(es) are held.


> My understanding is if we're seeing warnings here, that means we're
> struggling to find backing pages. I believe adding __GFP_NORETRY on the
> workqueue path as Tejun mentioned above would help with warnings as
> well, but not if they are caused by the allocation path.
>

That is a separate concern.

My patch simply avoids latency spikes when huge percpu allocations are
happening, on systems with say 1024 cpus.


Dennis Zhou

unread,
Feb 13, 2018, 1:13:57 PM2/13/18
to Eric Dumazet, Tejun Heo, Daniel Borkmann, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com
I see. I misunderstood thinking this was for the same concern.

Thanks,
Dennis

Dennis Zhou

unread,
Feb 14, 2018, 12:15:30 PM2/14/18
to Daniel Borkmann, Dmitry Vyukov, syzbot, Alexei Starovoitov, netdev, LKML, syzkall...@googlegroups.com, t...@kernel.org
On Mon, Feb 12, 2018 at 06:00:13PM +0100, Daniel Borkmann wrote:
>
> [ +Dennis, +Tejun ]
>
> Looks like we're stuck in percpu allocator with key/value size of 4 bytes
> each and large number of entries (max_entries) in the reproducer in above
> link.
>
> Could we have some __GFP_NORETRY semantics and let allocations fail instead
> of triggering OOM killer?

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git master

As I don't have a great idea how to best test this, I'm going to just
run it against syzbot. Locally simple allocation tests seem fine. Though
it may require the second patch as well to enable pass through of the
following flags.

I will send a patchset if the syzbot results look good. This changes the
balance path to use __GFP_NORETRY and __GFP_NOWARN.

Thanks,
Dennis

---
mm/percpu-km.c | 8 ++++----
mm/percpu-vm.c | 18 +++++++++++-------
mm/percpu.c | 45 ++++++++++++++++++++++++++++-----------------
3 files changed, 43 insertions(+), 28 deletions(-)

diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index d2a7664..0d88d7b 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -34,7 +34,7 @@
#include <linux/log2.h>

static int pcpu_populate_chunk(struct pcpu_chunk *chunk,
- int page_start, int page_end)
+ int page_start, int page_end, gfp_t gfp)
{
return 0;
}
@@ -45,18 +45,18 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk,
/* nada */
}

-static struct pcpu_chunk *pcpu_create_chunk(void)
+static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
{
const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;
struct pcpu_chunk *chunk;
struct page *pages;
int i;

- chunk = pcpu_alloc_chunk();
+ chunk = pcpu_alloc_chunk(gfp);
if (!chunk)
return NULL;

- pages = alloc_pages(GFP_KERNEL, order_base_2(nr_pages));
+ pages = alloc_pages(gfp | GFP_KERNEL, order_base_2(nr_pages));
if (!pages) {
pcpu_free_chunk(chunk);
return NULL;
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9158e5a..ea9906a 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -37,7 +37,7 @@ static struct page **pcpu_get_pages(void)
lockdep_assert_held(&pcpu_alloc_mutex);

if (!pages)
- pages = pcpu_mem_zalloc(pages_size);
+ pages = pcpu_mem_zalloc(pages_size, 0);
return pages;
}

@@ -73,18 +73,21 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
* @pages: array to put the allocated pages into, indexed by pcpu_page_idx()
* @page_start: page index of the first page to be allocated
* @page_end: page index of the last page to be allocated + 1
+ * @gfp: allocation flags passed to the underlying allocator
*
* Allocate pages [@page_start,@page_end) into @pages for all units.
* The allocation is for @chunk. Percpu core doesn't care about the
* content of @pages and will pass it verbatim to pcpu_map_pages().
*/
static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
- struct page **pages, int page_start, int page_end)
+ struct page **pages, int page_start, int page_end,
+ gfp_t gfp)
{
- const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
unsigned int cpu, tcpu;
int i;

+ gfp |= GFP_KERNEL | __GFP_HIGHMEM;
+
for_each_possible_cpu(cpu) {
for (i = page_start; i < page_end; i++) {
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
@@ -262,6 +265,7 @@ static void pcpu_post_map_flush(struct pcpu_chunk *chunk,
* @chunk: chunk of interest
* @page_start: the start page
* @page_end: the end page
+ * @gfp: allocation flags passed to the underlying memory allocator
*
* For each cpu, populate and map pages [@page_start,@page_end) into
* @chunk.
@@ -270,7 +274,7 @@ static void pcpu_post_map_flush(struct pcpu_chunk *chunk,
* pcpu_alloc_mutex, does GFP_KERNEL allocation.
*/
static int pcpu_populate_chunk(struct pcpu_chunk *chunk,
- int page_start, int page_end)
+ int page_start, int page_end, gfp_t gfp)
{
struct page **pages;

@@ -278,7 +282,7 @@ static int pcpu_populate_chunk(struct pcpu_chunk *chunk,
if (!pages)
return -ENOMEM;

- if (pcpu_alloc_pages(chunk, pages, page_start, page_end))
+ if (pcpu_alloc_pages(chunk, pages, page_start, page_end, gfp))
return -ENOMEM;

if (pcpu_map_pages(chunk, pages, page_start, page_end)) {
@@ -325,12 +329,12 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk,
pcpu_free_pages(chunk, pages, page_start, page_end);
}

-static struct pcpu_chunk *pcpu_create_chunk(void)
+static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
{
struct pcpu_chunk *chunk;
struct vm_struct **vms;

- chunk = pcpu_alloc_chunk();
+ chunk = pcpu_alloc_chunk(gfp);
if (!chunk)
return NULL;

diff --git a/mm/percpu.c b/mm/percpu.c
index 50e7fdf..ecb9193 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -447,10 +447,12 @@ static void pcpu_next_fit_region(struct pcpu_chunk *chunk, int alloc_bits,
/**
* pcpu_mem_zalloc - allocate memory
* @size: bytes to allocate
+ * @gfp: allocation flags
*
* Allocate @size bytes. If @size is smaller than PAGE_SIZE,
- * kzalloc() is used; otherwise, vzalloc() is used. The returned
- * memory is always zeroed.
+ * kzalloc() is used; otherwise, the equivalent of vzalloc() is used.
+ * This is to facilitate passing through flags such as __GFP_NOREPLY.
+ * The returned memory is always zeroed.
*
* CONTEXT:
* Does GFP_KERNEL allocation.
@@ -458,15 +460,16 @@ static void pcpu_next_fit_region(struct pcpu_chunk *chunk, int alloc_bits,
* RETURNS:
* Pointer to the allocated area on success, NULL on failure.
*/
-static void *pcpu_mem_zalloc(size_t size)
+static void *pcpu_mem_zalloc(size_t size, gfp_t gfp)
{
if (WARN_ON_ONCE(!slab_is_available()))
return NULL;

if (size <= PAGE_SIZE)
- return kzalloc(size, GFP_KERNEL);
+ return kzalloc(size, gfp | GFP_KERNEL);
else
- return vzalloc(size);
+ return __vmalloc(size, gfp | GFP_KERNEL | __GFP_ZERO,
+ PAGE_KERNEL);
}

/**
@@ -1154,12 +1157,12 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
return chunk;
}

-static struct pcpu_chunk *pcpu_alloc_chunk(void)
+static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
{
struct pcpu_chunk *chunk;
int region_bits;

- chunk = pcpu_mem_zalloc(pcpu_chunk_struct_size);
+ chunk = pcpu_mem_zalloc(pcpu_chunk_struct_size, gfp);
if (!chunk)
return NULL;

@@ -1168,17 +1171,17 @@ static struct pcpu_chunk *pcpu_alloc_chunk(void)
region_bits = pcpu_chunk_map_bits(chunk);

chunk->alloc_map = pcpu_mem_zalloc(BITS_TO_LONGS(region_bits) *
- sizeof(chunk->alloc_map[0]));
+ sizeof(chunk->alloc_map[0]), gfp);
if (!chunk->alloc_map)
goto alloc_map_fail;

chunk->bound_map = pcpu_mem_zalloc(BITS_TO_LONGS(region_bits + 1) *
- sizeof(chunk->bound_map[0]));
+ sizeof(chunk->bound_map[0]), gfp);
if (!chunk->bound_map)
goto bound_map_fail;

chunk->md_blocks = pcpu_mem_zalloc(pcpu_chunk_nr_blocks(chunk) *
- sizeof(chunk->md_blocks[0]));
+ sizeof(chunk->md_blocks[0]), gfp);
if (!chunk->md_blocks)
goto md_blocks_fail;

@@ -1277,9 +1280,10 @@ static void pcpu_chunk_depopulated(struct pcpu_chunk *chunk,
* pcpu_addr_to_page - translate address to physical address
* pcpu_verify_alloc_info - check alloc_info is acceptable during init
*/
-static int pcpu_populate_chunk(struct pcpu_chunk *chunk, int off, int size);
+static int pcpu_populate_chunk(struct pcpu_chunk *chunk, int off, int size,
+ gfp_t gfp);
static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, int off, int size);
-static struct pcpu_chunk *pcpu_create_chunk(void);
+static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp);
static void pcpu_destroy_chunk(struct pcpu_chunk *chunk);
static struct page *pcpu_addr_to_page(void *addr);
static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai);
@@ -1421,7 +1425,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
}

if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
- chunk = pcpu_create_chunk();
+ chunk = pcpu_create_chunk(0);
if (!chunk) {
err = "failed to allocate new chunk";
goto fail;
@@ -1450,7 +1454,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
page_start, page_end) {
WARN_ON(chunk->immutable);

- ret = pcpu_populate_chunk(chunk, rs, re);
+ ret = pcpu_populate_chunk(chunk, rs, re, 0);

spin_lock_irqsave(&pcpu_lock, flags);
if (ret) {
@@ -1561,10 +1565,17 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align)
* pcpu_balance_workfn - manage the amount of free chunks and populated pages
* @work: unused
*
- * Reclaim all fully free chunks except for the first one.
+ * Reclaim all fully free chunks except for the first one. This is also
+ * responsible for maintaining the pool of empty populated pages. However,
+ * it is possible that this is called when physical memory is scarce causing
+ * OOM killer to be triggered. We should avoid doing so until an actual
+ * allocation causes the failure as it is possible that requests can be
+ * serviced from already backed regions.
*/
static void pcpu_balance_workfn(struct work_struct *work)
{
+ /* gfp flags passed to underlying allocators */
+ gfp_t gfp = __GFP_NOWARN | __GFP_NORETRY;
LIST_HEAD(to_free);
struct list_head *free_head = &pcpu_slot[pcpu_nr_slots - 1];
struct pcpu_chunk *chunk, *next;
@@ -1645,7 +1656,7 @@ static void pcpu_balance_workfn(struct work_struct *work)
chunk->nr_pages) {
int nr = min(re - rs, nr_to_pop);

- ret = pcpu_populate_chunk(chunk, rs, rs + nr);
+ ret = pcpu_populate_chunk(chunk, rs, rs + nr, gfp);
if (!ret) {
nr_to_pop -= nr;
spin_lock_irq(&pcpu_lock);
@@ -1662,7 +1673,7 @@ static void pcpu_balance_workfn(struct work_struct *work)

if (nr_to_pop) {
/* ran out of chunks to populate, create a new one and retry */
- chunk = pcpu_create_chunk();
+ chunk = pcpu_create_chunk(gfp);
if (chunk) {
spin_lock_irq(&pcpu_lock);
pcpu_chunk_relocate(chunk, -1);
--
1.8.3.1

syzbot

unread,
Feb 14, 2018, 12:28:02 PM2/14/18
to a...@kernel.org, dan...@iogearbox.net, denni...@gmail.com, dvy...@google.com, eric.d...@gmail.com, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, t...@kernel.org
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git/master commit
7928b2cbe55b2a410a0f5c1f154610059c57b1b2 (Sun Feb 11 23:04:29 2018 +0000)
Linux 4.16-rc1

compiler: gcc (GCC) 7.1.1 20170620
Patch is attached.
Kernel config is attached.
patch.diff
raw.log.txt
config.txt
Reply all
Reply to author
Forward
0 new messages