GPF in rt6_uncached_list_flush

Dmitry Vyukov

unread,

Oct 12, 2015, 5:34:43 AM10/12/15

to David Miller, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, Patrick McHardy, net...@vger.kernel.org, LKML, syzk...@googlegroups.com, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Eric Dumazet, Maciej Żenczykowski

Hello,

The following program causes episodic crashes:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <sched.h>
#define CLONE_NEWNET 0x40000000
int main(void)
{
unshare(CLONE_NEWNET);
}

On commit dd36d7393d6310b0c1adefb22fba79c3cf8a577c
(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)

general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 1058 Comm: kworker/u8:1 Not tainted 4.3.0-rc2+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: netns cleanup_net
task: ffff880051c71a00 ti: ffff8800514f8000 task.ti: ffff8800514f8000
RIP: 0010:[<ffffffff82a6dad1>] [<ffffffff82a6dad1>] rt6_ifdown+0x481/0x740
RSP: 0018:ffff8800514ffaa0 EFLAGS: 00010246
RAX: dffffc0000000059 RBX: ffff88005107c580 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 000000000000000f RDI: ffff880052a1f340
RBP: ffff8800514ffb78 R08: 0000000000000000 R09: ffff8800514ffb10
R10: ffff88002d5b7dc0 R11: ffff88002ec07600 R12: ffff880051c11140
R13: ffff88005144af40 R14: 0000000000000000 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff88002f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000648056 CR3: 0000000003610000 CR4: 00000000000006f0
Stack:
00000000000002c8 1ffff1000a29ff5e dffffc0000000059 000000022d5b61c0
ffff880052a1f340 ffff880051c11140 ffff880052a1f348 ffff88005107c6d8
ffff88005107c598 0000000000000000 0000000041b58ab3 ffffffff83471ca6
Call Trace:
[<ffffffff82a6f830>] fib6_net_exit+0x20/0x100 net/ipv6/ip6_fib.c:1847
[<ffffffff8271fd9e>] ops_exit_list.isra.6+0xae/0x150
net/core/net_namespace.c:134
[<ffffffff82722c5d>] cleanup_net+0x3cd/0x730
net/core/net_namespace.c:431 (discriminator 3)
[<ffffffff81142161>] process_one_work+0x6d1/0x1370 kernel/workqueue.c:2030
[<ffffffff81142ee3>] worker_thread+0xe3/0x1300 kernel/workqueue.c:2162
[<ffffffff811552e7>] kthread+0x1e7/0x260 kernel/kthread.c:209
[<ffffffff82e4283f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:475
Code: 89 95 50 ff ff ff e8 6f 41 9f fe 48 8b 95 50 ff ff ff 48 39 95
70 ff ff ff 0f 84 d5 fe ff ff e8 56 41 9f fe 48 8b 85 38 ff ff ff <80>
38 00 0f 85 9b 01 00 00 48 8b 85 70 ff ff ff 48 8b 90 c8 02
RIP [< inline >] __read_once_size include/linux/compiler.h:207
RIP [< inline >] in6_dev_get include/net/addrconf.h:281
RIP [< inline >] rt6_uncached_list_flush_dev net/ipv6/route.c:156
RIP [<ffffffff82a6dad1>] rt6_ifdown+0x481/0x740 net/ipv6/route.c:2621
RSP <ffff8800514ffaa0>
---[ end trace 113e678e9b762d96 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

The crash happens because loopback_dev is NULL in
rt6_uncached_list_flush_dev(). The crash happens only if there is an
uncached route when the interface in destroyed.

I've tried to run the program with the following patch applied:

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index dc7d970..fd7e88d 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -144,6 +144,8 @@ static int loopback_dev_init(struct net_device *dev)

static void loopback_dev_free(struct net_device *dev)
{
+ pr_err("loopback_dev_free %p = %p",
&dev_net(dev)->loopback_dev, dev_net(dev)->loopback_dev);
+ WARN_ON(1);
dev_net(dev)->loopback_dev = NULL;
free_percpu(dev->lstats);
free_netdev(dev);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f204089..fd558a4 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -142,6 +142,8 @@ static void rt6_uncached_list_flush_dev(struct net
*net, struct net_device *dev)
struct net_device *loopback_dev = net->loopback_dev;
int cpu;

+ pr_err("rt6_uncached_list_flush_dev %p = %p",
&net->loopback_dev, net->loopback_dev);
+ WARN_ON(1);
for_each_possible_cpu(cpu) {
struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
struct rt6_info *rt;

And it shows that the loopback device is destroyed before
rt6_uncached_list_flush_dev is executed, while
rt6_uncached_list_flush_dev seems to assume that loopback_dev is alive
when it is called:

[ 197.812174] loopback_dev_free ffff88003d288150 = ffff88003e1d67c0
[ 197.812890] ------------[ cut here ]------------
[ 197.813389] WARNING: CPU: 2 PID: 1044 at drivers/net/loopback.c:148
loopback_dev_free+0x3c/0x70()
[ 197.814186] Modules linked in:
[ 197.814478] CPU: 2 PID: 1044 Comm: kworker/u8:1 Tainted: G W
4.3.0-rc3+ #45
[ 197.815186] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 197.815886] Workqueue: netns cleanup_net
[ 197.816256] ffffffff81c27c67 ffff88003d923c50 ffffffff812fe8d6
0000000000000000
[ 197.816949] ffff88003d923c88 ffffffff81051ff1 ffff88003e1d67c0
ffff88003e1d6bd0
[ 197.817662] 00000000fffe70d4 00000000fffe70d4 00000000000003e8
ffff88003d923c98
[ 197.818367] Call Trace:
[ 197.818589] [<ffffffff812fe8d6>] dump_stack+0x44/0x5e
[ 197.819048] [<ffffffff81051ff1>] warn_slowpath_common+0x81/0xc0
[ 197.819573] [<ffffffff810520e5>] warn_slowpath_null+0x15/0x20
[ 197.820088] [<ffffffff8151e36c>] loopback_dev_free+0x3c/0x70
[ 197.820588] [<ffffffff81698c71>] netdev_run_todo+0x211/0x300
[ 197.821096] [<ffffffff816915b2>] ? rollback_registered_many+0x222/0x2b0
[ 197.823461] [<ffffffff816a2dc9>] rtnl_unlock+0x9/0x10
[ 197.824109] [<ffffffff81692683>] default_device_exit_batch+0x133/0x150
[ 197.824924] [<ffffffff81087f10>] ? __wake_up_sync+0x10/0x10
[ 197.825608] [<ffffffff8168b97d>] ops_exit_list.isra.6+0x4d/0x60
[ 197.826335] [<ffffffff8168c87c>] cleanup_net+0x17c/0x230
[ 197.826963] [<ffffffff81067c7e>] process_one_work+0x13e/0x3c0
[ 197.827645] [<ffffffff81068015>] worker_thread+0x115/0x450
[ 197.828305] [<ffffffff81856241>] ? __schedule+0x311/0x870
[ 197.828935] [<ffffffff81067f00>] ? process_one_work+0x3c0/0x3c0
[ 197.829642] [<ffffffff8106d044>] kthread+0xc4/0xe0
[ 197.830220] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
[ 197.830853] [<ffffffff81859e6f>] ret_from_fork+0x3f/0x70
[ 197.831486] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
[ 197.832129] ---[ end trace 54eee6f54dedacca ]---

[ 197.835015] IPv6: rt6_uncached_list_flush_dev ffff88003d288150 =
(null)
[ 197.835641] ------------[ cut here ]------------
[ 197.836083] WARNING: CPU: 2 PID: 1044 at net/ipv6/route.c:146
rt6_ifdown+0xc7/0x220()
[ 197.836738] Modules linked in:
[ 197.837022] CPU: 2 PID: 1044 Comm: kworker/u8:1 Tainted: G W
4.3.0-rc3+ #45
[ 197.837714] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 197.838395] Workqueue: netns cleanup_net
[ 197.838738] ffffffff81c3ac07 ffff88003d923cc8 ffffffff812fe8d6
0000000000000000
[ 197.839410] ffff88003d923d00 ffffffff81051ff1 0000000000000000
ffff88003d288000
[ 197.840079] ffffffff82119b98 0000000000000000 0000000000000000
ffff88003d923d10
[ 197.840740] Call Trace:
[ 197.840952] [<ffffffff812fe8d6>] dump_stack+0x44/0x5e
[ 197.841391] [<ffffffff81051ff1>] warn_slowpath_common+0x81/0xc0
[ 197.841848] [<ffffffff810520e5>] warn_slowpath_null+0x15/0x20
[ 197.842297] [<ffffffff81761407>] rt6_ifdown+0xc7/0x220
[ 197.842701] [<ffffffff8177d020>] ? xfrm6_net_exit+0x30/0x40
[ 197.843140] [<ffffffff81761c0f>] fib6_net_exit+0xf/0x60
[ 197.843545] [<ffffffff8168b963>] ops_exit_list.isra.6+0x33/0x60
[ 197.843999] [<ffffffff8168c87c>] cleanup_net+0x17c/0x230
[ 197.844420] [<ffffffff81067c7e>] process_one_work+0x13e/0x3c0
[ 197.844867] [<ffffffff81068015>] worker_thread+0x115/0x450
[ 197.845324] [<ffffffff81856241>] ? __schedule+0x311/0x870
[ 197.845761] [<ffffffff81067f00>] ? process_one_work+0x3c0/0x3c0
[ 197.846288] [<ffffffff8106d044>] kthread+0xc4/0xe0
[ 197.846698] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
[ 197.847161] [<ffffffff81859e6f>] ret_from_fork+0x3f/0x70
[ 197.847612] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
[ 197.848074] ---[ end trace 54eee6f54dedaccb ]---

I use plain defconfig/kvmconfig.

Found with syzkaller fuzzer.

Eric Dumazet

unread,

Oct 12, 2015, 8:24:15 AM10/12/15

to Dmitry Vyukov, Eric W. Biederman, Martin KaFai Lau, David Miller, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, Patrick McHardy, net...@vger.kernel.org, LKML, syzk...@googlegroups.com, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Eric Dumazet, Maciej Żenczykowski

> --

CC Eric W. Biederman <ebie...@xmission.com>, who is the expert in this
area.

Thanks.

Bug was added in 8d0b94afdca84
("ipv6: Keep track of DST_NOCACHE routes in case of iface
down/unregister")

CC Martin KaFai Lau <ka...@fb.com>

Eric W. Biederman

unread,

Oct 12, 2015, 12:03:20 PM10/12/15

to Eric Dumazet, Dmitry Vyukov, Martin KaFai Lau, David Miller, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, Patrick McHardy, net...@vger.kernel.org, LKML, syzk...@googlegroups.com, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Eric Dumazet, Maciej Żenczykowski

So I don't quite know what it was intended that rt6_uncached_list_flush
was intended to be doing but that code is not correct by a country mile.

What the code attempts to do is to flush every uncached entry when any
network namespace exits. Which makes no sense whatsoever.

Further we are past the point of network devices even existing in a
network namespace so it does not even make sense to attempt to do
anything with network devices.

So given the fact that there is nothing for rt6_unchaced_list_flush to
do in this case and there is no sensible thing for rt6_uncached_list to
do when when dev == NULL. I recommend removing the dev == NULL support
and just not calling rt6_uncached_list_flish when dev == NULL.

Eric

Eric W. Biederman

unread,

Oct 12, 2015, 12:10:30 PM10/12/15

to David Miller, Dmitry Vyukov, Martin KaFai Lau, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, Patrick McHardy, net...@vger.kernel.org, LKML, syzk...@googlegroups.com, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Eric Dumazet, Maciej Żenczykowski, Eric Dumazet

As originally written rt6_uncached_list_flush_dev makes no sense when
called with dev == NULL as it attempts to flush all uncached routes
regardless of network namespace when dev == NULL. Which is simply
incorrect behavior.

Furthermore at the point rt6_ifdown is called with dev == NULL no more
network devices exist in the network namespace so even if the code in
rt6_uncached_list_flush_dev were to attempt something sensible it
would be meaningless.

Therefore remove support in rt6_uncached_list_flush_dev for handling
network devices where dev == NULL, and only call rt6_uncached_list_flush_dev
when rt6_ifdown is called with a network device.

Fixes: 8d0b94afdca8 ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
Signed-off-by: "Eric W. Biederman" <ebie...@xmission.com>
---
net/ipv6/route.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b8f85f143b69..1c45d7d90718 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -143,6 +143,9 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)

struct net_device *loopback_dev = net->loopback_dev;
int cpu;

+ if (dev == loopback_dev)
+ return;
+

for_each_possible_cpu(cpu) {
struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
struct rt6_info *rt;

@@ -152,14 +155,12 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
struct inet6_dev *rt_idev = rt->rt6i_idev;
struct net_device *rt_dev = rt->dst.dev;

- if (rt_idev && (rt_idev->dev == dev || !dev) &&
- rt_idev->dev != loopback_dev) {
+ if (rt_idev->dev == dev) {
rt->rt6i_idev = in6_dev_get(loopback_dev);
in6_dev_put(rt_idev);
}

- if (rt_dev && (rt_dev == dev || !dev) &&
- rt_dev != loopback_dev) {
+ if (rt_dev == dev) {
rt->dst.dev = loopback_dev;
dev_hold(rt->dst.dev);
dev_put(rt_dev);
@@ -2600,7 +2601,8 @@ void rt6_ifdown(struct net *net, struct net_device *dev)

fib6_clean_all(net, fib6_ifdown, &adn);
icmp6_clean_all(fib6_ifdown, &adn);
- rt6_uncached_list_flush_dev(net, dev);
+ if (dev)
+ rt6_uncached_list_flush_dev(net, dev);
}

struct rt6_mtu_change_arg {
--
2.2.1

Martin KaFai Lau

unread,

Oct 12, 2015, 8:02:29 PM10/12/15

to Eric W. Biederman, David Miller, Dmitry Vyukov, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, Patrick McHardy, net...@vger.kernel.org, LKML, syzk...@googlegroups.com, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Eric Dumazet, Maciej Żenczykowski, Eric Dumazet

On Mon, Oct 12, 2015 at 11:02:08AM -0500, Eric W. Biederman wrote:
>
> As originally written rt6_uncached_list_flush_dev makes no sense when
> called with dev == NULL as it attempts to flush all uncached routes
> regardless of network namespace when dev == NULL. Which is simply
> incorrect behavior.

Thanks for fixing it.

Reviewed-by: Martin KaFai Lau <ka...@fb.com>

I also tested the following cases with the presence of DST_NOCACHE entries:
1. rmmod e1000.ko while running netperf
2. unshare(CLONE_NEWNET) as reported by Dmitry

Tested-by: Martin KaFai Lau <ka...@fb.com>

David Miller

unread,

Oct 13, 2015, 7:37:25 AM10/13/15

to ebie...@xmission.com, dvy...@google.com, ka...@fb.com, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, ka...@trash.net, net...@vger.kernel.org, linux-...@vger.kernel.org, syzk...@googlegroups.com, k...@google.com, gli...@google.com, andre...@google.com, sasha...@oracle.com, edum...@google.com, ma...@google.com, eric.d...@gmail.com

From: ebie...@xmission.com (Eric W. Biederman)
Date: Mon, 12 Oct 2015 11:02:08 -0500

>
> As originally written rt6_uncached_list_flush_dev makes no sense when
> called with dev == NULL as it attempts to flush all uncached routes
> regardless of network namespace when dev == NULL. Which is simply
> incorrect behavior.
>
> Furthermore at the point rt6_ifdown is called with dev == NULL no more
> network devices exist in the network namespace so even if the code in
> rt6_uncached_list_flush_dev were to attempt something sensible it
> would be meaningless.
>
> Therefore remove support in rt6_uncached_list_flush_dev for handling
> network devices where dev == NULL, and only call rt6_uncached_list_flush_dev
> when rt6_ifdown is called with a network device.
>
> Fixes: 8d0b94afdca8 ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
> Signed-off-by: "Eric W. Biederman" <ebie...@xmission.com>

Applied and queued up for -stable, thanks.

Reply all

Reply to author

Forward

GPF in rt6_uncached_list_flush_dev

Dmitry Vyukov

Eric Dumazet

Eric W. Biederman

Eric W. Biederman

Martin KaFai Lau

David Miller