net: deadlock between ip_expire/sch_direct_xmit

58 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 14, 2017, 9:31:36 AM3/14/17
to David Miller, Eric Dumazet, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, j...@mojatatu.com, syzkaller
Hello,

I've got the following deadlock report while running syzkaller fuzzer
on net-next/92cd12c5ed432c5eebd2462d666772a8d8442c3b:


[ INFO: possible circular locking dependency detected ]
4.10.0+ #29 Not tainted
-------------------------------------------------------
modprobe/12392 is trying to acquire lock:
(_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock
include/linux/spinlock.h:299 [inline]
(_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock
include/linux/netdevice.h:3486 [inline]
(_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>]
sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180

but task is already holding lock:
(&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
include/linux/spinlock.h:299 [inline]
(&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&(&q->lock)->rlock){+.-...}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669
ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713
packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459
deliver_skb net/core/dev.c:1834 [inline]
dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890
xmit_one net/core/dev.c:2903 [inline]
dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923
sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182
__dev_xmit_skb net/core/dev.c:3092 [inline]
__dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308
neigh_output include/net/neighbour.h:478 [inline]
ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228
ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672
ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545
ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
__sys_sendmmsg+0x25c/0x750 net/socket.c:2075
SYSC_sendmmsg net/socket.c:2106 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2101
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

-> #0 (_xmit_ETHER#2){+.-...}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
__netif_tx_lock include/linux/netdevice.h:3486 [inline]
sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
__dev_xmit_skb net/core/dev.c:3092 [inline]
__dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
neigh_hh_output include/net/neighbour.h:468 [inline]
neigh_output include/net/neighbour.h:476 [inline]
ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307 [inline]
__run_timers+0x960/0xcf0 kernel/time/timer.c:1601
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
__do_softirq+0x31f/0xbe7 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:657 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
__read_once_size include/linux/compiler.h:254 [inline]
atomic_read arch/x86/include/asm/atomic.h:26 [inline]
rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
do_fault_around mm/memory.c:3231 [inline]
do_read_fault mm/memory.c:3265 [inline]
do_fault+0xbd5/0x2080 mm/memory.c:3370
handle_pte_fault mm/memory.c:3600 [inline]
__handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
__do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&q->lock)->rlock);
lock(_xmit_ETHER#2);
lock(&(&q->lock)->rlock);
lock(_xmit_ETHER#2);

*** DEADLOCK ***

10 locks held by modprobe/12392:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>]
__do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336
#1: (rcu_read_lock){......}, at: [<ffffffff8188cab6>]
filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324
#2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
spin_lock include/linux/spinlock.h:299 [inline]
#2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
pte_alloc_one_map mm/memory.c:2944 [inline]
#2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072
#3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
lockdep_copy_map include/linux/lockdep.h:175 [inline]
#3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258
#4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
include/linux/spinlock.h:299 [inline]
#4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
#5: (rcu_read_lock){......}, at: [<ffffffff8389a633>]
ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216
#6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock
include/linux/spinlock.h:309 [inline]
#6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock
net/ipv4/icmp.c:219 [inline]
#6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>]
icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681
#7: (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>]
ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198
#8: (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>]
__dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324
#9: (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at:
[<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423

stack backtrace:
CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
__netif_tx_lock include/linux/netdevice.h:3486 [inline]
sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
__dev_xmit_skb net/core/dev.c:3092 [inline]
__dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
neigh_hh_output include/net/neighbour.h:468 [inline]
neigh_output include/net/neighbour.h:476 [inline]
ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307 [inline]
__run_timers+0x960/0xcf0 kernel/time/timer.c:1601
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
__do_softirq+0x31f/0xbe7 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:657 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10
RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c
RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000
R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25
R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000
</IRQ>
rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
do_fault_around mm/memory.c:3231 [inline]
do_read_fault mm/memory.c:3265 [inline]
do_fault+0xbd5/0x2080 mm/memory.c:3370
handle_pte_fault mm/memory.c:3600 [inline]
__handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
__do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
RIP: 0033:0x7f83172f2786
RSP: 002b:00007fffe859ae80 EFLAGS: 00010293
RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970
RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000
R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040
R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000

Eric Dumazet

unread,
Mar 14, 2017, 10:41:46 AM3/14/17
to Dmitry Vyukov, David Miller, Eric Dumazet, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, j...@mojatatu.com, syzkaller
This looks a false positive.
An ICMP message can not be possibly fragmented.

Following patch should fix the lockdep issue.

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index bbe7f72db9c157ba2d6c5292637c2f58ad39a123..b3cdeec85f1f2c612c362590e828f50596a5c247 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -198,6 +198,7 @@ static void ip_expire(unsigned long arg)
qp = container_of((struct inet_frag_queue *) arg, struct ipq, q);
net = container_of(qp->q.net, struct net, ipv4.frags);

+ rcu_read_lock();
spin_lock(&qp->q.lock);

if (qp->q.flags & INET_FRAG_COMPLETE)
@@ -207,7 +208,7 @@ static void ip_expire(unsigned long arg)
__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);

if (!inet_frag_evicting(&qp->q)) {
- struct sk_buff *head = qp->q.fragments;
+ struct sk_buff *clone, *head = qp->q.fragments;
const struct iphdr *iph;
int err;

@@ -216,32 +217,40 @@ static void ip_expire(unsigned long arg)
if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !qp->q.fragments)
goto out;

- rcu_read_lock();
head->dev = dev_get_by_index_rcu(net, qp->iif);
if (!head->dev)
- goto out_rcu_unlock;
+ goto out;
+

/* skb has no dst, perform route lookup again */
iph = ip_hdr(head);
err = ip_route_input_noref(head, iph->daddr, iph->saddr,
iph->tos, head->dev);
if (err)
- goto out_rcu_unlock;
+ goto out;

/* Only an end host needs to send an ICMP
* "Fragment Reassembly Timeout" message, per RFC792.
*/
if (frag_expire_skip_icmp(qp->user) &&
(skb_rtable(head)->rt_type != RTN_LOCAL))
- goto out_rcu_unlock;
+ goto out;
+
+ clone = skb_clone(head, GFP_ATOMIC);

/* Send an ICMP "Fragment Reassembly Timeout" message. */
- icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
-out_rcu_unlock:
- rcu_read_unlock();
+ if (clone) {
+ spin_unlock(&qp->q.lock);
+ icmp_send(clone, ICMP_TIME_EXCEEDED,
+ ICMP_EXC_FRAGTIME, 0);
+ consume_skb(clone);
+ goto out_rcu_unlock;
+ }
}
out:
spin_unlock(&qp->q.lock);
+out_rcu_unlock:
+ rcu_read_unlock();
ipq_put(qp);
}




Dmitry Vyukov

unread,
Mar 14, 2017, 10:46:29 AM3/14/17
to Eric Dumazet, David Miller, Eric Dumazet, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, j...@mojatatu.com, syzkaller
I am confused. Lockdep has observed both of these stacks:

CPU0 CPU1
---- ----
lock(&(&q->lock)->rlock);
lock(_xmit_ETHER#2);
lock(&(&q->lock)->rlock);
lock(_xmit_ETHER#2);


So it somehow happened. Or what do you mean?

Eric Dumazet

unread,
Mar 14, 2017, 10:56:57 AM3/14/17
to Dmitry Vyukov, Eric Dumazet, David Miller, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
On Tue, Mar 14, 2017 at 7:46 AM, Dmitry Vyukov <dvy...@google.com> wrote:

> I am confused. Lockdep has observed both of these stacks:
>
> CPU0 CPU1
> ---- ----
> lock(&(&q->lock)->rlock);
> lock(_xmit_ETHER#2);
> lock(&(&q->lock)->rlock);
> lock(_xmit_ETHER#2);
>
>
> So it somehow happened. Or what do you mean?
>

Lockdep said " possible circular locking dependency detected " .
It is not an actual deadlock, but lockdep machinery firing.

For a dead lock to happen, this would require that he ICMP message
sent by ip_expire() is itself fragmented and reassembled.
This cannot be, because ICMP messages are not candidates for
fragmentation, but lockdep can not know that of course...

Dmitry Vyukov

unread,
Mar 14, 2017, 10:58:53 AM3/14/17
to Eric Dumazet, Eric Dumazet, David Miller, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
Ah, I see. Thanks.

Paolo Abeni

unread,
Mar 14, 2017, 11:03:12 AM3/14/17
to Eric Dumazet, Dmitry Vyukov, David Miller, Eric Dumazet, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, j...@mojatatu.com, syzkaller
I'm wondering if we really need to keep the fragment queue lock held
while sending the icmp packet ? we hold a reference to the struct, so
it can't be deleted, and AFAICS after ipq_kill() nobody else could
access/modify that queue.

That lock is there pretty much forever, but perhaps is only a leftover
and we can release it just after ipq_kill() ?

Paolo

Eric Dumazet

unread,
Mar 14, 2017, 11:09:41 AM3/14/17
to Paolo Abeni, Eric Dumazet, Dmitry Vyukov, David Miller, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
On Tue, Mar 14, 2017 at 8:03 AM, Paolo Abeni <pab...@redhat.com> wrote:

> I'm wondering if we really need to keep the fragment queue lock held
> while sending the icmp packet ? we hold a reference to the struct, so
> it can't be deleted, and AFAICS after ipq_kill() nobody else could
> access/modify that queue.
>
> That lock is there pretty much forever, but perhaps is only a leftover
> and we can release it just after ipq_kill() ?

Maybe, but for peace of mind I would make sure this code path owns the
skb (head) before releasing the lock.

Seems something to try for net-next ?

Paolo Abeni

unread,
Mar 14, 2017, 11:34:22 AM3/14/17
to Eric Dumazet, Eric Dumazet, Dmitry Vyukov, David Miller, Cong Wang, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
Agreed.

I asked because I was in doubt I missed something obvious.

Thank you,

Paolo


Cong Wang

unread,
Mar 14, 2017, 12:41:43 PM3/14/17
to Eric Dumazet, Dmitry Vyukov, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
It doesn't have to be ICMP, as long as get the same hash for
the inet_frag_queue, we will need to take the same lock and
deadlock will happen.

hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);

So it is really up to this hash function.

Dmitry Vyukov

unread,
Mar 20, 2017, 6:00:16 AM3/20/17
to Cong Wang, Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
Is the following the same issue?
It mentions dev->qdisc_tx_busylock, but I am not sure if it's relevant
if there already a cycle between _xmit_ETHER#2 -->
&(&q->lock)->rlock#2.


[ INFO: possible circular locking dependency detected ]
4.11.0-rc2+ #340 Not tainted
-------------------------------------------------------
swapper/1/0 is trying to acquire lock:
(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at:
[<ffffffff834cff12>] spin_lock include/linux/spinlock.h:299 [inline]
(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at:
[<ffffffff834cff12>] __dev_xmit_skb net/core/dev.c:3077 [inline]
(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at:
[<ffffffff834cff12>] __dev_queue_xmit+0x11b2/0x1ed0
net/core/dev.c:3359

but task is already holding lock:
(&(&q->lock)->rlock#2){+.-...}, at: [<ffffffff8368dc21>] spin_lock
include/linux/spinlock.h:299 [inline]
(&(&q->lock)->rlock#2){+.-...}, at: [<ffffffff8368dc21>]
ip_expire+0x51/0x6d0 net/ipv4/ip_fragment.c:201

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&(&q->lock)->rlock#2){+.-...}:
lock_acquire+0x217/0x5e0 kernel/locking/lockdep.c:3762
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
ip_defrag+0x395/0x4070 net/ipv4/ip_fragment.c:669
ip_check_defrag+0x4e7/0x890 net/ipv4/ip_fragment.c:713
packet_rcv_fanout+0x26b/0x7f0 net/packet/af_packet.c:1459
deliver_skb net/core/dev.c:1835 [inline]
dev_queue_xmit_nit+0x2a6/0xaa0 net/core/dev.c:1891
xmit_one net/core/dev.c:2904 [inline]
dev_hard_start_xmit+0x16b/0xb20 net/core/dev.c:2924
sch_direct_xmit+0x31d/0x6d0 net/sched/sch_generic.c:182
__dev_xmit_skb net/core/dev.c:3093 [inline]
__dev_queue_xmit+0x166a/0x1ed0 net/core/dev.c:3359
dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
neigh_hh_output include/net/neighbour.h:468 [inline]
neigh_output include/net/neighbour.h:476 [inline]
ip_finish_output2+0xe91/0x13f0 net/ipv4/ip_output.c:228
ip_do_fragment+0x1d17/0x25f0 net/ipv4/ip_output.c:672
ip_fragment.constprop.53+0x145/0x200 net/ipv4/ip_output.c:545
ip_finish_output+0x5c1/0xb40 net/ipv4/ip_output.c:314
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1d0/0x870 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
raw_sendmsg+0x25e9/0x3870 net/ipv4/raw.c:655
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x35e/0x860 net/socket.c:1986
__sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2076
SYSC_sendmmsg net/socket.c:2107 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2102
do_syscall_64+0x26c/0x810 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

-> #1 (_xmit_ETHER#2){+.-...}:
lock_acquire+0x217/0x5e0 kernel/locking/lockdep.c:3762
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
__netif_tx_lock include/linux/netdevice.h:3486 [inline]
sch_direct_xmit+0x280/0x6d0 net/sched/sch_generic.c:180
__dev_xmit_skb net/core/dev.c:3093 [inline]
__dev_queue_xmit+0x166a/0x1ed0 net/core/dev.c:3359
dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
neigh_hh_output include/net/neighbour.h:468 [inline]
neigh_output include/net/neighbour.h:476 [inline]
ip6_finish_output2+0x1421/0x2270 net/ipv6/ip6_output.c:123
ip6_fragment+0x2c6c/0x3920 net/ipv6/ip6_output.c:724
ip6_finish_output+0x319/0x950 net/ipv6/ip6_output.c:147
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip6_output+0x1b6/0x890 net/ipv6/ip6_output.c:163
dst_output include/net/dst.h:486 [inline]
ip6_local_out+0x95/0x170 net/ipv6/output_core.c:172
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1739
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1759
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d7c/0x3ee0 net/ipv6/raw.c:930
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x738/0x860 net/socket.c:1986
__sys_sendmsg+0xe5/0x210 net/socket.c:2020
SYSC_sendmsg net/socket.c:2031 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2027
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add kernel/locking/lockdep.c:1940 [inline]
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2f07/0x3a80 kernel/locking/lockdep.c:3347
lock_acquire+0x217/0x5e0 kernel/locking/lockdep.c:3762
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
__dev_xmit_skb net/core/dev.c:3077 [inline]
__dev_queue_xmit+0x11b2/0x1ed0 net/core/dev.c:3359
dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
neigh_hh_output include/net/neighbour.h:468 [inline]
neigh_output include/net/neighbour.h:476 [inline]
ip_finish_output2+0xe91/0x13f0 net/ipv4/ip_output.c:228
ip_finish_output+0x7a3/0xb40 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1d0/0x870 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
icmp_push_reply+0x372/0x4c0 net/ipv4/icmp.c:394
icmp_send+0x12f5/0x1920 net/ipv4/icmp.c:754
ip_expire+0x420/0x6d0 net/ipv4/ip_fragment.c:239
call_timer_fn+0x254/0x940 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307 [inline]
__run_timers+0x716/0xbb0 kernel/time/timer.c:1601
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
__do_softirq+0x2fc/0xbf6 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:657 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:487
native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
default_idle+0xba/0x480 arch/x86/kernel/process.c:275
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:266
default_idle_call+0x44/0xa0 kernel/sched/idle.c:97
cpuidle_idle_call kernel/sched/idle.c:155 [inline]
do_idle+0x296/0x410 kernel/sched/idle.c:244
cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:346
start_secondary+0x2a7/0x340 arch/x86/kernel/smpboot.c:275
verify_cpu+0x0/0xfc

other info that might help us debug this:

Chain exists of:
dev->qdisc_tx_busylock ?: &qdisc_tx_busylock --> _xmit_ETHER#2 -->
&(&q->lock)->rlock#2

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&q->lock)->rlock#2);
lock(_xmit_ETHER#2);
lock(&(&q->lock)->rlock#2);
lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);

*** DEADLOCK ***

6 locks held by swapper/1/0:
#0: (((&q->timer))){+.-...}, at: [<ffffffff815d9716>]
lockdep_copy_map include/linux/lockdep.h:175 [inline]
#0: (((&q->timer))){+.-...}, at: [<ffffffff815d9716>]
call_timer_fn+0x1c6/0x940 kernel/time/timer.c:1258
#1: (&(&q->lock)->rlock#2){+.-...}, at: [<ffffffff8368dc21>]
spin_lock include/linux/spinlock.h:299 [inline]
#1: (&(&q->lock)->rlock#2){+.-...}, at: [<ffffffff8368dc21>]
ip_expire+0x51/0x6d0 net/ipv4/ip_fragment.c:201
#2: (rcu_read_lock){......}, at: [<ffffffff8368dd95>]
ip_expire+0x1c5/0x6d0 net/ipv4/ip_fragment.c:216
#3: (k-slock-AF_INET){+.-...}, at: [<ffffffff83797f54>] spin_trylock
include/linux/spinlock.h:309 [inline]
#3: (k-slock-AF_INET){+.-...}, at: [<ffffffff83797f54>]
icmp_xmit_lock net/ipv4/icmp.c:219 [inline]
#3: (k-slock-AF_INET){+.-...}, at: [<ffffffff83797f54>]
icmp_send+0x6e4/0x1920 net/ipv4/icmp.c:681
#4: (rcu_read_lock_bh){......}, at: [<ffffffff8369e5ed>]
ip_finish_output2+0x22d/0x13f0 net/ipv4/ip_output.c:198
#5: (rcu_read_lock_bh){......}, at: [<ffffffff834cefb4>]
__dev_queue_xmit+0x254/0x1ed0 net/core/dev.c:3325

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc2+ #340
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x1b8/0x28d lib/dump_stack.c:52
print_circular_bug+0x358/0x400 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add kernel/locking/lockdep.c:1940 [inline]
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2f07/0x3a80 kernel/locking/lockdep.c:3347
lock_acquire+0x217/0x5e0 kernel/locking/lockdep.c:3762
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
__dev_xmit_skb net/core/dev.c:3077 [inline]
__dev_queue_xmit+0x11b2/0x1ed0 net/core/dev.c:3359
dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
neigh_hh_output include/net/neighbour.h:468 [inline]
neigh_output include/net/neighbour.h:476 [inline]
ip_finish_output2+0xe91/0x13f0 net/ipv4/ip_output.c:228
ip_finish_output+0x7a3/0xb40 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1d0/0x870 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
icmp_push_reply+0x372/0x4c0 net/ipv4/icmp.c:394
icmp_send+0x12f5/0x1920 net/ipv4/icmp.c:754
ip_expire+0x420/0x6d0 net/ipv4/ip_fragment.c:239
call_timer_fn+0x254/0x940 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307 [inline]
__run_timers+0x716/0xbb0 kernel/time/timer.c:1601
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
__do_softirq+0x2fc/0xbf6 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:657 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:487
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
RSP: 0018:ffff88003e027db0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
RAX: dffffc0000000000 RBX: 1ffff10007c04fb9 RCX: 0000000000000000
RDX: 1ffffffff09d8ed4 RSI: 0000000000000001 RDI: ffffffff84ec76a0
RBP: ffff88003e027db0 R08: ffffed0007c03461 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff84ec7690
R13: ffff88003e027e68 R14: 0000000000000000 R15: 0000000000000001
</IRQ>
arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
default_idle+0xba/0x480 arch/x86/kernel/process.c:275
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:266
default_idle_call+0x44/0xa0 kernel/sched/idle.c:97
cpuidle_idle_call kernel/sched/idle.c:155 [inline]
do_idle+0x296/0x410 kernel/sched/idle.c:244
cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:346
start_secondary+0x2a7/0x340 arch/x86/kernel/smpboot.c:275
start_cpu+0x14/0x14 arch/x86/kernel/head_64.S:306

Eric Dumazet

unread,
Mar 20, 2017, 8:43:22 AM3/20/17
to Dmitry Vyukov, Cong Wang, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Jamal Hadi Salim, syzkaller
False positive again.

veth needs to use netdev_lockdep_set_classes(), assuming you use veth ?

I will provide a patch, thanks.

cf515802043cccecfe9ab75065f8fc71e6ec9bab missed a few drivers.




Reply all
Reply to author
Forward
0 new messages