2.6.24-rc6: possible recursive locking detected

Christian Kujau

unread,

Jan 3, 2008, 5:58:40 PM1/3/08

to linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net

hi,

a few minutes after upgrading from -rc5 to -rc6 I got:

[ 1310.670986] =============================================
[ 1310.671690] [ INFO: possible recursive locking detected ]
[ 1310.672097] 2.6.24-rc6 #1
[ 1310.672421] ---------------------------------------------
[ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
[ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
[ 1310.673869]
[ 1310.673870] but task is already holding lock:
[ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
[ 1310.675267]
[ 1310.675268] other info that might help us debug this:
[ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
[ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
[ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
[ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
[ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
[ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
[ 1310.680151]
[ 1310.680152] stack backtrace:
[ 1310.680772] Pid: 3692, comm: FahCore_a0.exe Not tainted 2.6.24-rc6 #1
[ 1310.681209] [<c01038aa>] show_trace_log_lvl+0x1a/0x30
[ 1310.681659] [<c0104322>] show_trace+0x12/0x20
[ 1310.682085] [<c0104cba>] dump_stack+0x6a/0x70
[ 1310.682512] [<c0138ec1>] __lock_acquire+0x971/0x10c0
[ 1310.682961] [<c013966e>] lock_acquire+0x5e/0x80
[ 1310.683392] [<c0419b78>] _spin_lock_irqsave+0x38/0x50
[ 1310.683914] [<c011544b>] __wake_up+0x1b/0x50
[ 1310.684337] [<c018e2ba>] ep_poll_safewake+0x9a/0xc0
[ 1310.684822] [<c018f11b>] ep_poll_callback+0x8b/0xe0
[ 1310.685265] [<c0114418>] __wake_up_common+0x48/0x70
[ 1310.685712] [<c0115467>] __wake_up+0x37/0x50
[ 1310.686136] [<c03816aa>] sock_def_readable+0x7a/0x80
[ 1310.686579] [<c0381c2b>] sock_queue_rcv_skb+0xeb/0x150
[ 1310.687028] [<c03c7d99>] udp_queue_rcv_skb+0x139/0x2a0
[ 1310.687554] [<c03c81f1>] __udp4_lib_rcv+0x2f1/0x7e0
[ 1310.687996] [<c03c86f2>] udp_rcv+0x12/0x20
[ 1310.688415] [<c03a80a5>] ip_local_deliver_finish+0x125/0x210
[ 1310.688881] [<c03a84ed>] ip_local_deliver+0x2d/0x90
[ 1310.689323] [<c03a7d6b>] ip_rcv_finish+0xeb/0x300
[ 1310.689760] [<c03a8425>] ip_rcv+0x195/0x230
[ 1310.690182] [<c0388fdc>] netif_receive_skb+0x37c/0x470
[ 1310.690632] [<c038ba39>] process_backlog+0x69/0xc0
[ 1310.691175] [<c038b6f7>] net_rx_action+0x137/0x1b0
[ 1310.691681] [<c011e5c2>] __do_softirq+0x52/0xb0
[ 1310.692006] [<c0104e94>] do_softirq+0x94/0xe0
[ 1310.692301] =======================

This is a single CPU machine, and the box was quite busy due to disk I/O
(load 6-8). The machine continues to run and all is well now. Even the
application mentioned above (FahCore_a0.exe) is running fine
("Folding@Home", cpu bound). The binary is located on an jfs filesystem,
which was also under heavy I/O. Can someone tell me why the backtrace
shows so much net* stuff? There was not much net I/O...

more details and .config: http://nerdbynature.de/bits/2.6.24-rc6

Thanks,
Christian.
--
BOFH excuse #312:

incompatible bit-registration operators
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Rafael J. Wysocki

unread,

Jan 3, 2008, 6:05:09 PM1/3/08

to Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Ingo Molnar, Peter Zijlstra

[Added some CCs]

Ingo Molnar

unread,

Jan 4, 2008, 3:31:44 AM1/4/08

to Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Peter Zijlstra, Davide Libenzi, Herbert Xu

[ Added even more CCs :-) Seems eventpoll & net related. ]

Herbert Xu

unread,

Jan 5, 2008, 2:13:21 AM1/5/08

to Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Peter Zijlstra, Davide Libenzi

On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote:
>
> > > [ 1310.670986] =============================================
> > > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > > [ 1310.672097] 2.6.24-rc6 #1
> > > [ 1310.672421] ---------------------------------------------
> > > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > [ 1310.673869]
> > > [ 1310.673870] but task is already holding lock:
> > > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > [ 1310.675267]
> > > [ 1310.675268] other info that might help us debug this:
> > > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50

The net part might just be a red herring, since the problem is that
__wake_up is somehow reentering itself.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Peter Zijlstra

unread,

Jan 5, 2008, 11:54:33 AM1/5/08

to Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Davide Libenzi, Johannes Berg, Oleg Nesterov

On Sat, 2008-01-05 at 18:12 +1100, Herbert Xu wrote:
> On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote:
> >
> > > > [ 1310.670986] =============================================
> > > > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > > > [ 1310.672097] 2.6.24-rc6 #1
> > > > [ 1310.672421] ---------------------------------------------
> > > > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > > > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > [ 1310.673869]
> > > > [ 1310.673870] but task is already holding lock:
> > > > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > [ 1310.675267]
> > > > [ 1310.675268] other info that might help us debug this:
> > > > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > > > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > > > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > > > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > > > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > > > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
>
> The net part might just be a red herring, since the problem is that
> __wake_up is somehow reentering itself.

/*
* Perform a safe wake up of the poll wait list. The problem is that
* with the new callback'd wake up system, it is possible that the
* poll callback is reentered from inside the call to wake_up() done
* on the poll wait queue head. The rule is that we cannot reenter the
* wake up code from the same task more than EP_MAX_POLLWAKE_NESTS times,
* and we cannot reenter the same wait queue head at all. This will
* enable to have a hierarchy of epoll file descriptor of no more than
* EP_MAX_POLLWAKE_NESTS deep. We need the irq version of the spin lock
* because this one gets called by the poll callback, that in turn is called
* from inside a wake_up(), that might be called from irq context.
*/

Seems to suggest that the epoll code can indeed recurse into wakeup.

Davide, Johannes, any ideas?

Peter Zijlstra

unread,

Jan 5, 2008, 12:02:09 PM1/5/08

to Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Davide Libenzi, Johannes Berg, Oleg Nesterov

Since EP_MAX_POLLWAKE_NESTS < MAX_LOCKDEP_SUBCLASSES we could perhaps do
something like:

wake_up_nested(..., wake_nests);

although I'm not quite sure that is correct, my understanding of this
code is still fragile at best.

Davide Libenzi

unread,

Jan 5, 2008, 4:37:38 PM1/5/08

to Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, Linux Kernel Mailing List, jfs-dis...@lists.sourceforge.net, Johannes Berg, Oleg Nesterov

I remember I talked with Arjan about this time ago. Basically, since 1)
you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
are used, you can see a wake_up() from inside another wake_up(), but they
will never refer to the same lock instance.
Think about:

dfd = socket(...);
efd1 = epoll_create();
efd2 = epoll_create();
epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);

When a packet arrives to the device underneath "dfd", the net code will
issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
callback wakeup entry on that queue, and the wake_up() performed by the
"dfd" net code will end up in ep_poll_callback(). At this point epoll
(efd1) notices that it may have some event ready, so it needs to wake up
the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
that ends up in another wake_up(), after having checked about the
recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
avoid stack blasting. Never hit the same queue, to avoid loops like:

epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);

The code "if (tncur->wq == wq || ..." prevents re-entering the same
queue/lock.
I don't know how the lockdep code works, so I can't say about
wake_up_nested(). Although I have a feeling is not enough in this case.
A solution may be to move the call to ep_poll_safewake() (that'd become a
simple wake_up()) inside a tasklet or whatever is today trendy for delayed
work. But his kinda scares me to be honest, since epoll has already a
bunch of places where it could be asynchronously hit (plus performance
regression will need to be verified).

- Davide

Christian Kujau

unread,

Jan 5, 2008, 7:21:00 PM1/5/08

to Davide Libenzi, Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List, jfs-dis...@lists.sourceforge.net, Johannes Berg, Oleg Nesterov

On Sat, 5 Jan 2008, Davide Libenzi wrote:
> A solution may be to move the call to ep_poll_safewake() (that'd become a
> simple wake_up()) inside a tasklet or whatever is today trendy for delayed
> work. But his kinda scares me to be honest, since epoll has already a
> bunch of places where it could be asynchronously hit (plus performance
> regression will need to be verified).

Although I'm not able to reproduce this one right now, I'm happy to test
any patches you guys come up with.

Thanks for your time,
Christian.
--
BOFH excuse #78:

Yes, yes, its called a design limitation

Cyrill Gorcunov

unread,

Jan 6, 2008, 4:45:24 PM1/6/08

to Davide Libenzi, Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, Linux Kernel Mailing List, jfs-dis...@lists.sourceforge.net, Johannes Berg, Oleg Nesterov

[Davide Libenzi - Sat, Jan 05, 2008 at 01:35:25PM -0800]

| On Sat, 5 Jan 2008, Peter Zijlstra wrote:
|

[...snip...]

it's quite possible that i'm wrong but just interested...
why in ep_poll_safewake() the assignment

struct list_head *lsthead = &psw->wake_task_list;

is not protected by spinlock?

- Cyrill -

Cyrill Gorcunov

unread,

Jan 6, 2008, 4:54:36 PM1/6/08

to Davide Libenzi, Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, Linux Kernel Mailing List, jfs-dis...@lists.sourceforge.net, Johannes Berg, Oleg Nesterov

[Cyrill Gorcunov - Mon, Jan 07, 2008 at 12:44:42AM +0300]

it was a completely stupid question... please drop ;)

Oleg Nesterov

unread,

Jan 7, 2008, 12:21:29 PM1/7/08

to Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Davide Libenzi, Johannes Berg

On 01/05, Peter Zijlstra wrote:
>
> Since EP_MAX_POLLWAKE_NESTS < MAX_LOCKDEP_SUBCLASSES we could perhaps do
> something like:
>
> wake_up_nested(..., wake_nests);

I think this would be the most correct change. But I wonder if it is possible
to do something more generic (but otoh more stupid/hackish and less safe).

Consider this "just for illustration" patch,

--- t/kernel/lockdep.c 2007-11-09 12:57:31.000000000 +0300
+++ t/kernel/lockdep.c 2008-01-07 19:43:50.000000000 +0300
@@ -1266,10 +1266,13 @@ check_deadlock(struct task_struct *curr,
struct held_lock *prev;
int i;

- for (i = 0; i < curr->lockdep_depth; i++) {
+ for (i = curr->lockdep_depth; --i >= 0; ) {
prev = curr->held_locks + i;
if (prev->class != next->class)
continue;
+
+ if (prev->trylock == -1)
+ return 2;
/*
* Allow read-after-read recursion of the same
* lock class (i.e. read_lock(lock)+read_lock(lock)):
-------------------------------------------------------------------------

Now,

// trylock == -1
#define spin_mark_nested(l) \
lock_acquire(&(l)->dep_map, 0, -1, 0, 2, _THIS_IP_)
#define spin_unmark_nested(l) \
lock_release(&(l)->dep_map, 1, _THIS_IP_)

and ep_poll_safewake() can do:

/* Do really wake up now */
spin_mark_nested(&wq->lock);
wake_up(wq);
spin_unmark_nested(&wq->lock);

Possible?

Oleg.

Oleg Nesterov

unread,

Jan 7, 2008, 12:47:38 PM1/7/08

to Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Davide Libenzi, Johannes Berg

I tested the patch above with the following code,

wait_queue_head_t w1, w2, w3;

init_waitqueue_head(&w1);
init_waitqueue_head(&w2);
init_waitqueue_head(&w3);

local_irq_disable();
spin_lock(&w1.lock);

spin_mark_nested(&w2.lock);
spin_lock(&w2.lock);

spin_mark_nested(&w3.lock);
wake_up(&w3);
spin_unmark_nested(&w3.lock);

spin_unlock(&w2.lock);
spin_unmark_nested(&w2.lock);

spin_unlock(&w1.lock);
local_irq_enable();

seems to work. What do you think?

Davide Libenzi

unread,

Jan 7, 2008, 4:36:25 PM1/7/08

to Christian Kujau, Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List, jfs-dis...@lists.sourceforge.net, Johannes Berg, Oleg Nesterov

On Sun, 6 Jan 2008, Christian Kujau wrote:

> On Sat, 5 Jan 2008, Davide Libenzi wrote:
> > A solution may be to move the call to ep_poll_safewake() (that'd become a
> > simple wake_up()) inside a tasklet or whatever is today trendy for delayed
> > work. But his kinda scares me to be honest, since epoll has already a
> > bunch of places where it could be asynchronously hit (plus performance
> > regression will need to be verified).
>
> Although I'm not able to reproduce this one right now, I'm happy to test any
> patches you guys come up with.

There's no need to reproduce it. It's there, it's among us ;)

- Davide

Peter Zijlstra

unread,

Jan 13, 2008, 11:32:48 AM1/13/08

to Oleg Nesterov, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Davide Libenzi, Johannes Berg

I've been pondering this for a while, and some days I really like it,
some days I don't.

The problem I have with it is that it becomes very easy to falsely
annotate problems away - its a very powerful annotation. That said, its
almost powerful enough to annotate the device semaphore/mutex problem.

I think I'll do wake_up_nested() for now and keep this around.

Thanks for this very nice idea though.

Oleg Nesterov

unread,

Jan 14, 2008, 4:28:59 PM1/14/08

to Peter Zijlstra, Herbert Xu, Ingo Molnar, Rafael J. Wysocki, Christian Kujau, linux-...@vger.kernel.org, jfs-dis...@lists.sourceforge.net, Davide Libenzi, Johannes Berg

> > seems to work. What do you think?
>
> I've been pondering this for a while, and some days I really like it,
> some days I don't.
>
> The problem I have with it is that it becomes very easy to falsely
> annotate problems away - its a very powerful annotation.

Also, I don't like the overloading of ->trylock, this is really hackish.

> I think I'll do wake_up_nested() for now and keep this around.

Agreed.

Perhaps it is a bit easier to use spin_lock_nested() + __wake_up_common()
directly (we have a lot of wake_up_xxx helpers), but this is up to you.

Offtopic question. Why do we have so many lockdep stuff in timer.c and hrtimer.c ?
We never lock 2 bases at the same time, except in migrate_timers(). We can kill
double_spin_lock() and base_lock_keys[] and just use spin_lock_nested in
migrate_timers(), no?

Oleg.

Peter Zijlstra

unread,

Jan 30, 2008, 5:35:05 AM1/30/08

to Oleg Nesterov, Ingo Molnar, linux-...@vger.kernel.org, Thomas Gleixner

( trimmed CC list )

Sorry for the delay, this message seems to have gotten lost in my
inbox :-/

On Tue, 2008-01-15 at 00:27 +0300, Oleg Nesterov wrote:

> On 01/13, Peter Zijlstra wrote:

> Offtopic question. Why do we have so many lockdep stuff in timer.c and hrtimer.c ?
> We never lock 2 bases at the same time, except in migrate_timers(). We can kill
> double_spin_lock() and base_lock_keys[] and just use spin_lock_nested in
> migrate_timers(), no?

Lets ask Thomas.. :-)

Thomas Gleixner

unread,

Jan 30, 2008, 12:37:14 PM1/30/08

to Peter Zijlstra, Oleg Nesterov, Ingo Molnar, linux-...@vger.kernel.org

On Wed, 30 Jan 2008, Peter Zijlstra wrote:

> ( trimmed CC list )
>
> Sorry for the delay, this message seems to have gotten lost in my
> inbox :-/
>
> On Tue, 2008-01-15 at 00:27 +0300, Oleg Nesterov wrote:
> > On 01/13, Peter Zijlstra wrote:
>
> > Offtopic question. Why do we have so many lockdep stuff in timer.c and hrtimer.c ?
> > We never lock 2 bases at the same time, except in migrate_timers(). We can kill
> > double_spin_lock() and base_lock_keys[] and just use spin_lock_nested in
> > migrate_timers(), no?
>
> Lets ask Thomas.. :-)

No objections as long as it all works :)

tglx