Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hang in call_usermodehelper_exec()

64 views
Skip to first unread message

Alexey Dobriyan

unread,
Feb 17, 2010, 3:30:02 PM2/17/10
to
I was stress-testing netns stuff in presence of removing modules.

Kernel is 2.6.33-rc8-next-20100217+:

At some point setkey(8) and modprobe hanged:

3541 pts/1 D 0:00 /usr/sbin/setkey -f /root/ipcomp-tunnel.setkey
3737 pts/2 D 0:00 modprobe -r des_generic sha1_generic md5 hmac xfrm_user crypto_null af_packet xfrm4_mode_beet ipcomp xfrm_ipcomp xfrm4_tunnel deflate zlib_deflate zlib_inflate esp4 xfrm4_mode_tran

SysRq : Show Blocked State
task PC stack pid father
ipcomp-tunnel D 0000000000000000 4944 3541 3532 0x00000000
ffff8800b98636f8 0000000000000046 0000000000000082 ffff8800b9862000
ffff8800b9863fd8 ffff8800b9862000 0000000000004000 ffff8800b9863fd8
ffff8800b9863fd8 ffff8800bd40eba0 ffffffff81625020 0000000000000046
Call Trace:
[<ffffffff813f218d>] schedule_timeout+0x2ad/0x320
[<ffffffff813f1239>] ? wait_for_common+0x49/0x180
[<ffffffff813f134b>] wait_for_common+0x15b/0x180
[<ffffffff81038960>] ? default_wake_function+0x0/0x20
[<ffffffff813f142d>] wait_for_completion+0x1d/0x20
[<ffffffff8105c841>] call_usermodehelper_exec+0xe1/0x110
[<ffffffff813ddaf0>] ? xfrm_init_state+0x120/0x3a0
[<ffffffff8105cb24>] __request_module+0x194/0x200
[<ffffffff81060568>] ? __kernel_text_address+0x58/0x80
[<ffffffff813dd68c>] ? xfrm_state_get_afinfo+0x2c/0x60
[<ffffffff813ddaf0>] xfrm_init_state+0x120/0x3a0
[<ffffffff810e7bbe>] ? kmem_cache_alloc_notrace+0xbe/0x130
[<ffffffffa05750e8>] pfkey_add+0x738/0x980 [af_key]
[<ffffffffa0570b86>] ? pfkey_broadcast+0x1e6/0x210 [af_key]
[<ffffffffa05709a0>] ? pfkey_broadcast+0x0/0x210 [af_key]
[<ffffffff8136b90e>] ? __skb_clone+0x2e/0x110
[<ffffffffa0573655>] pfkey_sendmsg+0x375/0x3b0 [af_key]
[<ffffffff81364a60>] sock_sendmsg+0xc0/0xf0
[<ffffffff81364b1a>] ? sockfd_lookup_light+0x2a/0x80
[<ffffffff8136532f>] sys_sendto+0xef/0x120
[<ffffffff813681c4>] ? release_sock+0x34/0x110
[<ffffffff8136828b>] ? release_sock+0xfb/0x110
[<ffffffff8136a30b>] ? sock_setsockopt+0xcb/0x850
[<ffffffff810d4176>] ? remove_vma+0x66/0x80
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
modprobe D ffff8800bd072ba0 5280 3737 3734 0x00000000
ffff8800b9ae1ed8 0000000000000046 0000000000000000 ffff8800b9ae0000
ffff8800b9ae1fd8 ffff8800b9ae0000 0000000000004000 ffff8800b9ae1fd8
ffff8800b9ae1fd8 ffff8800b98bae20 ffff8800bad4dc40 ffffffff81630f68
Call Trace:
[<ffffffff810761dd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81083306>] ? module_refcount+0x46/0xb0
[<ffffffff81084150>] sys_delete_module+0x230/0x250
[<ffffffff813f4522>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: task ipcomp-tunnel.s:3541 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ipcomp-tunnel D 0000000000000000 4944 3541 3532 0x00000000
ffff8800b98636f8 0000000000000046 0000000000000082 ffff8800b9862000
ffff8800b9863fd8 ffff8800b9862000 0000000000004000 ffff8800b9863fd8
ffff8800b9863fd8 ffff8800bd40eba0 ffffffff81625020 0000000000000046
Call Trace:
[<ffffffff813f218d>] schedule_timeout+0x2ad/0x320
[<ffffffff813f1239>] ? wait_for_common+0x49/0x180
[<ffffffff813f134b>] wait_for_common+0x15b/0x180
[<ffffffff81038960>] ? default_wake_function+0x0/0x20
[<ffffffff813f142d>] wait_for_completion+0x1d/0x20
[<ffffffff8105c841>] call_usermodehelper_exec+0xe1/0x110
[<ffffffff813ddaf0>] ? xfrm_init_state+0x120/0x3a0
[<ffffffff8105cb24>] __request_module+0x194/0x200
[<ffffffff81060568>] ? __kernel_text_address+0x58/0x80
[<ffffffff813dd68c>] ? xfrm_state_get_afinfo+0x2c/0x60
[<ffffffff813ddaf0>] xfrm_init_state+0x120/0x3a0
[<ffffffff810e7bbe>] ? kmem_cache_alloc_notrace+0xbe/0x130
[<ffffffffa05750e8>] pfkey_add+0x738/0x980 [af_key]
[<ffffffffa0570b86>] ? pfkey_broadcast+0x1e6/0x210 [af_key]
[<ffffffffa05709a0>] ? pfkey_broadcast+0x0/0x210 [af_key]
[<ffffffff8136b90e>] ? __skb_clone+0x2e/0x110
[<ffffffffa0573655>] pfkey_sendmsg+0x375/0x3b0 [af_key]
[<ffffffff81364a60>] sock_sendmsg+0xc0/0xf0
[<ffffffff81364b1a>] ? sockfd_lookup_light+0x2a/0x80
[<ffffffff8136532f>] sys_sendto+0xef/0x120
[<ffffffff813681c4>] ? release_sock+0x34/0x110
[<ffffffff8136828b>] ? release_sock+0xfb/0x110
[<ffffffff8136a30b>] ? sock_setsockopt+0xcb/0x850
[<ffffffff810d4176>] ? remove_vma+0x66/0x80
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task modprobe:3737 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff8800bd072ba0 5280 3737 3734 0x00000000
ffff8800b9ae1ed8 0000000000000046 0000000000000000 ffff8800b9ae0000
ffff8800b9ae1fd8 ffff8800b9ae0000 0000000000004000 ffff8800b9ae1fd8
ffff8800b9ae1fd8 ffff8800b98bae20 ffff8800bad4dc40 ffffffff81630f68
Call Trace:
[<ffffffff810761dd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81083306>] ? module_refcount+0x46/0xb0
[<ffffffff81084150>] sys_delete_module+0x230/0x250
[<ffffffff813f4522>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task ipcomp-tunnel.s:3541 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ipcomp-tunnel D 0000000000000000 4944 3541 3532 0x00000000
ffff8800b98636f8 0000000000000046 0000000000000082 ffff8800b9862000
ffff8800b9863fd8 ffff8800b9862000 0000000000004000 ffff8800b9863fd8
ffff8800b9863fd8 ffff8800bd40eba0 ffffffff81625020 0000000000000046
Call Trace:
[<ffffffff813f218d>] schedule_timeout+0x2ad/0x320
[<ffffffff813f1239>] ? wait_for_common+0x49/0x180
[<ffffffff813f134b>] wait_for_common+0x15b/0x180
[<ffffffff81038960>] ? default_wake_function+0x0/0x20
[<ffffffff813f142d>] wait_for_completion+0x1d/0x20
[<ffffffff8105c841>] call_usermodehelper_exec+0xe1/0x110
[<ffffffff813ddaf0>] ? xfrm_init_state+0x120/0x3a0
[<ffffffff8105cb24>] __request_module+0x194/0x200
[<ffffffff81060568>] ? __kernel_text_address+0x58/0x80
[<ffffffff813dd68c>] ? xfrm_state_get_afinfo+0x2c/0x60
[<ffffffff813ddaf0>] xfrm_init_state+0x120/0x3a0
[<ffffffff810e7bbe>] ? kmem_cache_alloc_notrace+0xbe/0x130
[<ffffffffa05750e8>] pfkey_add+0x738/0x980 [af_key]
[<ffffffffa0570b86>] ? pfkey_broadcast+0x1e6/0x210 [af_key]
[<ffffffffa05709a0>] ? pfkey_broadcast+0x0/0x210 [af_key]
[<ffffffff8136b90e>] ? __skb_clone+0x2e/0x110
[<ffffffffa0573655>] pfkey_sendmsg+0x375/0x3b0 [af_key]
[<ffffffff81364a60>] sock_sendmsg+0xc0/0xf0
[<ffffffff81364b1a>] ? sockfd_lookup_light+0x2a/0x80
[<ffffffff8136532f>] sys_sendto+0xef/0x120
[<ffffffff813681c4>] ? release_sock+0x34/0x110
[<ffffffff8136828b>] ? release_sock+0xfb/0x110
[<ffffffff8136a30b>] ? sock_setsockopt+0xcb/0x850
[<ffffffff810d4176>] ? remove_vma+0x66/0x80
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task modprobe:3737 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff8800bd072ba0 5280 3737 3734 0x00000000
ffff8800b9ae1ed8 0000000000000046 0000000000000000 ffff8800b9ae0000
ffff8800b9ae1fd8 ffff8800b9ae0000 0000000000004000 ffff8800b9ae1fd8
ffff8800b9ae1fd8 ffff8800b98bae20 ffff8800bad4dc40 ffffffff81630f68
Call Trace:
[<ffffffff810761dd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81083306>] ? module_refcount+0x46/0xb0
[<ffffffff81084150>] sys_delete_module+0x230/0x250
[<ffffffff813f4522>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task ipcomp-tunnel.s:3541 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ipcomp-tunnel D 0000000000000000 4944 3541 3532 0x00000000
ffff8800b98636f8 0000000000000046 0000000000000082 ffff8800b9862000
ffff8800b9863fd8 ffff8800b9862000 0000000000004000 ffff8800b9863fd8
ffff8800b9863fd8 ffff8800bd40eba0 ffffffff81625020 0000000000000046
Call Trace:
[<ffffffff813f218d>] schedule_timeout+0x2ad/0x320
[<ffffffff813f1239>] ? wait_for_common+0x49/0x180
[<ffffffff813f134b>] wait_for_common+0x15b/0x180
[<ffffffff81038960>] ? default_wake_function+0x0/0x20
[<ffffffff813f142d>] wait_for_completion+0x1d/0x20
[<ffffffff8105c841>] call_usermodehelper_exec+0xe1/0x110
[<ffffffff813ddaf0>] ? xfrm_init_state+0x120/0x3a0
[<ffffffff8105cb24>] __request_module+0x194/0x200
[<ffffffff81060568>] ? __kernel_text_address+0x58/0x80
[<ffffffff813dd68c>] ? xfrm_state_get_afinfo+0x2c/0x60
[<ffffffff813ddaf0>] xfrm_init_state+0x120/0x3a0
[<ffffffff810e7bbe>] ? kmem_cache_alloc_notrace+0xbe/0x130
[<ffffffffa05750e8>] pfkey_add+0x738/0x980 [af_key]
[<ffffffffa0570b86>] ? pfkey_broadcast+0x1e6/0x210 [af_key]
[<ffffffffa05709a0>] ? pfkey_broadcast+0x0/0x210 [af_key]
[<ffffffff8136b90e>] ? __skb_clone+0x2e/0x110
[<ffffffffa0573655>] pfkey_sendmsg+0x375/0x3b0 [af_key]
[<ffffffff81364a60>] sock_sendmsg+0xc0/0xf0
[<ffffffff81364b1a>] ? sockfd_lookup_light+0x2a/0x80
[<ffffffff8136532f>] sys_sendto+0xef/0x120
[<ffffffff813681c4>] ? release_sock+0x34/0x110
[<ffffffff8136828b>] ? release_sock+0xfb/0x110
[<ffffffff8136a30b>] ? sock_setsockopt+0xcb/0x850
[<ffffffff810d4176>] ? remove_vma+0x66/0x80
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task modprobe:3737 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff8800bd072ba0 5280 3737 3734 0x00000000
ffff8800b9ae1ed8 0000000000000046 0000000000000000 ffff8800b9ae0000
ffff8800b9ae1fd8 ffff8800b9ae0000 0000000000004000 ffff8800b9ae1fd8
ffff8800b9ae1fd8 ffff8800b98bae20 ffff8800bad4dc40 ffffffff81630f68
Call Trace:
[<ffffffff810761dd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81083306>] ? module_refcount+0x46/0xb0
[<ffffffff81084150>] sys_delete_module+0x230/0x250
[<ffffffff813f4522>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task ipcomp-tunnel.s:3541 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ipcomp-tunnel D 0000000000000000 4944 3541 3532 0x00000000
ffff8800b98636f8 0000000000000046 0000000000000082 ffff8800b9862000
ffff8800b9863fd8 ffff8800b9862000 0000000000004000 ffff8800b9863fd8
ffff8800b9863fd8 ffff8800bd40eba0 ffffffff81625020 0000000000000046
Call Trace:
[<ffffffff813f218d>] schedule_timeout+0x2ad/0x320
[<ffffffff813f1239>] ? wait_for_common+0x49/0x180
[<ffffffff813f134b>] wait_for_common+0x15b/0x180
[<ffffffff81038960>] ? default_wake_function+0x0/0x20
[<ffffffff813f142d>] wait_for_completion+0x1d/0x20
[<ffffffff8105c841>] call_usermodehelper_exec+0xe1/0x110
[<ffffffff813ddaf0>] ? xfrm_init_state+0x120/0x3a0
[<ffffffff8105cb24>] __request_module+0x194/0x200
[<ffffffff81060568>] ? __kernel_text_address+0x58/0x80
[<ffffffff813dd68c>] ? xfrm_state_get_afinfo+0x2c/0x60
[<ffffffff813ddaf0>] xfrm_init_state+0x120/0x3a0
[<ffffffff810e7bbe>] ? kmem_cache_alloc_notrace+0xbe/0x130
[<ffffffffa05750e8>] pfkey_add+0x738/0x980 [af_key]
[<ffffffffa0570b86>] ? pfkey_broadcast+0x1e6/0x210 [af_key]
[<ffffffffa05709a0>] ? pfkey_broadcast+0x0/0x210 [af_key]
[<ffffffff8136b90e>] ? __skb_clone+0x2e/0x110
[<ffffffffa0573655>] pfkey_sendmsg+0x375/0x3b0 [af_key]
[<ffffffff81364a60>] sock_sendmsg+0xc0/0xf0
[<ffffffff81364b1a>] ? sockfd_lookup_light+0x2a/0x80
[<ffffffff8136532f>] sys_sendto+0xef/0x120
[<ffffffff813681c4>] ? release_sock+0x34/0x110
[<ffffffff8136828b>] ? release_sock+0xfb/0x110
[<ffffffff8136a30b>] ? sock_setsockopt+0xcb/0x850
[<ffffffff810d4176>] ? remove_vma+0x66/0x80
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
INFO: task modprobe:3737 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff8800bd072ba0 5280 3737 3734 0x00000000
ffff8800b9ae1ed8 0000000000000046 0000000000000000 ffff8800b9ae0000
ffff8800b9ae1fd8 ffff8800b9ae0000 0000000000004000 ffff8800b9ae1fd8
ffff8800b9ae1fd8 ffff8800b98bae20 ffff8800bad4dc40 ffffffff81630f68
Call Trace:
[<ffffffff810761dd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81083306>] ? module_refcount+0x46/0xb0
[<ffffffff81084150>] sys_delete_module+0x230/0x250
[<ffffffff813f4522>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100279b>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Rusty Russell

unread,
Feb 21, 2010, 7:30:02 PM2/21/10
to
On Thu, 18 Feb 2010 06:52:34 am Alexey Dobriyan wrote:
> I was stress-testing netns stuff in presence of removing modules.
>
> Kernel is 2.6.33-rc8-next-20100217+:
>
> At some point setkey(8) and modprobe hanged:
>
> 3541 pts/1 D 0:00 /usr/sbin/setkey -f /root/ipcomp-tunnel.setkey
> 3737 pts/2 D 0:00 modprobe -r des_generic sha1_generic md5 hmac xfrm_user crypto_null af_packet xfrm4_mode_beet ipcomp xfrm_ipcomp xfrm4_tunnel deflate zlib_deflate zlib_inflate esp4 xfrm4_mode_tran

Hmm, looks like sys_delete_module() is stuck trying to re-grab the
module_mutex. But who's holding it? We'd need a full task dump to see.

We do a stop_machine with the module mutex held, but not much else. This
doesn't seem like that. Do you have tracepoints, or dynamic debug enabled?

Puzzled,
Rusty.
--
Away travelling 25Feb-26Mar (6 .de + 1 .pl + 17 .lt + 2 .sg)

Américo Wang

unread,
Feb 22, 2010, 1:20:02 AM2/22/10
to
On Thu, Feb 18, 2010 at 4:22 AM, Alexey Dobriyan <adob...@gmail.com> wrote:
> I was stress-testing netns stuff in presence of removing modules.
>
> Kernel is 2.6.33-rc8-next-20100217+:
>

I don't know if this kernel has Andi's patchset, which is:

http://lkml.org/lkml/2010/1/4/506

If yes, then please try Neil's patch:

http://lkml.org/lkml/2010/2/12/181

Thanks.

Neil Horman

unread,
Feb 22, 2010, 7:10:01 AM2/22/10
to
On Mon, Feb 22, 2010 at 02:19:28PM +0800, Am�rico Wang wrote:
> On Thu, Feb 18, 2010 at 4:22 AM, Alexey Dobriyan <adob...@gmail.com> wrote:
> > I was stress-testing netns stuff in presence of removing modules.
> >
> > Kernel is 2.6.33-rc8-next-20100217+:
> >
>
> I don't know if this kernel has Andi's patchset, which is:
>
> http://lkml.org/lkml/2010/1/4/506
>
> If yes, then please try Neil's patch:
>
> http://lkml.org/lkml/2010/2/12/181
>
> Thanks.
No, both Andi's and my usermodehelper changes aren't going in until .34 IIRC,
but that shouldn't matter, they weren't meant to fix hangs (not that they dont
:) ).

Looking at this, my guess is that your hang is in userspace, and that you have
an old version of module-utils. I fixed this with iptables awhile back. What
happens is that process 3737, the modprobe is removing modules, while the setkey
utility is requesting kernel services that require those modules, so you are in
fact trying to load and unload the same module(s) in parallel. While thats not
a huge deal, it results in the setkey utility forking another copy of modprobe
(using callusermode_helper). The two copies of modprobe actually take file
locks on the module they are workign with, serializing them in user space. If
the setkey path and the modprobe path serialize at all in kernel space (by
taking the module_mutex for instance), you get deadlock.

I fixed this by making modprobe be non-blocking in user space (which is the
default for insmod as well), and it fixed the problem. The gory details are
here:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=16fcec35e7d7c4faaa4709f6434a4a25b06d25e3

My guess is that if you get the latest module-utils you can make this go away.

Neil

Américo Wang

unread,
Feb 23, 2010, 2:00:02 AM2/23/10
to
On Mon, Feb 22, 2010 at 8:04 PM, Neil Horman <nho...@tuxdriver.com> wrote:

I see, excellent! Thank you!

0 new messages