写了个小内核模块打印出 freeshell 两个杀不死的进程的 kernel stack trace,看起来挺有意思的,一起分析分析吧。
Kernel: 2.6.32-5-openvz-amd64 (in Debian package linux-image-2.6.32-5-openvz-amd64, 2.6.32-48squeeze4)
Pid 21299 on freeshell 378 on node 7:
User-mode process: /usr/sbin/apache2
[<ffffffff81051b55>] ? do_exit+0xdf/0x75b
[<ffffffff8104e7ee>] ? release_console_sem+0x192/0x1c4
[<ffffffff812ec5bc>] ? oops_end+0xaf/0xb4
[<ffffffff81032353>] ? no_context+0x1e9/0x1f8
[<ffffffff810d24dc>] ? free_pgtables+0x55/0xbd
[<ffffffff81032506>] ? __bad_area_nosemaphore+0x1a4/0x1c8
[<ffffffff810bf313>] ? release_pages+0x17b/0x18d
[<ffffffff8101172e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff812eba85>] ? page_fault+0x25/0x30
[<ffffffff810d24dc>] ? free_pgtables+0x55/0xbd
[<ffffffff812eb619>] ? _spin_lock_irqsave+0x1a/0x34
[<ffffffff8107365f>] ? uncharge_beancounter+0x1f/0x50
[<ffffffff810741d4>] ? ub_slab_uncharge+0x29/0x42
[<ffffffff810e9314>] ? kmem_cache_free+0xc3/0xd1
[<ffffffff8117d290>] ? prio_tree_remove+0xbd/0xc5
[<ffffffff810d24dc>] ? free_pgtables+0x55/0xbd
[<ffffffff810d3df2>] ? exit_mmap+0xf4/0x14d
[<ffffffff8104bbca>] ? mmput+0x2b/0xf4
[<ffffffff810502dc>] ? exit_mm+0x115/0x120
[<ffffffff81051c7c>] ? do_exit+0x206/0x75b
[<ffffffff81052247>] ? do_group_exit+0x76/0x9d
[<ffffffff81052280>] ? sys_exit_group+0x12/0x16
[<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
==========================================
Pid 8772 on freeshell 397 on node 5:
Kernel thread: apache2
Additional info: Process (e.g. ps aux) becomes uninterruptible while reading /proc/8772/cmdline
[<ffffffff8107fe51>] ? __module_text_address+0x9/0x56
[<ffffffff812eb527>] ? rwsem_down_failed_common+0x8c/0xa8
[<ffffffff812eb58a>] ? rwsem_down_read_failed+0x22/0x2b
[<ffffffff81182514>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff811a48b9>] ? vgacon_cursor+0x0/0x140
[<ffffffff812eaf3d>] ? down_read+0x17/0x19
[<ffffffff81088b46>] ? acct_collect+0x3e/0x16c
[<ffffffff81051c47>] ? do_exit+0x1d1/0x75b
[<ffffffff8104e7ee>] ? release_console_sem+0x192/0x1c4
[<ffffffff812ec5bc>] ? oops_end+0xaf/0xb4
[<ffffffff81032353>] ? no_context+0x1e9/0x1f8
[<ffffffff810d24dc>] ? free_pgtables+0x55/0xbd
[<ffffffff81032506>] ? __bad_area_nosemaphore+0x1a4/0x1c8
[<ffffffff8108036a>] ? search_module_extables+0x3c/0x66
[<ffffffff812eda9a>] ? do_page_fault+0x185/0x2fc
[<ffffffff812eba85>] ? page_fault+0x25/0x30
[<ffffffff810d24dc>] ? free_pgtables+0x55/0xbd
[<ffffffff812eb619>] ? _spin_lock_irqsave+0x1a/0x34
[<ffffffff8107365f>] ? uncharge_beancounter+0x1f/0x50
[<ffffffff810741d4>] ? ub_slab_uncharge+0x29/0x42
[<ffffffff810e9314>] ? kmem_cache_free+0xc3/0xd1
[<ffffffff8117d290>] ? prio_tree_remove+0xbd/0xc5
[<ffffffff810d24dc>] ? free_pgtables+0x55/0xbd
[<ffffffff810d3c30>] ? unmap_region+0xf1/0x132
[<ffffffff810d4dfe>] ? do_munmap+0x273/0x2e1
[<ffffffff810d4eac>] ? sys_munmap+0x40/0x59
[<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
另外,把进程状态变成 TASK_INTERRUPTBLE, TASK_RUNNING, TASK_STOPPED 后,kill -9 仍然杀不掉。我首先尝试的改变进程状态,发现进程状态改变了但仍然杀不掉,再看 stack trace 就知道这是徒劳了:这两个进程已经深陷 do_exit 这个给自己收尸的地方,肯定不会“浮”上来响应信号的。