On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>
>> Hi all,
>>
>> I use the tip/sched/core branch.
>>
>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>
>> * It boots normally
>> * It sends info to the console
>> * The graphics does not work
>> * The terminals show the prompt, I can enter the username but after
>> pressing enter, it does not give the password prompt
>> * sysrq works more or less, I can't get the process stack but it receives
>> the command
>>
>> It is like no new process can be created.
>>
>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>
>> After git bisecting, the following patch seems to introduce the bug.
>>
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.
>
> Would you happen to have a .config for me to try?
I was able to reduce the scope and reproduce the issue.
AFAICT, that happens with rsyslogd. When login in a tty, the login
command sends a message through /dev/log. But rsyslogd is never woken up
and blocked in poll_schedule_timeout. The login process is blocked in
unix_wait_for_peer.
I can strace rsyslogd at startup. The two last sched_setscheduler calls
fail.
> grep sched trace.out
3570 sched_getparam(3570, { 0 }) = 0
3570 sched_getscheduler(3570) = 0 (SCHED_OTHER)
3570 sched_get_priority_min(SCHED_OTHER) = 0
3570 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_get_priority_min(SCHED_OTHER) = 0
3571 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_get_priority_min(SCHED_OTHER) = 0
3571 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = 0
3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_min resumed> ) = 0
3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_max resumed> ) = 0
3571 sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
permitted)
3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_min resumed> ) = 0
3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_max resumed> ) = 0
3571 sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
permitted)
The same strace but on a kernel which does not hang. The calls to
sched_setscheduler do not fail.
3292 sched_getparam(3292, { 0 }) = 0
3292 sched_getscheduler(3292) = 0 (SCHED_OTHER)
3292 sched_get_priority_min(SCHED_OTHER) = 0
3292 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_get_priority_min(SCHED_OTHER) = 0
3293 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_get_priority_min(SCHED_OTHER) = 0
3293 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_min resumed> ) = 0
3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_max resumed> ) = 0
3293 sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_min resumed> ) = 0
3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_max resumed> ) = 0
3293 sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
The EPERM error comes from kernel/sched/core.c:3303
...
if (fair_policy(policy)) {
if (!can_nice(p, attr->sched_nice))
return -EPERM;
}
...
But I don't know why this is leading to block a process or making
rsyslogd being not woken up by a packet coming in the af_unix socket.
I hope that helps
-- Daniel