spin locks debugging

Mark

unread,

Dec 17, 2009, 12:02:01 AM12/17/09

to

Hello

I have an old product to maintain, based on ucLinux-2.4.20, MMU-less ARM
processor. Periodically the system just reboots without any diagnostic
messages. I suspect spinlocks as one of sources of the problem, therefore
I've enabled DEBUG_SPINLOCKS=2 in include/linux/spinlock.h.

Now the kernel reboots at boo-up time, investigations have shown it happens
somewhere in 'rest_init()' in init/main.c. The code of the function as
follows:

static void rest_init(void)
{
kernel_thread(init, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
unlock_kernel();
current->need_resched = 1;
cpu_idle();
}

First three statements are invoked, and either in 'cpu_idle' or after the
system reboots. Am I doing something wrong? Isn't this a suitable mechanism
to debug?

PS. Setting DEBUG_SPINLOCKS=1 works fine, but doesn't provide full debugging
capabilities.

--
Mark

Rainer Weikusat

unread,

Dec 17, 2009, 4:50:13 AM12/17/09

to

"Mark" <mark_cruz...@hotmail.com> writes:
> I have an old product to maintain, based on ucLinux-2.4.20, MMU-less
> ARM processor. Periodically the system just reboots without any
> diagnostic messages. I suspect spinlocks as one of sources of the
> problem, therefore I've enabled DEBUG_SPINLOCKS=2 in
> include/linux/spinlock.h.

Unless your kernel is compiled for multi-processor support, no
spinlock-related code will be in it.

> Now the kernel reboots at boo-up time, investigations have shown it
> happens somewhere in 'rest_init()' in init/main.c. The code of the
> function as follows:
>
> static void rest_init(void)
> {
> kernel_thread(init, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
> unlock_kernel();
> current->need_resched = 1;
> cpu_idle();
> }
>
> First three statements are invoked, and either in 'cpu_idle' or after
> the system reboots.

Nothing is ever executed after cpu_idle. This is the co-called 'idle
thread' which is supposed to do nothing in a loop (literally) when no
other kernel scheduled entity (thread, kernel_thread, 'process') is
runnable. It is not entirely unconceivable that your hardware has a
problem with the architecture-specific idling-code executed from
cpu_idle. Otherwise, the problem is possibly external. The usual cause
(according to my experience) of spontaneous reboots are unstable power
supplies: If power is gone for a small amount of time, the system will
boot afterwards.

Mark

unread,

Dec 17, 2009, 7:55:07 PM12/17/09

to

"Rainer Weikusat" <rwei...@mssgmbh.com> wrote in message
news:87pr6e7...@fever.mssgmbh.com...

> Unless your kernel is compiled for multi-processor support, no
> spinlock-related code will be in it.

So it is pointless to use 'spinlocks' on a uniprocessor system?

> It is not entirely unconceivable that your hardware has a
> problem with the architecture-specific idling-code executed from
> cpu_idle.

void cpu_idle(void)
{
...
while (1) {
void (*idle)(void) = pm_idle;
if (!idle)
idle = arch_idle;
...
}

I have search through the entire kernel tree in 'arch/armnommu' and have not
found any platform specific 'arch_idle' definitions, even for platform
already ported and in the maintree.

> Otherwise, the problem is possibly external. The usual cause
> (according to my experience) of spontaneous reboots are unstable power
> supplies: If power is gone for a small amount of time, the system will
> boot afterwards.

The problem is that the board reboots only when some specific protocol is
activated (it's a network device), therefore my suspect on the code, rather
hardware.

--
Mark

Mark

unread,

Dec 17, 2009, 8:16:18 PM12/17/09

to

"Mark" <mark_cruz...@hotmail.com> wrote in message
news:hgejru$p3$1...@aioe.org...

> I have search through the entire kernel tree in 'arch/armnommu' and have
> not found any platform specific 'arch_idle' definitions, even for platform
> already ported and in the maintree.

Sorry, I was uncareful. The function in concealed in
'include/asm-armnommu/arch-myarch/system.h':

static inline void arch_idle(void)
{
while (!current->need_resched && !hlt_counter);
}

--
Mark

Mark

unread,

Dec 17, 2009, 9:31:17 PM12/17/09

to

"Rainer Weikusat" <rwei...@mssgmbh.com> wrote in message
news:87pr6e7...@fever.mssgmbh.com...

> Unless your kernel is compiled for multi-processor support, no
> spinlock-related code will be in it.

Macro 'spin_lock_irqsave(lock, flags)' gets down to:

do {
local_irq_save(flags);
((
unsigned long temp;
__asm__ __volatile__(
"mrs %0, cpsr @ save_flags_cli\n"
" orr %1, %0, #128\n"
" msr cpsr_c, %1"
: "=r" (flags), "=r" (temp)
:
: "memory");
))

(void) (lock);
} while (0)

while spin_unlock_irqrestore() simply restires the flags.

So it means whenever I use spin locks on a uniprocessor system, it only
disables/enables interrupts.

--
Mark

Kaz Kylheku

unread,

Dec 17, 2009, 9:39:14 PM12/17/09

to

On 2009-12-17, Mark <mark_cruz...@hotmail.com> wrote:
> static void rest_init(void)
> {
> kernel_thread(init, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
> unlock_kernel();
> current->need_resched = 1;
> cpu_idle();
> }
>
> First three statements are invoked, and either in 'cpu_idle' or after the
> system reboots.

Yes but note that rest_init is launching a thread, which calls
the init() function to complete the initialization. The system may be
dying in that thread. Put some printk's into that function.

David Schwartz

unread,

Dec 18, 2009, 1:05:30 AM12/18/09

to

On Dec 17, 4:55 pm, "Mark" <mark_cruzNOTFORS...@hotmail.com> wrote:

> So it is pointless to use 'spinlocks' on a uniprocessor system?

No, but they won't actually spin.

DS

Rainer Weikusat

unread,

Dec 18, 2009, 1:06:27 PM12/18/09

to

"Mark" <mark_cruz...@hotmail.com> writes:

> "Rainer Weikusat" <rwei...@mssgmbh.com> wrote:
>> Unless your kernel is compiled for multi-processor support, no
>> spinlock-related code will be in it.
>
> Macro 'spin_lock_irqsave(lock, flags)' gets down to:
>
> do {
> local_irq_save(flags);
> ((
> unsigned long temp;
> __asm__ __volatile__(
> "mrs %0, cpsr @ save_flags_cli\n"
> " orr %1, %0, #128\n"
> " msr cpsr_c, %1"
> : "=r" (flags), "=r" (temp)
> :
> : "memory");
> ))
>
> (void) (lock);
> } while (0)
>
> while spin_unlock_irqrestore() simply restires the flags.
>
> So it means whenever I use spin locks on a uniprocessor system, it
> only disables/enables interrupts.

Only for the *_irq*-variants. The others just do nothing. The purpose
of a spin lock is to provide mutex-semantics for code which must not
sleep (link itself onto a waitqueue and call the scheduler to cause a
different task to be scheduled), that is, code running in so-called
'interrupt context', which is executed autonomously by the kernel in
response to external events (interrupts), as opposed to code running
in 'process context' which is executed by some process/ thread which
has made a system call (and may sleep), and for code whose execution
must be serialized wrt other code running in interrupt context. On an
uniprocessor, it is sufficient to disable interrupts to achieve this
mutual exclusion because this guarantees that no other kernel code
suddenly starts to be executed. On a multiprocessor, the possibility
that some other CPU/ core/ hyperthread/ $whatever executes conflicting
kernel code exists. It would be possible to achieve mutual exclusion
on a multiprocessor by disabling interrupts 'globally', meaning, for
all CPUs/ ... which exist in the system but this is a really expensive
operation because it basically halts everything 'just in case' and
even 'just in an improbable case' because it is usually desirable that
lock contention is low, IOW, the chances that another $whatever will be
executing conflicting code should be slim except in pathological
cases. That's were the 'spin' part comes into play: This refers to a
busy-waiting loop which another processor will execute until the
'spin lock' is released by the code which presently holds it. This
still affects all processors in the system, because of the atomic
memory access operations necessary to implement the lock and a spin
lock more than one processor wants to acquire at the same time will
cause the corresponding cache line to bounce back and forth among the
processors which desire to own the lock (if there is only one
processor waiting for it, this processor can happily spin along for as
long as the cacheline belongs to it exclusively) but at least, this
happens only if there is actual contention for the lock.

For obvious reasons, an interrupt handler executing on the 'local' CPU
cannot 'spin' until the code it interrupted has released a spin lock.

Kaz Kylheku

unread,

Dec 19, 2009, 3:39:52 AM12/19/09

to

On 2009-12-18, Rainer Weikusat <rwei...@mssgmbh.com> wrote:
> "Mark" <mark_cruz...@hotmail.com> writes:
>> "Rainer Weikusat" <rwei...@mssgmbh.com> wrote:
>> So it means whenever I use spin locks on a uniprocessor system, it
>> only disables/enables interrupts.
>
> Only for the *_irq*-variants. The others just do nothing.

That is the case on a non-preemptible kernel. But on a preemptible
kernel, non-irq spinlocks have to do something: namely disable
preemption.

> The purpose
> of a spin lock is to provide mutex-semantics for code which must not
> sleep (link itself onto a waitqueue and call the scheduler to cause a
> different task to be scheduled), that is, code running in so-called
> 'interrupt context'

This is completely wrong.

A spinlock is simply a fast mutual exclusion primitive. It is not
a primitive that is dedicated to interrupts (but, obviously, the
augmented interrupt spinlock extends spinlocks to interrupt context).

Processes that hold a spinlock must not sleep or be preempted for the
simple reason that this would cause the waiting processes (which are
spinning to acquire the lock) to /massively/ bleed CPU time, in a way
only rivaled by Windows operating systems.

This has little to do with the reasons why interrupt context can't
sleep. I.e. yes, interrupt context can hold a spinlock, and interrupt
context cannot sleep. But this is not where the rule comes from that a
processor can't sleep while holding a spinlock. Sleeping is forbidden
even when holding a /non/-interrupt spinlock!

> which is executed autonomously by the kernel in
> response to external events (interrupts), as opposed to code running
> in 'process context' which is executed by some process/ thread which
> has made a system call (and may sleep)

Process context may not sleep when holding a spinlock (irq or regular).

> and for code whose execution
> must be serialized wrt other code running in interrupt context.

Non-irq spinlocks are used to efficiently serialize among
processors (in a non-preemptive SMP kernel).

The irq part extends the usefulness of spinlocks to interrupt context;
it basically combines two independent locks into one.

IRQ disabling provides exclusion between a processor and its interrupts,
but not against other processors. A spinlock provides exclusion against
other processors (efficiently, if combined with forbidden sleeping and
disabled preemption), but not against being interrupted. So: they are
combined together in the irq spinlock.

> kernel code exists. It would be possible to achieve mutual exclusion
> on a multiprocessor by disabling interrupts 'globally', meaning, for

No, it wouldn't. Disabling interrupts on other processors would
not stop them from running non-interrupt-context code which could
be a critical region. Doh?

The disabling interrupts model does not extend to processors, because
they are truly concurrent, and interrupts are not. You cannot model
the behavior of the other processors, running concurrently with this
one, as if they were interrupts.

> For obvious reasons, an interrupt handler executing on the 'local' CPU
> cannot 'spin' until the code it interrupted has released a spin lock.

The interrupt cannot spin, because it isn't happening. There is no
interrupt. If local the CPU holds an irq spin lock, then interrupts are
disabled, remember?

An interrupt can, of course, spin on the lock---if another CPU has it.

Rainer Weikusat

unread,

Dec 20, 2009, 3:42:49 PM12/20/09

to

Kaz Kylheku <kkyl...@gmail.com> writes:
> On 2009-12-18, Rainer Weikusat <rwei...@mssgmbh.com> wrote:
>> "Mark" <mark_cruz...@hotmail.com> writes:
>>> "Rainer Weikusat" <rwei...@mssgmbh.com> wrote:
>>> So it means whenever I use spin locks on a uniprocessor system, it
>>> only disables/enables interrupts.
>>
>> Only for the *_irq*-variants. The others just do nothing.
>
> That is the case on a non-preemptible kernel. But on a preemptible
> kernel, non-irq spinlocks have to do something: namely disable
> preemption.

Strictly speaking, no. If kernel preemption was compiled into the
kernel (doesn't exist for 2.4), in addition to disabling interrupts,
preemption also needs to be disabled to ensure exclusive
execution. Since this is always necessary when using a spin lock, the
code to do so was added to the corresponding routines. But this
doesn't affect the OP since his kernel doesn't support preemption (at
least not if it wasn't included explicitly) and still doesn't mean
that actual 'spin lock operations' would be performed.

>> The purpose of a spin lock is to provide mutex-semantics for code
>> which must not sleep (link itself onto a waitqueue and call the
>> scheduler to cause a different task to be scheduled), that is, code
>> running in so-called 'interrupt context'
>
> This is completely wrong.
>
> A spinlock is simply a fast mutual exclusion primitive. It is not
> a primitive that is dedicated to interrupts (but, obviously, the
> augmented interrupt spinlock extends spinlocks to interrupt
> context).

I didn't write that it was dedicated to interrupts. I wrote that its
purpose would be to provide mutex semantics for code which must not
sleep and hence, cannot use a semaphore or ordinary mutex. This is
usually code running in interrupt context.

> Processes that hold a spinlock must not sleep or be preempted for the
> simple reason that this would cause the waiting processes (which are
> spinning to acquire the lock) to /massively/ bleed CPU time, in a way
> only rivaled by Windows operating systems.

That's a pretty obvious consequence of the busy-waiting.

[...]

>> which is executed autonomously by the kernel in
>> response to external events (interrupts), as opposed to code running
>> in 'process context' which is executed by some process/ thread which
>> has made a system call (and may sleep)
>
> Process context may not sleep when holding a spinlock (irq or
> regular).

I didn't write anything about this since I (see above) considered this
to be obvious. Code running in process context, such as different
processes using the same driver, is allowed to sleep and hence, can
and usually does, use locks where 'sleeping' is an option.

>> and for code whose execution must be serialized wrt other code
>> running in interrupt context.
>
> Non-irq spinlocks are used to efficiently serialize among
> processors (in a non-preemptive SMP kernel).
>
> The irq part extends the usefulness of spinlocks to interrupt context;
> it basically combines two independent locks into one.
>
> IRQ disabling provides exclusion between a processor and its interrupts,
> but not against other processors. A spinlock provides exclusion against
> other processors (efficiently, if combined with forbidden sleeping and
> disabled preemption), but not against being interrupted. So: they are
> combined together in the irq spinlock.

There is no such thing as 'an irq spinlock'. For convenience, spin
lock locking and unlocking calls exist which also disable interrupts
on the local CPU because this is necessary to 'lock out' code running
in interrupt context.

>> kernel code exists. It would be possible to achieve mutual exclusion

>> on a multiprocessor by disabling interrupts 'globally', meaning, for
>
> No, it wouldn't. Disabling interrupts on other processors would
> not stop them from running non-interrupt-context code which could
> be a critical region. Doh?

Indeed.

[...]

>> For obvious reasons, an interrupt handler executing on the 'local' CPU
>> cannot 'spin' until the code it interrupted has released a spin lock.
>
> The interrupt cannot spin, because it isn't happening. There is no
> interrupt. If local the CPU holds an irq spin lock, then interrupts are
> disabled, remember?

You really seem to suffer from some kind of strange 'inverted cause
and effect syndrome': As I wrote: Interrupts need to be disabled on
the local CPU despite using a spin lock because otherwise, an
interrupt handler could try to acquire the same spin lock and this
would take a loooong time.

Rainer Weikusat

unread,

Dec 20, 2009, 4:24:06 PM12/20/09

to

Rainer Weikusat <rwei...@mssgmbh.com> writes:
> Kaz Kylheku <kkyl...@gmail.com> writes:

[...]

>> A spinlock is simply a fast mutual exclusion primitive. It is not
>> a primitive that is dedicated to interrupts (but, obviously, the
>> augmented interrupt spinlock extends spinlocks to interrupt
>> context).
>
> I didn't write that it was dedicated to interrupts. I wrote that its
> purpose would be to provide mutex semantics for code which must not
> sleep and hence, cannot use a semaphore or ordinary mutex. This is
> usually code running in interrupt context.

Before someone trips over this: Since interrupts must be disabled on
the local CPU if locking out other interrupt handlers locally is
necessary, this is, of course, meant to refer to 'mutual exclusion wrt
code running on other CPUs, be it interrupt handlers or anything
else'.

Rainer Weikusat

unread,

Dec 21, 2009, 7:03:02 AM12/21/09

to

"Mark" <mark_cruz...@hotmail.com> writes:
> "Rainer Weikusat" <rwei...@mssgmbh.com> wrote in message

[...]

>> Otherwise, the problem is possibly external. The usual cause
>> (according to my experience) of spontaneous reboots are unstable power
>> supplies: If power is gone for a small amount of time, the system will
>> boot afterwards.
>
> The problem is that the board reboots only when some specific protocol
> is activated (it's a network device), therefore my suspect on the
> code, rather hardware.

If this results in a hithertho dormant piece of hardware becoming
active (eg, a network interface), it is completely possible that the
power consumption suddenly increases. I had a very similar situation
here with an 802.11 interface: Whenever that was started, the board
would reboot (so I thought) after a while. Fortunately, the defective
power supply died completely a short time afterwards and after
replacing it, the problem disappeared.