is the smp impl. by the nuttx with big kernel lock or multi-level fine gained lock?

71 views
Skip to first unread message

tugouxp tugouxp

unread,
Dec 10, 2019, 8:13:32 PM12/10/19
to NuttX
the kernel big lock is a direct and effiective way to protocted the share resources between cpus, but has weaks on performace.
the mult-level fin grained lock is more effenciency but hard to mantaince and need more skills and experience.
so which way did the present nutttx adopted? 

thanks for your kindly support!

Gregory Nutt

unread,
Dec 11, 2019, 8:35:48 AM12/11/19
to nu...@googlegroups.com
I am not so sure that this terminology makes sense within NuttX.
Inter-CPU locks are implemented as simple spinlocks.  There are many,
many spinlocks so I think this makes them fine grained.

The is an internal OS interface to enter a critical section.  On a
single CPU, it just disables interrupts.  In the SMP case, it disables
local interrupts and takes a spinlock.

Each semaphore also has a spinlock.  In order to take the semaphore, you
need to hold a spinlock.  There are, of course, many many instances of
semaphores, hence, I would think they classify as "fine-grained".  All 
higher level locking is performed on top of semaphores so the
spin-locking comes with the semaphores.

There are some miscellaneous use of spinlocks to control some inter-CPU
communications too.

That is all of the use of inter-CPU locking that I can thing of.


Gregory Nutt

unread,
Dec 11, 2019, 8:39:06 AM12/11/19
to nu...@googlegroups.com
The only down-side of this is that programming logic errors can often
result in deadlock conditions.  There is quite a bit of additional logic
in the OS to monitor spin lock usages and to attempt to trace the events
leading up to the deadlock.



Johnny Billquist

unread,
Dec 11, 2019, 3:13:34 PM12/11/19
to nu...@googlegroups.com, Gregory Nutt
On 2019-12-11 14:35, Gregory Nutt wrote:
>
>> the kernel big lock is a direct and effiective way to protocted the
>> share resources between cpus, but has weaks on performace.
>> the mult-level fin grained lock is more effenciency but hard to
>> mantaince and need more skills and experience.
>> so which way did the present nutttx adopted?
>
> I am not so sure that this terminology makes sense within NuttX.
> Inter-CPU locks are implemented as simple spinlocks.  There are many,
> many spinlocks so I think this makes them fine grained.

Inter-cpu locks are pretty much always some kind of spin-lock. There are
not really that many other ways of doing it...

> The is an internal OS interface to enter a critical section.  On a
> single CPU, it just disables interrupts.  In the SMP case, it disables
> local interrupts and takes a spinlock.

I think the central question here then is if there are individual
spinlocks for different critical sections, or just one. The fact that
there is just a disabling of interrupts in the single CPU case might
suggest that there is just one spinlock, but it's not necessarily so...
And since each semaphore (below) seems to have its own spin-lock, that
would suggest that maybe the locking is fine-grained.

> Each semaphore also has a spinlock.  In order to take the semaphore, you
> need to hold a spinlock.  There are, of course, many many instances of
> semaphores, hence, I would think they classify as "fine-grained".  All
> higher level locking is performed on top of semaphores so the
> spin-locking comes with the semaphores.

Yes, that I would definitely then call find-grained.
But the terminology is mostly just applied to critical sections in the
kernel, where various data structures are modified that all CPUs might
be interested in.

Johnny

--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: b...@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol

Alan Carvalho de Assis

unread,
Dec 11, 2019, 5:28:59 PM12/11/19
to nu...@googlegroups.com, Gregory Nutt
Few months ago Mr Dave Marples did some instrumentations on NuttX and after some improvements realized it had low jitter (few microseconds), even under high stress test.

I think we need to keep our eyes open to avoid some code that could degrade NuttX Realtime performance.

BR,

Alan

patacongo

unread,
Dec 11, 2019, 5:30:09 PM12/11/19
to NuttX
The critical sections in NuttX have to use a "big lock" because they have some special properaties:

1) The are nestable.  If an OS function creates a critical section then calls a function that creates a critical secion that calls a function... etc. those all nest nicely and un-nest cleanly so that only the outermost function can leave the critical section.

2) They are automatically released when a task suspends.  It is common practice in the OS to take a critical section then suspend.  While suspended, the critical section is released, but restore when the task is resumed.

I think that requires a big lock.

Some mitigating things:

In NuttX critical sections should only be used to control access to internal OS resources.  The sections must be very brief or they can harm real time performance.  There are places I am sure where a critical section is used because people are lazy (including me) and don't create a proper semaphore/mutex lock.

But we do know, which every is the case, that the critical sections are very brief.  The OS includes a critical section monifor that will monitor how long critcal sections, spinlocks, and semaphores are held.  The data is made available via the procfs file system and there an application that periodically samples that monitored data and prints the times that each lock is held.

It is not important how often the critical section is held, but only how long it is held.

With that that application, we can pretty much assure that we will meeting our realtime deadlines.

Gregory Nutt

unread,
Dec 11, 2019, 5:36:21 PM12/11/19
to Alan Carvalho de Assis, nu...@googlegroups.com

> Few months ago Mr Dave Marples did some instrumentations on NuttX and
> after some improvements realized it had low jitter (few microseconds),
> even under high stress test.

I did the instrumentation, but DaveM did the hard work of verifying
response stability.  He did find same cases where locks were being held
to long.  The critical section monitor is a very useful tool and anyone
doing time critical control systems should certainly check it out.

I forget what jitter he said he say.  I thought was less than that.  We
could search the group to find out.


Gregory Nutt

unread,
Dec 11, 2019, 5:39:32 PM12/11/19
to Alan Carvalho de Assis, nu...@googlegroups.com

> I forget what jitter he said he say.  I thought was less than that. 
> We could search the group to find out.

And the high priority, zero latency interrupts are also available when
needed.  They are a little more work to set up, but with those
interrupts you can get nanosecond-order-of-magnitude stability and they
are complete unaffected by locks and critical sections.

That is a longer story, but there is a wiki page if anyone is interested.


Alan Carvalho de Assis

unread,
Dec 11, 2019, 5:39:51 PM12/11/19
to nu...@googlegroups.com
Greg, Just for reference to confirm what you put between quotes is big-lock or not:

https://en.m.wikipedia.org/wiki/Giant_lock

What do you think? Is it big-lock or fine-grained?

BR,

Alan
> --
> You received this message because you are subscribed to the Google Groups "NuttX" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/7e45bebf-9bb9-4d6b-aa6e-c8f40312a5de%40googlegroups.com.
>

Gregory Nutt

unread,
Dec 11, 2019, 5:55:05 PM12/11/19
to nu...@googlegroups.com

> Greg, Just for reference to confirm what you put between quotes is
> big-lock or not:
>
> https://en.m.wikipedia.org/wiki/Giant_lock
>
> What do you think? Is it big-lock or fine-grained?

I would say that overall NuttX is fine grained but it is a mixture
depending on what you are considering.  Semaphore locks are clearly
fine-grained since there is one lock per semahore. But if you consider
the critical sections only, they would be big lock.

But NuttX does NOT have one big lock:  It uses one lock per instance of
any locking mechanism.  So I  think that in general is fine-grained. 
But when this thread begin, I said that I do not believe that the
concept maps well to the way that locking is managed in NuttX.  It does
not have one big lock, it has many locks of, let's say, varying "sizes".

"Size" in this case, is mostly a matter of  how many potential "clients"
of the lock there are.  Critical sections use a biggish lock because
critical sections are pervasive.  But the critical section lock is
probably not any "bigger" than a semaphore that also has pervasive usage
(like the semaphore that manages mutual access to the console device or
to the syslogging device).  All have basically the same effect for SMP
behavior.

And, again, the critical thing is NOT how often a resource is locked,
but rather for how long a resource is locked.  It is the duration of the
longest lock only that effects response time.



patacongo

unread,
Dec 11, 2019, 7:30:05 PM12/11/19
to NuttX
I suspect that a much bigger affect on SMP performance will be due to how well you design your memory architecture.  If you do not have a data cache, then the memory access collisions will be ruin you performance. Masayuki Ishikawa showed this effect in his NuttX2019 presentation: https://nuttx.events/wp-content/uploads/2019/11/MIshikawa_nx2019.pdf

And, if you do have a data cache, you will need some mechanism to maintain cache coherency between the CPUs.  So you are stuck between bad performance and complex cache designs.  It might be best to use a CPU with caches are designed to support SMP (like Cortex-A).

It is easy to see the effect of the memory conflict in SMP mode, but it could also be a significant effect in AMP too if there is shared RAM with no memory subsystem optimizations to support independent access to different RAM banks

Nathan Hartman

unread,
Dec 11, 2019, 7:57:44 PM12/11/19
to NuttX
I am using the zero latency interrupts. They execute "outside" the OS and at a higher priority. Even when the OS disables "all" interrupts, the zero latency interrupt remains enabled. I can confirm that the jitter is on the order of nanoseconds -- how many nanoseconds I don't remember, but when I last worked on this, I verified jitter using an oscilloscope with the zero latency interrupt triggered by a GPIO pin and instrumented to toggle a separate testpoint pin. The oscilloscope was set up to trigger on the 1st pin and show the signals of both pins, so we could see latency and jitter. This test was performed while under quite heavy load with NSH, web server (with a web browser repeatedly reloading pages over the network), and other programs running on NuttX. It was rock solid, with the only jitter being due, I think, to variance in the time length of uninterruptible CPU instructions.

Greg, Dave Marples, David Sidrane, and I tracked down and fixed one mistake that broke zero latency interrupts. See the thread "Zero latency interrupt isn't. Until..." -- https://groups.google.com/forum/#!searchin/nuttx/zero$20latency$20interrupt%7Csort:date/nuttx/AY4gxGDfm0Y/B_7XVq5_BQAJ

To Alan's point, we definitely need to keep a close watch on any changes to critical parts of the system where performance could be degraded.

Nathan

patacongo

unread,
Dec 11, 2019, 7:58:35 PM12/11/19
to NuttX

> I am not so sure that this terminology makes sense within NuttX.
> Inter-CPU locks are implemented as simple spinlocks.  There are many,
> many spinlocks so I think this makes them fine grained.

Inter-cpu locks are pretty much always some kind of spin-lock. There are
not really that many other ways of doing it...


Yes, but they are not all "simple spinlocks".  There are "fair" spinlocks that more complex.. like "Ticket Spinlocks."

I suppose I should consider using Ticket Spinlocks.  That can be a project for another day.

Gregory Nutt

unread,
Dec 11, 2019, 8:24:16 PM12/11/19
to nu...@googlegroups.com

> ... I can confirm that the jitter is on the order of nanoseconds --
> how many nanoseconds I don't remember, ...

There is normal jitter in interrupt execution due to the way interrupts
are implemented by ARM.  Interrupts can only be taken at instruction
boundaries and instructions execute with different times.  Therefore,
interrupts execution can be delayed from zero to the duration of the
shortest and longest instruction execution times.  This builtin hardware
jitter is on the order of a few nanoseconds (of course, depending on the
ARM system clock and upon keeping the pipeline full and not stalled). 
The builtin hardware jitter is probably enough to account for all of the
response jitter that you see with the high priority, zero latency
interrupts.

That is too much to type.  I should call the HPZL interrupts. The
Nucleus OS has a similar concept and I much prefer their terminology. 
They call them managed interrupts vs. raw interrupts.  That is much clearer.


Gregory Nutt

unread,
Dec 11, 2019, 8:30:11 PM12/11/19
to nu...@googlegroups.com

> Yes, but they are not all "simple spinlocks".  There are "fair"
> spinlocks that more complex.. like "Ticket Spinlocks."
>
> I suppose I should consider using Ticket Spinlocks.  That can be a
> project for another day.

Simple spinlocks can effect deterministic behavior.  You can imagine a
scenario where one CPU is delayed because other CPUs take and re-take
the spinlock without that one CPU ever getting it.  That adds some
random-ness to the SMP behavior that could be fixed with Ticket
spinlocks.  Tickets spinlocks force FIFO waiting for spinlocks and,
hence, are "fair".

Not a tough job.  Anyone up to the challenge?



Gregory Nutt

unread,
Dec 12, 2019, 11:20:22 AM12/12/19
to nu...@googlegroups.com

> I suspect that a much bigger affect on SMP performance will be due to
> how well you design your memory architecture.  If you do not have a
> data cache, then the memory access collisions will be ruin you
> performance. Masayuki Ishikawa showed this effect in his NuttX2019
> presentation:
> https://nuttx.events/wp-content/uploads/2019/11/MIshikawa_nx2019.pdf
>
> And, if you do have a data cache, you will need some mechanism to
> maintain cache coherency between the CPUs.  So you are stuck between
> bad performance and complex cache designs.  It might be best to use a
> CPU with caches are designed to support SMP (like Cortex-A).

Another cool option would be the ARMv7-R which supports the MPCore
options as well:  See
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0458c/BGBIIFJJ.html
and especially the SCU discussion under "Multiprocessing."

ASFAIK ARMv7-M does not support the MPCore option.


tugouxp tugouxp

unread,
Jan 13, 2020, 4:48:49 AM1/13/20
to nu...@googlegroups.com
could you show me how to use the zero latency interrupts in nuttx?

may be it is related how to register the interrupt and CPU architecture? can you show me how?

thank you!

--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.

Alan Carvalho de Assis

unread,
Jan 13, 2020, 5:10:21 AM1/13/20
to nu...@googlegroups.com
The reference page about it is here:

http://www.nuttx.org/doku.php?id=wiki:nxinternal:highperfints

There are some board with support to highpri that you can use as reference:

boards/arm/stm32/nucleo-f334r8
boards/arm/stm32/viewtool-stm32f107

BR,

Alan
>> <https://groups.google.com/d/msgid/nuttx/b6f9e340-83df-4fd2-a4bd-8a414305856c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "NuttX" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nuttx+un...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/nuttx/CAAhLDMamVtYjz2v1jQZAPPuTKh5L570UwkB14UND1Od1LnXJ9g%40mail.gmail.com.
>
Reply all
Reply to author
Forward
0 new messages