did the smp mechanism has been widely test and proved stability now?

52 views
Skip to first unread message

tugouxp tugouxp

unread,
Dec 25, 2019, 1:55:58 AM12/25/19
to NuttX
did the smp mechanism has been widely test and proved stability now? i open the smp mode on  cortex-a7 platforom and meet crash issue that maybe root from 
smp implementations issues.
for example:
1. SGI interrupt cant be masked on cortex-a7 arch form arm spec, which would break the critical sections on another cpu, but i did see any protected operations done.
2. abut the tick hander, only cpu0 support tick handler present, and the global tick hander a matained by CPU0, another cpu cant git noticied by the tick hander so its`s RR scheduler
    cant work, and preemept mode in tick interrupt cant work.
3. ...

Xiang Xiao

unread,
Dec 25, 2019, 4:33:20 AM12/25/19
to NuttX


On Wednesday, December 25, 2019 at 2:55:58 PM UTC+8, tugouxp tugouxp wrote:
did the smp mechanism has been widely test and proved stability now? i open the smp mode on  cortex-a7 platforom and meet crash issue that maybe root from 
smp implementations issues.
for example:
1. SGI interrupt cant be masked on cortex-a7 arch form arm spec, which would break the critical sections on another cpu, but i did see any protected operations done.

Yes, that's why up_cpu_pausereq exist to handle SGI disabled case: the arch need correctly report SGI request is pending, so the scheduler can handle the request in enter_critical_section.
 
2. abut the tick hander, only cpu0 support tick handler present, and the global tick hander a matained by CPU0, another cpu cant git noticied by the tick hander so its`s RR scheduler
    cant work, and preemept mode in tick interrupt cant work.

Yes, NuttX scheduler just run on the main CPU which is different from Linux. But in this timer interrupt handler, the scheduler will do the schedule for all CPU.
You can see nxsched_process_scheduler call nxsched_cpu_scheduler for each CPU, so RR work well for other CPU too.
 
3. ...

tugouxp tugouxp

unread,
Dec 25, 2019, 7:11:15 AM12/25/19
to nu...@googlegroups.com
i see this.
thank you!

--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/bce10b0f-7421-4a61-b5c1-30588042536a%40googlegroups.com.

Masayuki Ishikawa

unread,
Dec 25, 2019, 8:24:12 AM12/25/19
to nu...@googlegroups.com
Hello, 

I only tested the NuttX SMP on both LC823450 (dual Cortex-M3) and Spresense (6 cores Cortex-M4F).
In my experience, the NuttX SMP is stable for basic use-cases when running in dual core.

However, I know that the NuttX SMP implementation for Cortex-A has a problem in IPI (Inter Processor Interrupt).
Please see TODO in the nuttx repository.

===
  Title:       CORTEX-A GIC SGI INTERRUPT MASKING                                                                            
  Description: In the ARMv7-A GICv2 architecture, the inter-processor                                                        
               interrupts (SGIs) are non maskable and will occur even if                                                    
               interrupts are disabled.  This adds a lot of complexity                                                      
               to the ARMV7-A critical section design.                                                                      
                                                                                                                             
               Masayuki Ishikawa has suggested the use of the GICv2 ICCMPR                                                  
               register to control SGI interrupts.  This register (much like                                                
               the ARMv7-M BASEPRI register) can be used to mask interrupts                                                  
               by interrupt priority.  Since SGIs may be assigned priorities                                                
               the ICCMPR should be able to block execution of SGIs as well.     
===

Thanks,
Masayuki

2019年12月25日(水) 15:56 tugouxp tugouxp <tug...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.

Xiang Xiao

unread,
Dec 25, 2019, 8:43:18 AM12/25/19
to NuttX
Masayuki' answer is right, I misunderstand the first question, sorry.
To unsubscribe from this group and stop receiving emails from it, send an email to nu...@googlegroups.com.

patacongo

unread,
Dec 25, 2019, 8:49:52 AM12/25/19
to NuttX
I think I am the only the person who has tested SMP on Cortex-A.  I cannot say that it is 100% stable, mostly because of the SGI issues that Masayuki Ishikawa mentions.  Because, otherwise, there is no basic functional difference between the Cortex-A and Cortex-M3/4 and ESP32 SMP implementations (and in the number of cores).

The original SMP implementation was done on a quad-core i.MX6.  But since this was only a development platform, it has not gotten the scrutiny that other platforms have.  It has not had the testing like Ishikawa-san has done with the Cortex-M3/4.  My experience is that if a long time passes and I reverify on the i.MX6, there will be new hangs.  These are almost always associated with inter-CPU communications and, hence, most likely with the SGI issues.

Using fine-grained spinlocks, deadlocks are a common symptom.  There is a good discussion in the thread about deadlocks and how to debug them:  https://groups.google.com/forum/#!topic/nuttx/KVcRTEK3BYw  But since you started that thread, I am sure that you are aware of it.


patacongo

unread,
Dec 25, 2019, 9:07:42 AM12/25/19
to NuttX

1. SGI interrupt cant be masked on cortex-a7 arch form arm spec, which would break the critical sections on another cpu, but i did see any protected operations done.

There is special, very complex logic to handle the non-maskable  interrupt case.  See, for example:
  • There is a loop around line 190 in sched/irq/irq_csection.c that will try repeatedly to get a spinlock in the
  • There is logic in arch/arm/src/armv7-a/arm_vectors.S to handle cases where interrupt handling is re-entered by a non-maskable SGI.
That is kind of complexity that makes life more difficult on Cortex-A.  That could all go away if Ishikawa-san's suggestion to disable SGIs works.

2. abut the tick hander, only cpu0 support tick handler present, and the global tick hander a matained by CPU0, another cpu cant git noticied by the tick hander so its`s RR scheduler
    cant work, and preemept mode in tick interrupt cant work.

No, that is not true.  The timer interrupt may run on one CPU, but the scheduler will run (on that CPU) and the handle the timer interrupt for all CPUs.  See for example the use of

int  sched_cpu_select(cpu_set_t affinity);

in sched/sched/*  That is the logic that picks which CPU a new task will run on.  It executes on any CPU and related logic can start or stop tasks running on other CPUs.

Greg

patacongo

unread,
Dec 25, 2019, 9:09:44 AM12/25/19
to NuttX
I am very interested in getting SMP stabilized on Cortex-A and would be happy to help you debug any issues.  What hardware are you using?  Is in internal proprietary hardware or something that we could work together with that is commercially available?

tugouxp tugouxp

unread,
Dec 25, 2019, 8:09:34 PM12/25/19
to nu...@googlegroups.com
@Xiao Xiang
"up_cpu_pausereq" is used for check if there are pending  SGI2 interrupt  need to be dealt with during  the try lock operation.
but what would happened if  during the "enter_critical_section" succeed and the critical section setup. an SGI recevied? 
would the sgi interrupt  break the critical section and cause race conditions? 

tugouxp tugouxp

unread,
Dec 25, 2019, 8:10:57 PM12/25/19
to nu...@googlegroups.com

tugouxp tugouxp

9:09 AM (0 minutes ago)
to nuttx
@Xiao Xiang
"up_cpu_pausereq" is used for check if there are pending  SGI2 interrupt  need to be dealt with during  the try lock operation.
but what would happened if  the sgi happend after the "enter_critical_section" succeed and the critical section  has been setup. ? 
would the sgi interrupt  break the critical section and cause race conditions? 
On Wed, Dec 25, 2019 at 5:33 PM Xiang Xiao <xiaoxia...@gmail.com> wrote:

tugouxp tugouxp

unread,
Dec 25, 2019, 8:12:39 PM12/25/19
to nu...@googlegroups.com
@Xiao Xiang
"up_cpu_pausereq" is used for check if there are pending  SGI2 interrupt  need to be dealt with during  the try lock operation.
but what would happened if  the sgi happens after the "enter_critical_section" succeed and the critical section has been setup? 
would the sgi interrupt  break the critical section and cause race conditions? 
On Wed, Dec 25, 2019 at 5:33 PM Xiang Xiao <xiaoxia...@gmail.com> wrote:

tugouxp tugouxp

unread,
Dec 25, 2019, 8:37:19 PM12/25/19
to nu...@googlegroups.com
thanks for your kindly help and appreciate for your offertory.

i work on  dual-core Cortex-A7  with arm public version, and GICv2 controller,with dram 64M  , now i meet some stablility issues and very likely related about the atomic and SGI operation.
i need to get a deep investigation to offer more information,

On Wed, Dec 25, 2019 at 10:09 PM patacongo <spud...@gmail.com> wrote:
I am very interested in getting SMP stabilized on Cortex-A and would be happy to help you debug any issues.  What hardware are you using?  Is in internal proprietary hardware or something that we could work together with that is commercially available?

--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.

tugouxp tugouxp

unread,
Dec 25, 2019, 8:37:33 PM12/25/19
to nu...@googlegroups.com
thanks for your kindly help and appreciate for your offertory.

i work on  dual-core Cortex-A7  with arm public version, and GICv2 controller,with dram 64M  , now i meet some stablility issues and very likely related about the atomic and SGI operation.
i need to get a deep investigation to offer more information,


On Wed, Dec 25, 2019 at 10:09 PM patacongo <spud...@gmail.com> wrote:
I am very interested in getting SMP stabilized on Cortex-A and would be happy to help you debug any issues.  What hardware are you using?  Is in internal proprietary hardware or something that we could work together with that is commercially available?

Gregory Nutt

unread,
Dec 25, 2019, 8:42:36 PM12/25/19
to nu...@googlegroups.com
> thanks for your kindly help and appreciate for your offertory.
>
> i work on  dual-core Cortex-A7  with arm public version, and GICv2
> controller,with dram 64M  , now i meet some stablility issues and very
> likely related about the atomic and SGI operation.
> i need to get a deep investigation to offer more information,
>
Also look at boards/arm/imx6/sabre-6quad/README.txt.  There is more
complete status and some good debug tips there.


tugouxp tugouxp

unread,
Dec 26, 2019, 4:22:56 AM12/26/19
to NuttX
excuse me, i want to know about which Cortex-M3/4 platform has been fully tested the SMP mode? 
could you offer some information's about this board?

Masayuki Ishikawa

unread,
Dec 26, 2019, 6:42:12 AM12/26/19
to nu...@googlegroups.com
Hello,

>excuse me, i want to know about which Cortex-M3/4 platform has been fully tested the SMP mode? 

As I wrote before, the processors which I used to test the NuttX SMP were LC823450 and Spresense (cxd5602).


As for Sony Spresense, I recommend that you buy both Spresense main board and Spresense extension board.
And to run the NuttX SMP kernel with Spresense, you need to use the latest bootloader which is available at Spresense SDK site.

Thanks,
Masayuki

2019年12月26日(木) 18:22 tugouxp tugouxp <tug...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages