A smooth running ArduCopter on a RPI A+

361 visninger
Gå til det første ulæste opslag

msue...@googlemail.com

ulæst,
26. mar. 2016, 14.05.1926.03.2016
til drones-discuss

Hi developers,


Inspired by the BBBMini project I spend some time in the last months to get the ArduCopter running on a standard RPI A+ and would like to share my experiences now.


My hardware setup is as follow:

Raspberry PI A+ running a Linux PREEMPT_RT kernel, Sensors are the MPU9250\AK8963 and MS5611 connected to the SPI bus, GPS on serial /dev/ttyAPM.

After some source code configurations to compile the ArduCopter project for this setup, I realized many overruns in the 400 Hz main loop by getting performance messages like:


PERF: 57/4000 15373 685


Because of this (heavy) overruns I searched for the reasons and found two:


1. The SPI bus


SPI sensors will be read by ArduPilot's highest prio 1ms timer-thread. This thread then calls the standard Linux SPI driver synchronously. The Linux driver however use it's own kernel thread to perform SPI operations, but the priority of this thread is less than the priority of the 1ms timer thread.

My Conclusion: Using the Linux driver leads to a priority inversion which blocks the whole ArduCopter for occasionally long times. There were two options I thought about to improve the real-time capability:

  • Using the RPI SPI hardware directly

  • Using a software bit-banging driver

I focused on bit-banging:

Sensors like the MPU9250 can be clocked up to 20Mhz. Reachable with software bit-banging and hal.gpio methods to get\set gpio pins is between 1.5Mhz and 2.0Mhz on the RPI. With RPI specific gpio access macros, a clock rate between 6.5 Mhz and 7.5 Mhz could be reached. Sofware bit-banging is (as expected) slower than using SPI hardware, but keep in mine only few bytes have to be read each 1ms. First tests shows: bit-baning only with 1.5Mhz (to read the 14 bytes of the MPU9250's sensor) is faster (more than double) as using the Linux SPI driver and (more importend) with a predictable real-time behavior. This was confirmed by performance messages I got with bit-banging:


PERF: 15/4000 3298 1721


What do you think about an integration of a bit-banging SPI driver into ArduPilot?


2. The main Loop priority


The ArduCopter's main loop isn't the highest prior thread and can be interrupted (for example) by the 1ms timer thread and by backround Linux IRQ-threads which handles USB stuff. The main loop however contains a time critical part, which is waiting for the next loop time, reading AHRS values and executing the rate controllers. In detail:


void Copter::loop(){

     // wait for an INS sample

     ins.wait_for_sample();

     ...

     // IMU DCM Algorithm

     // --------------------

     read_AHRS();


     // run low level rate controllers that only require IMU data

     attitude_control.rate_controller_run();

   ...


I changed the code by increasing the priority temporary as follow:


void Copter::loop(){

 

   → set the priority to highest possible, go sleeping but return immediately when time is elapsed

     // wait for an INS sample

     ins.wait_for_sample();

     ...

     // IMU DCM Algorithm

     // --------------------

     read_AHRS();

     // run low level rate controllers that only require IMU data

     attitude_control.rate_controller_run();

   → set the priority back to default


With this changes only I got worst case performance messages (with video broadcasting in the backround) like this:


PERF: 0/4000 2791 2392


What do you think about temporary increasing the priority of the main-loop thread?


Markus

Andrew Tridgell

ulæst,
27. mar. 2016, 17.24.4827.03.2016
til drones-discuss
Hi Markus,

nice work!

> SPI sensors will be read by ArduPilot's highest prio 1ms timer-thread. This
> thread then calls the standard Linux SPI driver synchronously. The Linux
> driver however use it's own kernel thread to perform SPI operations, but
> the priority of this thread is less than the priority of the 1ms timer
> thread.
>
> My Conclusion: Using the Linux driver leads to a priority inversion which
> blocks the whole ArduCopter for occasionally long times.

hmm, while this is true, I'd expect the time spent in the kernel thread
for each SPI transfer to be short. For example, the mpu9250 is running
the bus at 20MHz and each transfer is 14 bytes, so we'd expect transfers
to be 6usec or so. I don't think that explains your results unless
something else is going on.

> Sensors like the MPU9250 can be clocked up to 20Mhz. Reachable with
> software bit-banging and hal.gpio methods to get\set gpio pins is between
> 1.5Mhz and 2.0Mhz on the RPI. With RPI specific gpio access macros, a clock
> rate between 6.5 Mhz and 7.5 Mhz could be reached. Sofware bit-banging is
> (as expected) slower than using SPI hardware, but keep in mine only few
> bytes have to be read each 1ms. First tests shows: bit-baning only with
> 1.5Mhz (to read the 14 bytes of the MPU9250's sensor) is faster (more than
> double) as using the Linux SPI driver and (more importend) with a
> predictable real-time behavior. This was confirmed by performance messages
> I got with bit-banging:
>
>
> PERF: 15/4000 *3298 *1721

the really interesting thing is the great reduction in the worst case
time.

> What do you think about an integration of a bit-banging SPI driver into
> ArduPilot?

I'd like Lucas to look at this as well, but I'm not opposed to having
bit-banging as an option. It would be good to understand why it helps so
much though. If your results are consistent then you've reduced the
worst case time by 12ms. That is a very long delay. I wonder where that
is coming from?

> 2. The main Loop priority
>
>
> The ArduCopter's main loop isn't the highest prior thread and can be
> interrupted (for example) by the 1ms timer thread and by backround Linux
> IRQ-threads which handles USB stuff. The main loop however contains a time
> critical part, which is waiting for the next loop time, reading AHRS values
> and executing the rate controllers. In detail:
> ....
> → *set the priority to highest possible, go sleeping but return
> immediately when time is elapsed *
> ...
> What do you think about temporary increasing the priority of the main-loop
> thread?

this should definately go in, but we should use the
hal.scheduler->delay_microseconds_boost() method. That method is already
called in the right places in AP_InertialSensor library, and is
implemented in HAL_PX4:

https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_HAL_PX4/Scheduler.cpp#L125

I added it to PX4 for exactly the reason you want it for HAL_Linux. We
just didn't implement it in AP_HAL_Linux yet.

Adding that to PX4 helped a lot and I think it would also help for
HAL_Linux, as you have found.

Thanks for working on this!

Cheers, Tridge

msue...@googlemail.com

ulæst,
28. mar. 2016, 09.00.3128.03.2016
til drones-discuss

Hi Tridge,


I can't explain the reasons in detail (I'am not a linux expert), but found a discussion that may be does https://www.raspberrypi.org/forums/viewtopic.php?f=44&t=19489&start=142.


In recent kernel version the SPI driver has obviously been improved, but when reading the whole discussion, I have some doubts that using this driver in real time applications like ArduPilot is the right way. I think the Linux SPI stack was designed with other use cases in mine.


>>I'd like Lucas to look at this as well, but I'm not opposed to having bit-banging as an option.


In this case I will polish my current bit-banging implementation and make a pull request within the next week.


>> this should definately go in, but we should use the hal.scheduler->delay_microseconds_boost() method.


I looked for this function and how it is implemented in the PX4 HAL. A little bit 'magic' for me is the constant “APM_MAIN_PRIORITY_BOOST_USEC”. How was it determined? Includes this time also the execution of the rate controllers on a PX4 with boost priority?


Bye for now, Markus

Lucas De Marchi

ulæst,
28. mar. 2016, 13.25.3028.03.2016
til drones-discuss
Hi Markus,

On Sat, Mar 26, 2016 at 2:16 PM, msuevern via drones-discuss
<drones-...@googlegroups.com> wrote:
> Hi developers,
>
>
> Inspired by the BBBMini project I spend some time in the last months to get
> the ArduCopter running on a standard RPI A+ and would like to share my
> experiences now.
>
>
> My hardware setup is as follow:
>
> Raspberry PI A+ running a Linux PREEMPT_RT kernel, Sensors are the

which version?

> MPU9250\AK8963 and MS5611 connected to the SPI bus, GPS on serial
> /dev/ttyAPM.
>
> After some source code configurations to compile the ArduCopter project for
> this setup, I realized many overruns in the 400 Hz main loop by getting
> performance messages like:
>
>
> PERF: 57/4000 15373 685

ough. That's a long delay. I'm asking which version of the kernel
above because there's probably a bug there that you are reproducing.
I've seen on some rpi-based hardware a huge workload on irq thread
when using PREEMPT_RT. Here the patch set is causing more harm than
it's helping with the latency.


> Because of this (heavy) overruns I searched for the reasons and found two:
>
>
> 1. The SPI bus
>
>
> SPI sensors will be read by ArduPilot's highest prio 1ms timer-thread. This
> thread then calls the standard Linux SPI driver synchronously. The Linux
> driver however use it's own kernel thread to perform SPI operations, but the
> priority of this thread is less than the priority of the 1ms timer thread.

It shouldn't use a separate kernel thread. At least not in upstream
kernel. It should just configure the spi controller and be done with
that. In any case, Linux has built-in support for priority inversion,
so even if the RPI-based kernel used a kernel thread, an inversion in
the priority should cause the low priority thread to be boosted
automatically.

Besides that, kernel thread usually run within the 50-99 rt
priorities. And our threads are in the 1-15 range, which means our
thread will never have higher priority than the kernel threads.


What does the following command return to you?

ps -eL -o tid,comm,class,rtprio| grep -ve "-$"

> My Conclusion: Using the Linux driver leads to a priority inversion which
> blocks the whole ArduCopter for occasionally long times. There were two

As mentioned above this should really not occur. I think what you are
seeing is not due to PI. In order to check a simple thing to do would
be to change the priority of the kernel thread. This can be as simple
as using the chrt utility. See this example for changing the priority
of the kernel thread for iwlwifi interrupts:
http://showterm.io/9ab8e42c03389e72c2c2c

> options I thought about to improve the real-time capability:
>
> Using the RPI SPI hardware directly
>
> Using a software bit-banging driver
>
> I focused on bit-banging:
>
> Sensors like the MPU9250 can be clocked up to 20Mhz. Reachable with software
> bit-banging and hal.gpio methods to get\set gpio pins is between 1.5Mhz and
> 2.0Mhz on the RPI. With RPI specific gpio access macros, a clock rate
> between 6.5 Mhz and 7.5 Mhz could be reached. Sofware bit-banging is (as
> expected) slower than using SPI hardware, but keep in mine only few bytes
> have to be read each 1ms. First tests shows: bit-baning only with 1.5Mhz (to
> read the 14 bytes of the MPU9250's sensor) is faster (more than double) as
> using the Linux SPI driver and (more importend) with a predictable real-time

Bit-banging has its own problems. It can't be more predictable
particularly because you will have failed transactions that need to be
retried, mostly due to rescheduling of the bit-banging thread. I don't
think this is would really be a viable approach - it brings a lot of
headache.


> behavior. This was confirmed by performance messages I got with bit-banging:
>
>
> PERF: 15/4000 3298 1721
>
>
> What do you think about an integration of a bit-banging SPI driver into
> ArduPilot?

The minlure board is actually using bit-bang for the second I2C. It
brings lots of headache for us. I really advise against this solution.
If done at all, it should be abstracted by the kernel, and not done as
a userspace thread. See
http://lxr.free-electrons.com/source/drivers/spi/spi-gpio.c

It should be mostly a matter of declaring the spi pins and let this
driver handle the bitbang. From ardupilot's pov, what will happen is
that a new /dev/spidevX.Y will appear.

> 2. The main Loop priority
>
>
> The ArduCopter's main loop isn't the highest prior thread and can be
> interrupted (for example) by the 1ms timer thread and by backround Linux
> IRQ-threads which handles USB stuff. The main loop however contains a time

IRQ threads should be of higher priority, but you are right wrt the
timer thread.

> I changed the code by increasing the priority temporary as follow:
>
>
> void Copter::loop(){
>
>
>
> → set the priority to highest possible, go sleeping but return
> immediately when time is elapsed
>
> // wait for an INS sample
>
> ins.wait_for_sample();
>
> ...
>
> // IMU DCM Algorithm
>
> // --------------------
>
> read_AHRS();
>
> // run low level rate controllers that only require IMU data
>
> attitude_control.rate_controller_run();
>
> → set the priority back to default
>
>
> With this changes only I got worst case performance messages (with video
> broadcasting in the backround) like this:

I'd rather say we don't need all of this. The main thread should be
the one with highest priority. Even if the Linux kernel didn't have
the priority inversion feature, we should be running the main thread
priority with the highest priority. Reason is that this thread is a
consumer of the timer thread and will schedule out as soon as it
doesn't have more data to execute. If it has new data, we want it to
run. Anyway, some relevant articles on the matter for those wanting to
read more about the priority inversion/inheritance:

https://lwn.net/Articles/360699/
https://lwn.net/Articles/178253/
http://man7.org/linux/man-pages/man2/futex.2.html


> PERF: 0/4000 2791 2392
>
>
> What do you think about temporary increasing the priority of the main-loop
> thread?

Could you try the following? Just increase the main thread priority
above the timer thread and check what numbers you are getting. If
still getting bad numbers we can think on more ways to debug it.
Increasing the main thread prio is something I proposed ~ a year ago
but I didn't have convincing numbers since the minnowboard has enough
cpu bandwidth not to need it. I'm carrying this patch for a long
time in search of numbers to justify it for upstream:

https://github.com/lucasdemarchi/ardupilot/commit/2634ad8798251edc8aae54e3ffb2a90611e33187

Thanks for looking into this


Lucas De Marchi

Lucas De Marchi

ulæst,
28. mar. 2016, 14.12.2628.03.2016
til drones-discuss
On Mon, Mar 28, 2016 at 2:25 PM, Lucas De Marchi
<lucas.d...@gmail.com> wrote:
> Hi Markus,
>
> On Sat, Mar 26, 2016 at 2:16 PM, msuevern via drones-discuss
> <drones-...@googlegroups.com> wrote:
>> Hi developers,
>>
>>
>> Inspired by the BBBMini project I spend some time in the last months to get
>> the ArduCopter running on a standard RPI A+ and would like to share my
>> experiences now.
>>
>>
>> My hardware setup is as follow:
>>
>> Raspberry PI A+ running a Linux PREEMPT_RT kernel, Sensors are the
>
> which version?
>
>> MPU9250\AK8963 and MS5611 connected to the SPI bus, GPS on serial
>> /dev/ttyAPM.

Oh... and now reading more on the link you pointed out. It's not
entirely true, but the additional overhead of rescheduling for such
small transfers is. This should only introduce a few microseconds,
though.

On the few microseconds overhead the good thing to do would be to use
the kernel drivers for sensors, i.e. the IIO subsystem. We've been
busy enabling exactly these sensors you are talking about to be used
inside the kernel (except for GPS). From userspace you will only get
a chardev to read and interpret the data.

Lucas De Marchi

msue...@googlemail.com

ulæst,
10. apr. 2016, 10.12.1410.04.2016
til drones-discuss

Hi Luca,


thank you for your answer and sorry for the late response. Last week I invested the problem more in detail and want to share my findings with you.

To your questions:


>>which version?


The version I'm using is „3.18.9-rt5+“


>>I've seen on some rpi-based hardware a huge workload on irq thread
>>when using PREEMPT_RT. Here the patch set is causing more harm than
>>it's helping with the latency


I checked the workload, but found no aberrations.


>>It shouldn't use a separate kernel thread. At least not in upstream
>>kernel.  It should just configure the spi controller and be done with
>>that.


I can't verify this behaviour. Please see my scheduler tracing results below.


 In any case, Linux has built-in support for priority inversion,
>>so even if the RPI-based kernel used a kernel thread, an inversion in
>>the priority should cause the low priority thread to be boosted
>>automatically.

Yes, but only when using semaphores, if I understand your linked articles correctlly. 


>>Besides that, kernel thread usually run within the 50-99 rt
>>priorities. And our threads are in the 1-15 range, which means our
>>thread will never have higher priority than the kernel threads.


>>What does the following command return to you?

>> ps -eL -o tid,comm,class,rtprio| grep -ve "-$"


The command shows:


TID COMMAND CLS RTPRIO

3 ksoftirqd/0 FF 1

10 rcub/0 FF 1

11 rcuc/0 FF 1

13 posixcputmr/0 FF 99

20 irq/65-ARM Mail FF 50

37 irq/16-bcm2708_ FF 50

41 irq/66-VCHIQ do FF 50

44 irq/32-dwc_otg FF 50

45 irq/32-dwc_otg_ FF 50

46 irq/32-dwc_otg_ FF 50

48 irq/24-DMA IRQ FF 50

49 irq/25-DMA IRQ FF 50

50 irq/84-mmc0 FF 50

252 irq/79-bcm2708_ FF 50

3261 irq/80-bcm2708_ FF 50

3385 ArduCopter.elf FF 12

3386 sched-timer FF 15

3387 sched-uart FF 14

3388 sched-rcin FF 13

3389 sched-tonealarm FF 11

3390 sched-io FF 10

3391 irq/83-uart-pl0 FF 50


Not shown by this command are the kworker-threads, which are used in the SPI-Driver.


ps -eL -o tid,comm,class,pri,rtprio shows for example:


3394 kworker/0:0 TS 19 -

3395 kworker/u2:4 TS 19 -

3403 kworker/u2:3 TS 19 -

3411 kworker/u2:1 TS 19 -

3415 kworker/u2:2 TS 19 -

3418 ps TS 19 -


>>As mentioned above this should really not occur. I think what you are
>>seeing is not due to PI. In order to check a simple thing to do would
>>be to change the priority of the kernel thread. This can be as simple
>>as using the chrt utility. See this example for changing the priority


To find out what's happens exactly, I used the linux kernel tracing capabilities to trace schedule-switches together with the tool kernelshark for visualization.


For this test, a recent 3.4dev version of ArduCopter compiled for „pfxmini“ (release version) was used. Target was a Rasberry PI A+ running at 900Mhz. As before, Sensors are the MPU9250\AK8963 and MS5611 connected to the SPI bus.


The first picture shows the reading sequence of the MPU9250's gyro and acceleration values.




The sequence in detail:


1. The thread 'sched-timer' starts

2. ArduCopter calls ioctrl() of the SPIBus to read sensor data. The message is then queued in the SPI driver, and the sched-timer thread goes sleeping.

3. The kworker-thread, which belongs to the SPI driver queue starts, enqueues the message and starts the transmission together with IRQ (irq/80-bcm2708)

4. At point 4 the thread 'rcin' gets ready and starts immediately, because it has a higher priority as the SPIs kworker-thread and as a result, the SPI communication gets stuck.

5. ioctrl() calls returns

6. sched-timer finished.


Because kworker-threads are running with priorities below the real time threads, all ArduCopter threads may interrupt the SPI communication. Shown below is a worst case picture, where the timer-thread blocks for more then 14ms.




I don't know whether there are any faults in my considerations, but it exactly described my observations.


Markus





Auto Generated Inline Image 1
Auto Generated Inline Image 2

Mirko

ulæst,
12. apr. 2016, 20.54.4612.04.2016
til drones-discuss
Hi Markus,

tried to reproduce this with my BeagleBone BBBmini Kernel 4.1.21-bone-rt-r20. I can not see this timer thread-blocks here, but will look further.

Mirko


Bildschirmfoto vom 2016-04-12 22:07:52.png
Svar alle
Svar til forfatter
Videresend
0 nye opslag