Timing issues on ~master

136 views
Skip to first unread message

Jonathan Challinger

unread,
Sep 7, 2016, 4:03:01 PM9/7/16
to Randy Mackay, Andrew Tridgell, drones-...@googlegroups.com
Hey Tridge, Randy,

I've been running into very random timing issues recently. I've attached a log. The version in the log is based on 4ca1b584089b4c2511e8dc8f76fa72c3528724be. The log is of an indoor stabilize flight. It seems like the timing issues build up over a period of time and then suddenly go back to normal.

Do you have any thoughts on what might be causing this? The vehicle has LOG_REPLAY enabled.

figure_1.png
figure_2.png
figure_3.png

Andrew Tridgell

unread,
Sep 8, 2016, 7:08:03 AM9/8/16
to Jonathan Challinger, Randy Mackay, drones-...@googlegroups.com
Hi Jon,

I've seen bad timing too, but I haven't been able to reproduce it in
bench tests.

The timing errors I've seen are typically 12ms delays, once every 30s or
so. It doesn't happen in every flight. So not really the same as what
you're seeing.

What we really need first is a way to reproduce this in an environment
where we can debug it properly.

Cheers, Tridge

Andrew Tridgell

unread,
Sep 8, 2016, 11:01:21 PM9/8/16
to Jonathan Challinger, Randy Mackay, drones-...@googlegroups.com
Hi Jon,

I just tried without canbus enabled, and that seems to have reduced the
timing issues a lot. Do you have canbus enabled?

Cheers, Tridge

Jonathan Challinger

unread,
Sep 8, 2016, 11:07:29 PM9/8/16
to drones-...@googlegroups.com

Shouldn't be, not using CAN for anything yet.


--
You received this message because you are subscribed to the Google Groups "drones-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to drones-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Tridgell

unread,
Sep 8, 2016, 11:21:07 PM9/8/16
to Jonathan Challinger, drones-...@googlegroups.com
> Shouldn't be, not using CAN for anything yet.

can you check if its enabled in BRD_CAN_ENABLE anyway?

Jonathan Challinger

unread,
Sep 16, 2016, 4:29:49 PM9/16/16
to Andrew Tridgell, drones-...@googlegroups.com
BRD_CAN_ENABLE  0.000000

I'm going to add a mechanism to log more information about the scheduler, so we can be sure it isn't APM's fault.

Jonathan Challinger

unread,
Sep 16, 2016, 5:13:19 PM9/16/16
to Andrew Tridgell, drones-...@googlegroups.com
Wow, Copter's timing has been really badly neglected. I'm logging every time a scheduler task overruns, and the result for a Pixhawk sitting on a desk is this:

Task #22: {'OvrCount': 112, 'MaxOvr': 47423, 'AvgOvr': 648}
Task #25: {'OvrCount': 1926, 'MaxOvr': 5410, 'AvgOvr': 1472}
Task #30: {'OvrCount': 5, 'MaxOvr': 5219, 'AvgOvr': 1316}
Task #9: {'OvrCount': 9167, 'MaxOvr': 948, 'AvgOvr': 353}
Task #18: {'OvrCount': 458, 'MaxOvr': 946, 'AvgOvr': 502}
Task #28: {'OvrCount': 3411, 'MaxOvr': 764, 'AvgOvr': 421}
Task #14: {'OvrCount': 363, 'MaxOvr': 388, 'AvgOvr': 134}
Task #2: {'OvrCount': 24, 'MaxOvr': 379, 'AvgOvr': 249}
Task #4: {'OvrCount': 386, 'MaxOvr': 359, 'AvgOvr': 157}
Task #16: {'OvrCount': 5796, 'MaxOvr': 355, 'AvgOvr': 79}
Task #10: {'OvrCount': 839, 'MaxOvr': 341, 'AvgOvr': 135}
Task #29: {'OvrCount': 583, 'MaxOvr': 331, 'AvgOvr': 150}
Task #12: {'OvrCount': 135, 'MaxOvr': 330, 'AvgOvr': 130}
Task #34: {'OvrCount': 251, 'MaxOvr': 314, 'AvgOvr': 137}
Task #0: {'OvrCount': 134, 'MaxOvr': 313, 'AvgOvr': 161}
Task #23: {'OvrCount': 48, 'MaxOvr': 291, 'AvgOvr': 143}
Task #17: {'OvrCount': 105, 'MaxOvr': 289, 'AvgOvr': 124}
Task #3: {'OvrCount': 22, 'MaxOvr': 286, 'AvgOvr': 198}
Task #26: {'OvrCount': 98, 'MaxOvr': 253, 'AvgOvr': 106}
Task #11: {'OvrCount': 84, 'MaxOvr': 252, 'AvgOvr': 142}
Task #27: {'OvrCount': 73, 'MaxOvr': 249, 'AvgOvr': 108}
Task #31: {'OvrCount': 5, 'MaxOvr': 241, 'AvgOvr': 146}
Task #1: {'OvrCount': 111, 'MaxOvr': 236, 'AvgOvr': 107}
Task #37: {'OvrCount': 18, 'MaxOvr': 229, 'AvgOvr': 143}
Task #19: {'OvrCount': 17, 'MaxOvr': 214, 'AvgOvr': 108}
Task #32: {'OvrCount': 19, 'MaxOvr': 210, 'AvgOvr': 106}
Task #15: {'OvrCount': 23, 'MaxOvr': 194, 'AvgOvr': 119}
Task #35: {'OvrCount': 2, 'MaxOvr': 175, 'AvgOvr': 148}
Task #20: {'OvrCount': 8, 'MaxOvr': 173, 'AvgOvr': 121}
Task #36: {'OvrCount': 10, 'MaxOvr': 170, 'AvgOvr': 120}
Task #21: {'OvrCount': 31, 'MaxOvr': 166, 'AvgOvr': 78}
Task #5: {'OvrCount': 50, 'MaxOvr': 164, 'AvgOvr': 72}
Task #39: {'OvrCount': 10, 'MaxOvr': 163, 'AvgOvr': 125}
Task #13: {'OvrCount': 11, 'MaxOvr': 144, 'AvgOvr': 104}
Task #38: {'OvrCount': 4, 'MaxOvr': 139, 'AvgOvr': 99}
Task #7: {'OvrCount': 59, 'MaxOvr': 135, 'AvgOvr': 79}
Task #8: {'OvrCount': 6, 'MaxOvr': 105, 'AvgOvr': 91}
Task #6: {'OvrCount': 19, 'MaxOvr': 105, 'AvgOvr': 71}

I'll add names to the tasks, but I know the top 3:
#22 is gcs_check_input
#25 is gcs_data_stream_send
#30 is dataflash_periodic (probably to be expected given the logging load)

I'll do a histogram of gcs_check_input's run times next.

Jonathan Challinger

unread,
Sep 16, 2016, 5:27:45 PM9/16/16
to Andrew Tridgell, drones-...@googlegroups.com
Sorry, MaxOvr and AvgOvr are misleading names. They are total runtime of the task, not overrun time.

Anyway, task #22 there turned out to have two outliers: one at ~48ms and one ~1ms. The one at 48ms was right at the end when I began the log download, so that probably has something to do with it.

#25 overrunning could have to do with having USB connected.

#30 overrunning possibly has to do with my logging all the tens of thousands of scheduling overruns.

I'll go collect a loiter log.

Jonathan Challinger

unread,
Sep 16, 2016, 9:33:21 PM9/16/16
to Andrew Tridgell, drones-...@googlegroups.com
I haven't yet reproduced the timing issue seen in the log, but I did take a look at what needs to be done to the scheduler tasks table in ArduCopter:
  • All of the tasks below 300us should probably be bumped to 300us because there are interrupts that can take 200us or so. Mostly so that scheduler debugging can be useful again, rather than flooding the console with overruns.
  • rc_loop needs to be bumped to 500us
  • read_rangefinder needs to be bumped to 1400us. This is because I am using LightwareI2C, and AP_HAL_PX4::I2CDevice::read is blocking for up to 1200us.
  • one_hz_loop needs to be bumped to 1500us. Not certain why, yet.
  • gcs_check_input and gcs_data_stream_send need to be bumped to 1400us and 1700us respectively. Not certain why, yet...
  • ten_hz_logging_loop needs to be bumped to 750us
I did a test flight with all this, and despite having no large scheduler overrun during flight, there were PM messages that reported MaxT as 12ms. I looked at the timing of the main loop by looking at the timestamps on a 400hz log message, and it seems that these are occasional, single events (figure_1.png). This is in contrast to what I observed in 172.BIN, where the main loop time built up over a period of time and then suddenly snapped back to normal (figure_2.png)


figure_1.png
figure_2.png
Reply all
Reply to author
Forward
0 new messages