I used to have a imx8-based
UDOO board in my living room, I'm very familiar with that SoC :)
I think here you should go in layers:
1. First of all I'd start looking at sched_switch + sched_waking events, to understand if the nature of the latency is CPU-bound or blocking/contention bound
2. If it's CPU bound, you can use callstack sampling. You can generate configs via tools/cpu_profile (see
docs). However I don't recall what it takes to run it on Linux rather than Android.
+Ryan Savitski here
3. If it's latency bound, you need to follow the sched_waking events (the UI shows arrows) and use the "critical path lite" feature to figure out where it's blocked
It could be too many different things. Could be that you are blocked waiting for a GPU fence (the GPU on that chipset isn't really great honestly).
You can try a callstack sampling config based on sched_switch, so you get a sample precisely whenever every thread switches in/out, rather than just every xx ms. Ryan can help you with that (Ryan: we should put it in the docs / FAQ, It' the 3rd time this year that I end up discussing this with somebody).
I honestly put most of my bets on the GPU driver ;)
Good luck with your project!