--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/eg05dc5tkbdb1ae3up4aoi92h9er1lq1sn%404ax.com.
For more options, visit https://groups.google.com/d/optout.
Note you might need an -rt or -xenomai kernel to achieve reliable
operation, I've seen the non-rt kernels occasionally "wander off into
the weeds" for several hundred mS at a time.
--
Charles Steinkuehler
cha...@steinkuehler.net
I thought using select() to wait for notification of an event (by "listening" to the fsys uio files) would free the ARM cpu to do other things while waiting, but provide the most immediate path to the user space application to send more data. Is there a better way?
OK, I will use the busy-wait loop w/ usleep and test. The reason I used select was I thought it would allow me to do other things (I need to have another process, thread, or loop in this same application serving out audio data to another client, synchronized with this data). My understanding was that the process blocking on select() to return would free the CPU for other things, but allow a quick wake-up to refresh the buffer as needed.
BTW, I have only mentioned the problems - but it does almost work. In my tests, I ran 12,500 4KiB buffers from ARM to PRU and measured (on the PRU side, using the precise CYCLE counter) to see if the PRU ever had to wait for the next buffer fill. Turns out that the PRU had to wait about 180 times, or about 1.5% of the buffer fill events. The worse case wait (stall) time was ~5milliSeconds.
@William Hermans I thought I'd share the result of my efforts to reliably stream data from ARM host (Linux userspace) to PRU.I instrumented the PRU ASM code to use the CYCLE register for very precise measurements. I ran tests that kept track of how many times, for how long, and the "worst offender" when the PRU was stalled waiting for data from the ARM host. I used this to test my current implementation using select(), and then replaced select() with usleep() (and nanosleep()), and then again a loop with no sleep, just a brute-force busy wait that never released the CPU. As it turns out, the results were surprising. Using usleep() (and similar related methods), the number of stalls, the overall stall time and the worst-case stall time were all significantly worse than the implementation using select(). Even the busy wait loop w/out sleep() was worse. I did a bit of research and sleep() and related methods are implemented using a syscall (sleep - used to use alarm in the olden-days (so I read)). So getting through the call gate and the context swap happens with sleep() just like with select(). My theory is that select() is more efficient precisely because of this: one call to select() incurs one system call/context swap per interrupt. The process is put on the NotRunning list, and the the OS continues on. When a trigger event happens, the OS returns the process to the Running list and then control back to user space. For the sleep() method, there are many calls per "interrupt", polling some memory location looking for the signal from the PRU. So what is handled by one userspace->kernelspace->userspace transition with select() could require dozens of these transitions using sleep().I don't claim to be an expert, and if there is a flaw in this theory I'm open to hearing what it is. But this is my theory at the current moment.
So what I ended up doing is compress the data so that one "frame" can fit in PRU memory at once. The PRU needs to send a full "frame" out with precise timing (within microsecond timing) for all data in that frame. Between frames, there is slack. By compressing the data, I can load a full frame into the PRU0/1 DRAM and shared RAM, and then kick off writing out the frame. Now everything is (or appears to be) deterministic in the timing of all transfers between registers, scratch and PRU DRAM. So I've sidestepped the problem of unpredictable latency waiting for data from the ARM host.I hope this might help someone else with similar requirements.
On a bare-metal microcontroller, sleep() is a busy loop but in Linux
sleep/usleep/nanosleep() results in a system call, which explains the
latency differences. BTW, a busy loop on Linux could still be
interrupted and result in latency.