Re: [beagleboard] How to get a high frequency Gpio Input Sampling Rate

randy rodes

unread,

Apr 5, 2013, 10:13:26 PM4/5/13

to beagl...@googlegroups.com

On Fri, Apr 5, 2013 at 6:21 PM, Rafael Machado <rfa...@gmail.com> wrote:
> tl;dr:
> How to achieve a high sampling frequency on gpio input data reading
> (beaglebone) ?
>

Theoretically:
Best is to remain inside kernel for capturing the data at high
frequency to avoid context switch to userspace.

Fastest would be to put your code in some existing debugfs "cat
/sys/kernel/debug/pm_debug/... "
Hack some existing infrastructure.

Start a timer and print your gpio and see how well its sampling.
Next you could try with dmtimer sending some interrupt to ARM and do
the same in interrupt handler.

If you have to pump data to userspace, some kind of poll/select will
have to be implemented to wakeup userspace app when the sampling timer
expires.

>
> My goal is to collect lots of digital input samples (via GPIOs), keeping a
> rate ~200Khz.
> I'm using the latest BeagleBone.
>
>
> So far, I've tried:
>
> - usleep
> - a dummy busy waiting loop watching CLOCK_REALTIME via clock_gettime
> - watching dmtimer2 values (mmaped)
>
> Even when I comment out the lines responsible for actually using the input
> gpios
> (i.e. the sampling overhead turns out to be completely empty),
> I cannot go further a sampling period of ~30microsec (~34Khz).
>
>
> I'm currently relying expectations on one of the following:
>
> - pru
> - /dev/rtc
> - change my ubuntu arm to xenomai or some realtime patched linux
> - write a kernel module
> - write assembly
>
> I don't know yet what of the above ideas are completely absurd or worth a
> shot
> and really need some guidance or shared experiences on this subject.
>
>
> Thanks.
>
> --
> For more options, visit http://beagleboard.org/discuss
> ---
> You received this message because you are subscribed to the Google Groups
> "BeagleBoard" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to beagleboard...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Rafael Machado

unread,

Apr 8, 2013, 11:59:28 AM4/8/13

to beagl...@googlegroups.com

Thanks Randy.

So you are basically suggesting me that I should put my code into some ram-based fs, like debugfs/tmpfs/ramfs

in order to avoid file write cycles. Is this any better than mmaping my gpio's file descriptors ?

I'm not quite sure what timer should I use to collect interruptions in userspace app: dmtimer or dmtimer2 ?

I'm gonna play around with poll/select/epoll eventually. Thanks for the tip.

The main issue bothering me is that

I cannot possibly get ubuntu arm to interrupt my userspace app in periods smaller than ~30ms.

Even if I blank out the loop internals (i.e., delete the entire gpio read code overhead),

I cannot get better interruption periods, even when with dmtimer2 mmaped.

Do you think plain poll/select/epoll techniques or another (patched) distro can be of any use here ?

To clarify, an interrupt snippet in my code would be something like this (got it in some topic here.. can't find the link right now) :

    #define PERIOD 1000
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    volatile u_int32_t *dmtimer2_regs = (u_int32_t *)mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0x48040000); 

    u_int32_t t0  = 0, t1 = 0;
    dmtimer2_regs[0x3c / 4] = 0;
    while ((t1 - t0) < PERIOD) {
        t1 = dmtimer2_regs[0x3c / 4]; 
    }
    printf("delta=%u. t0=%u. t1=%u\n", t1 - t0, t0, t1);

Or something like this (via clock_gettime):

    struct sched_param sch; sch.sched_priority = sched_get_priority_max(SCHED_FIFO);
    if (sched_setscheduler(0, SCHED_FIFO, &sch) == -1) perror("Scheduling error");

    struct timespec spec_start, spec_end;
    clock_gettime(CLOCK_REALTIME, &spec_start);
    clock_gettime(CLOCK_REALTIME, &spec_end);
    printf("delta_gettime=%lf us\n", diff_nsec(&spec_start, &spec_end) / 1000.0);

Anyway, in my computer (arch linux 64b) I can get results down to 0.5us.

In the bbone (w/ ubuntu arm 32b) I cannot go further 30ms.

Thank you

Jason Kridner

unread,

Apr 8, 2013, 11:45:45 PM4/8/13

to beagl...@googlegroups.com

Under JavaScript (node.js) under Angstrom using multiple processes and messages in userspace I'm getting single digit ms responses (epoll POLLPRI). I'm kinda confused why yours is so slow.

Of course, I think we should really try to figure out what you are doing. If you are trying to count fast events, you probably want to use the eCAP or PRU hardware and not thrash the main CPU.

Thank you

Rafael Machado

unread,

Apr 9, 2013, 1:39:04 AM4/9/13

to beagl...@googlegroups.com, jason....@hangerhead.com

Thanks Jason.

This eCAP idea seems quite promising ... but I'm getting mixed information about it from the web:

- eCAP driver for capture mode not supported:

eCAP is described in http://processors.wiki.ti.com/index.php/AM335x_PWM_Driver%27s_Guide#eCAP_2

and discussed in http://e2e.ti.com/support/arm/sitara_arm/f/791/t/249460.aspx (a decently recent topic)

as not available through a driver ("The current release of the driver supports only PWM mode").

As my understanding, this would basically means to write my own eCAP capture assembly code (guided by the am3359 - technical reference).

- eCAP driver for capture mode supported:

- Although, eCAP driver is described in http://comments.gmane.org/gmane.linux.ports.arm.omap/80653 as available

and I can see the code (also very recent) for doing this in

https://gitorious.org/linux-pwm/linux-pwm/blobs/blame/bdd7cf97153d354f654379563483bdb5a774ef16/drivers/pwm/pwm-tiecap.c

Is this eCAP capture driver available in Angstrom upstream ?

Is it worth to change my embedded OS (ubuntu arm) to Angstrom (so I can get the cutting edge patches) ?

In addition to this eCAP plethora of questions, I'd like to know if is the PRU more suitable for my needs

(reading gpios at high well-controlled frequencies) ?

I'm just getting started with embedded systems and don't know yet how to evaluate this decision (PRU vs eCAP).

Thank you very much.

Chris Micali

unread,

Apr 10, 2013, 4:40:38 PM4/10/13

to beagl...@googlegroups.com, jason....@hangerhead.com

I'm using the PRU to do something like this now at rates between 1-2MS/s (16-bit samples.) It has been non-trivial though.. prepare for some work if you end up going down this route. That said, apart from eCAP (which I don't know much about) I could not find another way to do this.

Rafael Machado

unread,

Apr 11, 2013, 12:12:26 AM4/11/13

to beagl...@googlegroups.com, jason....@hangerhead.com

Chris,

This just gave some hope. At least I know it is possible to do this with a Beaglebone.

Do you have any sources or materials to share with me

besides the resources on https://github.com/beagleboard/am335x_pru_package

I still don't know yet if I should go through this route or the eCAP one

(I also didn't try yet to put xenomai or some RTOS over my current ubuntu arm)

Thank you

Chris Micali

unread,

Apr 11, 2013, 1:56:36 PM4/11/13

to beagl...@googlegroups.com, jason....@hangerhead.com

Rafael,

This blog post helped me a lot: http://blog.boxysean.com/2012/08/12/first-steps-with-the-beaglebone-pru/

I've also posted a DDR example here: https://github.com/sagedevices/am335x_pru_package/tree/master/pru_sw/example_apps

Also the TI wiki has been helpful: http://processors.wiki.ti.com/index.php/Programmable_Realtime_Unit and http://processors.wiki.ti.com/index.php/PRU_Assembly_Instructions

http://hipstercircuits.com/ has had a couple great PRU posts also

-c

Rafael Machado

unread,

Apr 11, 2013, 11:26:01 PM4/11/13

to beagl...@googlegroups.com, jason....@hangerhead.com

Thanks Chris.

I'm certainly going to dig down those links.

In additional to that, I'd like to put another complementary question in this this thread:

The 30us bottleneck I mentioned in the very first message is actually T=30.5717 us

Well ... T^-1 = 32.768KHz, which is somewhat of a canonical number, mentioned several times in the TRManual (http://www.ti.com/lit/ug/spruf98x/spruf98x.pdf ),

such as precisely the de-bouncing timer frequency for a given GPIO in input mode.

Do you guys have any though on this ? It cannot be just mere coincidence.

@Chris:

Are using PRU to control how many input gpio ?

I think I read in another topic of a certain 48 gpio limit.

Thank you

Rafael Machado

unread,

Apr 12, 2013, 10:25:59 AM4/12/13

to beagl...@googlegroups.com, jason....@hangerhead.com

ops... wrong technical reference link

http://www.ti.com/litv/pdf/spruh73g is the correct one

Rafael Machado

unread,

Apr 13, 2013, 12:05:14 AM4/13/13

to beagl...@googlegroups.com, jason....@hangerhead.com

Hi. I've (re)tried the experiment.

This time I'm reading values in the scope instead of relying in plain software sampling time calculation (nanosleep, clock_gettime, etc).

I'd like to share some results.

Scope Images: http://share.pho.to/1oteV

The experiment idea is quite simple:

An input gpio is reading an square wave externally generated (scope CH2 - blue wave)

and another output gpio is configured to "mirror" this input just read (this gpio is connected to scope CH1 - yellow wave).

Conceptually, there are just two consecutives code lines inside of a main infinite loop.

The conclusions are as following:

- The initial overhead to read the square wave and write back to another gpio is always >=760ns

- For each additional input code line before the output code line (i.e., we are increasing the gpio input payload) there is an observed 80ns increase in the phase difference of CH2 and CH1.

This 80ns increase is pretty much constant across experimentations (i.e, for an additional 4 inputs, there is an ~300ns increasing).

- Any external square wave (CH2) of f<500KHz is captured and mirrored back to CH1 with

a good frequency match (approximately the same as the inputted square wave freq).

- Anything beyond that and the sampling is compromised (frequency mismatch, high level for too long, low level for too long, etc).

- For any input square wave (CH2) with f<=200Khz, the phase difference (between CH1 and CH2) is small (<25% of total period time).

- For anything beyond that freq value, phase difference is >25% of total period time.

Now, the most intriguing observed fact:

- The time values absorved by the software are always >=30.571 us (32.768Khz).

This sampling period is obsviouly wrong, since I've read it the actual value on the scope.

Chris Micali

unread,

Apr 14, 2013, 2:27:42 PM4/14/13

to beagl...@googlegroups.com, jason....@hangerhead.com

Rafael,

I'm using about 10 pins out for the PRU I think, mostly outputs but a couple inputs. The beaglebone only has a subset of the PRU pins brought out and available but i think you could get up to 16-20 PRU I/O pins.. don't fully remember. You can find the exact # by using the TI Pinmux Tool and the beaglebone SRM to see which pins are available on beaglebone headers

-c

randy rodes

unread,

Apr 15, 2013, 1:02:06 PM4/15/13

to beagl...@googlegroups.com, jason....@hangerhead.com

can anyone share some _REAL_ code wrt PRU work with GPIO sampling you are doing?
I am quiet interested to see how it works and may be able to make use
of it as well.

thanks in advance
Randy

Reply all

Reply to author

Forward