Capture one Analog channel in continuous mode at near maximum speed?

312 views
Skip to first unread message

Stewart

unread,
Jun 20, 2016, 3:29:43 PM6/20/16
to BeagleBoard
I'm looking to write a simple app for BBB.  When started from the command line, it would set up the ADC in continuous mode and read ~1 M samples from e.g. AN0 into memory.  After the capture is complete, it would write the data to a file and exit.

Ideally, it would run at the hardware limit of 1.6 MSPS (15 cycles of 24 MHz adc_clk per sample).  If that's not practical, 800 KSPS or better would be acceptable.

What is an easy way to do this?  Most Beaglebone ADC examples sample at kilohertz rates or slower.

This guide: http://processors.wiki.ti.com/index.php/Linux_Core_ADC_User%27s_Guide speaks of 200 KSPS.  What is the limitation here?

I've seen various suggestions to use the PRU, but don't understand why.  I would think that since DMA would be required anyway, there should be no requirement to otherwise access the hardware with tight timing.  If PRU is indeed necessary, is there a suitable example or tutorial?  (None of the libpruio built-in examples deal with rapid sampling or large amounts of data.)

Any other ideas for a simple way to capture data fast will be gratefully appreciated.

Thanks.

John Syne

unread,
Jun 20, 2016, 4:51:24 PM6/20/16
to beagl...@googlegroups.com
I have been working on adding DMA to the ADC driver, but it currently it throws overflow errors before DMA starts. The DMA should trigger when the ADC fifo reaches a predefined threshold, but for some reason there is a delay before DMA triggers. The ADC driver uses the IIO framework and I’m using their experimental DMA buffer framework which has its share of issues. I’m trying to diagnose the error by replicating the setup in Starterware.  Unfortunately the CCS debugger isn’t all that helpful so now I’m trying to get my Lauterbach working Starterware, but I have to translate the CCS GEL scripts to Lauterbach Practice scripts. 

Regarding the sampling rate, the datasheet does specify 200ksps, but if you setup the sample delay, open delay, etc, it should be possible to achieve something like 1.5msps, but I haven’t been able to verify this yet. 

Regards,
John




--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/aeaf9852-fb4c-4fd1-9594-8aad0ad5fd3c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

William Hermans

unread,
Jun 20, 2016, 5:53:47 PM6/20/16
to beagl...@googlegroups.com
The ADC module is a 200ksps SAR module . . .You're only going to be able to sample 200k samples per second . . .

Additionally you can use:

  1. PRUs ( Programmable Real-time Units )
  2. IIO ( industrial IO )
  3. /dev/mem/ + mmap()

To read 200ksps. Personally, I've proven that /dev/mem + mmap() can work for reading 200ksps for 7 channel simultaneously. But CPU usage is so high, that you're not going ot be able to do a whole lot more in addition to reading the ADC in this fashion. Hence, the PRU are best used, as this offers hardware offload( very little CPU load needed - and only when reading values out  ).

John Syne

unread,
Jun 20, 2016, 6:21:36 PM6/20/16
to beagl...@googlegroups.com
You should read the info Steward provided. There is a conflict in the AM3358 datasheet because it does say max 200ksps, but in the register settings, it does show you can configure the ADC for 1.6msps. There are discussions on E2E about this issue and no one from TI has said you cannot achieve 1.6msps. 

Regards,
John




snelso...@gmail.com

unread,
Jun 20, 2016, 6:35:08 PM6/20/16
to BeagleBoard
Wow, I think that /dev/mem + mmap() is the easy solution!  Many thanks.

But can you please confirm that by "200ksps for 7 channel simultaneously" you mean that each of the 7 channels is being sampled 200k times per second.  If so, that's 1.4M captures per second, enough for my project.

> But CPU usage is so high, that you're not going to be able to do a whole lot more in addition to reading the ADC in this fashion.

This is essentially a lab experiment.  The program only needs to run correctly once (though it will take many executions to get all the external factors right).  On each run, the BBB has no other tasks to perform.  It won't start writing the results to a file until after the capture is complete.  If CPU usage during the capture is 95%, that's fine.  If it's 105%, I'll slow the sample rate a bit to avoid losing samples.  But if it's 250%, then I'm in trouble.

If not missing samples requires running in single-user mode with no networking and only a serial console, that's IMO a minor nuisance compared to learning about PRU, etc. and debugging a much more complex program.

evilwulfie

unread,
Jun 20, 2016, 6:39:09 PM6/20/16
to beagl...@googlegroups.com
um no the 7 channels go into a analog mux and you can only have one selected input at a time.

John Syne

unread,
Jun 20, 2016, 6:48:43 PM6/20/16
to beagl...@googlegroups.com
You don’t read the IIO driver that way. You read the samples from /dev/iio:device0. You can modify the open delay, sample delay, in the devicetree, but because the current driver uses interrupts, the cpu utilization increases as you increase the sample rate. Also, the convertor is specified as 200ksps, but there is only one converter for all channel which use a multiplexor to select each ADC channel.  So if you use 2 channels, then each channel is sampled at 100ksps. 

Regards,
John




John Syne

unread,
Jun 20, 2016, 6:59:19 PM6/20/16
to beagl...@googlegroups.com

snelso...@gmail.com

unread,
Jun 20, 2016, 7:40:01 PM6/20/16
to BeagleBoard
I'm well aware of the analog mux and fully understand that if N channels are being sampled, the maximum sample rate for each channel is reduced by a factor of N.

However, Mr. Hermans, who clearly knows what he is talking about, stated "I've proven that /dev/mem + mmap() can work for reading 200ksps for 7 channel simultaneously."
It's reasonable to assume that he would not have included the "7 channel simultaneously" part unless it was somehow relevant to the performance he observed.

And, if the ADC is actually capable of doing conversions at a 1.6 MHz rate, then it could sample all 8 channels at 200KSPS each, and maybe that's what Mr. Hermans was observing.

William Hermans

unread,
Jun 20, 2016, 7:45:04 PM6/20/16
to beagl...@googlegroups.com
However, Mr. Hermans, who clearly knows what he is talking about, stated "I've proven that /dev/mem + mmap() can work for reading 200ksps for 7 channel simultaneously."
It's reasonable to assume that he would not have included the "7 channel simultaneously" part unless it was somehow relevant to the performance he observed.

You're not understanding correctly . . . the ADC module is rated for 200ksps only. Not 1M, not 2M, 200ksps.

William Hermans

unread,
Jun 20, 2016, 7:55:03 PM6/20/16
to beagl...@googlegroups.com
Page 3:

– 12-Bit Successive Approximation Register
(SAR) ADC
• 200K Samples per Second
• Input can be Selected from any of the Eight Analog Inputs Multiplexed Through an 8:1 Analog Switch
• Can be Configured to Operate as a 4-Wire, 5-Wire, or 8-Wire Resistive Touch Screen Controller (TSC) Interface

John Syne

unread,
Jun 20, 2016, 8:05:55 PM6/20/16
to beagl...@googlegroups.com
The ADC uses a sequencer and you can only read the sample when the samples are complete. The sequencer provides an interrupt or triggers a DMA transfer when the EOC (End of Conversion) is achieved. Since you cannot service interrupts in userspace, you will have to poll repeatedly to wait for the conversion to complete. Given the non deterministic nature of Linux, you will miss some of the conversions when you processor utilization is high. Polling itself will cause high CPU utilization. Hence the /dev/mem + mmap() won’t work. 

Regards,
John




snelso...@gmail.com

unread,
Jun 20, 2016, 8:06:16 PM6/20/16
to BeagleBoard
> You're not understanding correctly . . . the ADC module is rated for 200ksps only. Not 1M, not 2M, 200ksps.

So your "7 channel simultaneously" comment had nothing to do with the total sample rate, but merely indicates that this 200 KSPS rate was achieved even though some additional overhead (e.g. distributing the captured data into 7 different buffers) was present.  Is that correct?

But in that case, I'm confused by john3909's link, wherein a TI spokesman stated that the 200 kSPS value is for 3 MHz ADC clock, though the maximum ADC clk frequency is 24 MHz.  A further clarification was that the 24 MHz ADC clk could only be achieved when the master oscillator is 24 MHz, but I confirmed at https://github.com/CircuitCo/BeagleBone-Black/blob/master/BBB_SCH.pdf?raw=true that Y2 is indeed 24 MHz on production boards.

William Hermans

unread,
Jun 20, 2016, 8:18:36 PM6/20/16
to beagl...@googlegroups.com
Stop second guessing what I mean by what I type, and just read that dammed datasheet. I even pasted a quote of the relevant information, and it's quite clear.

John Syne

unread,
Jun 20, 2016, 8:20:44 PM6/20/16
to beagl...@googlegroups.com
Currently the ADC driver is configured for 16x oversample, Open Delay = 152 cycles and Sample Delay = 1 cycles.


 time in us for processing a single channel, calculated as follows:

 num cycles = open delay + (sample delay + conv time) * averaging

 num cycles: 152 + (1 + 13) * 16 = 376

 clock frequency: 24MHz / 8 = 3MHz
 clock period: 1 / 3MHz = 333ns

 processing time: 376 * 333ns = 125us



Regards,
John




snelso...@gmail.com

unread,
Jun 20, 2016, 8:27:04 PM6/20/16
to BeagleBoard
Thanks for the link.  Given the recent update to the data sheet, it seems pretty likely that the 200 kSPS total sample rate limit is correct, even though that conflicts with both the (older) TRM and the comments in 2014 by Biser Gatchev-XID .  So, it appears to be a waste of time to attempt 800 kSPS+.


Sorry to have bothered you all,
Stewart

John Syne

unread,
Jun 20, 2016, 8:36:28 PM6/20/16
to beagl...@googlegroups.com
Well, I think that is still to be determined if the ADC can sample higher than 200ksps and that is why I’m adding DMA to this driver. But in any case, why not use a higher speed ADC connected via SPI or McASP? You can purchase boards that sample at a higher rate and connect those to the BBB. 

Regards,
John




William Hermans

unread,
Jun 20, 2016, 8:50:02 PM6/20/16
to beagl...@googlegroups.com
So going back to what I said earlier. When using /dev/mem + mmap() yes, I was (actually ) reading more than
 200ksps from 7 channels simultaneously. For a total of somewhere around 1.5Msps

***********BUT***********

Only 1 sample in ~8-9 per second was valid. So what I proved is possible is completely different from what is accurate / possible for the AM335x's on die ADC module.

John Syne

unread,
Jun 20, 2016, 9:06:22 PM6/20/16
to beagl...@googlegroups.com
That is a totally different issue. You were reading the same sample over and over again as opposed to increasing the sample rate by changing the clock divider, open delay, sample delay, etc. In any case, at 200ksps, each sample occurs every 5uS. How is a user space app going to process samples at 5uS? Even when you poll for the EOC with 8 channels configured, you still have to service the samples every 40uS and that is still not possible from a userspace app. 

My guess is the app you had was reading the same sample at a higher rate than 1.5msps and then the scheduler switched out your app to service background tasks and then return a while later and then your app would read the same sample again. The average sample rate would then result in 1.5msps. Not a good idea. You should enable the channel ID to see when you miss samples. 

Regards,
John




--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.

ybea...@rehut.com

unread,
Jun 20, 2016, 9:14:26 PM6/20/16
to beagl...@googlegroups.com
The ADC is a SAR ADC. Overclocking it is likely to have side effects on the
analog end of things which will reduce accuracy.

Ignoring that it is a very bad idea to keep doing things in userland:

On Monday, June 20, 2016 18:06:09 John Syne wrote:
> That is a totally different issue. You were reading the same sample over and
> over again as opposed to increasing the sample rate by changing the clock
> divider, open delay, sample delay, etc. In any case, at 200ksps, each
> sample occurs every 5uS. How is a user space app going to process samples
> at 5uS? Even when you poll for the EOC with 8 channels configured, you
> still have to service the samples every 40uS and that is still not possible
> from a userspace app.

You could use the FIFOs to reduce the read timing accuracy but you still risk
getting scheduled away for (comparatively) long period of time. This probally
means you will also need to check the status for FIFO overflows and....

>
> My guess is the app you had was reading the same sample at a higher rate
> than 1.5msps and then the scheduler switched out your app to service
> background tasks and then return a while later and then your app would read
> the same sample again. The average sample rate would then result in
> 1.5msps. Not a good idea. You should enable the channel ID to see when you
> miss samples.

It is not just missing sames but what is the overall accuracy of the data when
the ADC module is overclocked? Does it appear anywhere near the 12bits of
accuracy?
>
> Regards,
> John
>
> > On Jun 20, 2016, at 5:49 PM, William Hermans <yyr...@gmail.com> wrote:
> >
> > So going back to what I said earlier. When using /dev/mem + mmap() yes, I
> > was (actually ) reading more than>
> > 200ksps from 7 channels simultaneously. For a total of somewhere around
> > 1.5Msps>
> > ***********BUT***********
> >
> > Only 1 sample in ~8-9 per second was valid. So what I proved is possible
> > is completely different from what is accurate / possible for the AM335x's
> > on die ADC module.

--
Hunyue Yau
http://www.hy-research.com/

William Hermans

unread,
Jun 20, 2016, 9:32:39 PM6/20/16
to beagl...@googlegroups.com
When you're writing directly to memory addresses ( registers ), you can't tell me what is, and what isn't. *This* is exactly what the PRU does when accessing peripherals modules. But, you'd be surprised what you can accomplish from userland when  you pay close attention to what you should *not* do in order to remain performant.

Anyway, reading from a memory location, and putting that value into another location does not really take a lot of computational power, and then if you're using an rt kernel. The scheduler is going to run in a tighter loop, offering lower latency. But again, you have to be smart how you go about things. Printing every, or even *any* samples to stdout will slow thigns down considerably. Also, if you use a lot of API calls that have to go back and forth to / from the kernel . . . that's going ot slow things down considerably also. etc . . .

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.

John Syne

unread,
Jun 20, 2016, 9:36:51 PM6/20/16
to beagl...@googlegroups.com
You make a very good point and we have no idea what the ADC front end bandwidth is. You would think that TI would add some provisions in the ADC setup that can be whatever you want as long as you do not exceed 200ksps. The 200ksps I believe comes from a setting where clock divider = 8, open delay =1, sample delay = 0 and oversample = 1x. The fact that they allow a 24MHz clock, and no warning is interesting. This was also discussed on E2E and no one from TI shut the idea down.

Anyway, if I can get the DMA to work, then we can test the concept ;-) If nothing else, at least we get a proper 200ksps ADC working on the ARM directly. No need for PRU.

Regards,
John
> --
> For more options, visit http://beagleboard.org/discuss
> ---
> You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/33331620.iAXeZpr9Qk%40acer0.

John Syne

unread,
Jun 20, 2016, 9:43:57 PM6/20/16
to beagl...@googlegroups.com
The latency in the RT kernel is only relevant for interrupts (preemption). The scheduler will still swap out user space apps to service background tasks and that is the same as the regular kernel. In any case, the ADC uses a sequencer to do the samples and those samples get stored in the fifo, so you have to wait until the fifo count is greater than a certain number and then read until the fifo is empty and poll again for samples. From what I recall, you were just reading the samples from the IIO driver. I don’t believe you had code to setup the sequencer, or maybe I’m wrong?

Regards,
John




William Hermans

unread,
Jun 20, 2016, 10:05:12 PM6/20/16
to beagl...@googlegroups.com
I never shared my /dev/mem + mmap() code with the group / publicly. For what should be obvious reasons. I fact I do not think I shared it with anyone. My reasoning is that if someone can figure out how to do this on their own, then they probably earned the right, and with that comes knowledge and responsibility.

As far as interrupts and rt. That does not matter. Interrupts happen all the time, and the more you have, for whatever reason, the slower your application will be. Regardless if your app generated that interrupt or not.

So if i understood what someone was talking about concerning remoteproc not long ago. Moving interrupts to userspace . . . would be a very very bad idea. You'd get all kind of context switching between processes / interrupts fighting for time. Then jumping into and out of kernel space, potentially copying data . . . yeah, very very bad idea.

William Hermans

unread,
Jun 20, 2016, 10:27:39 PM6/20/16
to beagl...@googlegroups.com
I seemed to have lost a post here that I made. *Somehow* . . .

Anyway, I never shared my /dev/mem + mmap() code with anyone, and I will never post it on this group. So no one here would know what I've done in code concerning that. My reason is simple. In order to use code of that nature, you need to earn the right to do so, and hopefully have an understanding of could happen if you're not careful.

Most of that code is peripheral setup, and the rest is simply reading from the ADC buffer, and then printf() to screen. However, in order to get the best performance you never let that data get put on screen by piping the output of the executable to a file. *That* increases performance drastically, and is a happy medium for not having to write a bunch of read() / write() / open() calls for a simple test. Perfect ? Who cares, I never did. I proved a point to myself, and that is all that matters to me. I proved to myself that /dev/mem/ + mmap() works fine, and if you have an application that does not need to spend a lot of CPU time doing things. Then it would work fine. As it is. Reading from the ADC multiple channels as fast as you can. Should probably be done using the PRUs. Simply so you can use that CPU time saved to do other things. Perhaps even display that data to the outside world from a web server.

Interrupts. They happen, and frequently. So it does not matter if your app generated interrupts or not. Your app will constantly be interrupted by them. So if you're using an rt kernel, "return latency" will be less. Meaning, your app should be able to get things done faster.

Which brings me to another point. I hope I was misunderstanding someone earlier this week talking about remoteproc and bringing interrupts to userspace. *That* would be a terrible idea, and would generate all kinds of context switching between userland, kernel space, processes, interrupts, copying data to / from kernel space. . .  yeah it would be a bloody mess. But you know what. That will just give me another reason to avoid what I'm already avoiding now.  So, for me, no big deal. I guess.

William Hermans

unread,
Jun 20, 2016, 10:49:10 PM6/20/16
to beagl...@googlegroups.com
BY the way, when I say read from the ADC buffer. I do not mean that piece of garbage /dev/iio:device0. I mean the ADC hardware buffer. FIFO0DATA described on page 1095 of the TRM.

John Syne

unread,
Jun 21, 2016, 2:22:36 AM6/21/16
to beagl...@googlegroups.com
Ah, I thought you were talking about this solutions:


Otherwise, you would have to replicate much of the ADC driver in userspace and then loop, waiting for FIFO0COUNT>0, read samples from FIFO1DATA until FIFOCOUNT-0 and doing this in a way that doesn’t hog the CPU but still be fast enough to overflow the FIFO. At 1.5msps, you would have to do this in less than 21uS assuming a average FIFO Count of 32. I just don’t see ti, but maybe you have a trick that I don’t know about ;-)

Did you enable StepID? This way you can see if you missed any samples. 

Regards,
John




TJF

unread,
Jun 21, 2016, 4:36:42 AM6/21/16
to BeagleBoard
Hello Stewart!


Am Montag, 20. Juni 2016 21:29:43 UTC+2 schrieb Stewart:
(None of the libpruio built-in examples deal with rapid sampling or large amounts of data.)

Next libpruio version contains an example called rf_file, which will exactly do want you target. It uses the ring buffer mode to fetch data and writes them to file(s). It will be published when kernel development (>4.x) isn't experimental any more and I can start to write install instructions.

But you can use it today. See the following thread for details: https://groups.google.com/forum/#!topic/beagleboard/kxxucJAci2c

Regarding sampling rate:

I was able to sample at 1.6 Mhz (one channel). But I wasn't able to fetch the samples from the FiFo at that speed. That means the number of samples is limited to 256. (I didn't try DMA yet.)

The maximum transfer rate for the FiFo is different on each chip. My boards reach about 240-250 ksps. I guess TI specifies 200 ksps to be on the save side. Targeting 800 ksps is a waste of time, for that target you'll need additional hardware.

You can tune libpruio for higher sampling rates by adapting the number 5000 to your needs (period in [ns]) in the code of the  following link https://groups.google.com/d/msg/beagleboard/kxxucJAci2c/5nwXwyXZJQAJ

BR
Reply all
Reply to author
Forward
0 new messages