Behavior of Ring Buffer with Respect to mmap()

242 views
Skip to first unread message

BJ Dillon

unread,
Dec 12, 2017, 5:17:15 PM12/12/17
to Comedi: Linux Control and Measurement Device Interface

I have a piece of C code that copies the contents of comedi's ring buffer to a file-backed memory space (mmap). I am seeing some strange behavior when using  comedi_get_buffer_contents() and comedi_mark_buffer_read().  I was hoping to confirm the expected behavior for these functions. 

The intended design of my code is to spawn a thread that periodically calls comedi_get_buffer_contents(). On new data appearing in the ring buffer, this thread copies that data to the memory map's address space. On completing the copy, the thread returns to periodically calling comedi_get_buffer_contents() until new data appears. For my application, this loop needs a low execution profile, so calls to things like comedi_mark_buffer_read() need to be restricted in the number of times they are used during this loop. Presently,  comedi_mark_buffer_read() is called once per ring buffer fill. In other words, I'd like my code to reset the read offset when comedi reaches the last valid address of the buffer -- think of a carriage return on a typewriter.  

Comedi may overwrite the first few bytes of the ring buffer by the time my code has re-started copying, but because the entire ring buffer has already been copied, no data is lost. 

Here's my trouble... This code has worked well on an older P4 / debian 7 system. However when I migrated this code and pci daq card to a xeon / debian 9 system my code dies on account of comedi_get_buffer_contents() returning a -1 after  comedi_mark_buffer_read() is called with the size of the ring buffer. 

Here are my questions:  Has anything changed in the implementation of the ring buffer system between 2013-ish and now? Secondly, what is the expected behavior of these two functions used in this way -- does / should comedi detect ring buffer overwrites / lost data? 

Thanks, everyone.


Ian Abbott

unread,
Dec 13, 2017, 6:11:30 AM12/13/17
to comed...@googlegroups.com, BJ Dillon
There have been various changes to the buffer handling ioctls since
2013, so the behaviour will have changed a bit.

I wouldn't recommend calling comedi_mark_buffer_read() only every
"buffer size" bytes, as that would make leave the buffer vulnerable to
overflow errors because the part of the buffer that hasn't been marked
as read yet cannot be filled with new data.

One option would be call comedi_mark_buffer_read() slightly more often
so that the buffer never fills up completely.

Another option is to roll your own function that combines the
functionality of comedi_get_buffer_contents() and
comedi_mark_buffer_read(), since they both use the same comedi ioctl
internally. Here is one possible (untested) function:

int mark_read_and_get_new(comedi_t *it, unsigned int subdev,
unsigned int mark_read_bytes,
unsigned int *get_new_bytes)
{
int ret;
int fd;
comedi_bufinfo bi;

fd = comedi_fileno(it);
if (fd == -1)
return -1;
memset(&bi, 0, sizeof(bi));
bi.subdevice = subdev;
bi.bytes_read = mark_bytes_read;
ret = ioctl(fd, COMEDI_BUFINFO, &bi);
if (ret < 0)
return -1;
/* get new contents */
*get_new_bytes = bi.buf_write_count - bi.buf_read_count;
/* return amount marked read */
return bi.bytes_read;
}

The above function returns the same value as comedi_mark_buffer_read().
The 'mark_read_bytes' parameter can be set to 0 if you don't want to
mark anything as read. As a side-effect, it sets the '*get_new_bytes'
parameter to what would be returned by comedi_get_buffer_contents().
There are some slight differences in error handling, because it does not
update the "comedi errno" returned by the comedi_errno() function.
However, it does update the standard C 'errno'.

--
-=( Ian Abbott @ MEV Ltd. E-mail: <abb...@mev.co.uk> )=-
-=( Web: http://www.mev.co.uk/ )=-

BJ Dillon

unread,
Dec 21, 2017, 11:21:03 AM12/21/17
to Comedi: Linux Control and Measurement Device Interface
Hello Ian,

Thank you so very much for your quick response. And sorry about my tardy one -- I was at a conference most of last week.

I've had a chance to tinker around with this a bit more and my code was fairly easily modified with the insights you provided. Again, thank you.

However, I have come across another complication with the buffer that I can't seem to understand and I was hoping I could bend your ear for a second time?

It appears as though comedi is selectively dropping a few records from each buffer update. The number of records that are dropped seems to depend on the number of channels I am sampling.

As I mentioned briefly in my last email, my code runs in a loop. Each cycle checks comedi_get_buffer_contents(). If there is a buffer update, the contents are read out and (now) instantly marked as read using comedi_mark_buffer_read(). If there is no update, the loop sleeps for a bit, wakes up, and rechecks comedi_get_buffer_contents().

This method seems to work very well using one or two channels. In these cases it appears that comedi_get_buffer_contents() regularly returns "4096" as the number of bytes ready to be read. This would seem to make sense as the page size on my system is 4k. However when sampling from three, five, six, or seven channels comedi_get_buffer_contents() consistently returns 4092, 4090, 4092, or 4088 bytes, respectively. Which wouldn't worry me so much, other than it appears that the bytes between the number reported by comedi_get_buffer_contents() and 4096 represent records that are not being reported / copied into the buffer / recorded?  As a result,  when plotting the time histories of each channel, after each 409x bytes, the plot lines shift up by a number of channels that corresponds to the number of missing bytes. For example in the case of sampling from three channels (4092 byte buffer updates)  the signal measured by channel three (physical) gets put into the array representing channel one (logical). Channel two (physical) is put into channel three (logical) and channel one (physical) is put into channel two (logical). This permutation (or maybe rotation?) continues for as long as data is being sampled -- every ~4k bytes. 

To add to this strangeness, if I request 61,440 records from three channels, comedi will give me 61,440 records, but the last 20 records (modulus of the "missing" bytes) are just extra channel samples.

At first I thought this was some mishandling of the buffer copy process or implementation of the comedi_mark_buffer_read() function on my part. To test this, I set the buffer to its maximum value, started a recording, waited until a large number of records built up (many 4k sets) in the buffer and then read the buffer sequentially from start to finish in one go (just address++ from 0 to to comedi_get_max_buffer_size()). This method results in the same behavior as described above - after each ~4k bytes a few records go "missing" and the lower channels shift up by a number that corresponds to the number of "missing" bytes.

Does this seem to make any sense to you? Can you recommend any other tests to confirm this behavior?   BTW: I'm using a MCC PCI-DAS6402/16 card. 
 
Thanks,
Brandon

DSC_9107_mod.JPG

BJ Dillon

unread,
Feb 12, 2018, 12:46:04 PM2/12/18
to Comedi: Linux Control and Measurement Device Interface
Just checking in on this. Any thoughts? 

Thanks
-Brandon

Frank Mori Hess

unread,
Feb 12, 2018, 3:35:40 PM2/12/18
to comed...@googlegroups.com
On Thu, Dec 21, 2017 at 11:21 AM, BJ Dillon <bdi...@vt.edu> wrote:
> This method seems to work very well using one or two channels. In these
> cases it appears that comedi_get_buffer_contents() regularly returns "4096"
> as the number of bytes ready to be read. This would seem to make sense as
> the page size on my system is 4k. However when sampling from three, five,
> six, or seven channels comedi_get_buffer_contents() consistently returns
> 4092, 4090, 4092, or 4088 bytes, respectively. Which wouldn't worry me so
> much, other than it appears that the bytes between the number reported by
> comedi_get_buffer_contents() and 4096 represent records that are not being
> reported / copied into the buffer / recorded? As a result, when plotting
> the time histories of each channel, after each 409x bytes, the plot lines
> shift up by a number of channels that corresponds to the number of missing
> bytes. For example in the case of sampling from three channels (4092 byte
> buffer updates) the signal measured by channel three (physical) gets put
> into the array representing channel one (logical). Channel two (physical) is
> put into channel three (logical) and channel one (physical) is put into
> channel two (logical). This permutation (or maybe rotation?) continues for
> as long as data is being sampled -- every ~4k bytes.
>
> To add to this strangeness, if I request 61,440 records from three channels,
> comedi will give me 61,440 records, but the last 20 records (modulus of the
> "missing" bytes) are just extra channel samples.

Are you using TRIG_WAKE_EOS? That flag determines whether the driver
uses dma transfers, I would guess you are not using it. You could try
setting it to see if the problem is being caused by DMA. Also, does
your command ask for a fixed number of samples? DMA transfers should
occur in fixed size chunks of 4096 bytes for your board, unless you
asked for a specific number of scans and it is transferring the last
chunk of data.

Frank Mori Hess

unread,
Feb 13, 2018, 9:33:46 PM2/13/18
to comed...@googlegroups.com
Looking at the version of Comedi in the Linux kernel repos, it looks
like it will transfer less than 4096 bytes if you have run out of
space in comedi's buffer. It does set the COMEDI_CB_OVERFLOW flag
when this happens, but that doesn't appear to actually cause the
acquisition to stop (it's been a long time but I think it used to).
So the bytes that didn't fit into comedi's buffer get discarded,
causing your problems.

--
Frank

Frank Mori Hess

unread,
Feb 14, 2018, 1:00:53 AM2/14/18
to comed...@googlegroups.com
On Tue, Feb 13, 2018 at 9:33 PM, Frank Mori Hess <fmh...@gmail.com> wrote:
>
> Looking at the version of Comedi in the Linux kernel repos, it looks
> like it will transfer less than 4096 bytes if you have run out of
> space in comedi's buffer. It does set the COMEDI_CB_OVERFLOW flag
> when this happens, but that doesn't appear to actually cause the
> acquisition to stop (it's been a long time but I think it used to).
> So the bytes that didn't fit into comedi's buffer get discarded,
> causing your problems.
>

Oops, I've changed my mind. Now I believe the problem is due to
comedi_nsamples_left() in comedi/drivers.c returning the wrong value.
It rounds down an intermediate variable:

unsigned int nscans = nsamples / cmd->scan_end_arg;

which can cause the returned number of samples to be less than it
should be. Can you tell I'm debugging without hardware?

--
Frank

BJ Dillon

unread,
Feb 14, 2018, 4:20:55 PM2/14/18
to Comedi: Linux Control and Measurement Device Interface
Hey Frank,

Super awesome reply. Very helpful. I'm still looking into it from your prescriptive, but wanted to let you know your comments are helping -- I just need a bit more time to wade through all the C. 

-Brandon 

Ian Abbott

unread,
Feb 15, 2018, 11:49:55 AM2/15/18
to comed...@googlegroups.com
That code will only be executed if the command's 'stop_src' is
TRIG_COUNT. Otherwise, comedi_nsamples_left() will just return the
number of samples value that was passed into it.

Brandon, what are you setting 'stop_src' to in your command?

BJ Dillon

unread,
Feb 16, 2018, 12:41:33 AM2/16/18
to Comedi: Linux Control and Measurement Device Interface
Hello Ian!

Thanks for chiming in on the conversation. I know if anyone can fix this issue, its you. I also recognize that you're probably a pretty busy guy, so a double thanks for your time. 

To answer your question.... I'm not setting 'stop_src' to anything explicitly. But it appears that a call to comedi_get_cmd_generic_timed() sets it to 0x20 (aka 'TRIG_COUNT'). Hope that helps.


My investigation, however, is now focused on my card's specific driver (drivers/das6402.c).  Here's what I know so far...

_comedi_get_buffer_contents() in lib/buffer.c is the function that returns the screwy values mentioned in my posts above (e.g. 4092 when it should be 4096). It seems like these values are generated from one line of code:  "return bi.buf_write_count - bi.buf_read_count; "

These values only appear to be meaningfully modified in two functions found in comedi/comedi_fops.c:  comedi_buf_write_free() and comedi_buf_read_free()  

It also seems that the only functions that call comedi_buf_write_free() and comedi_buf_read_free() are comedi_buf_get() and comedi_buf_put(). And since it seems like these are the functions that manipulate the buffer during an async acquisition, I figured that if 4092 bytes are being reported by comedi_get_buffer_contents() it really means that only 4092 meaningful bytes are available in the buffer -- it doesn't look like much can go wrong in that chain of function calls. 

On the off chance that I was wrong in my analysis and 4096 meaningful bytes are being written to the buffer but for some reason (that's beyond my understanding of the code) only 4092 are being reported as being avaiable, I modified my program to read out 4096 bytes when 4092 were being reported. The behavior is as you might expect, the extra bytes are garbage. This leads me to the conclusion that somewhere at a level that's dealing directly with _comedi_ioctl() and the kernel ioctl() that only 4092 bytes are being commanded to be transferred to the buffer. But I'm at a bit of a loss as to where to start searching for that functionality. As a guess, it seems that comedi_buf_put() is generally called from the specific DAQ card's driver. So for me, that would be drivers/das6402.c. 

This is where things get confusing. Older version of das6402.c (prior to linux 3.15) makes calls to comedi_buf_put() inside of a while(1) loop which is housed inside of das6402_ai_fifo_dregs(). That would seem to make sense in my general understanding of how the kernel should transfer data off of the card and into main memory. But newer versions of das6402.c (starting in linux 3.15-rc1) don't define or use a das6402_ai_fifo_dregs() function, nor does the driver seem to make any calls to comedi_buf_put(). 

It's going to take me some time to understand how this new driver works. But I can answer one of my initial questions: "Has anything changed in the implementation of the ring buffer system between 2013-ish and now?" 

It would seem that the change (linux 3.15 was mid 2014'ish) wasn't necessarily in the ring buffer implementation, but rather how my card's driver is interacting with it -- at least that's the working theory. 

If any of you have some insight on this, I'd love to hear it! In the mean time it looks like I'll be digging through drivers/das6402.c

Thanks!
-Brandon 

Frank Mori Hess

unread,
Feb 16, 2018, 12:57:07 AM2/16/18
to comed...@googlegroups.com
On Fri, Feb 16, 2018 at 12:41 AM, BJ Dillon <bdi...@vt.edu> wrote:

> If any of you have some insight on this, I'd love to hear it! In the mean
> time it looks like I'll be digging through drivers/das6402.c
>

Your driver is drivers/cb_pcidas64.c and see the drain_dma_buffers() function.

Ian Abbott

unread,
Feb 16, 2018, 5:31:25 AM2/16/18
to comed...@googlegroups.com
On 16/02/18 05:41, BJ Dillon wrote:
> Hello Ian!
>
> Thanks for chiming in on the conversation. I know if anyone can fix this
> issue, its you. I also recognize that you're probably a pretty busy guy,
> so a double thanks for your time.
>
> To answer your question.... I'm not setting 'stop_src' to anything
> explicitly. But it appears that a call to comedi_get_cmd_generic_timed()
> sets it to 0x20 (aka 'TRIG_COUNT'). Hope that helps.

What happens if you set stop_src = TRIG_NONE and stop_arg = 0 after the
call to comedi_get_cmd_generic_timed()? If Frank is correct about the
faulty comedi_nsamples_left() behaviour (as I think he is), that would
avoid the problem with the channel data going missing at the ends of the
4096-byte blocks. However, the command will now run "forever" until you
call comedi_cancel() for the subdevice, or call comedi_close() fore the
device.

BJ Dillon

unread,
Feb 16, 2018, 4:59:50 PM2/16/18
to Comedi: Linux Control and Measurement Device Interface
Hey Ian and Frank,

Setting those two variables as directed solved the problem. Time histories of all channel combinations work beautifully now. 

I understand the rounding error, but I'm having trouble grasping how that trouble fits into the bigger picture. I've (clearly) missed how comedi_nsamples_left() is incorporated into the schema of an async acquisition session. 


-Brandon 

Ian Abbott

unread,
Feb 19, 2018, 5:13:05 AM2/19/18
to comed...@googlegroups.com
On 16/02/18 21:59, BJ Dillon wrote:
> Hey Ian and Frank,
>
> Setting those two variables as directed solved the problem. Time
> histories of all channel combinations work beautifully now.

That's good news!

>
> I understand the rounding error, but I'm having trouble grasping how
> that trouble fits into the bigger picture. I've (clearly) missed how
> comedi_nsamples_left() is incorporated into the schema of an async
> acquisition session.

The trouble is that the broken version of comedi_nsamples_left() isn't
doing what it is supposed to. It is meant to return the smaller of the
number of samples passed in as a parameter and the number of samples
required to complete the acquisition. When stop_src is TRIG_COUNT, the
bug makes comedi_nsamples_left() return the smaller of number of samples
passed in rounded down to a multiple of the scan length (scan_arg) and
the number of samples required to complete the acquisition.

The drain_dma_buffers() function in
"drivers/staging/comedi/drivers/cb_pcidas64.c" is trying to copy all
filled DMA buffers to the comedi buffer, and it only gets one chance to
do so. It calls comedi_nsamples_left() with the number of samples in
the DMA buffer and expects that to return the value passed in except at
the end of acquisition when it may be smaller. It duly transfers the
number of samples returned by comedi_nsamples_left() from the DMA buffer
to the comedi buffer and moves on to the next DMA buffer if any.
However, the bug in comedi_nsamples_left() means it may not have
transferred a few samples from the end of each DMA buffer, leading to
the problem you have observed.

Setting stop_src to TRIG_NONE avoids the bug. Frank's fix should make
it into the mainline Linux kernel tree ready for the 4.17 kernel, and
then trickle down to the maintained stable and longterm kernels, but
until then, you will need to use the workaround.

BJ Dillon

unread,
Feb 19, 2018, 1:45:56 PM2/19/18
to Comedi: Linux Control and Measurement Device Interface
Hey Ian,

Sounds good. I'm very happy with the work around. 

Thank you for your help and also to Frank. 

-Brandon 
Reply all
Reply to author
Forward
0 new messages