Scheduler question

Daniel O'Connor

unread,

Feb 3, 2011, 9:56:45 PM2/3/11

to freebsd-hackers Hackers

Hi,
I am writing a program which reads from a data acquisition chassis connected to a radar via USB. The interface is a Cypress FX2 and I am communicating via libusb.

The program starts a thread which sits in a loop doing nothing but libusb_handle_events_timeout() which in turn runs a callback if a transfer is complete. Each transfer is in a struct which has a mutex and a 'done' flag (the former protects the later) which is set when the callback is run by libusb.

The main thread sits in a loop waiting for the next transfer to be done and when it is copying data out to be further processed and then written out to disk and/or another process for some further mangling.

After each USB transfer is done with (ie the main thread has passed it all out for further processing) the main thread re-submits it to libusb.

I only have about 10 milliseconds of buffering (96kbyte FIFO, 8Mbyte/sec) in the hardware, however I have about 128Mb of USB requests queued up to libusb. hps@ informed me that libusb will only queue 16kbyte (2msec) in the kernel at one time although I have increased this.

I hooked up a logic analyser and I can see most of the time it's fairly regularly transferring 16k of data every 2msec.

If I load up the disk by, eg, tar -cf /dev/null /local0 I find it drops out and I can see gaps in the transfers until eventually the FIFO fills up and it stops.

I am wondering if this is a scheduler problem (or I am expecting too much :) in that it is not running my libusb thread reliably under load. The other possibility is that it is a USB issue, although I am looking at using isochronous transfers instead of bulk.

I just noticed that the USB controller and ATA controller share an IRQ, but I wouldn't expect that to cause a problem..

This is running on FreeBSD 8.1-STABLE, Core 2 Duo with ICH9 chipset.

Thanks.

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Ivan Voras

unread,

Feb 4, 2011, 6:19:45 AM2/4/11

to freebsd...@freebsd.org

On 04/02/2011 03:56, Daniel O'Connor wrote:

> I hooked up a logic analyser and I can see most of the time it's fairly regularly transferring 16k of data every 2msec.
>
> If I load up the disk by, eg, tar -cf /dev/null /local0 I find it drops out and I can see gaps in the transfers until eventually the FIFO fills up and it stops.
>
> I am wondering if this is a scheduler problem (or I am expecting too much :) in that it is not running my libusb thread reliably under load. The other possibility is that it is a USB issue, although I am looking at using isochronous transfers instead of bulk.

I'm surprised this isn't complained about more often - I also regularly
see that file system activity blocks other, non-file-using processes
which are mostly CPU and memory intensive (but since I'm not running
realtime things, it fell under the "good enough" category). Maybe there
is kind of global-ish lock of some kind which the VM or the VFS hold
which would interfere with normal operation of other processes (maybe
when the processes use malloc() to grow their memory?).

Could you try 2 things:

1) instead of doing file IO, could you directly use a disk device (e.g.
/dev/ad0), possibly with some more intensive utility than dd (e.g.
"diskinfo -vt") and see if there is any difference?

2) if there is a difference in 1), try modifying your program to not
use malloc() in the critical path (if applicable) and/or use mlock(2)?

Daniel O'Connor

unread,

Feb 4, 2011, 6:46:20 AM2/4/11

to Ivan Voras, freebsd...@freebsd.org

On 04/02/2011, at 21:48, Ivan Voras wrote:
>> I am wondering if this is a scheduler problem (or I am expecting too much :) in that it is not running my libusb thread reliably under load. The other possibility is that it is a USB issue, although I am looking at using isochronous transfers instead of bulk.
>
> I'm surprised this isn't complained about more often - I also regularly see that file system activity blocks other, non-file-using processes which are mostly CPU and memory intensive (but since I'm not running realtime things, it fell under the "good enough" category). Maybe there is kind of global-ish lock of some kind which the VM or the VFS hold which would interfere with normal operation of other processes (maybe when the processes use malloc() to grow their memory?).

I guess for an interactive user anything less than 100msec is probably not noticeable unless it happens reasonably regularly when watching a video.

> Could you try 2 things:
>
> 1) instead of doing file IO, could you directly use a disk device (e.g. /dev/ad0), possibly with some more intensive utility than dd (e.g. "diskinfo -vt") and see if there is any difference?

OK, I'll give it a shot.

> 2) if there is a difference in 1), try modifying your program to not use malloc() in the critical path (if applicable) and/or use mlock(2)?

It doesn't allocate memory once it's going, everything is preallocated before the data transfer starts.

I'll have a go with mlock() and see what happens.

Thanks :)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Ivan Voras

unread,

Feb 4, 2011, 7:40:31 PM2/4/11

to freebsd...@freebsd.org

On 04/02/2011 12:45, Daniel O'Connor wrote:
>
> On 04/02/2011, at 21:48, Ivan Voras wrote:
>>> I am wondering if this is a scheduler problem (or I am expecting too much :) in that it is not running my libusb thread reliably under load. The other possibility is that it is a USB issue, although I am looking at using isochronous transfers instead of bulk.
>>
>> I'm surprised this isn't complained about more often - I also regularly see that file system activity blocks other, non-file-using processes which are mostly CPU and memory intensive (but since I'm not running realtime things, it fell under the "good enough" category). Maybe there is kind of global-ish lock of some kind which the VM or the VFS hold which would interfere with normal operation of other processes (maybe when the processes use malloc() to grow their memory?).
>
> I guess for an interactive user anything less than 100msec is probably not noticeable unless it happens reasonably regularly when watching a video.
>
>> Could you try 2 things:
>>
>> 1) instead of doing file IO, could you directly use a disk device (e.g. /dev/ad0), possibly with some more intensive utility than dd (e.g. "diskinfo -vt") and see if there is any difference?
>
> OK, I'll give it a shot.
>
>> 2) if there is a difference in 1), try modifying your program to not use malloc() in the critical path (if applicable) and/or use mlock(2)?
>
> It doesn't allocate memory once it's going, everything is preallocated before the data transfer starts.
>
> I'll have a go with mlock() and see what happens.

Did you find anything interesting?

Daniel O'Connor

unread,

Feb 4, 2011, 9:14:11 PM2/4/11

to Ivan Voras, freebsd...@freebsd.org

On 05/02/2011, at 11:09, Ivan Voras wrote:
>> It doesn't allocate memory once it's going, everything is preallocated before the data transfer starts.
>>
>> I'll have a go with mlock() and see what happens.
>
> Did you find anything interesting?

I'll be looking at it on Monday, I will let you know :)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Daniel O'Connor

unread,

Feb 6, 2011, 8:42:03 PM2/6/11

to Daniel O'Connor, freebsd...@freebsd.org, Ivan Voras

On 05/02/2011, at 12:43, Daniel O'Connor wrote:
> On 05/02/2011, at 11:09, Ivan Voras wrote:
>>> It doesn't allocate memory once it's going, everything is preallocated before the data transfer starts.
>>>
>>> I'll have a go with mlock() and see what happens.
>>
>> Did you find anything interesting?
>
> I'll be looking at it on Monday, I will let you know :)

No luck with mlock() so it wouldn't appear to be paging is the issue :(

Ivan Voras

unread,

Feb 6, 2011, 9:33:47 PM2/6/11

to Daniel O'Connor, freebsd...@freebsd.org

On 7 February 2011 02:41, Daniel O'Connor <doco...@gsoft.com.au> wrote:
>
> On 05/02/2011, at 12:43, Daniel O'Connor wrote:
>> On 05/02/2011, at 11:09, Ivan Voras wrote:
>>>> It doesn't allocate memory once it's going, everything is preallocated before the data transfer starts.
>>>>
>>>> I'll have a go with mlock() and see what happens.
>>>
>>> Did you find anything interesting?
>>
>> I'll be looking at it on Monday, I will let you know :)
>
> No luck with mlock() so it wouldn't appear to be paging is the issue :(

I'm also interested in raw device vs file system access!

Daniel O'Connor

unread,

Feb 6, 2011, 10:13:31 PM2/6/11

to Ivan Voras, freebsd...@freebsd.org

On 07/02/2011, at 13:02, Ivan Voras wrote:
>>> I'll be looking at it on Monday, I will let you know :)
>>
>> No luck with mlock() so it wouldn't appear to be paging is the issue :(
>
> I'm also interested in raw device vs file system access!

Oops, sorry.. I just tried that now but it doesn't improve things :(

I am writing directly to /dev/ad10 but stressing /dev/ad14 (sudo tar -cf /dev/null /local0)

It is interesting also that if I have md5's soaking up CPU then it's much less likely to start streaming properly and generally bombs out straight away. If I start it streaming and then start md5 it stays running... (even if it's rtprio'd)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Ivan Voras

unread,

Feb 7, 2011, 5:39:24 AM2/7/11

to Daniel O'Connor, freebsd...@freebsd.org

On 07/02/2011 04:12, Daniel O'Connor wrote:
>
> On 07/02/2011, at 13:02, Ivan Voras wrote:
>>>> I'll be looking at it on Monday, I will let you know :)
>>>
>>> No luck with mlock() so it wouldn't appear to be paging is the issue :(
>>
>> I'm also interested in raw device vs file system access!
>
> Oops, sorry.. I just tried that now but it doesn't improve things :(

Meaning: you still get jitter?

> I am writing directly to /dev/ad10 but stressing /dev/ad14 (sudo tar -cf /dev/null /local0)

Can you do only one of those things? I.e. leave all the file systems
alone and just do something like 'diskinfo -vt /dev/ad14'?

Daniel O'Connor

unread,

Feb 7, 2011, 7:39:27 AM2/7/11

to Ivan Voras, freebsd...@freebsd.org

On 07/02/2011, at 21:07, Ivan Voras wrote:
>>> I'm also interested in raw device vs file system access!
>>
>> Oops, sorry.. I just tried that now but it doesn't improve things :(
>
> Meaning: you still get jitter?

Yes, well I didn't measure the read frequency but it dropped out (stopped streaming due to a full FIFO) no less often.

>> I am writing directly to /dev/ad10 but stressing /dev/ad14 (sudo tar -cf /dev/null /local0)
>
> Can you do only one of those things? I.e. leave all the file systems
> alone and just do something like 'diskinfo -vt /dev/ad14'?

OK, I wrote the data to /dev/null from USB and ran diskutil in a loop and it doesn't drop out.

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Ivan Voras

unread,

Feb 7, 2011, 8:07:50 AM2/7/11

to Daniel O'Connor, freebsd...@freebsd.org

On 7 February 2011 13:38, Daniel O'Connor <doco...@gsoft.com.au> wrote:

>>> I am writing directly to /dev/ad10 but stressing /dev/ad14 (sudo tar -cf /dev/null /local0)
>>
>> Can you do only one of those things? I.e. leave all the file systems
>> alone and just do something like 'diskinfo -vt /dev/ad14'?
>
> OK, I wrote the data to /dev/null from USB and ran diskutil in a loop and it doesn't drop out.

Maybe I misunderstood you and it's a different problem than what I was
experiencing; is this a better description of your problem:

1) you have a program communicating with a USB device
2) it reads from the device and writes to a file
3) you experience stalls when you write the data recived from the USB
device to the file but only if the file system you're writing on is
also loaded by something else - heavy reads?

?

Daniel O'Connor

unread,

Feb 7, 2011, 8:45:20 AM2/7/11

to Ivan Voras, freebsd...@freebsd.org

On 07/02/2011, at 23:36, Ivan Voras wrote:
>> OK, I wrote the data to /dev/null from USB and ran diskutil in a loop and it doesn't drop out.
>
> Maybe I misunderstood you and it's a different problem than what I was
> experiencing; is this a better description of your problem:
>
> 1) you have a program communicating with a USB device
> 2) it reads from the device and writes to a file
> 3) you experience stalls when you write the data recived from the USB
> device to the file but only if the file system you're writing on is
> also loaded by something else - heavy reads?
>
> ?

Yes, however CPU loading also seems to affect it.

Unfortunately I don't have a useful measurement to show the problem - ie I don't have a metric which correlates with the hardware FIFO filling up.

This makes the testing rather annoying :)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Matthew Dillon

unread,

Feb 10, 2011, 3:29:26 PM2/10/11

to Daniel O'Connor, freebsd...@freebsd.org, Ivan Voras

It sounds like there are at least two issues involved.

The first could be a buffer cache starvation issue due to the load on
the filesystem from the tar. If the usb program is doing any filesystem
operation at all, even at low bandwidths, it could be hitting blockages
due to the disk intensive tar eating up available buffer cache buffers
(e.g. causing an excessive ratio of dirty buffers vs clean buffers).
This would NOT be a scheduler problem per-say, but instead a kernel
resource management problem.

The way to test this is to double-buffer or tripple-buffer the output
via shared memory. A pipe might not do the job if it gets stuck doing
direct transfers (I eventually gave up trying to optimize pipes in DFly
due to a similar problem and just run everything through a kernel buffer
now). Still, it may be possible to test against this particular problem
by having the program write to a pipe and another program or fork handle
the actual writing to the disk or filesystem.

Another way to test this is to comment out the writing in the usb program
entirely and see if things improve.

--

The second issue sounds more scheduler-related. Try running the
usb program at nice -20? You could even run it at a pseudo-realtime
priority using rtprio but nice -20 had better work properly against
a md5 or there is something seriously broken in the scheduler.

Dynamic priority handling is supposed to deal with this sort of thing
automatically, particularly if the usb program is not using a lot of
cpu, but sometimes it can't tell whether a newly-exec'd program is
going to be interactive or batch until after it has run for a while.

Tuning initial conditions after an exec for the scheduler is not an
easy task. Simply giving a program a more batch/bulk-run priority on
exec and letting the dynamic priority shift it more to interactive
operation tends to mess up interactive shells in the face of
cpu-intensive system operation, for example. Theoretically dynamic
priority handling should bump up the priority for the usb program well
beyond any initial conditions for exec once it has been running a while,
assuming it doesn't use tons of cpu.

--

An md5, or any single-file reading operation, would not overload the
buffer cache. File writing and/or multi-file operations (such as a
tar extraction or a tar-up) can create blockages in the buffer cache.

It takes a considerable amount of VM/buffer-cache tuning to get those
subsystems to pipeline properly and sometimes things can go stale and
stop pipelining properly for months without anyone realizing it.

-Matt

Daniel O'Connor

unread,

Feb 14, 2011, 9:00:23 PM2/14/11

to Matthew Dillon, freebsd...@freebsd.org, Ivan Voras

On 11/02/2011, at 6:58, Matthew Dillon wrote:
> It sounds like there are at least two issues involved.
>
> The first could be a buffer cache starvation issue due to the load on
> the filesystem from the tar. If the usb program is doing any filesystem
> operation at all, even at low bandwidths, it could be hitting blockages
> due to the disk intensive tar eating up available buffer cache buffers
> (e.g. causing an excessive ratio of dirty buffers vs clean buffers).
> This would NOT be a scheduler problem per-say, but instead a kernel
> resource management problem.

OK..
Note that my program is split into 2 threads and queues up a large number of buffers. One thread just calls the libusb event handler so if the main thread is blocked for IO it should still run.. right? :)

> The way to test this is to double-buffer or tripple-buffer the output
> via shared memory. A pipe might not do the job if it gets stuck doing
> direct transfers (I eventually gave up trying to optimize pipes in DFly
> due to a similar problem and just run everything through a kernel buffer
> now). Still, it may be possible to test against this particular problem
> by having the program write to a pipe and another program or fork handle
> the actual writing to the disk or filesystem.

Hmm.. in effect I have this as I write all data to disk via mbuffer and this did help, but it still drops out which seems to indicate to me that my libusb event loop thread is being stalled.

Note that the total CPU consumed by it is very low (<1%) and that thread does no I/O.

>
> Another way to test this is to comment out the writing in the usb program
> entirely and see if things improve.

If I write to /dev/null it works fine.

> The second issue sounds more scheduler-related. Try running the
> usb program at nice -20? You could even run it at a pseudo-realtime
> priority using rtprio but nice -20 had better work properly against
> a md5 or there is something seriously broken in the scheduler.

Unfortunately neither of these improve things, I am pretty surprised a nice -20 or rtprio'd thread doesn't beat a pure CPU user doing no IO :(

>
> Dynamic priority handling is supposed to deal with this sort of thing
> automatically, particularly if the usb program is not using a lot of
> cpu, but sometimes it can't tell whether a newly-exec'd program is
> going to be interactive or batch until after it has run for a while.
>
> Tuning initial conditions after an exec for the scheduler is not an
> easy task. Simply giving a program a more batch/bulk-run priority on
> exec and letting the dynamic priority shift it more to interactive
> operation tends to mess up interactive shells in the face of
> cpu-intensive system operation, for example. Theoretically dynamic
> priority handling should bump up the priority for the usb program well
> beyond any initial conditions for exec once it has been running a while,
> assuming it doesn't use tons of cpu.

Hmm.. It is unfortunate the hinting mechanisms are very coarse :(

> An md5, or any single-file reading operation, would not overload the
> buffer cache. File writing and/or multi-file operations (such as a
> tar extraction or a tar-up) can create blockages in the buffer cache.

The md5 process is just reading /dev/null - I run it to soak up the CPU because in production the system will be doing CPU intensive data analysis.

> It takes a considerable amount of VM/buffer-cache tuning to get those
> subsystems to pipeline properly and sometimes things can go stale and
> stop pipelining properly for months without anyone realizing it.

:(
I am waiting on a new buffer card with 8 times bigger FIFOs which should help I hope..

Also I am writing a kernel driver in the hope it will be more robust :)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Daniel O'Connor

unread,

Feb 18, 2011, 3:07:49 AM2/18/11

to Daniel O'Connor, freebsd-hackers Hackers

On 04/02/2011, at 13:26, Daniel O'Connor wrote:
> I only have about 10 milliseconds of buffering (96kbyte FIFO, 8Mbyte/sec) in the hardware, however I have about 128Mb of USB requests queued up to libusb. hps@ informed me that libusb will only queue 16kbyte (2msec) in the kernel at one time although I have increased this.

We have upped the hardware FIFO size to 768kb, which is 91msec at 8Mb/sec, although due to the fact we only start reading out when it's 1/6th full the effective buffer is 75msec.

It does seem much more resilient to CPU load, however heavy disk activity on the same drive still stalls it for too long :(

Given the large buffering in the program it does seem very odd that it would stall for long enough unless both threads are slept while one is waiting for disk IO (which seems like a bug to me).

BTW I have changed to -current (without WITNESS).

Daniel O'Connor

unread,

Feb 23, 2011, 9:23:14 PM2/23/11

to Daniel O'Connor, freebsd-hackers Hackers

On 04/02/2011, at 13:26, Daniel O'Connor wrote:

> I am writing a program which reads from a data acquisition chassis connected to a radar via USB. The interface is a Cypress FX2 and I am communicating via libusb.

I ended up writing a kernel driver (thank you hps for usb_fifo_*!) and it has greatly improved things which is good news for me :)

I will some of the tests suggested by various people soon, I have to wait for a new PC to do them though.