Reading UDP at high rates, and select

Jorgen Grahn

unread,

Mar 13, 2010, 3:23:42 AM3/13/10

to

I have a few different applications which have to multiplex reading
from several UDP sockets, and also doing other things. Performance
is important to me.

Today I have (A) a plain old select() loop. When an UDP socket is
readable, I try to read one datagram from it, process it, and go back
to select.

[David, we can skip the blocking discussion here -- I know from the
previous thread that the read may block here, at least on Linux.]

Fine, but this has a select setup + wakeup + read overhead for every
single UDP datagram. I could use the Linux epoll interface, but that
would only optimize the select setup part.

Another option is (B) nonblocking UDP sockets. Select as before, but
instead of reading once, read and process until EAGAIN. If I have a
growing Rx queue on the socket this should be better ... but on the
other hand if I'm faster than the UDP sender this means I call
recvmsg() twice for each UDP message.

A third option (C) is a thread for each UDP socket which just blocks
in a recvmsg() loop. But I don't want threads.

Are there other options than (A), (B), (C)?

What do people typically do? Maybe I should read the bind sources;
a DNS should have exactly kind of problem.

I'm beginning to dislike UDP for this reason too -- does anyone
agree with me? With TCP you can wake up and consume a huge chunk of
data if you have a huge userspace buffer to read into. With UDP all
you get is one datagram. Maybe there should be a variation of
recv/readv/etc which lets you provide buffer space for many
datagrams (and their source addresses).

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

David Schwartz

unread,

Mar 13, 2010, 6:35:56 AM3/13/10

to

On Mar 13, 12:23 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:

> Fine, but this has a select setup + wakeup + read overhead for every
> single UDP datagram. I could use the Linux epoll interface, but that
> would only optimize the select setup part.

If you're dealing with even a moderate number of sockets, and the
typical case is when few of those sockets are heavily active, the work
the kernel has to do on each entry and exit into 'select' is pretty
large. This overhead is removed with epoll.

Look at an extreme case. You have 1,000 sockets and one 1 is active.
When you call 'select', if you're caught up, the kernel has to put
your process on 1,000 wait queues, only to remove it from all 1,000 a
split-second later when you receive the data. Then, if you're fast,
you call 'select' again before another packet is received, and you
repeat the process, having to do 2,000 wait queue operations per
packet received.

> Another option is (B) nonblocking UDP sockets. Select as before, but
> instead of reading once, read and process until EAGAIN. If I have a
> growing Rx queue on the socket this should be better ... but on the
> other hand if I'm faster than the UDP sender this means I call
> recvmsg() twice for each UDP message.

It won't matter, because you just called 'recvmsg' on the same socket.
Everything will be warm in the cache. The second 'recvmsg' call, if
and only if you are so caught up that you get back there before
another packet is received, won't really matter.

It's nut to not be worried about 'select' overhead but to be worried
about an extra 'recvmsg' call. (Assuming you have other sockets to
'select' on too, at least.)

> A third option (C) is a thread for each UDP socket which just blocks
> in a recvmsg() loop. But I don't want threads.

If you have only a very small number of important UDP sockets, this is
likely your best bet.

> I'm beginning to dislike UDP for this reason too -- does anyone
> agree with me? With TCP you can wake up and consume a huge chunk of
> data if you have a huge userspace buffer to read into. With UDP all
> you get is one datagram.

That's not a fair comparison. With TCP, if you're caught up and have
CPU to spare, you will wind up reading only one packet's worth of
received data anyway. With UDP if you fall behind, you can read as
many datagrams as you want in a fast loop.

> Maybe there should be a variation of
> recv/readv/etc which lets you provide buffer space for many
> datagrams (and their source addresses).

To do what? Reduce user/kernel transitions in the least expensive case
where you immediately 'recvmsg' on the same socket? I don't think the
optimization would be significant.

DS

Jorgen Grahn

unread,

Mar 13, 2010, 7:31:03 AM3/13/10

to

On Sat, 2010-03-13, David Schwartz wrote:
> On Mar 13, 12:23 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
>
>> Fine, but this has a select setup + wakeup + read overhead for every
>> single UDP datagram. I could use the Linux epoll interface, but that
>> would only optimize the select setup part.
>
> If you're dealing with even a moderate number of sockets, and the
> typical case is when few of those sockets are heavily active, the work
> the kernel has to do on each entry and exit into 'select' is pretty
> large. This overhead is removed with epoll.

Yes, I have personally seen be a problem (on Linux, with a few dozen
UDP sockets). And I have seen your earlier postings on this. Doesn't
hurt to repeat it, though.

...

>> Another option is (B) nonblocking UDP sockets. Select as before, but
>> instead of reading once, read and process until EAGAIN. If I have a
>> growing Rx queue on the socket this should be better ... but on the
>> other hand if I'm faster than the UDP sender this means I call
>> recvmsg() twice for each UDP message.
>
> It won't matter, because you just called 'recvmsg' on the same socket.
> Everything will be warm in the cache. The second 'recvmsg' call, if
> and only if you are so caught up that you get back there before
> another packet is received, won't really matter.
>
> It's nut to not be worried about 'select' overhead but to be worried
> about an extra 'recvmsg' call. (Assuming you have other sockets to
> 'select' on too, at least.)

I never said I wasn't worried about select() overhead. But perhaps I
overstimate the cost of a system call (recvmsg() on an empty socket,
and as you say with a warm cache).

>> A third option (C) is a thread for each UDP socket which just blocks
>> in a recvmsg() loop. But I don't want threads.
>
> If you have only a very small number of important UDP sockets, this is
> likely your best bet.

OK. So you're saying the people who don't want threads are likely to
do (B), then?

I *did* check the bind9 sources, but they used Paul Vixie's eventlib
which I'm sure is great for these kinds of things, but made it hard to
quickly see what was going on.

>> I'm beginning to dislike UDP for this reason too -- does anyone
>> agree with me? With TCP you can wake up and consume a huge chunk of
>> data if you have a huge userspace buffer to read into. With UDP all
>> you get is one datagram.
>
> That's not a fair comparison. With TCP, if you're caught up and have
> CPU to spare, you will wind up reading only one packet's worth of
> received data anyway. With UDP if you fall behind, you can read as
> many datagrams as you want in a fast loop.
>
>> Maybe there should be a variation of
>> recv/readv/etc which lets you provide buffer space for many
>> datagrams (and their source addresses).
>
> To do what? Reduce user/kernel transitions in the least expensive case
> where you immediately 'recvmsg' on the same socket? I don't think the
> optimization would be significant.

Yes, my whole reasoning what based on the assumption that system calls
are expensive, and the observation that the syscall rate grows when
the UDP read load increases, but not so in the TCP case.

I don't really know if they *are* expensive -- on my target system and
on Unix in general.

David Schwartz

unread,

Mar 13, 2010, 8:29:12 AM3/13/10

to

On Mar 13, 4:31 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:

> OK. So you're saying the people who don't want threads are likely to
> do (B), then?

Yes.

> > To do what? Reduce user/kernel transitions in the least expensive case
> > where you immediately 'recvmsg' on the same socket? I don't think the
> > optimization would be significant.

> Yes, my whole reasoning what based on the assumption that system calls
> are expensive, and the observation that the syscall rate grows when
> the UDP read load increases, but not so in the TCP case.

In the typical TCP case, you will get around to calling 'read' once
per packet received. For UDP under high load, the majority of your
'recvmsg' calls will get a packet. And, of course, we know TCP is
superior for bulk data.

> I don't really know if they *are* expensive -- on my target system and
> on Unix in general.

Not likely. You may be working to optimize a case that's already
pretty darn efficient.

There's an easy way you can test. Create a UDP socket, bind it, set it
non-blocking, then call 'recvmsg' on it a million times. See how long
it takes.

DS

EJP

unread,

Mar 15, 2010, 1:26:41 AM3/15/10

to

On 13/03/2010 7:23 PM, Jorgen Grahn wrote:
> When an UDP socket is
> readable, I try to read one datagram from it, process it, and go back
> to select.

Assuming you're in non-blocking mode, read as many datagrams you can at
this point, until you get the zero or whatever it is in your API that
tells you there is no more data. Also assuming this thread has nothing
else to do ;-)

Jorgen Grahn

unread,

Mar 15, 2010, 7:46:38 AM3/15/10

to

On Mon, 2010-03-15, EJP wrote:
> On 13/03/2010 7:23 PM, Jorgen Grahn wrote:
>> When an UDP socket is
>> readable, I try to read one datagram from it, process it, and go back
>> to select.
>
> Assuming you're in non-blocking mode, read as many datagrams you can at
> this point, until you get the zero or whatever it is in your API that
> tells you there is no more data.

Yes, that's what I described as my option (B) a few lines further down.

> Also assuming this thread has nothing
> else to do ;-)

That's a good point. Mine *does* have other work to do, so I can, for
example, limit myself to max 50 recv()s on an fd after one select() says
it's readable.