Linux equivalent for ioctlsocket(FIONREAD) on datagram sockets

already...@yahoo.com

unread,

Mar 9, 2009, 7:16:02 AM3/9/09

to

Hi

Winsock2 implements two variants of FIONREAD control code which are
non-trivially different when applied to datagram (UDP) sockets.
Specifically:
ioctlsocket(FIONREAD):
If s is message oriented (for example, type SOCK_DGRAM), FIONREAD
still returns the amount of pending data in the network buffer,
however, the amount that can actually be read in a single call to the
recv function is limited to the data size written in the send or
sendto function call.

WSAIoctl(FIONREAD):
If s is message oriented (for example, type SOCK_DGRAM), FIONREAD
returns the size of the first datagram (message) queued on the
socket.

According to udp(7) - Linux man page:
FIONREAD (SIOCINQ) ... returns the size of the next pending datagram
in the integer in bytes, or 0 when no datagram is pending."

So Linux variant of ioctl(SIOCINQ) is an exact equivalent of Windows
WSAIoctl(FIONREAD).

Now the question, what is a Linux equivalent for Windows ioctlsocket
(FIONREAD)?
Motivation: I want to read as many as possible messages from the
blocking UDP socket without danger of being blocked.
On Windows I do something like that (leaving error handling aside for
sake of brevity):
select(...)
ioctlsocket(s, FIONREAD, &nOctets);
while (nOctets > 0)
{
rcvlen = recvfrom(s, ...);
handle_rx_message();
nOctets -= rcvlen;
}

On Linux the code like above produces correct results but it doesn't
achieve the original goal which is a minimizing the # of system calls.

Regards,
Michael

David Schwartz

unread,

Mar 9, 2009, 8:05:15 AM3/9/09

to

On Mar 9, 4:16 am, already5cho...@yahoo.com wrote:

> Now the question, what is a Linux equivalent for Windows ioctlsocket
> (FIONREAD)?

> Motivation: I want to read as many as possible messages from the
> blocking UDP socket without danger of being blocked.

It cannot be done. A non-blocking socket can always block.

Suppose, for example, the system call tells you that there are four
packets. By the time you read three of them, one of them has been
dropped (UDP packets are not reliable, remember). Now, when you go to
read the fourth packet, you hang.

The only way to ensure that you don't block is to set the socket non-
blocking.

DS

already...@yahoo.com

unread,

Mar 9, 2009, 12:01:59 PM3/9/09

to

On Mar 9, 2:05 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Mar 9, 4:16 am, already5cho...@yahoo.com wrote:
>
> > Now the question, what is a Linux equivalent for Windows ioctlsocket
> > (FIONREAD)?
> > Motivation: I want to read as many as possible messages from the
> > blocking UDP socket without danger of being blocked.
>
> It cannot be done. A non-blocking socket can always block.
>
> Suppose, for example, the system call tells you that there are four
> packets. By the time you read three of them, one of them has been
> dropped (UDP packets are not reliable, remember). Now, when you go to
> read the fourth packet, you hang.
>

First, I have never ever seen TCP/IP stack discarding the packet that
it already put in the socket buffer.
Second, even if such event happens, in my application the next packet
typically arrives soon enough to unblock without major problems.
Third, even if for the reason I can't imagine the stack decided to
drop the last packet in the stream I can live with it. The application
in question is just a tester for some equipment. If once in 100 years
it hangs - so be it.

> The only way to ensure that you don't block is to set the socket non-
> blocking.
>
> DS

Indeed, going non-blocking I can achieve the same number of system
calls - instead of wasting a syscall to ioctl() at the beginning of
the reception batch I'd wast a syscall to excessive recvfrom() at the
end of the batch.
However I don't like the idea, because I use the same socket for send/
sendto and for send I strongly prefer blocking semantics.

David Schwartz

unread,

Mar 9, 2009, 6:25:01 PM3/9/09

to

On Mar 9, 9:01 am, already5cho...@yahoo.com wrote:

> First, I have never ever seen TCP/IP stack discarding the packet that
> it already put in the socket buffer.

It doesn't matter what you have seen. It matters what the standards
say and what behavior is guaranteed.

When I wrote code two years ago, I had never seen a Core i7 CPU. But
people who have my code expect that they can upgrade their CPUs and
still have my code work. And it will, because I follow the standards
and rely on guarantees and don't assume something can't happen just
because I've never seen it happen.

> Second, even if such event happens, in my application the next packet
> typically arrives soon enough to unblock without major problems.

Then just call 'recvmsg' if that's good enough for you.

> Third, even if for the reason I can't imagine the stack decided to
> drop the last packet in the stream I can live with it. The application
> in question is just a tester for some equipment. If once in 100 years
> it hangs - so be it.

I think you're missing the point. You are asking for a way to do
something and I am explaining why such a way does not exist. In other
words, I've invalidated your use case. Such a thing does not exist
because it is not useful. Nobody is going to create something that
sort of works some of the time.

> Indeed, going non-blocking I can achieve the same number of system
> calls - instead of wasting a syscall to ioctl() at the beginning of
> the reception batch I'd wast a syscall to excessive recvfrom() at the
> end of the batch.

You're concerned about this fine level of optimization on one hand and
the other hand you're not even concerned if it operates correctly? I
call bullshit.

> However I don't like the idea, because I use the same socket for send/
> sendto and for send I strongly prefer blocking semantics.

No system I know of implements blocking semantics for non-local UDP
sends. In fact, in principle, there is no way to provide blocking UDP
semantics. How would you know what to wait for?

I think you've not quite fully grasped that UDP is unreliable. No
matter how long you wait to send the datagram, it still may get
dropped. It's your code's responsibility to do transmit pacing.

DS

already...@yahoo.com

unread,

Mar 9, 2009, 7:51:22 PM3/9/09

to

David,

Your line of thinking is not constructive. Your suggestions are
practically useless. It seems you're hanging on here not in order to
help other people but for sole purpose of massaging your own ego.
I'd greatly appreciate if you don't bother answering my questions in
the future.

Regards,
Michael

David Schwartz

unread,

Mar 9, 2009, 7:59:19 PM3/9/09

to

You are welcome to be intentionally ignorant. But don't spit at the
people who are trying to help you.

The solution to your problem is this simple:

1) Set the socket non-blocking.

2) Don't worry about what happens when you send. The semantics for
blocking and non-blocking UDP sends are essentially the same. The only
difference is that a blocking send may silently discard a datagram
where a non-blocking send will return EWOULDBLOCK. You can simply
ignore EWOULDBLOCK or use it as an extra hint that the datagram
dropped.

DS

already...@yahoo.com

unread,

Mar 9, 2009, 9:17:32 PM3/9/09

to

Now you are trying to be constructive.
However your concentration on the letter of standard prevents you from
understanding the spirit.
Yes, UDP is defined us unreliable. But it also defined as "best
effort". It would be practically useless without the later.
Now imagine the IP stack the adheres to both the letter and the spirit
of the standard. Imagine fast CPU/Memory/IO bus. Imagine slow physical
line. Please take into account that for modern CPUs/memory/IO-Bus even
1GBe is a "slow line". Imagine that fast CPU sends a burst of UDP
datagrams which is longer than SEND_BUFFER through blocking socket.
Now what the IP stack/packet driver that adheres to the "best effort"
spirit of UDP standard should do in that particular case? Drop the
packet? No way, that's against the spirit. The correct way would be
blocking a clling thread until the NIC hardware (probably through DMA)
reads one or more packets from socket's send buffer freeing up space
for the next one. Comprende?
I am absolutely sure that on Etherrnet 10/100 links all popular
general-purpose OSes behave exactly like described above. Didn't check
on faster links, but hopefully it's the same.

Now, if you really want to be constructive...
The original question is not particularly interesting. I'd ask a more
generic question:
What is the best way to receive fast UDP stream (order of 20K to 50K
packets per second) while dropping as few packets as possible?
On Windows the [pseudo]code presented in my original post easily
achieves packet error rate of ~1E-7 but hits the wall when we try to
do better. And on Windows when you hit the wall.... well, you hit the
wall.
On Linux (Ubuntu 8.10 x64) so far I see packet error rates in excess
of 1E-5. And yes, I tried both blocking and non-blocking sockets. Non-
blocking variant significantly reduces the CPU load but, at least on
fast computers, makes no material difference to the error rates. The
fact that the my current Linux results are so horribly bad leaves the
hope that I am doing something wrong... May be, I should try scatter-
gather read? Or something else?

I understand that quite a few people would try to suggest that I
shouldn't want :( to receive UDP datagrams at low error rate. All
these people are welcome to say it right here but sincerely I don't
promise a polite response.

David Schwartz

unread,

Mar 9, 2009, 9:51:10 PM3/9/09

to

On Mar 9, 6:17 pm, already5cho...@yahoo.com wrote:

> Now you are trying to be constructive.

You simply don't like the answer.

> However your concentration on the letter of standard prevents you from
> understanding the spirit.
> Yes, UDP is defined us unreliable. But it also defined as "best
> effort". It would be practically useless without the later.

Right, but everything is designed based on the premise that it is
unreliable. You cannot *ever* assume it is reliable.

> Now imagine the IP stack the adheres to both the letter and the spirit
> of the standard. Imagine fast CPU/Memory/IO bus. Imagine slow physical
> line. Please take into account that for modern CPUs/memory/IO-Bus even
> 1GBe is a "slow line". Imagine that fast CPU sends a burst of UDP
> datagrams which is longer than SEND_BUFFER through blocking socket.

That would be a bug. The sending application is responsible for
transmit pacing in a UDP application.

> Now what the IP stack/packet driver that adheres to the "best effort"
> spirit of UDP standard should do in that particular case? Drop the
> packet? No way, that's against the spirit. The correct way would be
> blocking a clling thread until the NIC hardware (probably through DMA)
> reads one or more packets from socket's send buffer freeing up space
> for the next one. Comprende?

Right, but you can't design based on that. The exact same problem can
happen one hop away on the other side of a router. So what point is
there in going to special effort to solve this one case when it leaves
another version of the exact same problem unsolved?

With UDP, you need a general solution to bottlenecks that can occur
anywhere. There is little point to special solutions to special
bottlenecks, unless you have some special way to know that that's the
only problem you are going to have.

The APIs and interfaces are designed for the general case.

> What is the best way to receive fast UDP stream (order of 20K to 50K
> packets per second) while dropping as few packets as possible?

Obviously, it depends on the operating system.

> On Windows the [pseudo]code presented in my original post easily
> achieves packet error rate of ~1E-7 but hits the wall when we try to
> do better. And on Windows when you hit the wall.... well, you hit the
> wall.

On Windows, posting lots of overlapped I/O requests and using a pool
of threads is the best you can do.

> On Linux (Ubuntu 8.10 x64) so far I see packet error rates in excess
> of 1E-5. And yes, I tried both blocking and non-blocking sockets. Non-
> blocking variant significantly reduces the CPU load but, at least on
> fast computers, makes no material difference to the error rates. The
> fact that the my current Linux results are so horribly bad leaves the
> hope that I am doing something wrong... May be, I should try scatter-
> gather read? Or something else?

Set the receive queue as large as you can. Keep a thread blocked on
'recvmsg'. Make sure that this thread does as little work as possible
before it gets back to 'recvmsg'. Keep your own internal queue of
received datagrams.

Keep a pool of free memory chunks so you don't block in the normal
allocator. Pre-allocate, say, 1,000 packet buffers with the necessary
space to form them into a linked list. Your loop looks like this:

1) Receive a UDP packet into my pre-allocated buffer.
2) Add it to my own linked list of packets.
3) Try to acquire the lock on the master linked list of packets. If
fewer than 10 packets received, do so non-blocking. If more than 10,
do so blocking.
4) If we failed to acquire the lock, jump to step 1.
5) Add our linked list of packets to the end of the system's linked
list.
6) Release the lock.
7) Go to step 1.

Keep all memory allocation out of the fast path. Do not block on the
linked list that is shared with other threads unless too much data has
backed up.

Note that you basically cannot do this with a single thread. There are
simply too many ways you can unexpectedly block.

> I understand that quite a few people would try to suggest that I
> shouldn't want :( to receive UDP datagrams at low error rate. All
> these people are welcome to say it right here but sincerely I don't
> promise a polite response.

It's definitely bad to allow your code to introduce extra packet loss.
You should definitely do your best to avoid losing data that has gone
to all the trouble to get to your system.

DS

jakas...@gmail.com

unread,

Jun 28, 2012, 3:53:19 AM6/28/12

to

Hi, I hope I'm not too late. To set a socket as non-blocking in Linux use fcntl with F_SETFL and O_NONBLOCK. I prefer to do it like this:

fcntl(sockfd, F_SETFL, fcntl(sockfd, F_GETFL, 0) | O_NONBLOCK);

This way, any other flags associated with the socket are preserved.