ENOBUFS returned by sendmsg()

King L

unread,

Nov 14, 2002, 9:13:56 AM11/14/02

to

When I send lots of packets via sendmsg() in vxWorks IP, send message
returns ENOBUFS.
If I have a delay between sends it is OK.

I have increased the number of buffers NUM_64, NUM_128, NUM_SYS_64,
NUM_SYS_128 etc
in netBufLib.h file. When I see the ENOBUFS error I check
netStackDataPoolShow() and netStackSysPoolShow() and there are no "failed to
find space" errors.

I have also sent a SO_SNDBUF socket option down to increase that value
without success.

Does anyone have any other ideas as to what may be causing this error?

Thanks,
King L

David Laight

unread,

Nov 14, 2002, 9:48:56 AM11/14/02

to

King L wrote:
> When I send lots of packets via sendmsg() in vxWorks IP, send message
> returns ENOBUFS.
> If I have a delay between sends it is OK.
>
> I have increased the number of buffers NUM_64, NUM_128, NUM_SYS_64,
> NUM_SYS_128 etc
> in netBufLib.h file. When I see the ENOBUFS error I check
> netStackDataPoolShow() and netStackSysPoolShow() and there are no "failed to
> find space" errors.

Are there any free buffers? Don't rely on the error counts being correct!
There is also a buffer pool allocated by the ethernet driver (ptr in
END structure).

Try calling netStackDataPoolShow() when you get ENOBUFS - should
show whether that pool is out of data, or if the problem is elsewhere.

IIRC there are 'problems' in that there is no local flow control
for transmitted datagrams, so if you are sending a lot back to back
some will be discarded before reaching the media. IMHO this is
a serious bug, the error rate of a (correctly built) local LAN
segment is very close to zero so it is actually reasonable to
assume that almost all datagrams that leave one system will
(in the absence of bridges and routers) be detected by the target
system - if only as a lost packet because if a lack of receive
buffers.

David

King L

unread,

Nov 15, 2002, 9:07:20 AM11/15/02

to

"David Laight" <da...@spam.me.l8s.co.uk> wrote in message
news:ar0d4n$19r$1...@helle.btinternet.com...

Thanks for the advice David.

I have called netStackDataPoolShow() when I got ENOBUFS but there always
seems
to be plenty of buffers available.

Cheers,
Paul

Leonid Rosenboim

unread,

Nov 14, 2002, 12:21:28 PM11/14/02

to

"David Laight" <da...@spam.me.l8s.co.uk> wrote in message
news:ar0d4n$19r$1...@helle.btinternet.com...

[snip]

> IIRC there are 'problems' in that there is no local flow control
> for transmitted datagrams, so if you are sending a lot back to back
> some will be discarded before reaching the media. IMHO this is
> a serious bug, the error rate of a (correctly built) local LAN
> segment is very close to zero so it is actually reasonable to
> assume that almost all datagrams that leave one system will
> (in the absence of bridges and routers) be detected by the target
> system - if only as a lost packet because if a lack of receive
> buffers.
>
> David
>

David, my friend, please allow me to disagree -
the ENOBUF error is in fact a flow control signal
that tells the application the packet could not be
delivered down to the interface queue.
In my experience, if you dont ignore this errors,
all packets the application sends succesfully
do reach teh wire.

Asl to LANs and their packet loss rate, I am not sure
there exists such a thing as a "properly designed network",
in reality. Therefore applications should be designed with the
assumption that packets do get lost, and mittigate that.

But even in a "perfect LAN", there are still wires, which have
a certain S/N ratio and a certain BER which translates to a
certain packet loss percentage. There is no communication
medium in the world that has a zero BER (Bit Error Rate),
so however small your BER is, it translates to a certain
amount of lost packets, due to laws of physics.

--
-----------------------------------------------------------------------
Leonid Rosenboim Visit: http://www.masada2000.org/historical.html
Consultant Email: my first name at consultant dot com
If you doubt that Nicotine has an effect on women's vocal cords -
try to drop some ash on the carpet.

David Laight

unread,

Nov 17, 2002, 3:09:00 PM11/17/02

to

> David, my friend, please allow me to disagree -
> the ENOBUF error is in fact a flow control signal
> that tells the application the packet could not be
> delivered down to the interface queue.

Firstly I would expect sendmsg to block, Posix (I know vxworks isn't
posix but) says:

http://www.opengroup.org/onlinepubs/007904975/functions/sendmsg.html

If space is not available at the sending socket to hold the message to
be transmitted and the socket file descriptor does not have O_NONBLOCK
set, the sendmsg() function shall block until space is available. If
space is not available at the sending socket to hold the message to be
transmitted and the socket file descriptor does have O_NONBLOCK set, the
sendmsg() function shall fail.

The correct error is EAGAIN or EWOULDBLOCK.

> In my experience, if you dont ignore this errors,
> all packets the application sends succesfully
> do reach teh wire.

But you should have reasonable 'back pressure' flow control
from the ethernet MAC back to your application.

>
> As to LANs and their packet loss rate, I am not sure

> there exists such a thing as a "properly designed network",
> in reality. Therefore applications should be designed with the
> assumption that packets do get lost, and mittigate that.

Certainly they should not assume that packets are not lost, but it
is reasonable to assume that none are gratuitously discarded.

> But even in a "perfect LAN", there are still wires, which have
> a certain S/N ratio and a certain BER which translates to a
> certain packet loss percentage.

I did some tests a few years ago, the error rate on my LAN segent
was 0. IIRC I was sending small packets as fast as I could...

> There is no communication
> medium in the world that has a zero BER (Bit Error Rate),
> so however small your BER is, it translates to a certain
> amount of lost packets, due to laws of physics.

But error rates < 1 in 10^8 are achievable. I would expect a
local ethernet segment to be at least that good.

David

Leonid Rosenboim

unread,

Nov 18, 2002, 4:14:30 AM11/18/02

to

"David Laight" <da...@spam.me.l8s.co.uk> wrote in message

news:ar8t0s$hsn$1...@knossos.btinternet.com...

>
> > David, my friend, please allow me to disagree -
> > the ENOBUF error is in fact a flow control signal
> > that tells the application the packet could not be
> > delivered down to the interface queue.
>
> Firstly I would expect sendmsg to block, Posix (I know vxworks isn't
> posix but) says:
>
> http://www.opengroup.org/onlinepubs/007904975/functions/sendmsg.html
>
> If space is not available at the sending socket to hold the message to
> be transmitted and the socket file descriptor does not have O_NONBLOCK
> set, the sendmsg() function shall block until space is available. If
> space is not available at the sending socket to hold the message to be
> transmitted and the socket file descriptor does have O_NONBLOCK set, the
> sendmsg() function shall fail.
>
> The correct error is EAGAIN or EWOULDBLOCK.
>

You may be right there, this could be a bug in the VxWorks
sockets implementation, which is dated before the POSIX
standards I think.

> > In my experience, if you dont ignore this errors,
> > all packets the application sends succesfully
> > do reach teh wire.
>
> But you should have reasonable 'back pressure' flow control
> from the ethernet MAC back to your application.
>

This is indeed acheivable. Good L-2 switches use the "collision"
signal to back pressure the sender when in half-duplex mode,
while some newer MACs support the PAUSE flow control frames in
hardware, so flow control is there for you to use at the MAC level,
which will eventually translate unto ENOBUFS for the application,
and that sounds good enough of a flow control to me.

> >
> > As to LANs and their packet loss rate, I am not sure
> > there exists such a thing as a "properly designed network",
> > in reality. Therefore applications should be designed with the
> > assumption that packets do get lost, and mittigate that.
>
> Certainly they should not assume that packets are not lost, but it
> is reasonable to assume that none are gratuitously discarded.

Layer-2 errors are simply discarded, and there is no way to inform
the sends of that. Also, when a Layer-2 switch does not have
proper flow control, packets are discarded simply due to congestion.
This is specifically notable when devices of different rates are connected
to the same switch (i.e. some are 10mbps while others are 100Mbos).

>
> > But even in a "perfect LAN", there are still wires, which have
> > a certain S/N ratio and a certain BER which translates to a
> > certain packet loss percentage.
>
> I did some tests a few years ago, the error rate on my LAN segent
> was 0. IIRC I was sending small packets as fast as I could...
>

I think that large packets are better suitted to this kind of test,
and a fairly long test period is required too.

> > There is no communication
> > medium in the world that has a zero BER (Bit Error Rate),
> > so however small your BER is, it translates to a certain
> > amount of lost packets, due to laws of physics.
>
> But error rates < 1 in 10^8 are achievable. I would expect a
> local ethernet segment to be at least that good.
>

Ok, assuming the BER is 1e-8, and an interface speed at 100Mbps
(that is 1e+8 bits/sec), at a 100% interface utilization, on ebit error
is expected every second on the average.
I think that the actual BER on Cat-5e cable at 100Mbps is better -
wireless connections often specify a 1e-6 BER, so I would expect a
Cat-5 cable perform at least 1e-12 (need to google a bit to find
specific test results), but an 1e-12 BER would result in a single
bit error every 3 hours. Also, BEr would depend on a wide range
of factors like cable length, connectro and jumper impedance matching.

Still, one would conclude that never-ever should an application assume
that no packets are lost on a local LAN segment.

David Laight

unread,

Nov 18, 2002, 9:08:30 AM11/18/02

to

> You may be right there, this could be a bug in the VxWorks
> sockets implementation, which is dated before the POSIX
> standards I think.

Quite probably, but I don't recall us changing that when standardising
sockets for X/Open...

> This is indeed acheivable. Good L-2 switches use the "collision"
> signal to back pressure the sender when in half-duplex mode,
> while some newer MACs support the PAUSE flow control frames in
> hardware, so flow control is there for you to use at the MAC level,
> which will eventually translate unto ENOBUFS for the application,
> and that sounds good enough of a flow control to me.

And if your are inside a collision domain (eg just coax or hubs)
there there is never a problem.

> Layer-2 errors are simply discarded, and there is no way to inform
> the sends of that. Also, when a Layer-2 switch does not have
> proper flow control, packets are discarded simply due to congestion.
> This is specifically notable when devices of different rates are connected
> to the same switch (i.e. some are 10mbps while others are 100Mbos)

Agreed, I did say that intermediate switches/routers mustn't be
discarding packets. Some early switches were particularly broken
and would discard packets if the target interface was busy!
Similarly some bridge/routers that used the AMD lance in promiscuous
mode would never (ever) send a packet onto a 99% busy LAN.

> I think that large packets are better suitted to this kind of test,
> and a fairly long test period is required too.

I probably left it running overnight (at least)...

>>But error rates < 1 in 10^8 are achievable. I would expect a
>>local ethernet segment to be at least that good.
>>
>
> Ok, assuming the BER is 1e-8, and an interface speed at 100Mbps
> (that is 1e+8 bits/sec), at a 100% interface utilization, on ebit error
> is expected every second on the average.

I was probably thinking of a packet error rate.

> I think that the actual BER on Cat-5e cable at 100Mbps is better -
> wireless connections often specify a 1e-6 BER, so I would expect a
> Cat-5 cable perform at least 1e-12 (need to google a bit to find
> specific test results), but an 1e-12 BER would result in a single
> bit error every 3 hours. Also, BEr would depend on a wide range
> of factors like cable length, connectro and jumper impedance matching.

Yes, you do need to have good quality cables. UTP probably isn't
as critical as coax though.

>
> Still, one would conclude that never-ever should an application assume
> that no packets are lost on a local LAN segment.

Indeed, but OTOH they should also not expect them to be discarded.
The error recovery of most protocols doesn't work well in the presence
of large numbers of lost packets. I have made protocols work in some
fairly horrid senarios (usually system with 8bit ISA ethernet cards
with 4k of buffer space....) but it is quite tricky, non-intuitive,
and best avoided if possible.

David

Leonid Rosenboim

unread,

Nov 18, 2002, 9:31:06 AM11/18/02

to

"David Laight" <da...@spam.me.l8s.co.uk> wrote in message

news:aras8t$hdr$1...@helle.btinternet.com...

Leonid Rosenboim

unread,

Nov 18, 2002, 9:35:31 AM11/18/02

to

"David Laight" <da...@spam.me.l8s.co.uk> wrote in message

news:aras8t$hdr$1...@helle.btinternet.com...
[snip]

> >
> > Still, one would conclude that never-ever should an application assume
> > that no packets are lost on a local LAN segment.
>
> Indeed, but OTOH they should also not expect them to be discarded.
> The error recovery of most protocols doesn't work well in the presence
> of large numbers of lost packets. I have made protocols work in some
> fairly horrid senarios (usually system with 8bit ISA ethernet cards
> with 4k of buffer space....) but it is quite tricky, non-intuitive,
> and best avoided if possible.

The recovery for lost packets should work well relative to the "expected"
packet loss rate (which in turn does depend on avg packet size).
But all should have reasonably well tested recovery mechaism, especially
now with the 802.11b WLANs popping up like mushrooms, where BER is
in the 1e-6 range (after FEC effect).

There are really nice tools these days for serious people to test their
wares,
before they hit the market, for example, SHUNRA have a network simulator
that can be instructed to emulate any kind of BER, Delay, Jitter and other
parameters, and be sure it works well before it hits the market.

David Laight

unread,

Nov 18, 2002, 10:10:59 AM11/18/02

to

> But all should have reasonably well tested recovery mechaism, especially
> now with the 802.11b WLANs popping up like mushrooms, where BER is
> in the 1e-6 range (after FEC effect).

Indeed, and the error rate of 802.11b is such that the recovery
algorithms of TCP implementations probably need looking at.
The losses on 802.11b are not helped by the fact that it is having
to do rate adaption as well. I gave up using the the wireless card
under vxworks for NFS (vxworks server) because the error rate was
so high (probably confounded by writing to EEPROM).

> There are really nice tools these days for serious people to test their
> wares,
> before they hit the market, for example, SHUNRA have a network simulator
> that can be instructed to emulate any kind of BER, Delay, Jitter and other
> parameters, and be sure it works well before it hits the market.

I've not seen these, but they will only simulate the errors that have
been thought of. It is the unknown errors that cause the problems.
Minimising the recovery time under adverse conditions isn't easy!

I suspect that we actually agree :-)

David