Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Silent UDP message loss -- possible silent on sendto? (Solaris 10)

265 views
Skip to first unread message

A. McKenney

unread,
Dec 22, 2010, 1:34:25 PM12/22/10
to
I'm using Solaris 10, if that matters.

We are sending data via multicast, with sender
and receiver on the same LAN, and every now
and then, messages are getting lost.
I'd like to know if we can be sure that they are
actually getting sent. (Yes, we're also checking the
rest of the path.)

Our application uses "sendto()" on a non-blocking socket.
If it gets EINTR, it retries, otherwise, if it gets an error,
it logs the error number. I see no error messages.

Is it safe to assume that if sendto() does not get
any error, then the message went out? Or is there
some place we should look for statistics about UDP
messages that get lost in the system, prior to getting
onto the network?

Måns Rullgård

unread,
Dec 22, 2010, 2:06:41 PM12/22/10
to
"A. McKenney" <alan_mc...@yahoo.com> writes:

> I'm using Solaris 10, if that matters.
>
> We are sending data via multicast, with sender
> and receiver on the same LAN, and every now
> and then, messages are getting lost.
> I'd like to know if we can be sure that they are
> actually getting sent. (Yes, we're also checking the
> rest of the path.)

You could always connect a transparent sniffer between the sender and
the network. Then you'd know for sure.

--
Måns Rullgård
ma...@mansr.com

Scott Lurndal

unread,
Dec 22, 2010, 2:45:53 PM12/22/10
to
"A. McKenney" <alan_mc...@yahoo.com> writes:
>I'm using Solaris 10, if that matters.
>
>We are sending data via multicast, with sender
>and receiver on the same LAN, and every now
>and then, messages are getting lost.
>I'd like to know if we can be sure that they are
>actually getting sent. (Yes, we're also checking the
>rest of the path.)

The 'U' in UDP stands for "Unreliable". The packet may
be silently dropped anywhere along the line, from the host, any switch
or router, or on ingress. Most devices will log such via SNMP
statistics on dropped packets. Look at netstat -i.

If your application can't tolerate dropped packets, it shouldn't
be using UDP.

scott

Barry Margolin

unread,
Dec 22, 2010, 8:02:32 PM12/22/10
to
In article <RzsQo.1724$Nl3....@news.usenetserver.com>,
sc...@slp53.sl.home (Scott Lurndal) wrote:

> "A. McKenney" <alan_mc...@yahoo.com> writes:
> >I'm using Solaris 10, if that matters.
> >
> >We are sending data via multicast, with sender
> >and receiver on the same LAN, and every now
> >and then, messages are getting lost.
> >I'd like to know if we can be sure that they are
> >actually getting sent. (Yes, we're also checking the
> >rest of the path.)
>
> The 'U' in UDP stands for "Unreliable".

While it's true that the protocol is unreliable, the U stands for "User".

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

David Schwartz

unread,
Dec 22, 2010, 10:30:33 PM12/22/10
to
On Dec 22, 10:34 am, "A. McKenney" <alan_mckenn...@yahoo.com> wrote:
> I'm using Solaris 10, if that matters.

> We are sending data via multicast, with sender
> and receiver on the same LAN, and every now
> and then, messages are getting lost.
> I'd like to know if we can be sure that they are
> actually getting sent.  (Yes, we're also checking the
> rest of the path.)

Design the protocol such that clients send reception reports back to
the sender.

> Our application uses "sendto()" on a non-blocking socket.
> If it gets EINTR, it retries, otherwise, if it gets an error,
> it logs the error number.  I see no error messages.

You will probably never get an error. What go wrong with attempting to
send a message?

> Is it safe to assume that if sendto() does not get
> any error, then the message went out?

No. If you need to know if a message went out, you need to design that
into your protocol with acknowledgments.

> Or is there
> some place we should look for statistics about UDP
> messages that get lost in the system, prior to getting
> onto the network?

Keep the statistics if you need them. It's not a service UDP provides.

If you think about it, it wouldn't do you any good. Even if the packet
makes the wire, it could be dropped by the switch on the other end of
the wire anyway.

DS

Rainer Weikusat

unread,
Dec 23, 2010, 6:42:22 AM12/23/10
to
sc...@slp53.sl.home (Scott Lurndal) writes:
> "A. McKenney" <alan_mc...@yahoo.com> writes:
>>I'm using Solaris 10, if that matters.
>>
>>We are sending data via multicast, with sender
>>and receiver on the same LAN, and every now
>>and then, messages are getting lost.
>>I'd like to know if we can be sure that they are
>>actually getting sent. (Yes, we're also checking the
>>rest of the path.)
>
> The 'U' in UDP stands for "Unreliable". The packet may
> be silently dropped anywhere along the line, from the host, any switch
> or router, or on ingress.

This is not part of the protocol definition. UDP is IP with an
additional header for local multiplexing (providing 'ports' in order
to route UDP datagrams to one of possibly many applications listenting
for them) and IP contains neither provisions for detecting whether a
datagram was lost somewhere on the path from source to destination nor
any mechanism to cope with such an event. One can imagine that the
inventors of the protocol had two different error scenarios in mind:

- transmission errors resulting in datagram corruption
- transient memory shortage occuring in some device on the
path (incuding the receiving device)

Specifically, RFC791 contains the following paragraph in the
description of the 'example SEND operation':

When the user sends a datagram, it executes the SEND call
supplying all the arguments. The internet protocol module, on
receiving this call, checks the arguments and prepares and
sends the message. If the arguments are good and the datagram
is accepted by the local network, the call returns
successfully. If either the arguments are bad, or the
datagram is not accepted by the local network, the call
returns unsuccessfully. On unsuccessful returns, a reasonable
report must be made as to the cause of the problem, but the
details of such reports are up to individual implementations.

which makes it pretty clear that the intent was not to 'advise' the
sending host that this piece of user data may be dropped onto the
floor without notice for any conceivable reason.

[...]

> If your application can't tolerate dropped packets, it shouldn't
> be using UDP.

If the application can't tolerate lost packets, it must not use any
kind of network (or computer, FWIW), for the simple reason that a
power outage can theoretically happen anywhere and last for any amount
of time alone (in my experience, datagram loss >= 20% will usually
lead to TCP connection aborts and additionally make them basically
unusable until this has happened). Provided the application can
tolerate lost packets, some scheme intended to keep 'packet loss' in
normal operation conditions (including transient 'network failures')
needs to be implemented or used.

Rainer Weikusat

unread,
Dec 23, 2010, 6:43:58 AM12/23/10
to
sc...@slp53.sl.home (Scott Lurndal) writes:
> "A. McKenney" <alan_mc...@yahoo.com> writes:
>>I'm using Solaris 10, if that matters.
>>
>>We are sending data via multicast, with sender
>>and receiver on the same LAN, and every now
>>and then, messages are getting lost.
>>I'd like to know if we can be sure that they are
>>actually getting sent. (Yes, we're also checking the
>>rest of the path.)
>
> The 'U' in UDP stands for "Unreliable". The packet may
> be silently dropped anywhere along the line, from the host, any switch
> or router, or on ingress.

This is not part of the protocol definition. UDP is IP with an

[...]

> If your application can't tolerate dropped packets, it shouldn't
> be using UDP.

If the application can't tolerate lost packets, it must not use any


kind of network (or computer, FWIW), for the simple reason that a
power outage can theoretically happen anywhere and last for any amount
of time alone (in my experience, datagram loss >= 20% will usually
lead to TCP connection aborts and additionally make them basically
unusable until this has happened). Provided the application can
tolerate lost packets, some scheme intended to keep 'packet loss' in
normal operation conditions (including transient 'network failures')

within tolerable limits needs to be implemented or used.

Ersek, Laszlo

unread,
Dec 23, 2010, 3:45:37 PM12/23/10
to
(Adding comp.protocols.tcp-ip.)

On Wed, 22 Dec 2010, David Schwartz wrote:

> On Dec 22, 10:34 am, "A. McKenney" <alan_mckenn...@yahoo.com> wrote:

>> We are sending data via multicast, with sender
>> and receiver on the same LAN, and every now
>> and then, messages are getting lost.
>> I'd like to know if we can be sure that they are
>> actually getting sent.  (Yes, we're also checking the
>> rest of the path.)
>
> Design the protocol such that clients send reception reports back to
> the sender.

An example is "TFTP Multicast Option", http://tools.ietf.org/html/rfc2090
(Because the master client and the server work in lock-step, it is
probably not very performant, but I suppose it is fairly famous and
historical.)


>> Our application uses "sendto()" on a non-blocking socket.
>> If it gets EINTR, it retries, otherwise, if it gets an error,
>> it logs the error number.  I see no error messages.
>
> You will probably never get an error. What go wrong with attempting to
> send a message?

Sendto() has some error values.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/sendto.html
http://docs.sun.com/app/docs/doc/816-5170/sendto-3socket?l=en&n=1&a=view

I think some of those can happen even if the program is correct. I believe
the best approach is to identify critical errors (that the program can't
recover from), like EBADF (programming error) or EMSGSIZE (programming
error or configuration error), and stop sending in those cases. Ignore
other errors for transfer purposes (errors signalling "dynamic" conditions
like ENOBUFS are platform-dependent anyway) and consider those packets
lost somewhere on the network, to be detected by retransmission requests
or lack of timely acknowledgements.

Though the inverse may be more robust, ie. identifying transient errors
explicitly and giving up on anything else. For example, ISTR sendto() can
return the non-standard EPERM on Linux if an iptables rule forbids the
packet. If one relies on the portable specification only and so doesn't
know about EPERM, then the default-to-give-up approach will immediately
catch this permanent error, while the default-to-retry one will probably
time out much later (if the programmer coded a timeout).

Alan, does your sender implement some sort of rate limiter or flow
control? (The flow control in the TFTP example is the lock-step, in fact a
sliding window of size 1; though I'm sure I'll be shredded for improper
use of "flow control".) I think trying to send as fast as the CPU allows
is wasted effort, under common circumstances.

lacos

David Schwartz

unread,
Dec 25, 2010, 11:57:25 AM12/25/10
to
On Dec 23, 12:45 pm, "Ersek, Laszlo" <la...@caesar.elte.hu> wrote:

> > You will probably never get an error. What go wrong with attempting to
> > send a message?

> Sendto() has some error values.

That's because sendto is protocol-independent and supports many
protocols where things can go wrong in attempting to send a message.
And, of course, there's stuff like EFAULT or ENOTSOCK that indicate
program errors. You may also see EACCES for UDP if the system has some
kind of firewall-like thing installed.

But most implementations don't support any 'network' error return
codes for UDP sendto if a datagram is dropped because they don't
retain program flow far enough to do that. By the time they know the
datagram didn't hit the wire, sendto has already returned to the
caller.

DS

Nicolas George

unread,
Dec 25, 2010, 12:45:21 PM12/25/10
to
David Schwartz , dans le message
<a0a419d9-8627-4ba1...@o11g2000prf.googlegroups.com>, a
écrit :

> But most implementations don't support any 'network' error return
> codes for UDP sendto if a datagram is dropped because they don't
> retain program flow far enough to do that.

If I remember my Stevens correctly, that is not true: on most
implementations, if an UDP socket was connect()ed, the kernel maintains an
error status and send returns an error on the next call after an error
packet is received.

Nobody

unread,
Dec 25, 2010, 5:37:25 PM12/25/10
to
On Sat, 25 Dec 2010 17:45:21 +0000, Nicolas George wrote:

>> But most implementations don't support any 'network' error return
>> codes for UDP sendto if a datagram is dropped because they don't
>> retain program flow far enough to do that.
>
> If I remember my Stevens correctly, that is not true: on most
> implementations, if an UDP socket was connect()ed, the kernel maintains an
> error status and send returns an error on the next call after an error
> packet is received.

Dropped or corrupted packets don't result in an error being returned to
the sender. Errors only occur for things like "no route to host".

Maxwell Lol

unread,
Dec 25, 2010, 6:42:10 PM12/25/10
to
"A. McKenney" <alan_mc...@yahoo.com> writes:

> I'm using Solaris 10, if that matters.
>
> We are sending data via multicast, with sender
> and receiver on the same LAN, and every now
> and then, messages are getting lost.

I did this years ago, and two things that helped minimize dropped
messages was to (1) increase the buffer size, and (2) use realtime
scheduling. Also (3) make sure you rate-limit the traqnsmission to not
exceed the available bandwidth.

David Schwartz

unread,
Dec 26, 2010, 1:47:58 AM12/26/10
to
On Dec 25, 9:45 am, Nicolas George <nicolas$geo...@salle-s.org> wrote:
> David Schwartz , dans le message
> <a0a419d9-8627-4ba1-95cf-329977c83...@o11g2000prf.googlegroups.com>, a

> > But most implementations don't support any 'network' error return
> > codes for UDP sendto if a datagram is dropped because they don't
> > retain program flow far enough to do that.

> If I remember my Stevens correctly, that is not true: on most
> implementations, if an UDP socket was connect()ed, the kernel maintains an
> error status and send returns an error on the next call after an error
> packet is received.

That is correct. But that error, as you noted, is reported on a
subsequent call to 'sendto'.

DS

A. McKenney

unread,
Dec 29, 2010, 12:26:35 PM12/29/10
to
On Dec 25, 11:57 am, David Schwartz <dav...@webmaster.com> wrote:

> But most implementations don't support any 'network' error return

> codes for UDP sendto if a datagram is dropped ...
>.... By the time they know the


> datagram didn't hit the wire, sendto has already returned to the
> caller.

Based on other problems we have encountered,
I'm fairly certain that Solaris 10 returns from the
sendto() call before the message actually gets
sent (or not sent.)

OK, assuming that Solaris may fail to send a UDP
message after a successful return from sendto(),
where would one look for statistics on such lost
messages?

For messages lost on the receiving end, we run
"netstat -s" -- are there fields there that would
show messages lost on the sending host, too?

Roy Smith

unread,
Dec 29, 2010, 9:04:05 PM12/29/10
to
In article
<46de45af-8951-489a...@p38g2000vbn.googlegroups.com>,
"A. McKenney" <alan_mc...@yahoo.com> wrote:

I'm tempted to quip, "Which part of unreliable did you not understand?",
but I'll behave myself.

UDP is a "best effort" kind of service. You send a packet, and if all
the planets align just right, it'll get where you sent it. If not, it
just disappears into the ether, without a trace. Most OS's have some
kind of instrumentation which tells you about the general state of UDP
traffic (for example, the SNMP udpInErrors counter), but specific
information about a specific UDP packet? You're barking up the wrong
protocol.

You might try using some kind of packet sniffer (tcpdump, snoop,
etherial, etc) to watch for outgoing traffic on a physical interface,
but that's clumsy, complicated, and not particularly reliable anyway.

David Schwartz

unread,
Dec 29, 2010, 11:16:30 PM12/29/10
to
On Dec 29, 9:26 am, "A. McKenney" <alan_mckenn...@yahoo.com> wrote:

> Based on other problems we have encountered,
> I'm fairly certain that Solaris 10 returns from the
> sendto() call before the message actually gets
> sent (or not sent.)

Of course.

> OK, assuming that Solaris may fail to send a UDP
> message after a successful return from sendto(),
> where would one look for statistics on such lost
> messages?

If you need such support, code it. Have the other ends send you
reception reports if you need them.

> For messages lost on the receiving end, we run
> "netstat -s" -- are there fields there that would
> show messages lost on the sending host, too?

Check your operating system documentation. There is no requirement
that it even have any way to establish this.

DS

Jorgen Grahn

unread,
Dec 30, 2010, 3:29:58 AM12/30/10
to
["Followup-To:" header set to comp.protocols.tcp-ip.]

On Thu, 2010-12-30, David Schwartz wrote:
> On Dec 29, 9:26�am, "A. McKenney" <alan_mckenn...@yahoo.com> wrote:
>
>> Based on other problems we have encountered,
>> I'm fairly certain that Solaris 10 returns from the
>> sendto() call before the message actually gets
>> sent (or not sent.)
>
> Of course.
>
>> OK, assuming that Solaris may fail to send a UDP
>> message after a successful return from sendto(),
>> where would one look for statistics on such lost
>> messages?
>
> If you need such support, code it. Have the other ends send you
> reception reports if you need them.

I didn't read the first postings as carefully as I should have, but I
assume/hope he *has* code to cope with packet loss, but sees losses
which are higher than he expected and wants to analyze the cause.

>> For messages lost on the receiving end, we run
>> "netstat -s" -- are there fields there that would
>> show messages lost on the sending host, too?
>
> Check your operating system documentation.

If Solaris is like Linux, it's hard to find such documentation, so I'm
not surprised he asked here.

> There is no requirement
> that it even have any way to establish this.

You'd expect a serious OS to count all drops before the link layer
*somewhere*, wouldn't you? Not per application, socket or port though
-- the stack loses track of such things pretty quickly.

My best bet would be netstat -s like he writes, and in particular the
UDP and IP (or IPv6) counters. Or some Solaris-specific alternative.
Also look in the packet filter stuff (whatever Solaris has instead of
Linux 'iptables -vL') because I don't think packets which get stuck in
the firewall show up in netstat -s.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Richard Kettlewell

unread,
Dec 30, 2010, 5:19:42 AM12/30/10
to
"A. McKenney" <alan_mc...@yahoo.com> writes:
> Based on other problems we have encountered,
> I'm fairly certain that Solaris 10 returns from the
> sendto() call before the message actually gets
> sent (or not sent.)
>
> OK, assuming that Solaris may fail to send a UDP
> message after a successful return from sendto(),
> where would one look for statistics on such lost
> messages?
>
> For messages lost on the receiving end, we run
> "netstat -s" -- are there fields there that would
> show messages lost on the sending host, too?

udpOutErrors (netstat -Pudp -s) and ipOutDiscards (netstat -Pip -s)
might be worth a look. Search the opensolaris kernel source for
ipIfStatsOutDiscards and udpOutErrors for the details of the conditions
they apply to.

--
http://www.greenend.org.uk/rjk/

Rick Jones

unread,
Dec 30, 2010, 2:46:41 PM12/30/10
to
In comp.protocols.tcp-ip Richard Kettlewell <r...@greenend.org.uk> wrote:
> udpOutErrors (netstat -Pudp -s) and ipOutDiscards (netstat -Pip -s)
> might be worth a look. Search the opensolaris kernel source for
> ipIfStatsOutDiscards and udpOutErrors for the details of the
> conditions they apply to.

There is though the problem of knowing of those discards were "your"
datagrams or some other program's yes? Since those datagrams can be
discarded any number of places along the network, while it may help in
the search it may be far from sufficient.

rick jones
--
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Richard Kettlewell

unread,
Dec 30, 2010, 3:57:30 PM12/30/10
to
Rick Jones <rick....@hp.com> writes:
> In comp.protocols.tcp-ip Richard Kettlewell <r...@greenend.org.uk> wrote:

>> udpOutErrors (netstat -Pudp -s) and ipOutDiscards (netstat -Pip -s)
>> might be worth a look. Search the opensolaris kernel source for
>> ipIfStatsOutDiscards and udpOutErrors for the details of the
>> conditions they apply to.
>
> There is though the problem of knowing of those discards were "your"
> datagrams or some other program's yes? Since those datagrams can be
> discarded any number of places along the network, while it may help in
> the search it may be far from sufficient.

Yes, you're right. I guess you could try to work out a "background"
drop rate when your program wasn't running, but even then, if your
program is responsible for pushing things "over the edge", the results
might be misleading.

--
http://www.greenend.org.uk/rjk/

Ian Collins

unread,
Dec 30, 2010, 5:04:36 PM12/30/10
to

If sendto is being used, it is unlikely that the socket is connected. I
believe it is an error to "sendto" a connected endpoint.

--
Ian Collins

Geoff Clare

unread,
Jan 3, 2011, 8:34:18 AM1/3/11
to
Ian Collins wrote:

Some systems give an EISCONN error, but on some systems the sendto()
succeeds and the datagram is sent to the specified address (overriding
the pre-specified address set up with connect()).

--
Geoff Clare <net...@gclare.org.uk>

0 new messages