RFC 1191 compliance

Randy Turner

unread,

Jun 21, 1993, 11:21:02 AM6/21/93

to

There was a question posted here recently regarding the fact
that Solaris 2.2 uses RFC 1191 to obtain the most efficient
Path MTU (PMTU). I wasn't aware that all systems (routers
especially) conformed to RFC 1191...I thought it was still a
draft internet standard? Can anyone confirm this ?

Thanks!
Randy

--
-----------------------------------------------------------------------------
Randy Turner
QMS, Inc.
rtu...@aqm.com

Tony Li

unread,

Jun 21, 1993, 3:02:16 PM6/21/93

to

In article <rturner.740676062@imagen> rtu...@imagen.com (Randy Turner) writes:

There was a question posted here recently regarding the fact
that Solaris 2.2 uses RFC 1191 to obtain the most efficient
Path MTU (PMTU). I wasn't aware that all systems (routers
especially) conformed to RFC 1191...I thought it was still a
draft internet standard? Can anyone confirm this ?

MTU discovery is (as of RFC 1360) a proposed standard. Many router vendors
rely on customer input rather than standards track progress for
implementing features.

Tony

Steve Heimlich

unread,

Jun 21, 1993, 5:39:56 PM6/21/93

to

In article <rturner.740676062@imagen> rtu...@imagen.com (Randy Turner) writes:
>

> There was a question posted here recently regarding the fact
> that Solaris 2.2 uses RFC 1191 to obtain the most efficient
> Path MTU (PMTU). I wasn't aware that all systems (routers
> especially) conformed to RFC 1191...I thought it was still a
> draft internet standard? Can anyone confirm this ?
>
> Thanks!
> Randy

According to RFC 1410, the latest IAB Official Protocol Standards RFC,
1191 is currently a Draft Standard. 1191 does address backward
compatibility issues, however, and a reasonable implementation will
allow PMTU discovery to be disabled (as suggested in the text) on a
per route basis.

Steve

Barry Margolin

unread,

Jun 21, 1993, 7:34:23 PM6/21/93

to

In article <rturner.740676062@imagen> rtu...@imagen.com (Randy Turner) writes:

> There was a question posted here recently regarding the fact
> that Solaris 2.2 uses RFC 1191 to obtain the most efficient
> Path MTU (PMTU). I wasn't aware that all systems (routers
> especially) conformed to RFC 1191...I thought it was still a
> draft internet standard? Can anyone confirm this ?

According to RFC 1360, the latest "IAB OFFICIAL PROTOCOL STANDARDS" RFC,
Path MTU is in the Proposed Standard state with Elective status. But
proposed standards generally don't move up the standards track unless
people actually try to use them (the whole point of proposed and draft
standards is to encourage people to implement them before they're cast in
stone).

Hosts that use PMTU discovery don't require that the routers implement the
"Next-Hop MTU" extension in RFC 1191. That extension makes PMTU more
accurate and efficient, but hosts can get by without it, and RFC 1191
explicitly mentions this:

Hosts MUST be able to deal with Datagram Too Big messages that do not
include the next-hop MTU, since it is not feasible to upgrade all the
routers in the Internet in any finite time. A Datagram Too Big
message from an unmodified router can be recognized by the presence
of a zero in the (newly-defined) Next-Hop MTU field. (This is
required by the ICMP specification [7], which says that "unused"
fields must be zero.)

The PMTU discovery protocol was designed to allow interoperability between
hosts and routers that don't agree on their conformance. There could be
problems if an old-style host (erroneously) rejects a "Datagram Too Big"
message with a non-zero "unused" field (which is now used for the Next-Hop
MTU field).

--
Barry Margolin
System Manager, Thinking Machines Corp.

bar...@think.com {uunet,harvard}!think!barmar

Jeffrey Mogul

unread,

Jun 21, 1993, 8:51:50 PM6/21/93

to

To clear up some confusion: on 8 March 1993, RFC1191 was advanced from
Proposed Standard to Draft Standard, along with a little extra advice:

The IESG has approved the Internet Draft "Path MTU Discovery" RFC1191
as a Draft Standard. This document is the product of the concluded
MTU Discovery Working Group. The IESG recommends that a companion
document "IESG Advice from Experience with Path MTU Discovery"
describing operational experience with the protocol in the Internet
be published as an Informational document. The IESG contact persons
are Philip Almquist, Stev Knowles and Dave Piscitello.

That "advice" was published as RFC1435.

According to the rules as I understand them, RFC1191 is eligible to
be advanced to Standard in September 1993. I have heard little
negative comment on the design, so I expect this will happen more
or less on schedule.

-Jeff

Randy Turner

unread,

Jun 22, 1993, 12:28:27 PM6/22/93

to

mo...@pa.dec.com (Jeffrey Mogul) writes:

>-Jeff

Thanks for all the replies. The reason I asked is that if there
is not a considerable number of machines out there that support it
then I cannot justify the development effort to add it to our current
IP code. I am trying to draw up a project to add several enhancements
to our our current (albeit outdated) IP implementation, and the
performance enhancements possible with PMTU discovery would definitely
help, but only if the majority of the networks we are installed in
support RFC 1191. I also understand that vendor inclusion of proposed
standards help move the standards process along, but in my case I
have a political battle if I use that reasoning.

At the moment, we are contemplating IP multicasting, header prediction,
and we are also looking at bypassing the TCP checksum calculation for
packets sent to and received from hosts on the same network (or subnet). Has anyone else considered bypassing checksum calculations for for TCP
connections on the same network?

Steve Heimlich

unread,

Jun 22, 1993, 4:08:48 PM6/22/93

to

In article <rturner.740766507@imagen> rtu...@imagen.com (Randy Turner) writes:
> and we are also looking at bypassing the TCP checksum calculation for
> packets sent to and received from hosts on the same network (or subnet).
> Has anyone else considered bypassing checksum calculations for for TCP
> connections on the same network?

This is a really bad idea.

Steve

Donald L. Nash

unread,

Jun 22, 1993, 5:46:58 PM6/22/93

to

In article <rturner.740766507@imagen> rtu...@imagen.com (Randy Turner) writes:

> and we are also looking at bypassing the TCP checksum calculation for
> packets sent to and received from hosts on the same network (or subnet).
> Has anyone else considered bypassing checksum calculations for for TCP
> connections on the same network?

This very illegal according to the TCP specification and the Host
Requirements RFC, not to mention being a patently bad idea. Section
4.2.2.7 of RFC 1122 (Host Requirements part 1), says it quite nicely:

4.2.2.7 TCP Checksum: RFC-793 Section 3.1

Unlike the UDP checksum (see Section 4.1.3.4), the TCP
checksum is never optional. The sender MUST generate it and
the receiver MUST check it.

To do otherwise would be to compromise TCP's guarantee of reliable data
delivery.

Although I'm not an implementor, do I know that much work has gone into
optimizing the TCP checksum algorithm. I believe that Dave Borman at
Cray Research Inc. vectorized the checksum algorithm and got some, ah,
interesting results out of that, but QMS and/or Imagen printers probably
don't have vector processors in them. :-) Short of that, there are
other techniques which I am sure someone more knowledgable than I would
be happy to share with you. I'm passingly familiar with some of the
techniques, but not enough to describe them well enough for you to
implement.

On the other hand, are your printers really so fast that the TCP
checksum code is a bottleneck, or is this for something other than a
printer? If your TCP implementation is capable of delivering data to
your print engine faster than the engine can process the data, then you
probably don't need to worry about speeding up your TCP.

++Don Nash

Internet: D.N...@utexas.edu The University of Texas System
THEnet: THENIC::DON Office of Telecommunication Services

Stephen C. Trier

unread,

Jun 22, 1993, 8:03:15 PM6/22/93

to

In article <207uki$2...@geraldo.cc.utexas.edu> D.N...@utexas.edu (Donald L. Nash) writes:
>On the other hand, are your printers really so fast that the TCP
>checksum code is a bottleneck, or is this for something other than a
>printer?

As a side point, as Vernon Schryver likes to point out ;-), it is
possible to write a one-copy TCP in Unix. If this is a printer or
other dedicated box and you have Ethernet hardware that can handle it,
you could even do a zero-copy TCP.

Every copy you remove will add roughly as much performance as disabling
checksums, but without putting data at risk.

RFC 1071 has a discussion of fast checksum techniques and sample code.

Stephen

--
Stephen Trier (tr...@ins.cwru.edu - MIME OK)
Network Software Engineer
IRIS/INS/T
Case Western Reserve University

Randy Turner

unread,

Jun 22, 1993, 6:20:55 PM6/22/93

to

heim...@watson.ibm.com (Steve Heimlich) writes:

>Steve

I can understand Steve and others opinions regarding turning off
checksumming in any case. However, if you can verify that the
two hosts are communicating over the same LAN, and the underlying
data link layer (Ethernet in my case) is already performing a
reliable datacheck operation, then I feel that the checksum operation
is redundant. Since TCP does not currently support turning off
checksumming, an earlier implementation would just not bother to
checksum inbound packets. It would still checksum outbound packets.

This approach is also discussed in a paper given at the 1993 Usenix
West Symposium titled:
"Measurement, Analysis, and Improvement of UDP/IP
Througput for the DECStation 5000"

By Jonathan Kay (jk...@cs.ucsd.edu) and
Joseph Pasquale (pasq...@cs.ucsd.edu)

In their paper, they also discuss how the probability of data
corruption during I/O bus transfers can be effectively ignored,
given the fact that disk I/O transfers are not datacheck'd by
the OS prior to delivery of data to user space.

The only real problem I see in eliminating the TCP inbound checksum
is making sure that a particular TCP connection is established
between two hosts on the same LAN.

Steve Heimlich

unread,

Jun 22, 1993, 11:15:01 PM6/22/93

to

In article <rturner.740787655@imagen> rtu...@imagen.com (Randy Turner) writes:
>
> The only real problem I see in eliminating the TCP inbound checksum
> is making sure that a particular TCP connection is established
> between two hosts on the same LAN.

Tell me how you plan to do this. It's kind of like describing how
to be a millionaire:

first, get a million dollars...

Apologies for ripping off an old Steve Martin line.

I've seen two examples of equipment failures which were detected
only by end to end checksum. The first we noticed after discovering
that our NFS data was screwed up and our data was hosing us down
(then we noticed that this particular NFS had disabled checksums in
its UDP, a practice which thankfully seems to be falling into disfavor).
The second we noticed when some routing protocols delivered completely
bogus routes. Neither were fun.

As someone else mentioned, there are many things you can
do to improve a TCP implementation which do not put the data integrity
at risk. It really doesn't matter how fast that schematic gets to
the print engine if it has wires missing somewhere.

Steve

Rick Jones

unread,

Jun 22, 1993, 11:42:12 PM6/22/93

to

Do the checksums.

I know of one TCP/IP product that can negotiate "no TCP checksums" for
"local" lan connections between cooperating nodes, and I have seen at
least one instance of lan cards going bad in such a way as to silently
corrupt data. Besides, with all these brouters, proxy-arp, and Knuth
knows what else you have really *no* way of knowing beyond a
reasonable doubt that your connection is really a local one.

It's a big, bad network - let's be careful out there... ;-)

rick jones
Just because I'm paranoid doesn't mean the network isn't out to get me...

Tim Ramsey

unread,

Jun 23, 1993, 12:38:56 AM6/23/93

to

rtu...@imagen.com (Randy Turner) writes:

> I can understand Steve and others opinions regarding turning off
> checksumming in any case. However, if you can verify that the
> two hosts are communicating over the same LAN, and the underlying
> data link layer (Ethernet in my case) is already performing a
> reliable datacheck operation, then I feel that the checksum operation
> is redundant.

Not true. I have seen data errors caused by a bad Ethernet interface cause
UDP corruption. Even if the data link layer does perform checksumming
this does not help if you have a bad interface.

--
Tim Ramsey, t...@matt.ksu.ksu.edu
PGP2.2 public key available via keyserver, finger, or email.
MIME mail accepted (eagerly :)
Member of the League for Programming Freedom and the ACLU.

Doug Siebert

unread,

Jun 23, 1993, 12:46:05 AM6/23/93

to

r...@cup.hp.com (Rick Jones) writes:

>Do the checksums.

>I know of one TCP/IP product that can negotiate "no TCP checksums" for
>"local" lan connections between cooperating nodes, and I have seen at
>least one instance of lan cards going bad in such a way as to silently
>corrupt data. Besides, with all these brouters, proxy-arp, and Knuth
>knows what else you have really *no* way of knowing beyond a
>reasonable doubt that your connection is really a local one.

>It's a big, bad network - let's be careful out there... ;-)

Don't most modern TCP implementations, especially on RISC processors, do the
checksum in parallel with the data copy anyway, and thus waste no processing
power doing the checksum anyway? I seem to recall this being discussed a while
back... Or was it the case that *theoretically* this can be done, but the
implementations the poster had talked about (in particular he had done it on
a Sparc and an i486) were just experimental and not real-world production
code?

I'm sure some people here would know if real-world implementations do this,
since you are from HP, Rick, I'll pick on you and ask you if HP's TCP does
this? It does seem to be very quick (of course HP's hardware is quick so it
could cover up a slow TCP implementation underneath :-) ) How about Sun, DEC,
SGI, IBM, and the rest of the vendors?

--
Doug Siebert | "I don't have to take this abuse
Internet: dsie...@isca.uiowa.edu | from you - I've got hundreds of
NeXTMail: dsie...@chop.isca.uiowa.edu | people waiting in line to abuse
ICBM: 41d 39m 55s N, 91d 30m 43s W | me!" Bill Murray, Ghostbusters

Vernon Schryver

unread,

Jun 23, 1993, 2:21:13 AM6/23/93

to

In article <rturner.740766507@imagen>, rtu...@imagen.com (Randy Turner) writes:
> ...

> Thanks for all the replies. The reason I asked is that if there
> is not a considerable number of machines out there that support it
> then I cannot justify the development effort to add it to our current

> IP code....

We care about MTU discovery because it's so ugly to use 1500 byte packets
over FDDI rings. Not to mention HIPPI. Not to mention 500 byte packets.

It seems to me that systems that only deal with ethernets could get by
with a hack-switch that would override the H.R. RFC rules about using
500 byte packets to distant networks. Several years ago, when we
couldn't drive ethernets to saturation, such a switch named
"allnetsarelocal", after the 4.3BSD "subnetsarelocal", made some
customers happy.

> packets sent to and received from hosts on the same network (or subnet).
> Has anyone else considered bypassing checksum calculations for for TCP
> connections on the same network?

One IP (or TCP/IP) host cannot reliably tell if the other host is
"on the same network". Besides hassles with subnets, there are
bridges and switching hubs which make it impossible to define
the notion "same network" in a useful sense in this context.

Vernon Schryver, v...@sgi.com

Randy Turner

unread,

Jun 22, 1993, 10:47:43 PM6/22/93

to

D.N...@utexas.edu (Donald L. Nash) writes:

>In article <rturner.740766507@imagen> rtu...@imagen.com (Randy Turner) writes:
>> and we are also looking at bypassing the TCP checksum calculation for
>> packets sent to and received from hosts on the same network (or subnet).
>> Has anyone else considered bypassing checksum calculations for for TCP
>> connections on the same network?

>This very illegal according to the TCP specification and the Host
>Requirements RFC, not to mention being a patently bad idea. Section
>4.2.2.7 of RFC 1122 (Host Requirements part 1), says it quite nicely:

> 4.2.2.7 TCP Checksum: RFC-793 Section 3.1

> Unlike the UDP checksum (see Section 4.1.3.4), the TCP
> checksum is never optional. The sender MUST generate it and
> the receiver MUST check it.

/*
True, that's what it says. However, RFC-793 was published before there
were sophisticated data-link layer services such as Ethernet or FDDI.
Since these data-link/MAC-layer services provide a more robust error
checking method (CRC-16/CRC-32), I would think if RFC-793 was written
today, there would probably be some type of clause that
included a per-connection optional checksum method.
*/

>To do otherwise would be to compromise TCP's guarantee of reliable data
>delivery.

>Although I'm not an implementor, do I know that much work has gone into
>optimizing the TCP checksum algorithm. I believe that Dave Borman at
>Cray Research Inc. vectorized the checksum algorithm and got some, ah,
>interesting results out of that, but QMS and/or Imagen printers probably
>don't have vector processors in them. :-) Short of that, there are
>other techniques which I am sure someone more knowledgable than I would
>be happy to share with you. I'm passingly familiar with some of the
>techniques, but not enough to describe them well enough for you to
>implement.

/*
We already use an algorithm similar to RFC-1071 that combines
a data copy with the checksum calculation, and also is written
in assembly language (68020) with extensive loop unrolling

Actually, the fact that we are using a 16Mhz 68020 implementation,
and the fact that we have to handle multiple protocol stacks
(TCP/IP, DECNet, EtherTalk, Novell) is part of the reason I am
seeking to squeeze every thing I can out of the code.
*/

>On the other hand, are your printers really so fast that the TCP
>checksum code is a bottleneck, or is this for something other than a
>printer? If your TCP implementation is capable of delivering data to
>your print engine faster than the engine can process the data, then you
>probably don't need to worry about speeding up your TCP.

/*
You would be surprised to find out what our network requirements
are. And it is not only the print engine that determines the
producer/consumer ratio in our printers......
*/
> ++Don Nash

>Internet: D.N...@utexas.edu The University of Texas System
>THEnet: THENIC::DON Office of Telecommunication Services

Thanks for the replies!

Tony Li

unread,

Jun 23, 1993, 2:16:21 AM6/23/93

to

In article <rturner.740766507@imagen> rtu...@imagen.com (Randy Turner) writes:

Thanks for all the replies. The reason I asked is that if there
is not a considerable number of machines out there that support it
then I cannot justify the development effort to add it to our current
IP code. I am trying to draw up a project to add several enhancements
to our our current (albeit outdated) IP implementation, and the
performance enhancements possible with PMTU discovery would definitely
help, but only if the majority of the networks we are installed in
support RFC 1191.

If it helps, cisco implemented PMTU discovery in version 8.3 of our
software. This code has been available for about 1.5 years now. By my
best guesstimate, that code or more recent is running in about 90% of our
customers networks.

Tony

Donald L. Nash

unread,

Jun 23, 1993, 10:50:56 AM6/23/93

to

In article <rturner.740803663@imagen>, rtu...@imagen.com (Randy Turner)
writes:

> True, that's what it says. However, RFC-793 was published before there
> were sophisticated data-link layer services such as Ethernet or FDDI.
> Since these data-link/MAC-layer services provide a more robust error
> checking method (CRC-16/CRC-32), I would think if RFC-793 was written
> today, there would probably be some type of clause that
> included a per-connection optional checksum method.

RFC 1122 re-affirms what was written in RFC 793, and 1122 was published in
October 1989. Sophisticated data-links were very much in existence in
1989, but this did not encourage anyone to relax the checksum requirement
when 1122 was written. And as others have pointed out, even Ethernet's
error checking method is open to failure if the network interface is bad.
Also as others have pointed out, there is no reliable way to tell when
your peer is on the same LAN as you are. End-to-end checksums are *the*
only way to guarantee data integrity.

> You would be surprised to find out what our network requirements
> are.

I'll take your word on this, although you have piqued my curiosity with
this statement.

++Don

obe...@ptavv.llnl.gov

unread,

Jun 22, 1993, 9:17:10 PM6/22/93

to

In Article <rturner.740803663@imagen>

rtu...@imagen.com (Randy Turner) writes:
>D.N...@utexas.edu (Donald L. Nash) writes:
>>This very illegal according to the TCP specification and the Host
>>Requirements RFC, not to mention being a patently bad idea. Section
>>4.2.2.7 of RFC 1122 (Host Requirements part 1), says it quite nicely:
>
>> 4.2.2.7 TCP Checksum: RFC-793 Section 3.1
>
>> Unlike the UDP checksum (see Section 4.1.3.4), the TCP
>> checksum is never optional. The sender MUST generate it and
>> the receiver MUST check it.
>
>/*
> True, that's what it says. However, RFC-793 was published before there
> were sophisticated data-link layer services such as Ethernet or FDDI.
> Since these data-link/MAC-layer services provide a more robust error
> checking method (CRC-16/CRC-32), I would think if RFC-793 was written
> today, there would probably be some type of clause that
> included a per-connection optional checksum method.
>*/

It never ceases to amaze me how far people are willing to go to rationalize
stupid behavior. (I don't exclude myself from that...I've pulled some lulus in
my time.)

First, there are LOTS of ways for data corruption to occur outside of the
Ethernet FCS.

Second, this section is from RFC1122, not 793. 1122 was written in 1989, long
after Ethernet became dominant and even after the introduction of FDDI. When
Bob and company put a "MUST" in 1122, they meant it. I do disagree with some
things in 1122, but skipping the checksum in TCP would NEVER be one of them. I
have little doubt that it 1122 was written today, it would read exactly the
same on this issue.

More important, 1122 compliance is a requirement in a lot of specs. It's in
every one that I put out with if it involves IP. If I ever receive a bid that
does not include 1122 compliance, I would probably reject it. This is called
negative sales impact or losing money, hardly something a business would want.
This can also impact severely on performance reviews of the programmers
involved.

Finally, please tell me just how you can be sure that the two systems are on
the same LAN? And if there is a bridge between them, they are NOT on the same
LAN. I've seen broken bridges corrupt packets and give them the correct FCS.

R. Kevin Oberman Lawrence Livermore National Laboratory
Internet: kobe...@llnl.gov (510) 422-6955

Disclaimer: Being a know-it-all isn't easy. It's especially tough when you
don't know that much. But I'll keep trying. (Both)

Peter Desnoyers

unread,

Jun 23, 1993, 9:50:43 AM6/23/93

to

rtu...@imagen.com (Randy Turner) writes:

>> 4.2.2.7 TCP Checksum: RFC-793 Section 3.1

>> Unlike the UDP checksum (see Section 4.1.3.4), the TCP
>> checksum is never optional. The sender MUST generate it and
>> the receiver MUST check it.

>/*
> True, that's what it says. However, RFC-793 was published before there
> were sophisticated data-link layer services such as Ethernet or FDDI.
> Since these data-link/MAC-layer services provide a more robust error
> checking method (CRC-16/CRC-32), I would think if RFC-793 was written
> today, there would probably be some type of clause that
> included a per-connection optional checksum method.
>*/

I wouldn't rush to judgement so quickly. The old Arpanet was
essentially an X.25 network, and most of the hosts at one point were
on this net. The same arguments could have been made then - if they
were, they were rejected. (in notable comparison to TP0 over X.25...)

However, it does come to mind that printer output data (as opposed to
e.g. downloading software or fonts) has a particularly transient
characteristic - a printer is a write-only device, so if you get a
data error it only affects that single printout. It's not like a
source file getting transfered with FTP, where an error can result in
unrecoverable bit-rot for the rest of time.

In fact, if you had data on packet error frequencies, you could
compare that to the probability of mucking up a printout due to e.g. a
paper jam, and possibly conclude that a printer is nowhere near
reliable enough to worry about an e.g. 10^-10 rate of undetected
packet errors. And if you only turned off checksums in the receive
direction it would be transparent to external devices.

I still think it would be a bad idea, though.

Peter Desnoyers
--

Craig Partridge

unread,

Jun 23, 1993, 12:37:33 PM6/23/93

to

True, that's what it says. However, RFC-793 was published before there
were sophisticated data-link layer services such as Ethernet or FDDI.
Since these data-link/MAC-layer services provide a more robust error
checking method (CRC-16/CRC-32), I would think if RFC-793 was written
today, there would probably be some type of clause that
included a per-connection optional checksum method.

Not true. Ethernet existed long before RFC-793 came out. The TCP checksum
is needed by the End-To-End Argument.

Note the whole question of bypassing TCP checksums, and ways to make TCP go
fast and do the checksum at little or no cost, was discussed in late March '93
on this newsgroup.

Craig

Randy Turner

unread,

Jun 23, 1993, 10:50:10 AM6/23/93

to

t...@sam.ksu.ksu.edu (Tim Ramsey) writes:

>rtu...@imagen.com (Randy Turner) writes:

/*
Normally a bad ethernet interface will cause error statuses
to be returned by the ethernet device driver. This is one way
to detect packet failure. If the interface is malfunctioning
in any other way so as to preclude the problem from being
reflected through status bits, then you will probably never even
get your TCP connection started. Hardware errors at the NIC
are really not very common, and if they were, then detection of
the problem should not be relegated to TCP checksum failures,
rather, you would be having other problems long before that, such
as ARP queries being trashed.
*/

Randy Turner

unread,

Jun 23, 1993, 11:54:33 AM6/23/93

to

obe...@ptavv.llnl.gov writes:

>First, there are LOTS of ways for data corruption to occur outside of the
>Ethernet FCS.

/*
Probably. But if you go to extremes of providing case histories
of intricate hardware failure, then even the validity of RFC1122
starts to break down. There are assumptions made about the
reliability of hardware for a particular protocol specification.
There are limits of tolerability in the specs. Even for an
implementation that fully conforms to RFC793 and RFC1122, you
can still get blown away by memory parity errors when transferring
the buffer from TCP to the socket layer.
*/

>Second, this section is from RFC1122, not 793. 1122 was written in 1989, long
>after Ethernet became dominant and even after the introduction of FDDI. When
>Bob and company put a "MUST" in 1122, they meant it. I do disagree with some
>things in 1122, but skipping the checksum in TCP would NEVER be one of them. I
>have little doubt that it 1122 was written today, it would read exactly the
>same on this issue.

/*
I am still talking about providing full RFC793 and 1122 support, but
also adding the option, on a per-connection basis, of disabling
inbound checksums, if the system has apriori knowledge that a
particular connection from a particular host is on the same LAN.
For a connection of this type, alot of the robustness provided by
TCP would go unused (e.g. packet reordering) and the connection
would probably be making use of TCP flow control only.
*/

>More important, 1122 compliance is a requirement in a lot of specs. It's in
>every one that I put out with if it involves IP. If I ever receive a bid that
>does not include 1122 compliance, I would probably reject it. This is called
>negative sales impact or losing money, hardly something a business would want.
>This can also impact severely on performance reviews of the programmers
>involved.

>Finally, please tell me just how you can be sure that the two systems are on
>the same LAN? And if there is a bridge between them, they are NOT on the same
>LAN. I've seen broken bridges corrupt packets and give them the correct FCS.

/*
If you've got a bridge that can corrupt packets and give them the
correct FCS, then chances are there is failure mode in this
configuration that would break even the RFC1122 assumptions.
Not to mention wreak havoc of all kinds with other network
activity. The permutations of that kind of failure are pretty
staggering.
*/

>R. Kevin Oberman Lawrence Livermore National Laboratory
>Internet: kobe...@llnl.gov (510) 422-6955

>Disclaimer: Being a know-it-all isn't easy. It's especially tough when you
>don't know that much. But I'll keep trying. (Both)

Michael Witt

unread,

Jun 23, 1993, 6:00:21 PM6/23/93

to

Consider this case: you have a single IP "subnet", which is actually
a number of bridged Ethernets. As far as I can see, there is no way
TCP could ever know about the bridges.

Now you have lost end-to-end reliability. Bad memory in any of the
bridges could cause corrupted data to be accepted by TCP.

There are probably other more "interesting" cases. Especially if you
allow subnets other than Ethernet.

-Mike

Casey Leedom

unread,

Jun 23, 1993, 6:06:33 PM6/23/93

to

| From: pet...@pjd.dev.cdx.mot.com (Peter Desnoyers)

|
| However, it does come to mind that printer output data (as opposed to
| e.g. downloading software or fonts) has a particularly transient
| characteristic - a printer is a write-only device, so if you get a data
| error it only affects that single printout. It's not like a source file
| getting transfered with FTP, where an error can result in unrecoverable
| bit-rot for the rest of time.

I understand your point, but I'd hate to be the person talking to the
president of an advertising agency that lost a multimillion dollar
contract because the printer silently corrupted an ad layout which was
then unknowningly sent on to a customer. Not a good thing at all.

Again, I understand that you yourself are not backing any move to allow
a non-checksum option on a protocol advertised as being reliable. You
were just trying to think up some circumstance where it *might* be okay.
Unfortunately your write-once throw away printout of a letter from Mom is
someone else' bread and butter.

I think the bottom line is that TCP is advertised as a reliable
protocol. Throwing away the checksum is akin to fraudulent advertisement.

Casey

Vernon Schryver

unread,

Jun 23, 1993, 7:29:40 PM6/23/93

to

In article <rturner.740850873@imagen>, rtu...@imagen.com (Randy Turner) writes:
> ...

> I am still talking about providing full RFC793 and 1122 support, but
> also adding the option, on a per-connection basis, of disabling
> inbound checksums, if the system has apriori knowledge that a
> particular connection from a particular host is on the same LAN.

>...

You really ought to pay a little more attention to who is saying
turning off checksums is a bad idea. They include (in overlapping
groups):

-people with some experience as designers, implemenators and
maintainers of fast network stuff, including TCP checksums.
-people with a lot of experience as users, including using
NFS/UDP implementations with checksums off.
-big customers.

Note particularly that last catagory.

Many people would reflexively and permanently disqualify any
printer-controller vendor that even optionally turns off TCP/IP
checksums. Besides the obvious difficulties of being confident the
option is disabled on a black box like a printer, many people would
decide that a vendor that doesn't know better than to turn off TCP
checksums probably doesn't know better than to make other, more
interesting performance "improvements", and that the fun of discovering
those "improvements" is not enough to make up for the hassles of
eventually throwing out the vendor and getting products that work.

It might be different if turning off the TCP checksum could gain you 2X
in performance. However, assuming your printer cannot do more than 5
pages/sec, and doesn't need more than one or two MByte/page (i.e. less
than 100Mbit/sec), the TCP checksum does not amount to 25% of the work,
and that work can be made to disappear into hardware for a low price.
All it takes is some 16-bit summers in the incoming data path.

If you're talking about a piddling 10 Mbit/sec ethernet controller for
a 5 page/minute postscript printer that needs a few 100Kbit/sec ...
well, enough.

Vernon Schryver, v...@sgi.com

Len Fishler

unread,

Jun 23, 1993, 7:52:23 PM6/23/93

to

In article <peterd.7...@pjd.dev.cdx.mot.com>, pet...@pjd.dev.cdx.mot.com (Peter Desnoyers) writes:

... stuff deleted

>However, it does come to mind that printer output data (as opposed to
>e.g. downloading software or fonts) has a particularly transient
>characteristic - a printer is a write-only device, so if you get a
>data error it only affects that single printout. It's not like a
>source file getting transfered with FTP, where an error can result in
>unrecoverable bit-rot for the rest of time.

Gee, I think I'd like to get checks from that printer (:-)). A single bit
error could make me a rich man.

>I still think it would be a bad idea, though.

Agreed.

- Len Fishler -
fishl...@tandem.com

Dave Mischler

unread,

Jun 23, 1993, 9:25:56 PM6/23/93

to

To all vendors of products that implement TCP:

If your product *ever* fails to calculate and check the TCP checksum,
please put this info in your ad so I can remember not to buy your
product.

Thanks.

Dave Mischler
misc...@cubic.com

Jon Kay

unread,

Jun 24, 1993, 12:15:55 AM6/24/93

to

Sigh. Looks like it's time for the quarterly argumentathon on
this.
OK, let's take a closer look at this data that's SO precious
that we have to spend half our software processing time to calculate a
SECONDARY checksum. First off, even I don't believe that Ethernet
interfaces or bridges always work, nor that the Ethernet CRC
catches every error. My argument is that having this awfully
expensive checksum doesn't buy very much in overall system
reliability.
From the basic structure of the way things tend to be done on
modern systems, it seems safe to conclude that the data came from a
disk. Even if it didn't come directly from a disk, the data was
generated based on data that did come from a disk. Like networks,
disks have reliability problems. Probably the most common problem is
that media can go bad. Well, that's OK. On each sector of almost
every disk in existence there's a ECC or CRC covering the contents
of the sector. The controller hardware embedded in the disk drive
checks or generates it each time a sector is read or written from
disk.
BUT - that ECC is only seen by the controller in the disk
drive. It is NOT passed out beyond there. The host disk interface
doesn't see it. The host software never sees it. That ECC on the
host controller is the only effort made at a checksum in the entire
process of disk I/O. When I asked a file systems expert if anybody
even thought about doing checksumming in software on file system
blocks (the moral equivalent), he made noises like I was crazy. As
far as I am aware, the closest anybody's ever come to such a thing
for a filesystem for a general-purpose machine is the checksum over
a four bytes out of each entire block in LFS. Thus, in theory, if
the disk controller, host interface, memory interface, or CPU goes
bad, they could merrily corrupt disk blocks without being detected.
And that's where most network data comes from.
In that kind of world, why on earth is a secondary checksum
so important? You can put dozens of checksums on the network data
and there's little reason to believe that it would have any impact
on resistance to data corruption overall, because after traveling
over the safe network, the data will get corrupted on the way to
the disk drive.
I would still agree that an additional checksum would do no
harm if it were cheap. But it isn't. Even if you do the checksum and
copy combination work that Craig Partridge mentioned, the checksum
only becomes cheap if your memory system is slow enough to cover
the latency of the extra summing instructions - if your processor
architecture happened to have a sum-with-carry instruction, that
helped. A 68020 has a sum-with-carry instruction, but the 68020 is
*SO* much hopelessly slower than its memory system that I don't
expect it'd be anything like enough to make checksumming cheap.
It has been pointed out that a broken bridge can corrupt
packets. True. However, the chance that any given packet will be
corrupted is not a boolean, but rather a fraction. Since the chance
that a bridge itself will have problems is small, if you put a
bridge into a network, it would probably raise the chance of
corruption, overall by an extremely tiny amount, the more so
since a bridge should not recalculate FCSs but rather pass them
through. In any case, I expect that the resulting reliability will
still be serious orders of magnitudes better than disk reliability.

cr...@sics.se (Craig Partridge) writes

> The TCP checksum is needed by the End-To-End Argument.

But TCP/IP does not implement an end-to-end checksum. The
data comes from sockets, which got it from a user process, which in
turn probably got it from disk, where the checksum is no longer in
evidence. The data goes to a different user process, which in turn
is going to put it on a disk. The user processes and disk
interface and controllers are free to scribble all over that data
without fear of detection.

> This very illegal according to the TCP specification and the Host

> Requirements RFC ...

Near as I can tell, the strictures against checksumming were
considered in an atmosphere of blanket switches controlling whether
checksumming was done at all, ever. If you completely turn off
checksumming, then start sending packets across the Internet, you
are likely to be unhappy. Redundant checksum avoidance is a very
different beast, as it is able to detect such things.

> You really ought to pay a little more attention to who is saying
> turning off checksums is a bad idea. They include (in overlapping
> groups):
>
> -people with some experience as designers, implemenators and
> maintainers of fast network stuff, including TCP checksums.

...notably Vernon Schryver, who has put gobs of time into doing
the Internet checksum in hardware, and thus seems to have a
particular hatred for this scheme, even though he doesn't implement
an end-to-end checksum either.
Those who are considering boycotting poor Randy should
boycott SGI too. It is true that SGI lives up to the letter of
RFC1122 (the checksums ARE calculated, no question), but SGI
violates the original end-to-end spirit of protecting packets from
host memory to host memory, raising the chance that corrupt packets
will go undetected.

Examining the role of checksums in the network has long been
unfashionable in certain circles. Yet I cannot seem to locate any
actual studies on the subject, even though the actual impact of
turning off checksums does not seem like an obvious issue at all.

So far, only one person has provided any numbers:

heim...@watson.ibm.com (Steve Heimlich) says:
> I've seen two examples of equipment failures which were detected
> only by end to end checksum.

How many years have you been working in networking? Remember that
for each of those failures, half of the entire packetload going
across networks you were responsible for would have been processed
faster; your user community would have seen prompts coming back
faster and thus been that much happier and more productive. How many
disk failures resulting in corrupt data occurred during the same
period? In my own decade in this business, I've been hit by disk
problems more times than I can easily count. The misprinted check is
going to happen because of a bad disk controller, not because of the
network.

Jon

Tim Ramsey

unread,

Jun 24, 1993, 2:45:15 AM6/24/93

to

matt% netstat -s
ip:
0 bad header checksums
icmp:
0 bad checksums
tcp:
21036149 packets received
144 discarded for bad checksums
udp:
0 bad checksums

This is on a Solbourne running 4.1A.3 (akin to 4.1.1). The running kernel
apparently has UDP checksums disabled. :(

Vernon Schryver

unread,

Jun 24, 1993, 1:37:49 AM6/24/93

to

In article <51...@sdcc12.ucsd.edu>, jk...@cs.ucsd.edu (Jon Kay) writes:
> ...

> In my own decade in this business, I've been hit by disk
> problems more times than I can easily count. The misprinted check is
> going to happen because of a bad disk controller, not because of the
> network.

How many of those disk problems produced undetected errors?

In other words, how many of those disk problems were similar to the
undetected and undetectalbe network problems we're talking about?

In other words, how many of those disk problems did not involve reading
and writing the medium (since, as you note, that's protected with an
ECC), but involved undetected errors while transfering data between the
controller and main memory?

Vernon Schryver, v...@sgi.com

Warner Losh

unread,

Jun 24, 1993, 12:43:57 PM6/24/93

to

In article <51...@sdcc12.ucsd.edu>, jk...@cs.ucsd.edu (Jon Kay) writes:
> In my own decade in this business, I've been hit by disk
> problems more times than I can easily count. The misprinted check is
> going to happen because of a bad disk controller, not because of the
> network.

It has been my experience, both as a product tester and as a customer,
that network problems happen an order of magnitude more often than
disk problems (except floppies, which don't count). The TCP checksums
make these transient network problems invisible to me so I don't have
to worry about them. Removing the checksums removes one more layer
of assurances that the data will get there safely. It effectively
eliminates the network layer from hosing packets.

The problems were especailly severe when we were on a network that had
a bridge that would munch the ends of packets every 100,000 packets or
so. The checksums were recomputed in this bridge, so the ethernet
hardware on the other side didn't detect the error. The problem showed
up with odd NFS corruption (this was back in the bad old days of
disabled UDP checksums), but FTP would alway be fine, even on multiple
megabyte files.

I haven't had an undetected disk failure that showed up. All the disk
failures that have bitten me have been detected. I would think after
all the years I've been around that I might notice at least one glitch
after the fact.

Granted, this isn't "hard evidence," but it has been my experience and
my judgement that checksums are indeed worth it. They are there so
that people with slight flaky networks can still mostly use them
effectively. Given the kinds of networks I've seen people build, I
certainly would never purchase something that disabled TCP checksums,
nor would I design something without them.

Oh well, just my two cents worth.

Warner
--
Warner Losh i...@boulder.parcplace.COM ParcPlace Boulder
I've almost finished my brute force solution to subtlety.

Peter Desnoyers

unread,

Jun 24, 1993, 10:47:47 AM6/24/93

to

Just to add a few more flames to the fire, I thought I would ask how
many of us use printers which are connected over serial ports (with no
end-to-end check) rather than via TCP over e.g. Ethernet?

Continuing, though -

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>It might be different if turning off the TCP checksum could gain you 2X
>in performance. However, assuming your printer cannot do more than 5
>pages/sec, and doesn't need more than one or two MByte/page (i.e. less
>than 100Mbit/sec), the TCP checksum does not amount to 25% of the
>work,

Assuming that there is one CPU in the printer, you are going to have
to spend cycles on both I/O (TCP) and imaging. If the printer running
flat-out uses (for example) 50% of its cycles on imaging and 50% on
TCP, then cutting TCP cycles by half increases the printing speed by
50%. If TCP takes 10%, though, it will hardly be worth it.

>and that work can be made to disappear into hardware for a low price.
>All it takes is some 16-bit summers in the incoming data path.

For the cost of that much hardware they could probably change the '020
to a 29K or R3000 and speed up the imaging software to boot.

>If you're talking about a piddling 10 Mbit/sec ethernet controller for
>a 5 page/minute postscript printer that needs a few 100Kbit/sec ...
>well, enough.

Another way to ask this question - in the most I/O intensive use of
the printer (bit-map data in some form, I assume) how many
instructions per byte does it take to take the data from TCP and print
it? Is the overhead of an add for every other byte (i.e. the checksum
overhead) significant in comparison to this?

Peter Desnoyers
--

Erick Engelke

unread,

Jun 24, 1993, 2:58:11 PM6/24/93

to

jk...@cs.ucsd.edu (Jon Kay) writes:
> Sigh. Looks like it's time for the quarterly argumentathon on
>this.
> OK, let's take a closer look at this data that's SO precious
>that we have to spend half our software processing time to calculate a
>SECONDARY checksum. First off, even I don't believe that Ethernet
>interfaces or bridges always work, nor that the Ethernet CRC
>catches every error. My argument is that having this awfully
>expensive checksum doesn't buy very much in overall system
>reliability.

Your argument assumes that the Ethernet card (like the disk controller)
correctly does its own CRC and the network driver does the intelligent
thing with the packet. The world needs the hopeful, but I'm more
of a realist.

A year ago, someone on our site bought a truckload of DEC's new
Ethernet cards for PCs.

We installed the cards, used the supplied drivers, and all my TCP
code worked but the non-TCP based filesystem was constantly getting
corrupted. Adding a software checksum made the system work instantly.

It was 1992, the manufacturer was DEC, the quantities sold were
probably significant, and the CRC didn't work. I don't care if
your software runs on PCs, Crays or Sinclair ZX-81's, you NEED
to be able to interoperate with all those broken cards!

Add all the people who are using non-Ethernet hardware which emulates
Ethernets at the driver levels. On PCs, for example, most SLIP
implementations have an Ethernet emulation mode so the software
doesn't know the difference. Do you want your stuff to appear
broken to them too?

Erick
--

gary s anderson

unread,

Jun 24, 1993, 3:54:23 PM6/24/93

to

The list of testimonials from individuals saved by checksums or burned
by disabled checksums is extremely large. This is quite obvious from
the posts in this thread. The bottom line is that the checksum has
significant value and as a general rule should be used at all times.

There are two things, that have not been mentioned in this thread,
which deserve some mention.

1) The TCP (UDP) checksum has a number of holes. The most prevalent
is the inability to detect word transpositions. There are other
subtle hardware problems which corrupt messages that still pass the
checksum test, but I'll leave an opportunity for others to tell their
old war stories. Also, it has been pointed out that TCP checksums
are only between the transport layers (I'm not going to even get into
transparent transport layer gateways). Consequently, the data may
have many unprotected data paths outside the scope of the TCP
connection. My point is that individuals with extremely high data
integrity requirements write application protocols to guarantee
reliability and do not rely on the presence of lower layer
checksumming. Assuming said application was sufficiently
reliable this MIGHT be a case where TCP checksums are not needed.

2) When the entire data path is known (either a priori or via
negotiation), including both end systems and the connecting
media, there MAY be some cases where TCP checksums MIGHT not
be needed (or wanted):

a) If addding a checksum phase would increase the probability
of an error. "Loopback" is a possible example. Effectively,
anything which moves data unnecessarily (especially through
less secure paths), in order to checksum, may be a candidate.

b) A "performance" case where the added wild card is that
the application may tolerate some error loss (e.g. image
which can be "touched up" later).

c) A "performance" case where a highly reliable point-to-point
link is used.

NOTE - "performance" simply means pressing the envelope of the
available technology.

WARNING - in all of these "MIGHT" cases there is a HUGE assumption
that a desired level of reliability can be maintained between the peer
transport entities.

The fundamental problem with an Ethernet Printer application (or most any
peer-to-peer Ethernet applications) is that you do know the level of
integrity of the entire path (bridges, adapters, memory, OS's, peer
applications, etc.). Consequently, you can not make any justifyable
determinations on the inherent reliability. Unless you are writing your
own extremely reliable application, disabling the checksum will
open up a new hole for potential failures. Your customers and
other vendors (who might be errantly blamed for data integrity errors)
are not likely to look very favorably on your solution if this hole
is ever hit.

My simple rule is to use the checksums unless you are absolutely sure
you can deliver a sufficient level of reliability.

Gary

Randy Turner

unread,

Jun 24, 1993, 11:24:11 AM6/24/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Vernon Schryver, v...@sgi.com

We have no intention of releasing a product that is not fully
compliant with all of the host requirements as specified in
RFC 1122, as well as all of the other RFCs to which our network
software corresponds, whether it be TCP,UDP,IP,ICMP, ARP, etc.

The discussion was purely hypothetical and was meant to generate
a discussion on potential optimizations that might be applied
to a TCP implementation to increase throughput.

Further, we would not initiate such modifications to our product
without adequate industry-wide backing and approval of such a
change, since interoperability is what we are trying to achieve.

Currently, it seems, there is no guaranteed way of verifying the
conditions (topology) for a particular connection so as to allow
a TCP implementation to make shortcuts, so it seems that the
original suggestion is not currently possible.

I would like to point out that we are constantly being asked by
customers if there is some way to increase throughput since
these customers are not only sending data to a print engine, but
also to a spool device. So for our larger departmental type
printers with high-end throughput (32 ppm+), we are required to
spool incoming data to local disk(s) with some files in the 10's
of megabytes in size. These customers choose to let the printer
spool the jobs rather than the host computer. So our throughput
requirements are based more on disk-to-disk transfers rather than
disk-to-print-engine tranfers.

Randy Turner

unread,

Jun 24, 1993, 12:05:01 PM6/24/93

to

Since bypassing checksums is out, someone mentioned in an earlier
discussion that there was a tech.paper or RFC that contains the
latest draft proposals for TCP/IP performance enhancements. Does
anyone know which document this was?

Thanks!

Thomas V Torrico

unread,

Jun 24, 1993, 10:27:29 AM6/24/93

to

In article <208mp0...@sam.ksu.ksu.edu> t...@sam.ksu.ksu.edu (Tim Ramsey)
writes:

This whole thread is rather interesting. I've been in a situatuion where
no checksumming is particularly desirable.
Using a 'smart' adapter to offload physical transport for HIPPI (High
Performance Parallel Interface) at 100 MBytes/Sec (yes bytes) the
calculation of the checksum takes about the same time as the packet
transmission.
The hardware guarantees acurate delivery plus the adapter has both data
and parity checking. Why checksum on the same local net?
Using UDP without checksumming we could only achieve aroung 19 MBytes/sec.
Without checksumming, it jumped to 27MBytes/sec.
--
Make all things as simple as possible but no simpler. -- Albert Einstein

Vernon Schryver

unread,

Jun 24, 1993, 8:43:56 PM6/24/93

to

In article <rturner.740935451@imagen>, rtu...@imagen.com (Randy Turner) writes:
> ...

> .... So for our larger departmental type

> printers with high-end throughput (32 ppm+), we are required to
> spool incoming data to local disk(s) with some files in the 10's
> of megabytes in size. These customers choose to let the printer
> spool the jobs rather than the host computer. So our throughput
> requirements are based more on disk-to-disk transfers rather than
> disk-to-print-engine tranfers.

Disk-to-disk over ethernet or something faster?

One would hope that any file transfer protocol at or below ethernet
speeds would be entirely limited by the speed of the medium. The
cycles needed for the TCP/IP checksum should be insignificant today.

Depending on various things, an ethernet runs TCP/IP between
~850KByte/sec and ~1150KByte/sec. That's a sizable variation,
and is in principle controllable.

Vernon Schryver, v...@sgi.com

Randy Turner

unread,

Jun 25, 1993, 12:06:07 AM6/25/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Depending on various things, an ethernet runs TCP/IP between
>~850KByte/sec and ~1150KByte/sec. That's a sizable variation,
>and is in principle controllable.

I'm assuming that the 850K to 1150KByte/sec numbers you
mention are the theoretical maximums based on media speed
and protocol overhead-vs-max-data per pkt...(?)

And I have heard of some TCP/IP implementations approaching
these figures when benchmarked on an end-to-end basis. However
usually the case with TCP/IP throughput (in my experience) has
been the protocol and application processing at either end of
the medium. Unfortunately in my case I do not have what most
would term "a bitchin' processor" by todays standards. Our
current implementation uses a 16mhz 68020, and it also has to
handle up to 3 other protocol stacks simultaneously over the
same medium, and not a whale of alot of RAM to use for buffering
either (i.e. large window sizes.., etc).

We are maintaining adequate data rates however across each of the
protocol stacks to keep most of our print engines busy. However,
there are print engines coming down the pipe that will definitely
be able to process data at 100Kbytes/sec in the future (as our
friends at Xerox will tell you, they already exist...). These
printers will possess 32ppm+ (and I do mean +) which is why I am
trying to look ahead at ways we can maximize our data rates
through our current 68020-based implementation without requiring
a new hardware design.

Andy Newman

unread,

Jun 25, 1993, 12:38:27 AM6/25/93

to

pet...@pjd.dev.cdx.mot.com (Peter Desnoyers) writes:
>
>However, it does come to mind that printer output data (as opposed to
>e.g. downloading software or fonts) has a particularly transient
>characteristic - a printer is a write-only device, so if you get a
>data error it only affects that single printout. It's not like a
>source file getting transfered with FTP, where an error can result in
>unrecoverable bit-rot for the rest of time.

To quote yourself "I wouldn't rush to judgement so quickly". Printers
are no longer simple output-only things. A printer these days is a network
service that offers PDL interpretation. Usually, but not always, the PDL
program has a side-effect of spitting a page out of the engine.

--
Andy Newman (an...@research.canon.oz.au)

GUEST Massimo Barontini

unread,

Jun 25, 1993, 3:37:34 AM6/25/93

to

I think the technical paper is an article from 1993 Winter USENIX :

"Measurement, Analysis, and Improvement of
UDP/IP Throughput for the DECstation 5000"

J.Kay & J.Pasquale - University of California, San Diego

Massimo Barontini

ma...@king.ico.olivetti.com

Steve Heimlich

unread,

Jun 25, 1993, 9:16:32 AM6/25/93

to

In article <rturner.740981167@imagen> rtu...@imagen.com (Randy Turner) writes:
>v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>
>>Depending on various things, an ethernet runs TCP/IP between
>>~850KByte/sec and ~1150KByte/sec. That's a sizable variation,
>>and is in principle controllable.
>
> I'm assuming that the 850K to 1150KByte/sec numbers you
> mention are the theoretical maximums based on media speed
> and protocol overhead-vs-max-data per pkt...(?)

Nope, this is measured. Here are the results for an ftp over
a busy ethernet. I get this speed all the time, more at night
when the ether is less busy. I get about 800KB/sec through a
router and 2 busy ethernets.

Steve

# ftp kingbee
Connected to kingbee.watson.ibm.com.
220 kingbee.watson.ibm.com FTP server (Version 4.1 Sat Nov 23 12:52:09 CST 1991)
ready.
Name (kingbee:root): heimlich
331 Password required for heimlich.
Password:
230 User heimlich logged in.
ftp> bin
200 Type set to I.
ftp> append
(local-file) /unix
(remote-file) /dev/null
200 PORT command successful.
150 Opening data connection for /dev/null.
226 Transfer complete.
1498122 bytes sent in 1.574 seconds (929.5 Kbytes/s)

Vernon Schryver

unread,

Jun 25, 1993, 11:03:22 AM6/25/93

to

In article <rturner.740981167@imagen>, rtu...@imagen.com (Randy Turner) writes:
> v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>
> >Depending on various things, an ethernet runs TCP/IP between
> >~850KByte/sec and ~1150KByte/sec. That's a sizable variation,
> >and is in principle controllable.
>
> I'm assuming that the 850K to 1150KByte/sec numbers you
> mention are the theoretical maximums based on media speed
> and protocol overhead-vs-max-data per pkt...(?)

Not theoretical, but measured with `ttcp` on workstations I'm paid
to care about. (ttcp.c is available on sgi.com)

The variation is a function of collision rates, and involves what some
consider a bug in the Ethernet protocol. Under saturation, one station
or the other tends to suffer more collisions, and to back-off into the
weeds, instead of resetting its backoff counter as soon as it notices
some other station won the medium.

> And I have heard of some TCP/IP implementations approaching
> these figures when benchmarked on an end-to-end basis. However
> usually the case with TCP/IP throughput (in my experience) has
> been the protocol and application processing at either end of
> the medium. Unfortunately in my case I do not have what most
> would term "a bitchin' processor" by todays standards. Our
> current implementation uses a 16mhz 68020, and it also has to
> handle up to 3 other protocol stacks simultaneously over the
> same medium, and not a whale of alot of RAM to use for buffering
> either (i.e. large window sizes.., etc).

A 16MHz 68020 running UNIX should be able to do at least 400KByte/sec
through TCP/IP over ethernet even if you must talk to a slow VME board
with the ethernet hardware, and must copy bytes from mbufs to or from
user space. (How I know that? Hint: look at the insides of an old
Silicon Graphics IRIS 2000 or 3000.) A dedicate 16MHz 68020 system
without those handicaps should be able to run at Ethernet medium
speeds.

Van Jacobson reported making 68000 based Sun's saturate ethernets
as measured in 1988 or 1989.

As has often been reported, a TCP/IP-packet requires less than 400
instructions to handle, exclusive of checksumming and byte copying,
assuming a reasonable C compiler, and reasonable care in UNIX style
protocol code.

You need only about 800 packets/sec to saturate an ethernet (~600 1500
Byte data packets, and 200-300 ACKs). 800*400 instructions/sec at
16Mhz is less than 10%.

A printer controller should not have to copy the bytes. A system with
a 68020 is not likely to have the cache problems that are the real
bottleneck in modern systems. (Checksumming is a complete, utter
irrelevance if you are forced to byte-copy and take 100-cycle cache
misses. Think what 150MHz superscaler CPU's suffer with typical DRAM
access times.)

> We are maintaining adequate data rates however across each of the
> protocol stacks to keep most of our print engines busy. However,
> there are print engines coming down the pipe that will definitely
> be able to process data at 100Kbytes/sec in the future (as our
> friends at Xerox will tell you, they already exist...). These
> printers will possess 32ppm+ (and I do mean +) which is why I am
> trying to look ahead at ways we can maximize our data rates
> through our current 68020-based implementation without requiring
> a new hardware design.

At least 3 workstation vendors are shipping systems that do than
80Mbit/sec or 10,000 Byte/sec over TCP/IP/FDDI. 100KB/s is slooooow.

Vernon Schryver, v...@sgi.com

Vernon Schryver

unread,

Jun 25, 1993, 11:05:55 AM6/25/93

to

In article <20e9vu$8...@olivea.ATC.Olivetti.Com>, elef...@flash.ATC.Olivetti.Com (GUEST Massimo Barontini) writes:
>
> I think the technical paper is an article from 1993 Winter USENIX :
>
> "Measurement, Analysis, and Improvement of
> UDP/IP Throughput for the DECstation 5000"
>
> J.Kay & J.Pasquale - University of California, San Diego

There have been many papers about TCP/IP performance over the years.
It would be smart to also look for "Van Jacobson" in bibliographies.

Vernon Schryver, v...@sgi.com

Rich Holland

unread,

Jun 25, 1993, 11:41:06 AM6/25/93

to

t...@sam.ksu.ksu.edu (Tim Ramsey) writes:

>matt% netstat -s
>ip:
> 0 bad header checksums
>icmp:
> 0 bad checksums
>tcp:
> 21036149 packets received
> 144 discarded for bad checksums
>udp:
> 0 bad checksums

>This is on a Solbourne running 4.1A.3 (akin to 4.1.1). The running kernel
>apparently has UDP checksums disabled. :(

godiva% netstat -s
ip:
10452170 total packets received

0 bad header checksums
icmp:
0 bad checksums
tcp:

10062681 packets received
0 discarded for bad checksums
udp:
0 bad checksums

This is on an IBM RS/6000 320h running AIX 3.2.3. The running kernel
aparantly has UDP cheksums enabled. :-)

--
Rich Holland (hol...@godiva.ne.ksu.edu)
723 Allison Ave, #8, Manhattan, KS 66502
(913) 776-5789

Vernon Schryver

unread,

Jun 25, 1993, 11:33:41 AM6/25/93

to

Caveats:

FTP tends to measure performance of the file systems at the source and the
destination more than the performance network hardware or software.

A benchmark of less than 10 or 20 seconds is often misleading because
of start up and shut down transients. I bet in this particular case
the bytes were still going over the wire when FTP decided things were
finished and computed its number.

Vernon Schryver, v...@sgi.com

Craig Partridge

unread,

Jun 25, 1993, 12:20:17 PM6/25/93

to

>rtu...@imagen.com (Randy Turner) writes

>
> And I have heard of some TCP/IP implementations approaching
> these figures when benchmarked on an end-to-end basis. However
> usually the case with TCP/IP throughput (in my experience) has
> been the protocol and application processing at either end of
> the medium.

I'm sorry, but this particular comment gets my goat. Taken to its essentials
it says "my experience with j-random implementation of protocol Y allows
me to make general comments about the possible performance of protocol Y."
The networking field has been plagued with these sorts of incautious
evaluations of protocols. Trying another tack, it is sort of like
someone saying "my experience with the PDP-10 entitles me to be an
expert in computer and processor design." Certainly working with the PDP-10
will teach someone a lot, but it doesn't make them an expert on all things.

> Unfortunately in my case I do not have what most
> would term "a bitchin' processor" by todays standards. Our
> current implementation uses a 16mhz 68020, and it also has to
> handle up to 3 other protocol stacks simultaneously over the
> same medium, and not a whale of alot of RAM to use for buffering
> either (i.e. large window sizes.., etc).

The first TCP/IP that was benchmarked at full Ethernet speeds ran on a similar
platform -- a SUN 2 workstation using a Lance chipset in 1988. Observe that
the number of protocol stacks you handle should have almost no impact on
performance (if it does, you've done something wrong). Since then, protocol
implementations have improved further, so that 1990 vintage workstations
(e.g., HP Snake) have been benchmarked doing TCP/IP at 100+ Mb/s (e.g.,
about 13 Mbyte/s).

The point here is that the best TCP/IP implementations currently available
can easily achieve the performance you're looking for, on your processor.

Craig

Randy Turner

unread,

Jun 25, 1993, 4:41:20 PM6/25/93

to

heim...@watson.ibm.com (Steve Heimlich) writes:

>Nope, this is measured. Here are the results for an ftp over
>a busy ethernet. I get this speed all the time, more at night
>when the ether is less busy. I get about 800KB/sec through a
>router and 2 busy ethernets.

>Steve

># ftp kingbee
>Connected to kingbee.watson.ibm.com.
>220 kingbee.watson.ibm.com FTP server (Version 4.1 Sat Nov 23 12:52:09 CST 1991)
> ready.
>Name (kingbee:root): heimlich
>331 Password required for heimlich.
>Password:
>230 User heimlich logged in.
>ftp> bin
>200 Type set to I.
>ftp> append
>(local-file) /unix
>(remote-file) /dev/null
>200 PORT command successful.
>150 Opening data connection for /dev/null.
>226 Transfer complete.
>1498122 bytes sent in 1.574 seconds (929.5 Kbytes/s)

/*
I think the performance numbers for FTP are skewed somewhat and
don't give an accurate measure of the throughput that a
particular TCP/IP implementation is capable of. Or maybe I
should say that it cannot be reliably used as a basis for
comparison. I say this cause when we send small files from
Sun IPC to Sun IPC using FTP we get huge throughput numbers
because of FTP's rounding up to the nearest second for data
transfer, something on the order of like 2.5Mb/sec. when we
know for a fact that the sustained throughput of TCP/IP running
on SunOS 4.1 is something like 260KBytes/sec.

*/

Vernon Schryver

unread,

Jun 25, 1993, 8:14:59 PM6/25/93

to

In article <rturner.741040880@imagen>, rtu...@imagen.com (Randy Turner) writes:
> ...

> when we
> know for a fact that the sustained throughput of TCP/IP running
> on SunOS 4.1 is something like 260KBytes/sec.

I trust you mean "disk-to-disk file transfers over TCP/IP (probably via
FTP)" only get 260KB/s. It would boggle my mind to hear that Sun has
reduced their performance by almost a factor of 4. For that matter,
I'm only slightly less boggled to hear that they only get 260KB/s
through FTP. Something closer to 1MByte/sec is what I would expect.

Are you sure of that number? Is there any chance that something was
broken in the network? Terrible packet loss rates perhaps? Bad
ethernet terminations? Late collisions? Could it have been a file
transfer from one diskless machine to another, so that the data would
go over the wire 4 times?

Vernon Schryver, v...@sgi.com

Juergen Wagner

unread,

Jun 26, 1993, 7:59:38 AM6/26/93

to

In article <rturner.741040880@imagen> rtu...@imagen.com (Randy Turner) writes:
>heim...@watson.ibm.com (Steve Heimlich) writes:
>
>>Nope, this is measured. Here are the results for an ftp over
>>a busy ethernet. I get this speed all the time, more at night
>>when the ether is less busy. I get about 800KB/sec through a
>>router and 2 busy ethernets.
>

...

>/*
> I think the performance numbers for FTP are skewed somewhat and
> don't give an accurate measure of the throughput that a
> particular TCP/IP implementation is capable of. Or maybe I
> should say that it cannot be reliably used as a basis for
> comparison. I say this cause when we send small files from
> Sun IPC to Sun IPC using FTP we get huge throughput numbers
> because of FTP's rounding up to the nearest second for data
> transfer, something on the order of like 2.5Mb/sec. when we
> know for a fact that the sustained throughput of TCP/IP running
> on SunOS 4.1 is something like 260KBytes/sec.
>
>*/

It seems to me there are -- at least to some extent -- religuous
issues involved here. I have read quite a number of different figures,
all characterizing TCP throughput on some network in some
configuration. Although I believe to have followed the discussion from
its beginning, I am not sure I remember what exactly was the idea
behind dropping TCP checksums and the thus evolving throughput numbers
discussion. Could you clarify your original objective? Were you
concerned about building printers printing faster than Ethernet
transmission speed? Maybe you need an ATM interface for that printer
then :-) :-)?

Being somewhat interested in those figures myself, I ran a few tests
on our network (a busy backboned network with heavy Decnet/IP/Apple
traffic during normal work time). All connections (with the exception
of those to the same host) were through two learning DEC bridges. I
also ran a test between a SPARC 10/30 and a SPARC 2 over several
bridges and at least four routers (number in kBytes/s measured with
ttcp -t -s/ttcp -r -s, mean over 10 trials, 2048 packets with 8192
bytes each, all machines with SunOS 4.1.3).

Receiver: 10/30 IPX 1 self localhost
Sender:
Sparc 10/30 780 570 670 2860 3210

Sparc IPX 587 392 397 1230 1332

Sparc 1 779 644 680 950 1010

What you seem to know as a fact (260 KBytes/s max between two SunOS
4.1.3 machines) surprised me, so I tried myself...

I also tried ftp and nfs in comparison with ttcp figures between two
10/30s separated by a single learning bridge (transfer for a 32 MByte
file from one machine to the other):

ttcp 810
nfs 944
ftp 970

To my big surprise, these figures were much better for nfs and ftp
than for ttcp. Anybody can explain this? I don't see where we have
overhead with a simple ttcp connection in comparison with ftp (the
timings were done by "time", including program startup times). NFS
might be faster by using UDP.

--Juergen

J_Wa...@iao.fhg.de
gan...@csli.stanford.edu

Randy Turner

unread,

Jun 26, 1993, 1:55:38 AM6/26/93

to

cr...@sics.se (Craig Partridge) writes:

>>rtu...@imagen.com (Randy Turner) writes
>>
>> And I have heard of some TCP/IP implementations approaching
>> these figures when benchmarked on an end-to-end basis. However
>> usually the case with TCP/IP throughput (in my experience) has
>> been the protocol and application processing at either end of
>> the medium.
>
>I'm sorry, but this particular comment gets my goat. Taken to its essentials
>it says "my experience with j-random implementation of protocol Y allows
>me to make general comments about the possible performance of protocol Y."
>The networking field has been plagued with these sorts of incautious
>evaluations of protocols. Trying another tack, it is sort of like
>someone saying "my experience with the PDP-10 entitles me to be an
>expert in computer and processor design." Certainly working with the PDP-10
>will teach someone a lot, but it doesn't make them an expert on all things.

/*
Sorry for getting your goat, but the first part of your comment
"my experience with j-random implementation of protocol Y" is
basically what I was saying. It's the second part of your comment

"make general comments about the possible performance of protocol

Y" that I don't find in my earlier comment. I was merely expressing
my opinion, given my experiences with benchmarking and profiling
two or three TCP/IP implementations, of where I thought some end-to
end throughput is constrained. Benchmarks like "ttcp" I would think
would be less constrained.

>
>The first TCP/IP that was benchmarked at full Ethernet speeds ran on a similar
>platform -- a SUN 2 workstation using a Lance chipset in 1988. Observe that
>the number of protocol stacks you handle should have almost no impact on
>performance (if it does, you've done something wrong). Since then, protocol
>implementations have improved further, so that 1990 vintage workstations
>(e.g., HP Snake) have been benchmarked doing TCP/IP at 100+ Mb/s (e.g.,
>about 13 Mbyte/s).
>
>

Craig

/*
The number of protocol stacks in our case makes more of a difference
than usual because we are running in an embedded system with RAM
constraints. Since we have to adequately support multiple,
simulataneous connections over multiple protocols, we have to
limit the TCP receive window sizes, thus, limiting the pipeline
capability, and it certain circumstances causing zero-window probes
to be generated under heavily loaded conditions.

Also, I'm assuming that the 13Mbyte/sec number is running over
something that is not Ethernet, which opens up a whole other can
of worms for performance enhancements (e.g. Path MTU discovery).
*/

Randy

Roy Smith

unread,

Jun 26, 1993, 9:50:56 AM6/26/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
> I trust you mean "disk-to-disk file transfers over TCP/IP (probably via
> FTP)" only get 260KB/s.

Vernon points out a good point -- if you are going to run an
experiment to measure something, you have to know what you're measuring.
Clocking FTP transfer times is not a very good way to measure TCP
throughput. I just tried a little experiment. I wrote a small client which
opens a TCP connection to port 9 (the discard port) and proceeds to write
some number of 1k blocks of data to it and times how long it takes. I don't
even generate any real data; I just allocate a "char buf[1024]" and then do:

for (i = 0; i < count; i++)
write (s, buf, 1024);

I tried it with 3 machines. The first was a SGI 4D/320-GTX, which
gave about 980 kb/sec (something like 80% of the theoretical ethernet
maximum), the second was a SGI 4D/25 (or maybe it's even a 4D/20?), which
gave 550 kb/sec. Both of these machines are on the same ethernet segment as
my client, a DECStation 5000/240. Presumably at this time of day the
ethernet is pretty idle (as are the hosts in question). The third machine I
tried was a vax (don't remember which model, a 6000-series, I think) running
VMS. The vax is on an ethernet segment a couple of bridges away from me.
That machine yielded 460 kb/sec. I ran each test a few times, and the
results were pretty consistant (not quite consistant enough to really
justify 2 significant figures, but close). The results also scaled well
(i.e. writting 10 times as many packets took 10 times as long; the figures
above were for 10000 packets).

While all of these machines have a fairly hefty amount of processing
power, they are hardly nothing special compared to commodity hardware that's
available today at affordable prices for the desktop (I'd bet as far as pure
integer MIPS go, the mid-range Macintoshes already beat the poor little
4D/25). And, as somebody already pointed out, Van Jacobson was saturating
ethernets with Sun-3/50's years ago (essentially the same hardware as what's
in a Mac-IIcx). Bottom line is, I don't see TCP checksumming as being a
major bottleneck, and certainly not in a printer.
--
Roy Smith <r...@nyu.edu>
Hippocrates Project, Department of Microbiology, Coles 202
NYU School of Medicine, 550 First Avenue, New York, NY 10016
"This never happened to Bart Simpson."

Vernon Schryver

unread,

Jun 26, 1993, 10:09:52 AM6/26/93

to

In article <1993Jun26.1...@Csli.Stanford.EDU>, gan...@Csli.Stanford.EDU (Juergen Wagner) writes:
> .... (number in kBytes/s measured with

> ttcp -t -s/ttcp -r -s, mean over 10 trials, 2048 packets with 8192
> bytes each, all machines with SunOS 4.1.3).

> ...

> Receiver: 10/30 IPX 1 self localhost
> Sender:
> Sparc 10/30 780 570 670 2860 3210
> Sparc IPX 587 392 397 1230 1332
> Sparc 1 779 644 680 950 1010

> ...

Does "self" mean using the machine's name with the sending ttcp? While
"localhost" means using "localhost"? If so, there should be little
difference in BSD style network code, because the ethernet driver
should never be involved. A BSD style kernel should notice that the
local name is being used and use the "loopback driver", regardless of
whether "localhost" or the machine's name is used.

There is a lot of variation in those numbers. The ~400KB/s numbers
strike me as strangely low. `netstat` should be used to see if there
is some problem on the ethernet for those machine. Look for TCP
retransmissions and "duplicate acks". (Duplicate acks are often a sign
of retransmissions.) Collision counts might be illuminating.

> I also tried ftp and nfs in comparison with ttcp figures between two
> 10/30s separated by a single learning bridge (transfer for a 32 MByte
> file from one machine to the other):
> ttcp 810
> nfs 944
> ftp 970
>
> To my big surprise, these figures were much better for nfs and ftp
> than for ttcp. Anybody can explain this? I don't see where we have
> overhead with a simple ttcp connection in comparison with ftp (the
> timings were done by "time", including program startup times). NFS
> might be faster by using UDP.

> ...

In my experience, on fast media and fast CPU's, UDP is generally slower
than TCP for several reasons that are best explained with kernel profiles.

Three samples of {810,944,970} for 16MByte strike me as about the
"same" value.

Also, numbers above 800KB/s for Ethernet are in the regime of Ethernet
protocol bug I keep talking about. If you do anything to reduce the
number of collisions, you can increase the result from `ttcp`, up to
1150KB/s on hardware I know about. Ways to decrease `ttcp` collisions
and increase speed on at least some non-solar workstations include:
-reduce the TCP window to ~10K.
-fiddle with the TCP delayed ACK code to reduce the number of ACKs.
-use a "polite" (a.k.a. "broken") AMD 7990 LANCE ethernet chip.

The 7990 delays starting to transmit after an Ethernet deferral by up
to about 25 microseconds. If another packet is seen early enough
during that time, the LANCE will simply defer again, not increase its
back-off counter, and not force a collision. Not increasing its
backoff counter is the important part.

Vernon Schryver, v...@sgi.com

Vernon Schryver

unread,

Jun 26, 1993, 12:42:51 PM6/26/93

to

In article <20f6ai$d...@newserv.ksu.ksu.edu>, hol...@godiva.ne.ksu.edu (Rich Holland) writes:
> t...@sam.ksu.ksu.edu (Tim Ramsey) writes:
[netstat numbers]

Based on the excellent idea of looking at `netstat`, I looked at two
servers (mostly netnews), and found in the two days they've been up,
they had a total of 57 TCP packets with bad checksums, in about
5,000,000 TCP packets. Three other machines used mostly for UNIX
source serving had a total of 11 TCP packets with bad checksums in
about 60,000,000 TCP packets. There were fewer UDP checksum errors,
but more than 0.

Each of those bad TCP checksums would have been an undetected error.
No one cares about netnews, but people are paid to care about UNIX source.

With a few more reports about other vendor's hardware, I hope we can
declare the idea of turning off TCP checksums officially dead.

Vernon Schryver, v...@sgi.com

Vernon Schryver

unread,

Jun 26, 1993, 12:12:24 PM6/26/93

to

In article <20hk80$o...@calvin.NYU.EDU>, r...@mchip00.med.nyu.edu (Roy Smith) writes:
> ...

> I just tried a little experiment. I wrote a small client which

> opens a TCP connection to port 9 (the discard port)...

> I tried it with 3 machines. The first was a SGI 4D/320-GTX, ...

> ... The third machine I
> tried was a vax ...

With ttcp, you can vary user buffer size, TCP window, user buffer
alignment, use UDP instead of TCP, and so on, on both the sender and
the receiver. Ttcp also tries to compute elapsed time and CPU cycles.
Ttcp ports easily to any machine with sockets or a "socket
compatibility library."

Anyone writing applications needing network speed should pay attention to
those parameters. The effects of changing the alignment of the buffer,
for example, might be surprising.

Silicon Graphics ships a ttcp binary and the source, ttcp.c. It is the
same as the "improved" version on sgi.com, not official Naval
Ballistics Research Lab version on brl.mil and also sgi.com. (I never
remember where ttcp is in the CDROMs. Find it by mounting a CDROM and
`grep ttcp *.idb`.) SGI is also shipping Cray's test, named something
like netperf or nettest, perhaps not until IRIX 5.something.

> ... While all of these machines have a fairly hefty amount of processing
> power,...

I'd rather implement the TCP checksum on a commodity 25MHz 80486 mother
board, one that runs its DRAMs in burst mode, than on a 12MHz 4D/20, or
even a 20MHz 4D/25. For that matter, a 25MHz 80486 motherboard can
beat a 33MHz 4D/33 doing the checksum. Besides the handy carry bit
which allows a 0.25 cycle/byte 1's-complement add instead of 0.5
cycle/byte, burst mode is useful on data not in the cache, which is
rather likely when computing the checksum.

As I keep saying, cache effects are far more important in most modern
hardware than the CPU cycles needed to compute the checksum.

Vernon Schryver, v...@sgi.com

Randy Turner

unread,

Jun 26, 1993, 4:56:23 PM6/26/93

to

I think your numbers are considerably higher than the numbers
I quoted due to the hardware you are running. I forgot to
mention that my numbers were generated on a Sun-3 box which
is probably an order of magnitude (ok, maybe an exaggeration)
slower than a Sparc 10.

Randy

Randy Turner

unread,

Jun 26, 1993, 4:42:18 PM6/26/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Vernon Schryver, v...@sgi.com

Sorry, I forgot to mention this is on Sun-3 hardware.
We obtained these numbers from an engineer at Sun in order to
compare our implementation performance against. And I believe his
comments were this is a sustained aggregate data rate from one
host to another from application layer to application layer over an
Ethernet at 15% utilization. He didn't mention any adverse network
conditions, but that's not to say there weren't any. His application
may have been disk-related, I don't know. However, our implementation
is disk related, except when printer-local spooling is disabled.

Randy

Randy Turner

unread,

Jun 26, 1993, 4:59:26 PM6/26/93

to

r...@mchip00.med.nyu.edu (Roy Smith) writes:

>And, as somebody already pointed out, Van Jacobson was saturating
>ethernets with Sun-3/50's years ago (essentially the same hardware as what's
>in a Mac-IIcx). Bottom line is, I don't see TCP checksumming as being a
>major bottleneck, and certainly not in a printer.
>--
>Roy Smith <r...@nyu.edu>
>Hippocrates Project, Department of Microbiology, Coles 202
>NYU School of Medicine, 550 First Avenue, New York, NY 10016
>"This never happened to Bart Simpson."

Now wait a minute...I didn't say is was a major bottleneck, just
another possible performance issue that shows up during TCP/IP
profiling.....

Randy

Steve Heimlich

unread,

Jun 26, 1993, 6:14:38 PM6/26/93

to

In article <rturner.741040880@imagen> rtu...@imagen.com (Randy Turner) writes:

>heim...@watson.ibm.com (Steve Heimlich) writes:
>
>
>>Nope, this is measured. Here are the results for an ftp over
>>a busy ethernet. I get this speed all the time, more at night

[...]

> I think the performance numbers for FTP are skewed somewhat and
> don't give an accurate measure of the throughput that a

Sorry for not clarifying that. You're right. In this case, however,
the numbers are ok (I have an independent monitor which I watch
during the transfer). A 66MB transfer from a mapped file gave about
the same results. ttcp would be a better tool (and a sniffer better yet).

In some private mail, I mentioned that I'd be interested in using
something like Jon Kay's work to negotiate a >stronger< checksum than
TCP currently uses.

Steve

Dave Carr

unread,

Jun 28, 1993, 10:16:58 AM6/28/93

to

In <1993Jun23.2...@kentrox.com> mw...@kentrox.com (Michael Witt) writes:

>Now you have lost end-to-end reliability. Bad memory in any of the
>bridges could cause corrupted data to be accepted by TCP.

George Ross

unread,

Jun 28, 1993, 11:57:37 AM6/28/93

to

In article <20f6ai$d...@newserv.ksu.ksu.edu>, hol...@godiva.ne.ksu.edu (Rich Holland) writes:

> udp:
> 0 bad checksums
>
> This is on an IBM RS/6000 320h running AIX 3.2.3. The running kernel
> aparantly has UDP cheksums enabled. :-)

Remember this is counting received checksum errors. If the other end has
checksums disabled then it will send you a plus-zero instead, so your end won't
check the received checksum. You would only see this counter incrementing
if both ends had checksums enabled and there were packet corruption.
--
George D M Ross, Department of Computer Science, University of Edinburgh
Kings Buildings, Mayfield Road, Edinburgh, Scotland, EH9 3JZ
Mail: gd...@dcs.ed.ac.uk Voice: 031-650 5147 Fax: 031-667 7209

Jeffrey Mogul

unread,

Jun 28, 1993, 8:48:11 PM6/28/93

to

In article <rturner.740766507@imagen> rtu...@imagen.com (Randy Turner) writes:
> Thanks for all the replies. The reason I asked is that if there
> is not a considerable number of machines out there that support it
> then I cannot justify the development effort to add it to our current
> IP code. I am trying to draw up a project to add several enhancements
> to our our current (albeit outdated) IP implementation, and the
> performance enhancements possible with PMTU discovery would definitely
> help, but only if the majority of the networks we are installed in
> support RFC 1191. I also understand that vendor inclusion of proposed
> standards help move the standards process along, but in my case I
> have a political battle if I use that reasoning.

The end-host mechanism described in RFC1191 was specifically designed to
work whether or not any or all of the routers support RFC1191. The router
support is only there to make the process more efficient.

In other words: all IP routers that have *ever* conformed to the
specifications, past or present, support RFC1191. Period. Read the
RFC again if you don't understand this.

Of course, there are probably routers out there that don't conform to the
original ICMP specification. If you run into one of these, Path MTU
Discovery could cause trouble.

-Jeff

Jeffrey Mogul

unread,

Jun 28, 1993, 9:00:13 PM6/28/93

to

In article <ipk...@rhyolite.wpd.sgi.com> v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>Based on the excellent idea of looking at `netstat`, I looked at two
>servers (mostly netnews), and found in the two days they've been up,
>they had a total of 57 TCP packets with bad checksums, in about
>5,000,000 TCP packets. Three other machines used mostly for UNIX
>source serving had a total of 11 TCP packets with bad checksums in
>about 60,000,000 TCP packets. There were fewer UDP checksum errors,
>but more than 0.
>

>With a few more reports about other vendor's hardware, I hope we can
>declare the idea of turning off TCP checksums officially dead.

Although I do not entirely agree with Jon Kay's argument in favor
of turning TCP checksums off, your netstat-based argument (in the
form given here) does not rule it out. That is, you have made an
error in logic, or at least have left out a step in the proof.

Jon clearly argues for turning off checksums only when the underlying
path is highly reliable (e.g., with uninterrupted CRC protection).
One could debate whether it is ever possible to guarantee this, but let's
assume that this can be done by magic (or administrative means).

None of the netstat numbers posted so far have distinguished between
local-LAN and remote-LAN checksum errors (nor could they, as far as
I am aware, because BSD-based systems don't keep separate statistics).
Unless you can provide some sort of evidence that some of those 57
bad TCP checksums came from "local" packets, then these numbers don't
support your side of the argument.

Remember, I am not saying that your position is wrong. I am saying
that your evidence is less than it appears to be. It might be an
interesting research mini-project to collect the right set of numbers.

-Jeff

Vernon Schryver

unread,

Jun 28, 1993, 10:40:56 PM6/28/93

to

In article <20o46t$5...@usenet.pa.dec.com>, mo...@pa.dec.com (Jeffrey Mogul) writes:
> Although I do not entirely agree with Jon Kay's argument in favor
> of turning TCP checksums off, your netstat-based argument (in the
> form given here) does not rule it out. That is, you have made an
> error in logic, or at least have left out a step in the proof.
>
> Jon clearly argues for turning off checksums only when the underlying
> path is highly reliable (e.g., with uninterrupted CRC protection).
> One could debate whether it is ever possible to guarantee this, but let's
> assume that this can be done by magic (or administrative means).
>
> None of the netstat numbers posted so far have distinguished between
> local-LAN and remote-LAN checksum errors (nor could they, as far as
> I am aware, because BSD-based systems don't keep separate statistics).
> Unless you can provide some sort of evidence that some of those 57
> bad TCP checksums came from "local" packets, then these numbers don't
> support your side of the argument.
>
> Remember, I am not saying that your position is wrong. I am saying
> that your evidence is less than it appears to be. It might be an
> interesting research mini-project to collect the right set of numbers.
>
> -Jeff

You're right, except that the hassles you refer to in collecting the
right set of numbers must be considered as part of the proof.

Depending on magic or admininstrative means might be ok, if both the
consequences and likelihood of the magic failing or the spell being
uttered wrong were low. Given evidence that failures are quite likely
with bad magic (or maybe even with good magic, should some of those 57
errors be local), and given the difficulties of ensuring something
equivalent to uninterrupted CRC protection, and given that computing
the TCP checksum takes such small number of cycles, don't you think the
case is proven?

----

Some have mentioned that genuine, 802.whatever briges to do not
recompute the ethernet CRC when forwarding packets. Does that mean
that no FDDI-ethernet or Tokenring-Ethernet bridge is compliant? Such
devices have no alternative except to recompute the CRC, albeit one
hopes only after checking it.

Vernon Schryver, v...@sgi.com

Tom Fitzgerald

unread,

Jun 29, 1993, 12:09:09 AM6/29/93

to

i...@boulder.parcplace.com (Warner Losh) writes:

> I haven't had an undetected disk failure that showed up.

While I'm not sure I 100% understand that sentence, I'd have to agree with
the spirit. A salesman who offered me a general-purpose TCP/IP
implementation where TCP checksums could be disabled would be treated like
a disk salesman who promised great disk performance by disabling ECC. He'd
be asked to leave.

I can see why it would be tempting to treat a smart printer controller as a
special case: the data is going onto paper and will never be seen by
software again; if the user sees a printo on the paper, he can have it
re-printed. But I just can't accept my own explanation - we're talking
about print output that will, *occasionally*, be going to a typesetter for
distribution as customer docs; purchase orders, paychecks, employee
reviews, etc, and I don't want my users to even *think* about getting in
the habit of doing any kind of printing with checksums disabled. I don't
even want it to be an option, because if it is, someone will use it at the
wrong time and we'll all be burned. It just isn't worth it.

There's a relevant line from some book on software performance (maybe one
of the Programming Pearls books): you can get amazing performance
improvements, as long as you don't care if the answer is right.

--
Tom Fitzgerald Wang Labs fi...@wang.com "I went to the universe today;
1-508-967-5278 Lowell MA, USA It was closed...."

Johnny Eriksson

unread,

Jun 29, 1993, 6:50:03 AM6/29/93

to

In article <1993Jun28....@gandalf.ca> dc...@gandalf.ca (Dave Carr) writes:

! >Now you have lost end-to-end reliability. Bad memory in any of the
! >bridges could cause corrupted data to be accepted by TCP.
!
! Buzzt! Wrong. A bridge will/should preserve the original FCS of the frame
! according to 802.1d. A router however does not.

If the bridge is broken it enjoys the privilige of ignoring 802.1d as much
as it wants, which may be plenty...

--Johnny

Steinar Haug

unread,

Jun 29, 1993, 9:36:06 AM6/29/93

to

> udp:
> 0 bad checksums
>
> This is on an IBM RS/6000 320h running AIX 3.2.3. The running kernel
> aparantly has UDP cheksums enabled. :-)

We run all our Suns with NFS checksums enabled. I just made a quick check.
One fileserver (11 days uptime) had 2 UDP checksum errors; one fileserver
(25 days uptime) had 3 UDP checksum errors. Not exactly huge numbers, but
having the checksum turned on makes me sleep much better at night!

Steinar Haug, system/networks administrator
SINTEF RUNIT, University of Trondheim, NORWAY
Email: Steina...@runit.sintef.no, Steina...@delab.sintef.no

Casper H.S. Dik

unread,

Jun 29, 1993, 9:11:45 AM6/29/93

to

Steina...@runit.sintef.no (Steinar Haug) writes:

>> udp:
>> 0 bad checksums
>>
>> This is on an IBM RS/6000 320h running AIX 3.2.3. The running kernel
>> aparantly has UDP cheksums enabled. :-)

>We run all our Suns with NFS checksums enabled. I just made a quick check.
>One fileserver (11 days uptime) had 2 UDP checksum errors; one fileserver
>(25 days uptime) had 3 UDP checksum errors. Not exactly huge numbers, but
>having the checksum turned on makes me sleep much better at night!

One probable generator of bad checksums with UDP is DNS. We have most
bad checksums on our nameservers.

Some bad checksums occur on clients. They could have been the result
of a packet from far away, but I wouldn't bet on it.

Casper

Mark Reardon

unread,

Jun 29, 1993, 1:42:05 PM6/29/93

to

Some bridge vendors have used hardware that does not support
transmitting an ethernet packet with the CRC in memory and
instead require recomputing it on transmit. Other vendors
have controllers that have to be reset to switch modes and
since the bridge needs to have a CRC on the spanning tree packets
they leave it on. This has caused a heated discussion in some
bridge groups because few vendors want to throw out current
product to support this requirement.

Note I am not arguing as to the whether the preservation of the
original CRC is proper or correct. I am instead pointing out
that not all vendors necessarily comply.

--
_____________________________________________
Mark Reardon AT&T Tridom (404-514-3383)
email: m...@tridom.eng.tridom.com, attmail!tridom!mwr

Jon Kay

unread,

Jun 29, 1993, 8:13:26 PM6/29/93

to

Vernon Schryver, v...@sgi.com, enunciates:
> In article <51...@sdcc12.ucsd.edu>, jk...@cs.ucsd.edu (Jon Kay) writes:
> > ...
> > In my own decade in this business, I've been hit by disk
> > problems more times than I can easily count. The misprinted check is
> > going to happen because of a bad disk controller, not because of the
> > network.
>
> How many of those disk problems produced undetected errors?

I ran into one just last week on our server. I found a bogus
directory that had been copied by tar from another directory without
passing over a network. What produced it? So long as it's not the
network (which it wasn't), my argument remains intact. It was
undetected for months; I only ran noticed it by sheer accident.
I have also had a Sun 3/60 with a bad on-disk controller for
years and seen all sorts of fun errors crop up. I suppose in a sense
they weren't undetected, since it was possible to figure out WHEN they
happened - just not WHERE.

Vernon Schryver, v...@sgi.com, continues:
> In other words, how many of those disk problems did not involve reading
> and writing the medium (since, as you note, that's protected with an
> ECC), but involved undetected errors while transfering data between the
> controller and main memory?

I'm sure you have all seen gobs of disk problems not caused by
media, especially those of you living in areas with lots of lightning
storms - e.g., results of crashes. Only a limited amount of stuff
gets written out, and often it gets written inaccurately. Running
SunOS 4.0.1 on the East Coast, I could count on my 'C' shared library
(libc.so.whatever) being crunched by every thunderstorm that beat me
to the 'halt' command. Thank God fsck is as good as it is!

i...@boulder.parcplace.com (Warner Losh) declares:
> In article <51...@sdcc12.ucsd.edu>, jk...@cs.ucsd.edu (Jon Kay) writes:
> I haven't had an undetected disk failure that showed up. All the disk
> failures that have bitten me have been detected. I would think after
> all the years I've been around that I might notice at least one glitch
> after the fact.

I'm sure you've at least run into a crashed filesystem.
Remember, it is possible to have data corruption that you don't know
about even if you know something bad happened. Fsck cannot diagnose
corrupt data blocks, nor does it always even do the correct thing with
inodes, though it tries quite hard.
But this raises an interesting question. How often DO
truly undetected disk failures occur? By definition, it's hard to
tell, since they are, well, undetected. When one does see a munged
file, the tendency is to assume that operator error produced it;
there's really no way to tell. It gets harder and harder to tell as
one tends to have more and more gigabytes of data in which an error
might be hiding. It would be a fascinating exercise to add a checksum
at the filesystem level and see how many errors are caught.
But I think that it's possibly that a lot of you *have* seen
the results of "undetected" disk failures. Let's think about this
for a second. If a corruption did occur, what sort of disk access
would it occur in? Read, write, metadata read, metadata write? Well,
a lot of recent filesystem work (Keith Muller's MFS, the trace-driven
FS analysis that led to LFS, etc.) has shown that metadata writes are
the most common kind of disk access, so assuming a random
distribution, they should be the most commonly corrupted accesses as
well. So, are they happening? Well, many of those of you who are or
have been sysadmins have probably noticed that when you bring machines
down that have been up for months or more, they tend not to fsck
cleanly. Of course, on such occasions, one tends to be more
interested in getting the machine running than in diagnosing the
problem, but this is completely consistent with the idea that metadata
writes get corrupted occasionally.

Jon

Jon Kay

unread,

Jun 29, 1993, 8:18:31 PM6/29/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) articulates:
> One would hope that any file transfer protocol at or below ethernet
> speeds would be entirely limited by the speed of the medium.

Most of the installed base today is neither Alpha nor SGI
Crimson. In fact, even a lot of relatively recent low-end machines on
peoples' desks today (IPCs, 5000/20s, etc.) cannot fill up an Ethernet
using the software they've got. Of course, even if every vendor
started shipping Checksum Redundancy Avoidance today, the machines
shipped would be able to fill Ethernets easily. On the other hand,
more and more FDDI networks are being sold, and I'm sure 100baseT
variants will sell like hotcakes when they come out.
Maybe more to the point, though one needs cycles to run the
application programs driving the network.

Vernon Schryver, v...@sgi.com sez:
> The
> cycles needed for the TCP/IP checksum should be insignificant today.

That's why you put a hardware checksummer onboard your FDDI interfaces
despite the added risk to customers.

cr...@sics.se (Craig Partridge) writes:
> The first TCP/IP that was benchmarked at full Ethernet speeds ran on a similar
> platform -- a SUN 2 workstation using a Lance chipset in 1988. Observe that

^3!!

> the number of protocol stacks you handle should have almost no impact on
> performance (if it does, you've done something wrong). Since then, protocol
> implementations have improved further, so that 1990 vintage workstations
> (e.g., HP Snake) have been benchmarked doing TCP/IP at 100+ Mb/s (e.g.,
> about 13 Mbyte/s).

> The point here is that the best TCP/IP implementations currently available

> can easily achieve the performance you're looking for, on your processor.

From Jacobson's posting back in '88:
| 3/60 to 3/60 | 3/280 to 3/60 |
| (LANCE to LANCE) | (Intel to LANCE) |
| socket | |
| buffer task to | task to |
| size task wire | task wire |
|(packets) (KB/s) (Mb/s) | (KB/s) (Mb/s) |
| 1 384 3.4 | 337 3.0 |
...
| 12 1001 8.9 | 715 6.3 |

This is 91% Ethernet speed (not saturation, though quite close,
especially for a 3/60), on a 20% faster processor than Randy has.
So if he has 17k of memory for TCP windows and spends no time at all
on application-level processing (NOT!), and installed the 4.3Reno
networking code (a tidy couple months' work), and is using a LANCE, he
could hope for 800 MBps. If nothing else goes wrong. It's easy to
see why he's so interested in TCP improvements. Especially
easy-to-install ones. A 4.3-Tahoe-based OS on a 3/60, SunOS 4.1.1,
could only do about 600 MBps with the wind behind it, immense socket
buffers, and nothing else happening. Randy probably has a Tahoe-based
OS (600 * 0.8 = 480. And he may not have room for the 64k socket
buffers with which that measurement was taken).
I think this raises an interesting point. 4.3 Reno (the OS
which those numbers come from) is available to Randy, but the more
recent implementations are not, and may never be if 4.4 BSD is
released without the newest networking code (and maybe not even then
if "4.4-Lite" doesn't happen). And there was no mention of newer
networking code in the 4.4 announcement that went out today.

Jon

Vernon Schryver

unread,

Jun 29, 1993, 10:26:14 PM6/29/93

to

In article <51...@sdcc12.ucsd.edu>, jk...@cs.ucsd.edu (Jon Kay) writes:
> ...

> It would be a fascinating exercise to add a checksum

> at the filesystem level and see how many errors are caught. ...

About 24 years ago, I helped to exactly that to a file system not so
very different from UNIX's, on an SDS-940, to eventually find a
hardware problem in a channel. There were enough spare bits in each
slot of the equivalent of an indirect block to add a checksum. Every
midnight the system spent hours checksumming and de-fragmenting files.
The operator would deal with any discovered problems in the morning.

More recently, I saw people at Silicon Graphics do something
functionally similar to catch a bug that was messing up some rather
active, ~30GByte file systems used for UNIX source. (Yeah, 30GB of
UNIX source trees. Disgusting, isn't it?) I think they've turned
off the nightly checks since the checks stopped discovering anything.

My point is that errors happen, in networks more than with disks,
because network data is fondled more often. No check that is effective
and available should be discarded or turned off without a large
improvement in performance or system-cost.

The TCP checksum is effective, as shown by detected errors. The TCP
checksum is cheap, as shown by systems that are cheap and fast and
compute it.

Some have said that the hardware assistance for the TCP checksum I like
is expensive. I do not understand that. How much do 2 or 3 16-bit
adders cost? Can you even quantify their parts cost? From another
direction, the SGI Indigo FDDI board uses a 16MHz 29030 to compute the
checksum, handle the MAC and PHYs, run the host DMA, and keep the MAC's
DMA going. The bus is available to it 30-80% of the time, being tied
up by DMA the rest of the time. In other words, it is a puny and
bus-starved processor by today's standards, but has no trouble
computing TCP and UDP checksums at full FDDI speeds, as well as all of
the rest of what it must do.

Once again, the performance cost of computing the checksum by the main
CPU is insignificant compared to cache effects in modern systems.

The only reason to do much work on the TCP checksum is if the work
can let the CPU avoid the cache costs of computing it. If you must
byte-copy the data, there is no reason to worry about the checksum,
because the cache costs of the copies will be where you're spending
most of your time.

Vernon Schryver, v...@sgi.com

Vernon Schryver

unread,

Jun 29, 1993, 11:52:55 PM6/29/93

to

In article <51...@sdcc12.ucsd.edu>, jk...@cs.ucsd.edu (Jon Kay) writes:
>

> v...@rhyolite.wpd.sgi.com (Vernon Schryver) articulates:
> > One would hope that any file transfer protocol at or below ethernet
> > speeds would be entirely limited by the speed of the medium.
>
> Most of the installed base today is neither Alpha nor SGI
> Crimson. In fact, even a lot of relatively recent low-end machines on
> peoples' desks today (IPCs, 5000/20s, etc.) cannot fill up an Ethernet
> using the software they've got. Of course, even if every vendor
> started shipping Checksum Redundancy Avoidance today, the machines

> shipped would be able to fill Ethernets easily...

You are wrong to say that turning off the TCP checksum is necessary or
sufficent to make such systems saturate ethernets--regardless of
whether you turn it off smart or dumb. Just as a dumb experiment, test
your throughput, then take the nearest such machine, and using adb or
dbx patch the subroutine call to in_cksum() in the 4.3BSD tcp_input.c
to jump around the bad-checksum-drop code, and re-test your
throughput. I just now tried that on a pair of 20MHz R3000 based
systems using 4.3BSD-Net2 TCP code.

You are wrong to say that IPCs and 5000/20s were slower than the
machines Van Jacobson used to saturate ethernets in the dim past.

> Vernon Schryver, v...@sgi.com sez:
> > The
> > cycles needed for the TCP/IP checksum should be insignificant today.
>
> That's why you put a hardware checksummer onboard your FDDI interfaces
> despite the added risk to customers.

Please read my preceding note tonight about the costs of that
checksummer, and the puny hardware it uses.

That outboard checksumming is worthwhile, but only because I also fixed
things to prevent byte copies. 0.5 cycles/byte is measurable, but not
significant compared to the cache costs, which are 10 to 500 times
higher than 0.5cycle/B, even on old R3000 based systems like DEC's.

> From Jacobson's posting back in '88:

> |(packets) (KB/s) (Mb/s) | (KB/s) (Mb/s) |

> | 12 1001 8.9 | 715 6.3 |
>
> This is 91% Ethernet speed (not saturation, though quite close,

If you read Van's old notices, you'll notice that he used an oscilliscope
to determine how hard he was driving the wire. He was saturating it.

1001KByte/sec is more than ethernet speed or less, depending on the
number of collisions you suffer, which depends on many things, from the
TCP window size to bugs in the Ethernet MAC (without the right bugs,
you can be limited to ~850KByte/sec). The maximum speed on an ethernet
is an interesting subject.

> ... A 4.3-Tahoe-based OS on a 3/60, SunOS 4.1.1,

> could only do about 600 MBps with the wind behind it, immense socket
> buffers, and nothing else happening.

SunOS 4.1.1 is "4.3-Tahoe-based"? That's interesting news.

Last I checked, a lot of Van Jacobson's code was in recent 4.3BSD.
That Sun's code did not perform the same as real 4.3-Tahoe does not
mean that Sun did anything wrong, but neither does it say much about
4.3-Tahoe.

> I think this raises an interesting point. 4.3 Reno (the OS
> which those numbers come from) is available to Randy, but the more
> recent implementations are not, and may never be if 4.4 BSD is
> released without the newest networking code (and maybe not even then
> if "4.4-Lite" doesn't happen). And there was no mention of newer
> networking code in the 4.4 announcement that went out today.

The source on UUNET for years, as well as other places including CDROMs
has header prediction, et al. It does not have the "squashed stack",
but it is fast enough to drive an FDDI ring at speed.

The limits to TCP/IP are almost never in the TCP/IP or checksum code,
but in uipc_socket*.c.

vjs

Tim Ramsey

unread,

Jun 30, 1993, 4:24:27 AM6/30/93

to

gd...@dcs.ed.ac.uk (George Ross) writes:

>Remember this is counting received checksum errors. If the other end has
>checksums disabled then it will send you a plus-zero instead, so your end won't
>check the received checksum. You would only see this counter incrementing
>if both ends had checksums enabled and there were packet corruption.

Actually, you should see this counter incrementing iff the remote end had
checksums enabled and there were packet corruption. From RFC1122 (Host
Requirements -- Communication Layers):

4.1.3.4 UDP Checksums
...
If a UDP datagram is received with a checksum that is non-
zero and invalid, UDP MUST silently discard the datagram.

The local end must detect UDP datagrams with invalid checksums even if it
has UDP checksums disabled.

--
Tim Ramsey, t...@matt.ksu.ksu.edu
PGP2.3 public key available via keyserver, finger, or email.
Member of the League for Programming Freedom and the ACLU.
"We're opposed to gratuitous whacking-off." -- Geoff Collyer

Steinar Haug

unread,

Jun 30, 1993, 9:40:39 AM6/30/93

to

> Most of the installed base today is neither Alpha nor SGI
> Crimson. In fact, even a lot of relatively recent low-end machines on
> peoples' desks today (IPCs, 5000/20s, etc.) cannot fill up an Ethernet
> using the software they've got. Of course, even if every vendor

Depends on your definition of "the software they've got". An IPC can fill
an Ethernet quite nicely when running ttcp. That doesn't mean it'll be able
to fill the same Ethernet running FTP...

Roy Smith

unread,

Jun 30, 1993, 10:23:27 AM6/30/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
> The TCP checksum is effective, as shown by detected errors.

Just to pick a nit, all that the statistics people have quoted over
the past week or so show is that TCP checksuming catches *some* errors.
They give no indication as to how effective it is at catching all errors, or
even most of the errors. It's well known that there are certain kinds of
errors which will corrupt a packet yet still pass checksum (i.e. transposed
quadwords). I have no reason to believe that errors like that actually
occur with any significant frequency, but lacking any hard evidence to the
contrary, it remains a possibility. Has anybody ever instrumented a system
to check for things like this?

Not that I'm actually suggesting anybody turn off TCP checksums.
I'm just being pedantic. The vast body of evidence and logic certainly says
that the cost/benefit ratio is so low turning them off would be just plain
silly.

Barry Margolin

unread,

Jun 30, 1993, 12:26:44 PM6/30/93

to

In article <20rijr...@sam.ksu.ksu.edu> t...@sam.ksu.ksu.edu (Tim Ramsey) writes:
>The local end must detect UDP datagrams with invalid checksums even if it
>has UDP checksums disabled.

Unfortunately, that's not what 4.3bsd and earlier do, and I suspect that's
the kind of system his examples were on. The BSD udp_cksum flag controls
both sending and checking. It's really annoying, since it's much easier
for us to make sure that our file servers are properly configured than all
the workstations, but it only takes one misconfigured workstation with a
buggy ethernet interface to screw up a whole bunch of files.

--
Barry Margolin
System Manager, Thinking Machines Corp.

bar...@think.com {uunet,harvard}!think!barmar

Peter Desnoyers

unread,

Jun 30, 1993, 12:20:18 PM6/30/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Some have said that the hardware assistance for the TCP checksum I like
>is expensive. I do not understand that. How much do 2 or 3 16-bit
>adders cost?

Remember that we were originally talking about a laser printer, not a
workstation. At the very minimum, that adder will cost you a board
rev. More likely it will cost you a gate array where you used to have
just an unbuffered bus, and at least 48 or 64 pins.

That's easily more expense than you want to deal with in a laser
printer, especially when it would be cheaper to upgrade the processor,
and the resulting machine would run probably TCP faster.

As far as I can tell, a separate adder only makes sense in something
like a high-end workstation, where there are few options left for
increasing processor speed, and the cost of a few adders (and the
logic to read their results, etc.) is minor in comparison to the cost
of any other speedups.

Peter Desnoyers
--

Vernon Schryver

unread,

Jun 30, 1993, 4:08:26 PM6/30/93

to

In article <peterd.7...@pjd.dev.cdx.mot.com>, pet...@pjd.dev.cdx.mot.com (Peter Desnoyers) writes:
> v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>
> >Some have said that the hardware assistance for the TCP checksum I like
> >is expensive. I do not understand that. How much do 2 or 3 16-bit
> >adders cost?
>
> Remember that we were originally talking about a laser printer, not a
> workstation. At the very minimum, that adder will cost you a board
> rev. More likely it will cost you a gate array where you used to have
> just an unbuffered bus, and at least 48 or 64 pins.

agreed.

> That's easily more expense than you want to deal with in a laser
> printer, especially when it would be cheaper to upgrade the processor,
> and the resulting machine would run probably TCP faster.

agreed.

> As far as I can tell, a separate adder only makes sense in something
> like a high-end workstation, where there are few options left for
> increasing processor speed, and the cost of a few adders (and the
> logic to read their results, etc.) is minor in comparison to the cost
> of any other speedups.

I don't quite buy this. If you were starting a printer controller
today, you'd need an Ethernet MAC, something to move packets (either
DMA or a CPU like a 29K doing loadm/storem) between the ethernet and
some memory, a CPU to control that stuff, a CPU to make pixels out of
Ethernet packets, something (perhaps DMA or a CPU loadm/storem) to move
pixels from memory to the laser, LED, even parallel printer port.
(Of course, you'll not have >1 CPU)

If you use a CPU get the packets from the Ethernet chip, you probably
could hide add instructions in the delay slots of an unrolled loop of
loads and stores, thereby computing the TCP checksum for free.

If you build some kind of PAL or ASIC or gate array to run the ethernet
DMA, couldn't you hide an adder? Wouldn't you naturally build silicon
for a new printer controller, just to keep the cost down on what should
be a high volume product, shipped with printers for years to come?

I just realized that I've been saying 2-3 adders because I've been told
it's hard to make cheap 16-bit adders run at FDDI or HIPPI speeds.
Ethernet is sloooow; you only need a single, 16-bit end-around-carry
adder that cycles in 1.6 microseconds. That's trivial, isn't?

Hardware assist for TCP checksumming is a bit of a hassle on output,
because you need to stuff the checksum for the whole packet into the
front of the same packet. Hardware assist for received packets is
trivial, and requires no more than the 16-bit sum of the entire
packet. Clear the adder at the start of the packet, and fetch its
contents at the end. If you have an odd byte, don't bother to add it,
or zero fill and add it. (Yes, all that sums too much or too
little, but is no problem. Everyone please think about how the TCP
checksum works before arguing with me about that detail.)

Vernon Schryver, v...@sgi.com

Randy Turner

unread,

Jun 30, 1993, 5:20:45 PM6/30/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>agreed.

>Vernon Schryver, v...@sgi.com

The thing that really complicates the issue of optimizing the
checksum through means described above is the fact that a
pseudo-header has to be formed and checksummed first.....
It's not just a straightforward summing of a packet as an
additional bit of logic in an ASIC for Ethernet DMA...

Vernon Schryver

unread,

Jul 1, 1993, 11:22:17 AM7/1/93

to

In article <rturner.741475245@imagen>, rtu...@imagen.com (Randy Turner) writes:
> v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

> ...

> >Hardware assist for TCP checksumming is a bit of a hassle on output,
> >because you need to stuff the checksum for the whole packet into the
> >front of the same packet. Hardware assist for received packets is
> >trivial, and requires no more than the 16-bit sum of the entire
> >packet. Clear the adder at the start of the packet, and fetch its
> >contents at the end. If you have an odd byte, don't bother to add it,
> >or zero fill and add it. (Yes, all that sums too much or too
> >little, but is no problem. Everyone please think about how the TCP
> >checksum works before arguing with me about that detail.)
>

> The thing that really complicates the issue of optimizing the
> checksum through means described above is the fact that a
> pseudo-header has to be formed and checksummed first.....
> It's not just a straightforward summing of a packet as an
> additional bit of logic in an ASIC for Ethernet DMA...

Depending on what is meant, that is false. Of course you add the
checksum of the psuedo-header, just like any TCP implementation.
You need not handle the psuedo-header first.

Remember that cksum(A+B)-cksum(B)=cksum(A).

Hardware support for input TCP checksum is absolutely trivial, subject
only to contraints on the speed of the adder(s) compared to the speed
of the medium.

I'm not talking about academic theories. The most recent SGI FDDI
card for SGI's low cost workstations has been shipping in volume for
some time.

Vernon Schryver, v...@sgi.com

Erick Engelke

unread,

Jul 1, 1993, 12:01:03 PM7/1/93

to

jk...@cs.ucsd.edu (Jon Kay) writes:
>
>> How many of those disk problems produced undetected errors?
>
> I ran into one just last week on our server. I found a bogus
>directory that had been copied by tar from another directory without
>passing over a network. What produced it? So long as it's not the
>network (which it wasn't), my argument remains intact. It was
>undetected for months; I only ran noticed it by sheer accident.

I'm actually in an excellent position to answer this question.

Glancing up I see status monitors constantly reporting on the forty-two
network servers I manage. Though our system is trivial, providing
disk service rather than fileservice, it gives an excellent opportunity
to answer this sort of question.

Our diskservers report all disk CRC errors, all network checksum and
other network errors, and they checksum physical memory in all background
cycles (ie. most of the time) because we keep all data structures
in memory checksummed. In other words, all parts of the system are
monitorred separately.

What I have said so far does not answer your question about end-to-end
integrity, but this does:

On those servers there are a total of approximately 30,000 disk
sectors which the server checksums on every read and write.

Though these special sectors constitute less than 0.05 % of the
total disk space, they are distributed relatively evenly over the
many disk surfaces (they are control blocks preceding each virtual
disk space), so they give an excellent idea of exactly what you ask.

These sectors are checked to provide user privileges very frequently,
and each is read at least once a day by the daily-save proceedure to
see if there has been any activity in the next few blocks which may
need to be archived.

Sooooo, the results are surprisingly in favour of believing the CRC.
We keep track of all CRC errors and have found that they appear very
frequently in the same day whenever we get a checksum error on the
checksummed blocks. So if you get a few CRCs, and your OS reports them
to you, start orderring a spare drive because this one will be going
to the shop soon.

Our experience is that the CRC is not perfect, but that errors tend
to appear in clumps, and the CRC detects many of them, enough for
you to put your emergency plan into action.

Check your backups, check your data, you've been given advance
notice of an impending failure. Of course, that only applies to OS's
which keep track of the CRC failure status.

As for the lightning strikes, blizzards happen attitude, a UPS solves
that for anyone willing to spend the money.

Erick
--

Rick Jones

unread,

Jul 1, 1993, 2:22:17 PM7/1/93

to

Vernon Schryver (v...@rhyolite.wpd.sgi.com) wrote:

: I'm not talking about academic theories. The most recent SGI FDDI

: card for SGI's low cost workstations has been shipping in volume for
: some time.

Nor is Vernon talking SGI specific. Two of the three FDDI cards from
HP (Series 800 NIO, and Series 700 Integrated), offer either both
out/in, or in checksum offload.

rick jones

Randy Turner

unread,

Jul 1, 1993, 2:15:21 PM7/1/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Remember that cksum(A+B)-cksum(B)=cksum(A).

>Vernon Schryver, v...@sgi.com

Right, that's what I was saying, that there is more to the
implementation than just summing the data portion in hardware.
After that's done, there is still some pseudo-header construction
and summing to be done in software.
And while all of this sounds like a wonderful speedup, it
shoots down the usual layered approach to network software, in that
we have taken functions of the transport layer and embedded that
functionality in what normally would be the data link layer. There
would also have to be additional logic to handle a multi-protocol stack
implementation over this network interface , meaning the hardware would
have to be able to detect that the packet is a UDP/TCP packet
before doing the summing , if I understand what you are describing.

Vernon Schryver

unread,

Jul 1, 1993, 6:03:35 PM7/1/93

to

In article <rturner.741550521@imagen>, rtu...@imagen.com (Randy Turner) writes:
> ...

> And while all of this sounds like a wonderful speedup, it

> shoots down the usual layered approach...

hmmph. That "usual layered approach" to network protocols exists only in
1. academic papers about how things ought to be.
2. standards committee view graphcs.
3. standards documents
4. low speed, often buggy, naive implementations.

Layering is an excellent way to think about networks, and to roughly
structure code and hardware, but no more.

The boxes and layers in the standards documents are perfectly fine
descriptions of how things should appear to work from outside your
black box. However, who anyone to tries to implement exactly the boxes
and layers, Signal_This and Indicate_That functions in the OSI, IEEE,
and ANSI standards almost always (well, always in my experience) fails
to produce a salable or even interesting product.

Imagine how useful would be an Ethernet chip that was actually
implemented with the Pascal code in the IEEE-802.3 standard.

Doing the TCP checksum in the link layer need not be a particularly big
layering violation. There are far larger violations of Standard
Committee Law than having your link layer code pass both the data and
the 1's complement sum of all of that data to the next layer up. Eventually
the TCP code (in which ever layer you put it) would receive its data and
adjusted 1's complement sum, and do the obvious.

Note: the SGI FDDI checksum scheme just flat out voilates the layers.
The link layer adds a bit to the data that says "this was good; trust me".
The Layering Standards Mavens are welcome to arrest me.

Why do so many otherwise rational people who would never outlaw
absolutely every use of goto and every use of assembly language have
such an unhealthy respect for network layering?

Vernon Schryver, v...@sgi.com

obe...@ptavv.llnl.gov

unread,

Jul 1, 1993, 6:30:23 AM7/1/93

to

In Article <rturner.741550521@imagen>
rtu...@imagen.com (Randy Turner) writes:

> And while all of this sounds like a wonderful speedup, it
> shoots down the usual layered approach to network software, in that
> we have taken functions of the transport layer and embedded that
> functionality in what normally would be the data link layer. There
> would also have to be additional logic to handle a multi-protocol stack
> implementation over this network interface , meaning the hardware would
> have to be able to detect that the packet is a UDP/TCP packet
> before doing the summing , if I understand what you are describing.

1. We're talking real world, not standards committees here. Perfect layering is
really a myth in every system I've ever worked with.

2. IP predates the OSI model by some years and doesn't really fit it well..

3. Tell me, which layer is ICMP in? Are you really sure about that?

R. Kevin Oberman Lawrence Livermore National Laboratory
Internet: kobe...@llnl.gov (510) 422-6955

Disclaimer: Being a know-it-all isn't easy. It's especially tough when you
don't know that much. But I'll keep trying. (Both)

Marcus J Ranum

unread,

Jul 1, 1993, 10:08:43 PM7/1/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>Note: the SGI FDDI checksum scheme just flat out voilates the layers.
>The link layer adds a bit to the data that says "this was good; trust me".
>The Layering Standards Mavens are welcome to arrest me.

Don't worry on that account.

They're all too busy trying to figure out how to make theirs
work fast, to go out arresting people.

In all seriousness, layering software is useful. Once you
prototype something in a layered manner it's *REAL* easy to profile
it and see where the bottlenecks are, and *THEN* to optimize it.
This is what I always thought of as "computer science"

10 Hypothesize
20 Perform experiments
30 Evaluate results
40 GOTO 10

A real scientist will throw out the theory if the output
of lines 20 and 30 are promising and/or make your product more
competitive on the market.

mjr.

Tom Evans

unread,

Jul 2, 1993, 4:13:15 AM7/2/93

to

In article <20s7kv$h...@calvin.NYU.EDU>, r...@mchip00.med.nyu.edu (Roy Smith) writes:
> v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
> > The TCP checksum is effective, as shown by detected errors.
>

> ... show is that TCP checksuming catches *some* errors... It's well

> known that there are certain kinds of errors which will corrupt a
> packet yet still pass checksum (i.e. transposed quadwords).

Transposed shorts and longs too. And one that bit me last month - if
the packet is thought to be longer than it originally was (length
field calculation stuffup) and the extra "data" is +0 or -0, it
doesn't get detected either.

========================
Tom Evans t...@wcc.oz.au
Webster Computer Corp P/L, 1270 Ferntree Gully Rd Scoresby, Melbourne 3179
Victoria, Australia 61-3-764-1100 FAX ...764-1179 A.C.N. 004 818 455

Vernon Schryver

unread,

Jul 2, 1993, 11:56:30 AM7/2/93

to

In article <32...@wcc.oz.au>, t...@wcc.oz.au (Tom Evans) writes:
> In article <20s7kv$h...@calvin.NYU.EDU>, r...@mchip00.med.nyu.edu (Roy Smith) writes:
> > v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
> > > The TCP checksum is effective, as shown by detected errors.
> >
> > ... show is that TCP checksuming catches *some* errors... It's well
> > known that there are certain kinds of errors which will corrupt a
> > packet yet still pass checksum (i.e. transposed quadwords).
>
> Transposed shorts and longs too. And one that bit me last month - if
> the packet is thought to be longer than it originally was (length
> field calculation stuffup) and the extra "data" is +0 or -0, it
> doesn't get detected either.

I agree.

If I were arguing about the right checksum, I would ask for something
that would distribute almost as easily over concatenation
(ck(A+B)-ck(B)=ck(A)), but would have detected at least some
transpositions, something like a rotate between 16-bit adds. Of course
today, you'd probably use 32-bit adds, which detects transposed
16-bit-words.

On the other hand, all of the hardware and software bugs I've seen
discovered with or covered by the TCP checksum were not simple
transpositions, or also involved other problems.

As long as you're sending data bit-serial, I don't supposed
transpositions are as likely as other fun like "simple" noise.

Vernon Schryver, v...@sgi.com

Walter Underwood

unread,

Jul 2, 1993, 2:39:48 PM7/2/93

to

The TCP checksum won't catch transpositions, but it will usually catch
blocks of 1's or 0's. Does your Ethernet controller have parity memory?
What happens if one of those chips goes bad?

wunder

Jon Kay

unread,

Jul 3, 1993, 3:09:54 AM7/3/93

to

mo...@pa.dec.com (Jeffrey Mogul) writes:
> >In article <ipk...@rhyolite.wpd.sgi.com> v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
> >Based on the excellent idea of looking at `netstat`, I looked at two
> >servers (mostly netnews), and found in the two days they've been up,
> ...
> Jon clearly argues for turning off checksums only when the underlying
> path is highly reliable (e.g., with uninterrupted CRC protection).
> ...
> None of the netstat numbers posted so far have distinguished between
> local-LAN and remote-LAN checksum errors (nor could they, as far as
> I am aware, because BSD-based systems don't keep separate statistics).
> Unless you can provide some sort of evidence that some of those 57
> bad TCP checksums came from "local" packets, then these numbers don't
> support your side of the argument.
> ...

Exactly. Thanks. As you point out, what's important is how
many of the errors are local versus remote, since remote errors would
be detected under redundant checksum avoidance. It is tough to get
the correct numbers (that doesn't stop us from looking, but it's
mighty time-consuming - don't expect an answer tomorrow). Preliminary
digging on netstat numbers from various machines supported the notion
that bad checksums mostly come from remote machines.
One preliminary thing we did do what sort of patterns of bad
checksums cropped up on DECstations with checksums turned. The
DECstations' main servers were also DECstations with checksums turned
on. Now, my idea was that if I was correct and that WANs really are
the major source of errors, there would be fewer logged UDP errors
than logged TCP errors, because most packets sent over WANs are TCP
packets. On the other hand, most packets that only traverse LANs are
UDP packets. Thus, the overwhelming majority of errors should show up
in TCP. This turned out to be the case, even though most packets
overall are UDP packets (so you expect more UDP errors if even a tenth
of errors come from LANs).

Jon

Jon Kay

unread,

Jul 3, 1993, 3:12:48 AM7/3/93

to

There is always a cost to reliability. The question is how
much you're willing to trade for how much reliability. Clearly, more
reliability is always better. We would all like our computers to
never make mistakes. However, that is not our primary goal. It is
trivial to design computer that never makes any mistakes at all - so
long as it doesn't actually have to do anything, it can never make any
mistakes - a brick of lead meets the specs.
Moving along the (price/performance) / reliability curve, if
you're willing to fork over a few million dollars to buy an extremely
specialized product, you can buy systems with thorough (for today)
fault detection and tolerance. You will still get bugs and data
corruption on this machine; at most you can hope that they will be
much rarer. A small number of you reading this note may be reading it
on such a machine. The majority of you, though, have opted to go way
further down the curve. You have traded reliability for
price/performance. You considered a certain amount of unreliability
reasonable. Some of you went even further and bought the cheapest
clone PC you could lay your hands on, trading the many subtle
incompatibilities of PCs for a low price.
So how much reliability is "reasonable?" The more expensive
an operation is, the less "reasonable"; the more errors it catches,
the more "reasonable."
I feel that data checksumming for general TCP/IP data
transmission is reasonable, especially given that wide-area links are
often slow and error-prone. We have not suggested eliminating
software checksums entirely.
Performing a software data checksum across a LAN is a
different story. There it is usually secondary to the LAN CRC
implemented in hardware, and given the higher bandwidths, the checksum
is likely to be an imposing overhead. Yes, both network interfaces
and bridges can cause corrupt packets - but remember, both the CRC and
the bridge/controller have to break for packet corruption to go
undetected. Of course both happen, but in my experience no more often
than other system problems.
So, given such a gray area, I think it is reasonable to
propose dropping the software checksum, in the limited case when a
packet is not routed, and allow users to decide whether they want to
pay the cost or not, rather than forcing the cost on everyone.

Jon

Jon Kay

unread,

Jul 3, 1993, 3:51:27 AM7/3/93

to

> That outboard checksumming is worthwhile, but only because I also fixed
> things to prevent byte copies. 0.5 cycles/byte is measurable, but not
> significant compared to the cache costs, which are 10 to 500 times
> higher than 0.5cycle/B, even on old R3000 based systems like DEC's.

Well, we're almost in agreement, for once. One of us must be sick
:-). The possibility of not even touching the cache is a big
reason why I'm pushing redundant checksum avoidance, though by no
means the only reason - checksumming remains a big factor even in a
Reno-or-earlier copying environment.
Then again, some numbers you've posted suggest that some of
the memory systems you're playing with are slower relative to the CPU
than the ones in the DECstations and Alphas I deal with (I simply
won't argue over whether SGIs have slower memory systems or faster
CPUs....).

Jon

Craig E Jackson

unread,

Jul 5, 1993, 12:23:33 PM7/5/93

to

Just to throw my oar into this discussion:

1. Several persons have gotten hysterical about corrupt printouts. Randy
from QMS has mentioned that he supports multiple protocol stacks.
Most likely, one of those stacks is NetWare; NetWare *does* rely on a
reliable link layer. I think that Novell stockholders will testify that
such a reliance is not an instant formula for commercial failure.

2. I don't know what QMS is working on, but there *are* fast printers out
there. Xerox was doing 2 pages/second 13-14 years ago, with 11/34s and 8080s.
A Postscript web press was announced recently. (A web press is the big
mutha with the huge spinning drums on which they print books, etc.)

3. As others have said, a simple 16-bit checksum just isn't that good. I
would suggest that it is insufficient to be the primary error check for
any IP data link layer.

4. Lots of people have quoted TCP speeds for this processor or that. Few
have mentioned how much of the processor was left for the application(s) when
running at such speeds. I would suspect that a printer vendor would prefer
that a very high percentage of the processor be delivered to PDL interpretation
and imaging.
--
Craig Jackson
c...@world.std.com

Roy Smith

unread,

Jul 5, 1993, 1:20:09 PM7/5/93

to

c...@world.std.com (Craig E Jackson) writes:
> 4. Lots of people have quoted TCP speeds for this processor or that. Few
> have mentioned how much of the processor was left for the application(s)
> when running at such speeds. I would suspect that a printer vendor would
> prefer that a very high percentage of the processor be delivered to PDL
> interpretation and imaging.

Van Jacobson used to saturate ethernets running TCP with a Sun-3/50,
which is a 16 MHz 68020, no cache. I suspect that compared to the cost of
everything else in a printer fast enough to worry about the speed of TCP
checksumming over ethernet, a cacheless 16 MHz 68020's worth of MIPS would
just get lost in the noise.

David L Stevens

unread,

Jul 6, 1993, 1:12:22 AM7/6/93

to

I really hate to encourage this discussion anymore, but I guess I'm
not old and wise enough to *not* put in some observations.

1) Weird and unlikely hardware failure scenarios are at best a straw
man argument. Ok, suppose the board DMA's bogus data into the
driver buffer. Sure, TCP checksums will catch it. But suppose
you have the (my guess) much more likely hardware faulure of
failed memory chips in any of the (often multiple) copies of
the data up to and including the user buffer itself. It doesn't
catch it, and if the enviroment is one in which not doing
checksums doubles performance, it's obviously better to leave
the Ethernet checksum as one on the data.

2) I believe vjs suggested doing the checksum via special purpose
chips rather than in software. Great! Why not put those on
an Ethernet board and do them on the packets as they arrive...
oh, they're already there and already done, now, aren't they?
The whole point...

I'm not at all advocating that checksums should not be used, or that
it's worth a hack to not compute them. But some of the arguments I've seen
for keeping them don't seem to have much support in reality to me. Under the
conditions of the original post, a TCP checksum *is* mostly redundant and is
*not* worth a significant reduction in throughput for the infinitesimal cases
it would catch (versus the number of cases it still wouldn't). If anyone was
designing the system from scratch as an integrated whole to just do printing
on a LAN, there wouldn't be 2 checksums on every part of the packet.

B U T.....

The great thing about using TCP in the first place-- in fact, the whole
point-- is that it allows your customers and/or you to extend the system or
use it in ways you might not anticipate today. That means that full compliance
with the protocol is a good idea. That is, if you want to keep your customers.
Further, one of those extensions might be to use it on a non-ethernet medium
that doesn't do checksumming, or to use it on a wide-area network where who
knows what sort of battering might go on. Having an end-to-end checksum is
almost a necessity, then, and that is the only argument I've seen that makes
any sense.
So what if in 1986 one machine somewhere, after hundreds of billions
of packets, had some funny Ethernet DMA problem that the TCP checksum caught!
How much data, by comparison, has not been reliably delivered because of any
number of other hardware failures, not to mention bugs in the TCP and/or user
program software in that time frame? Answer: a *LOT*.
So, two handy rules of thumb:

1) If the world will end if one bit is incorrect and is delivered to
an application once every trillion packets, don't use TCP. Don't use Ethernet.
Don't use any computer I've ever heard of, or any large software or hardware
system I've ever heard of. In fact, don't build it at all; humanity will thank
you.

2) If you're printing bitmaps of pamphlets for the "Nixon in '96" campaign
and one of the pixels is the wrong color, but it prints twice as fast, your
customers would probably rather you left out checksums. Though they'd like
it much more if you got a real processor to run it on, or a decent TCP where
the checksum speed doesn't dominate in the first place.

If you want to *claim* compliance without losing the speed, take the
VJS approach of using an "outboard" checksummer-- the one on the Ethernet
board, with a little fiddling. Subtract off the contribution from the headers
and add in the pseudo header to whatever the Ethernet checksum gets you. You
avoid resumming the data-- handy for very large packets, I suppose. But if
there's a DMA problem, the planet might end [see ROT #1].

Are we done now?
--
+-DLS (d...@mentor.cc.purdue.edu)

gary s anderson

unread,

Jul 7, 1993, 1:00:14 PM7/7/93

to

In article <C9q94...@mentor.cc.purdue.edu>, d...@mentor.cc.purdue.edu (David L Stevens) writes:
|> I really hate to encourage this discussion anymore, but I guess I'm
|> not old and wise enough to *not* put in some observations.
|>
|> 1) Weird and unlikely hardware failure scenarios are at best a straw
|> man argument. Ok, suppose the board DMA's bogus data into the
|> driver buffer. Sure, TCP checksums will catch it. But suppose
|> you have the (my guess) much more likely hardware faulure of
|> failed memory chips in any of the (often multiple) copies of
|> the data up to and including the user buffer itself. It doesn't
|> catch it, and if the enviroment is one in which not doing
|> checksums doubles performance, it's obviously better to leave
|> the Ethernet checksum as one on the data.
|>

You obviously haven't played with RISC systems and their VME buses :-)

The CPU, memory, and disk paths may have parity but the VME buses and
peripherals have had much looser requirements. The "high performance"
and/or "high power" adapters tend to have interesting problems, especially
if the buses are loaded. Network adapter vendors tend to ignore
(or at least they don't rush to correct the problem) their
integrity problems, because the protocol checksums catch most of their
short-comings. When a remote entity chooses to ignore the computed
checksum, they now open a "hole" which the peer had assumed was
adequately covered.

NOTE - in case you want to talk about most likely problems, single
bit loss has been the most prevalent problem with the few VME buses
I've had the pleasure to deal with.

My previous point about needing to "know" the entire path is still
valid even with your arguments. You must "know" that all elements in
the path (routers, bridges, peer systems, etc.) are not relying on
the checksum to Cover Their A...s, befor making a unilateral decision
to ignore the checksum!

|> [deleted a bunch of stuff.....]

I don't disagree with your philosophy, but it will be interesting
to see who's "box" needs to be upgraded when the customer
complains about seeing printing errors :-)

Vernon Schryver

unread,

Jul 7, 1993, 4:32:43 PM7/7/93

to

Just to beat this horse some more ...

The non-zero TCP and UDP checksum error coutners I mentioned recently
almost certainly were counting errors that occurred in traffic
that travelled routes that were completely protected by link-layer
checksums, either FDDI or Ethernet. It is almost certain that
none of the errors occurred on traffic going over less protected
56K or T1 or fractional-T1 wide area network links. I say this
based on my knowledge of the network and traffic involved.

This implies those ~100 errors
1. occurred in router memory or buses (including workstations
acting as routers, as well as various Cisco routers).
2. happened in host memory or buses.
3. were committed by various bridges or switching hubs.
4. were link layer errors not detected by the FDDI or Ethernet
checksums.

Isn't it true that cases (1) through (3) are essentially the same?
Does it matter if an error happens when a packet is being forwarded by
a router or bridge or when it happens when a packet is being sent or
received? I think it does not.

If you agree, then does it matter in any of those four cases whether
the packet was going directly from one host to its destination or if
peers were not directly connected?

In other words, what difference does it make if an error occurs on a
direct link or on a route? If it is safe to turn off TCP checksums for
a direct link then it should be about 1/(N+1) as safe to turn TCP
checksums when the peers are separated by N routers and N+1 links with
link-layer checksums.

If you accept that, then don't you also accept the statement that for
all values of N, the danger P of turning off the TCP checksum is, for
all practical purposes, the same as the danger P*(N+1) ?

In other words, non-zero `netstat` numbers are convincing evidence that
the TCP checksum is a Good Thing.

Vernon Schryver, v...@sgi.com

Randy Turner

unread,

Jul 7, 1993, 5:53:27 PM7/7/93

to

v...@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Vernon Schryver, v...@sgi.com

Ok, I'll pick up the whip for the horse for awhile...

I agree in part with your probabilities for router/bridge errors,
but I 'm not sure I understand the conclusion you draw from the math.
It seems that for increasing values of N, the probability of failure
in one of the N intermediates also increases. In other words, the
more pieces you have between the connected hosts, the more likely
it is that one of them might screw up. For a direct link (i.e.
both hosts tapped into the same Ethernet segment) your odds for
failure are somewhat less.

Also, I have been studying some traffic here on our network, and
it appears that the bulk of our traffic is small packets (less than
1K bytes), and with small packets, the optimization techniques for
checksums probably don't pay off much. In other words, for alot
of this traffic, the payback for checksum optimization, or even
elimination wouldn't be worth the risk. In earlier comments, it
was mentioned that the checksum algorithm shouldn't really be one
of the more burdensome throughput bottlenecks. However, as network
media speeds increase, and MTU sizes also increase, for those
vendors that don't have hardware support for checksumming, the
checksum algorithm may not scale well - meaning that the link layer
stuff is all much faster, but the packets are still being serialized
through the same checksum algorithm.

(I should note that the profiling I've done on our network was only
for TCP packets. I would imagine there are huge amts of full
MTU-size, fragmented Ethernet packets representing NFS UDP traffic
that would be consuming cycles for checksumming, provided
checksumming was enabled for these packets.)

We have been using a modified version of Jacobsen's checksum
algorithm, as presented in RFC 1071(?). We are now in the process of
combining this algorithm with an existing buffer copy in our stack
to see what gains we can achieve. My guess is, since Jacobsen's
algorithm was using extensive loop unrolling, and was thereby
neutralizing the 68020 I-cache advantages, we may see some
improvement.