I have a somewhat complex request for info involving file transfer
between hundreds of SUN3 and SUN2 workstations. These workstations'
are networked together using ethernet (802.3). Protocols - TCP/IP/UDP.
I would like to have the ability to transfer large application
programs to all nodes on the network simultaneously.
Why I think it may be possible to do this:
1.) The ethernet packet information can contain what is commonly
called a multicast bit within the destination address. Thus,
I should be able to set this bit to broadcast or spray my large
application program (ie. 10mb-30mb) to all nodes on the network.
I also, using the multi-cast bit, should be able to set up a table
of nodes that I wish to distribute the program to. Thus, if
the bit is set to 0 - it is specific address, 1 it is a group of
nodes and all 1's in the field indicate all nodes.
2.) Some user-level programs already do something similar to what
I want. For instance, "wall" will broadcast a message to nodes
on your network. The command "rcp" will copy one or a group of
files to one particular destination at a time. I want a "wall"
or "rwall" and combined "rcp" that can copy my file or files
simultaneously to all nodes or subgroups of nodes on a network.
I know NFS allows mounting of an application to nodes and simultaneous
access of that application - but that is not what I want. I want to
distribute to stand-alone machines as well as file servers new copies
of an application once a week and each rcp or "dread the thought"
cartridge taping can consume 1/2 hour per node. Thus, 20 nodes
is 20 x 1/2 hour by rcp from a master database or less time if multiple
tapes are made or rsh tarring from a server with 1/2" tape.
I would like to be able to say something like:
distribute -g <tablefile> <application>
where -g is the option for group and tablefile is the database that
contains a list of nodes with names or internet addresses
distribute -a <application>
where -a is for all nodes - no table
I am not that familiar with the networking code on SUN's and was
wondering the following:
Can it be done?
Is this beyond the ability of the Ethernet itself?
When "wall" does a broadcast - is it simultaneous to all nodes or
consecutive?
Will I need some sort of daemon process running, all the time, on
each node waiting for a signal to allow broadcast file transmissions,
or can /etc/inetd already handle this type of request with little or no
code tweeking?
What kind of error checking do I have to do for testing that
the program was successfully transmitted without losing packets
or corrupting packets - at source or destination?
Has anyone created a program that can do this?
If not, can someone get me started as to the process or code that
I might need to access or create to accomplish what I want?
For instance:
you can alter "inetd" or you have to create a new daemon
you must access these libraries and change this/that
you must use these calls
etc., etc.
Thanks in advance for all advice!!!
+-------------------------------------------------------------------+
| |
| /\ Post: Art Crotty |
| / \ Computervision Corp. |
| /_ _\ 14 Crosby Drive |
| / o o \ Bldg. 5-1 |
| -mm--------mm- Bedford, Mass. 01730 |
| Ma Bell: (617) 275-1800 |
| The fool wanders, UUCP: { decvax,raybed2 }!cvbnet!acrotty |
| the wise man travels. |
+-------------------------------------------------------------------+
First, the Ethernet is not a reliable medium. This means that any
individual packet may be dropped. All protocols currently used to
send files include some sort of acknowledgement that the packet really
got there. If an ack is not received, the sender resends the packet.
This is true of FTP, rcp, and NFS, though the actual details of the
protocols are different for NFS and the other two. So a broadcast
distribution protocol would have to keep a list of the sites that are
expected to be receiving, and keep resending each packet until it has
gotten an ack from every receiver. Since the acks would all be sent
at the same time, you would have guaranteed collisions on the
Ethernet. Probably you would want some sort of randomized delay
before sending the ack. This would be a nontrivial design problem,
and probably there would be other implications that I have not
noticed. But an experienced protocol designer could probably solve
the problem.
You imply that you are going to be updating hundreds of Suns. I would
be somewhat wary of the idea of hundreds of Suns on a single Ethernet.
When we asked Sun about this, they recommended no more than 50
diskless Suns on a single Ethernet. Our measurements suggest that
this number is about right. Of course if the machines are not
diskless, more should be possible. But there is a limit. If you have
hundreds of machines, they are probably going to be on more than one
Ethernet, with gateways. Broadcasts do not go through gateways,
unless special provisions are made. This is a good thing. It
protects networks from other networks where a machine has decided to
start spraying the network with high-speed broadcasts (a failure mode
that is not uncommon when you are playing with experimental network
software). There are also problems in making sure that loops don't
occur. If a gateway forwards broadcasts from one interface to the
other, any very interesting topology will end up with broadcasts
looping around the network. These problems can be solved, and indeed
there is an RFC describing multi-network broadcasts, but you should
realize that there are design issues involved with broadcast protocols
that involve more than one Ethernet.
My suspicion is that this is not worth doing. I suggest instead using
a branching tree distribution. I.e. your master sends to 10 machines
and each of them to 10 more, or something like that. Note that the
Ethernet should be able to support a number of simultaneous transfers,
as long as they are not broadcasts. The limit on network bandwidth
for most machines (including Suns) is the machine's own Ethernet
hardware and software. The fastest real transfers I have seen are
1MBit/sec, and even that requires special care. 200Kbit/sec is more
normal. Thus the Ethernet should be able to support a reasonable
number of simultaneous copies, as implied by the branching tree
model. Collisions would not be the problem here that it would be
with the broadcast scenario, since the various copies would quickly
lose any synchronization that they might have.
A multicast-based ftp is not impossible, but it certainly doesn't match
the communications model of any of the popular existing protocol families
(tcp/ip, sna, decnet, osi, xns, etc.). TCP/IP didn't even standardize
the value of the broadcast \fIaddress\fP until recently!
Note that most existing broadcast applications on Ethernets assume
unreliable broadcast, and are generally used for sending status information
(or requests for information). In almost all cases, the amount of
information to be transferred is limited to a single packet.
Conclusion: it's a good topic for research, but don't expect anyone to
implement such a beast in the near future. And don't ever expect to
see it layered on TCP/IP.
Except that the Sun driver for the "ie" interface doesn't understand
multicasts. You'd have to change that driver, provide "ioctl" calls to set
the multicast address group, and provide a way, in whatever protocol you
used, to specify that a packet is to go to a multicast group.
> 2.) Some user-level programs already do something similar to what
> I want. For instance, "wall" will broadcast a message to nodes
> on your network.
No, it won't. The "rwall" command will send messages to other machines;
however, it does not "broadcast" them, in the sense that it uses Ethernet
broadcast facilities for this. when discussing networks. If it is asked to
send messages to a set of machines, it does so by running through an
enumeration of those machines and sending to them one at a time.
> I know NFS allows mounting of an application to nodes and simultaneous
> access of that application - but that is not what I want. I want to
> distribute to stand-alone machines as well as file servers new copies
> of an application once a week and each rcp or "dread the thought"
> cartridge taping can consume 1/2 hour per node.
You may, in this case, want to have the stand-alone machines get the
application via NFS.
> I would like to be able to say something like:
>
> distribute -g <tablefile> <application>
>
> where -g is the option for group and tablefile is the database that
> contains a list of nodes with names or internet addresses
Even if IP supported multicast groups, this would not be straightforward.
You can't assign a host to a multicast group; that host has to add
*itself* to the multicast group. As such, you'd have to start by telling
the hosts in that list to join a particular multicast group (you'd also have
to either 1) reserve a multicast group for this or 2) find some way of
finding an unused group and choosing it).
I think there may be some RFCs discussing the use of multicast addresses in
IP, but I doubt that there are any standard implementations of this for
UNIX. At best, they're probably experimental. At worst, they don't exist.
There are a lot of complicated issues involved in putting multicast support
into IP.
Of course, as stated before, you'd have to whack on the networking code
quite a bit to teach it about multicast addresses, anyway.
> Can it be done?
Maybe, if you're willing to learn a lot about IP, Ethernet, and the 4.2BSD
networking code, and make *lots* of changes to it. I don't guarantee that
it'd be possible even then.
> When "wall" does a broadcast - is it simultaneous to all nodes or
> consecutive?
As I mentioned above, "wall" doesn't do broadcasts at all; since "rwall"
doesn't do them as Ethernet broadcasts, they are consecutive. (No, "rwall"
doesn't fork off N processes, one per machine.)
> What kind of error checking do I have to do for testing that
> the program was successfully transmitted without losing packets
> or corrupting packets - at source or destination?
Lots. TCP doesn't understand broadcasts, much less multicasts, and can't
really be made to. As such, you'd have to provide your own flow control and
error recovery.
As Charles Hedrick pointed out, this won't work well (if at all) if you want
to update hosts that aren't on the same Ethernet, either.
I think the best advice is "try something else".
--
Guy Harris
{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
g...@sun.com (or g...@sun.arpa)
If my suggestion leads you to a workable protocol, Guy is still right about
the type of skills you will need to use it. Expect to learn more than you
want to know about the messy details of the system, hardware, and LAN.
--
Steve Langdon ...!{decwrl,sun,hplabs,ihnp4,cbosgd}!amdahl!sjl +1 408 746 6970
[I speak for myself not others.]
The short answer to your question is you *can* broadcast your updated
programs to your other nodes, but you shouldn't. The reason for this
will take some explaining.
The Ethernet protocol you said you had, TCP/IP/UDP, are
actually 2 seperate protocols that can co-exist harmoniously: TCP/IP,
which will guarantee packet delivery to one node only, and UDP, which
guarantees nothing. One of UDP's features is, since it transmits
packets without waiting for any kind of acknowledgement, it is able to
send to a special broadcast address and have 'billions and billions'
of machines (which are also set to receive with this same broadcast
address) receive them without the overwhelming overhead that would
otherwise be required in such a case. Many erroneously equate "UDP"
with "Broadcast", when in fact "Broadcast" is merely a special case.
As you can probablly guess, if you choose to broadcast your
updates to all your Sun workstations, you run the risk of randomly
dropping packets or losing bits of information in other ways. This
risk is even greater if the other Suns are transmitting information to
each other (Using TCP/IP, no doubt) in the background at the same
time. An example: In my studies of UDP reliablilty, it was common for
a Sun3 to send 100 UDP packets and have a Sun2 receive only 65 of
them. (This result is amplified by the fact that the Sun3 sends them
faster than the Sun2 can physically receive them. Sun2 to Sun2
generally yields better than 98% of the message when lots of other
Ethernet activity is taking place.)
My reccommendation is to use NFS, as it was designed for
precisely your situation. (The original posting didn't state why the
option was ruled out.) If that option isn't acceptable, the next best
option is to write a shell that sequentially rcp's the file to every
node individually. (RCP uses the TCP/IP protocol; it's no dummy!)
Sorry about that---and good luck.
--
Gary Friedman
Jet Propulsion Laboratory
UUCP: {sdcrdcf,ihnp4,bellcore}!psivax!mc0!garyf
ARPA: ...mc0!ga...@cit-vax.ARPA
> As you can probablly guess, if you choose to broadcast your
> updates to all your Sun workstations, you run the risk of randomly
> dropping packets or losing bits of information in other ways.
Well, you *could* have the receiving hosts send back acknowledgments when
they received the broadcast packets. This would be an excellent way to melt
down an Ethernet, though, given the number of hosts that would receive the
broadcast packet. The sending host would also have to know *all* the hosts
the broadcast would go to, in order to know whether it got all the
acknowledgments it should. It would also have to know what to do if it
didn't get acknowledgments from all the hosts; should it retransmit only to
the hosts that didn't get it (if 75% of them didn't get it, this could flood
the Ethernet) or to all the hosts. In addition, the code on the receiving
end would have to be able to deal with packets received out of order, or
duplicate packets (especially if the response to negative or missing
acknowledgments is a broadcast retransmission).
> My reccommendation is to use NFS, as it was designed for
> precisely your situation. (The original posting didn't state why the
> option was ruled out.) If that option isn't acceptable, the next best
> option is to write a shell that sequentially rcp's the file to every
> node individually. (RCP uses the TCP/IP protocol; it's no dummy!)
And NFS uses UDP/IP. NFS operations return a success/failure indication
and, if a failure, an error code; this return message acts as the
acknowledgment.
In article <3...@mc0.UUCP> ga...@mc0.UUCP (Gary Friedman) explained that
TCP/IP guarantees safe delivery for point-to-point links and that UDP
allows for broadcasting, but at the price of reduced reliability. Gary
then suggests that Art probably wants to use NFS, since it's designed to
allow efficient sharing of a file by many hosts.
Clearly, UDP is not suitable for times when you really care if the
data gets delivered or not (rwho uses UDP, doesn't it?). As I understand
it, NFS uses UDP as the underlying transport protocol but to improve
performance, Sun has turned off checksumming in NSF/UDP packets.
Presumably NFS does its own error checking at a higher level, so they can
get away with ignoring checksums at the lower levels.
Has anybody done any studies to determine if this causes any
problems? I've heard random comments by people on the net that they don't
like what Sun did, but has anybody taken a serious look at the situation
and found cases where corrupted UDP packets have caused user-visible NFS
errors? On the other hand, has anybody made any measurements to see just
how much NFS would be slowed down if UDP checksumming were turned back on?
--
Roy Smith, {allegra,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016
Nope. As far as I can determine (by looking at the rpc stuff that's
going on, and by being familiar with how NFS hands things off to UDP/IP
to have them sent), there is no checksumming going on. In fact, the
standard Sun kernel has UDP checksums turned off both in udp_output()
and in the NFS/RPC kernel UDP fastsend routine. Without source, there
isn't any way (other than doing strange things with patching binaries)
to turn checksumming on in the kudp_fastsend() routine; the routine
just doesn't do it...
> Has anybody done any studies to determine if this causes any
>problems? I've heard random comments by people on the net that they don't
>like what Sun did...
Random comments from Chris or from me, probably...
>...but has anybody taken a serious look at the situation
>and found cases where corrupted UDP packets have caused user-visible NFS
>errors?
We've not looked at it seriously, but I'm sure that it's possible.
It's not too likely, it seems; some people from Sun have said, "yeah,
we know we're not 100% safe, but we've never heard of any problems."
The IP header checksum still happens; I would hope (and this perhaps
is what Sun is thinking) that the IP header would be trashed along
with the rest of the packet. Then again, that's only 20 bytes, and
the result of a standard NFS read operation is roughly 4K long...
Does anyone know how your average Joe Ethernet board hiccups, if and
when it does? Is this a valid assumption?
>On the other hand, has anybody made any measurements to see just
>how much NFS would be slowed down if UDP checksumming were turned back on?
Again, no hard data, but the Sun on my desk runs a kernel that goes
through the UDP output code for all my kernel RPC, and I have checksums
turned on. It seems a little slower, but not much.
-Steve
--
Spoken: Steve Miller ARPA: st...@mimsy.umd.edu Phone: +1-301-454-4251
CSNet: steve@umcp-cs UUCP: {seismo,allegra}!umcp-cs!steve
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742
I haven't seen any problems with NFS because of this, however, the
lack of checksums does cause problems with rwho.
One of our networks is a Proteon 10 Mbit Pronet.
[ Note-- I refuse to blame Proteon for the extent of our problems. I am
positive that the low quality cables we use are causing our problems. ]
Every once in a while, the network flakes out and lots of bad packets
get generated. The proteon boards are supposed to have a checksum
built in that catches single and double bit errors, but these packets
have lots of errors. When UDP gets the packets, it just passes them
up to rwho.
Well, rwho gets the packets and checks to make sure the hostname is
all printable ASCII. This is reasonable, but, often, the errors
causes the letters to change case or to become different printable
ASCII characters.
This leads to some pretty strange host lists when one runs 'ruptime'.
I have also seen this kind of mangled rwho packet on one of our
Ethernets, but, once again, I haven't seen the lack of checksums
affecting NFS.
--
Dave Cohrs
(608) 262-1204
..!{harvard,ihnp4,seismo,topaz}!uwvax!dave
da...@rsch.wisc.edu
UDP checksumming is switched off - but not just for performance reasons. I
understand that some implementations can't calculate the checksums properly
anyway. There has been some discussion of this in mod.protocols.tcp-ip.
There is no "higher-level" checking that I'm aware of. Presumably, you
can get away with the IP checksums and the ethernet frame checksums in the
IP fragments when constructing a UDP packet that comes in off the net.
If there were corruption problems, these should show up as errors in NFS
"headers" - RPC structures, file handles, and so on. The kernel would
scream about these if and when they happened.
> Has anybody done any studies to determine if this causes any
>problems? I've heard random comments by people on the net that they don't
>like what Sun did, but has anybody taken a serious look at the situation
>and found cases where corrupted UDP packets have caused user-visible NFS
>errors? On the other hand, has anybody made any measurements to see just
>how much NFS would be slowed down if UDP checksumming were turned back on?
We've had no problems - mind you, we've not been actively looking for them.
Our users would be complaining if they found their files corrupted. So far
there have been no complaints. Turning on (or off) UDP checksumming may not
be all that informative if it introduces erroneous "bad" packets.
I will try to establish some performance figures in the next day or so on a
couple of SUN-3's. Naturally, I'll post the results....
Jim
ARPA: jim%cs.stra...@ucl-cs.arpa, j...@cs.strath.ac.uk
UUCP: j...@strath-cs.uucp, ...!seismo!mcvax!ukc!strath-cs!jim
JANET: j...@uk.ac.strath.cs
"JANET domain ordering is swapped around so's there'd be some use for rev(1)!"
"Checksum is the 16-bit one's complement of the one's complement of the pseudo
header of informantionfromthe IP header, the UDP header, and the data. . .".
I would like to point out that the 32-bit CRC generated with every Ethernet
packet and checked by the receiver of the packet is (orders of magnitude?)
a far more reliable detector of transmission errors than the artifact of the
1st generation of computers, the checksum. If your Ethernet driver passes
corrupted packets into the higher protocol levels, it is because it is ignoring
the fact that the Ethernet controller chip has run out of memory or some
similar problem and not because an error has crept by the CRC checking logic.
In my experience, we have never encountered an Internet checksum error with our
very vanilla flavored implimentation of BSD 4.2 networking (at least, not
since I fixed the 82586 Ethernet controller chip NO RESOURCES bug in my
Ethernet controller driver;-).
BTW, a ones-complement checksum across a threaded list of mbufs takes a lot
longer than you might intuit. The cleverest assembly language programming
really pays off. I added a switch in my in_cksum routine which causes it to
immediately return a zero. Makes a network of Celerity C1200s and C1260s
run about 8% faster with a heavy TCP networking workload. Problem is that
the "foreign" machines on our net don't understand this and refuse to talk
to me.
R. L. (Ron) McDaniels
CELERITY COMPUTING . 9692 Via Excelencia Way . San Diego, California . 92126
(619) 271-9940 . {decvax || ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ron
"Yes, my Precious. . . we hates them socket(2)eses!"
|NFS uses UDP as the underlying transport protocol but to improve
|performance, Sun has turned off checksumming in NFS/UDP packets.
|... Has anybody done any studies to determine if this causes any problems?
No studies, but I have an anecdote. Last week we had an Intel Ethernet
controller board go slightly bad on a Sun-2/120 running 3.0. TCP/IP worked
fine, but NFS had rare bit errors without issuing any diagnostic messages.
The bit errors were in executable files, causing core dumps that were oddly
reproducible due to caching. I wasted some time tracking this down. Had NFS
checksummed, the problem would had been evident.
-- Paul Eggert, SDC Santa Monica
Or perhaps the CRC checking logic has failed, or the CRC was correct
but the transfer from Ethernet memory to host memory failed, or
any number of other possible glitches. However, it is true that
one must trust the hardware to some extent. (The exact extent is
often a matter of debate.) For the sake of argument I will assume
that Ethernet reliability is high enough that a software check is
not worthwhile.
But what proponents of no software checksums seem not to have
considered is this: Not all networks are Ethernets. There are
other systems out there. Many of these systems have considerably
higher error rates than Ethernets. By disabling software checksums
you preclude the use of these less-reliable but nonetheless useful
alternative networks.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: ch...@mimsy.umd.edu
For people who think that the swell checksumming that your favorite
Ethernet board does for you, think again: I once had an Interlan
controller whose on-board packet buffer memory went flakey. So the bits
coming off the wire were appropriately checksummed and passed, and then
trashed on the board and passed up to some unwitting Chaosnet software
(which, unlike the IP model, doesn't do checksumming) and on up to an
FTP program which happily wrote the wrong bit to a file.
Moral of the story: Do end-to-end checksumming.
-- Nat Mishkin
Apollo Computer Inc.
{mit-eddie, wanginst, yale}!apollo!mishkin
But it does absolutely nothing to detect errors in the Ethernet controller,
the low-level software, and the hardware and software of any gateways
through which the packets pass. As the Xerox people have been saying for
years, if you want to be sure the data is getting there intact, you put
a checksum (or CRC, or whatever) on it as it leaves the sending application,
and check that checksum when it reaches the receiving application.
Particularly when the "application" is something like NFS, which could make
an incredible mess if packets got garbled, there is something to be said
for such "end-to-end" error checking.
--
Henry Spencer @ U of Toronto Zoology
{allegra,ihnp4,decvax,pyramid}!utzoo!henry
I seem to recall a paper in _Computer Networks_ about a year and a half ago
that made a rather convincing case for end-to-end error checking. It's
really obvious when you think about it - error checking in lower level
protocols can really do nothing for your confidence level in upper level
protocols, because the criteria they use to evaluate an acceptable rate
of errors may be entirely different from your own.
In fact, the authors went on to suggest that it always be possible to turn
lower level error checking off, as a performance enhancement, since the
upper level protocol *should* do it anyway.
Of course, this is an environment where communications engineers calculate
acceptable error rates. Do we do that in computers, hmmm ;-) ?
Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms