Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Network problem with DN3500

2 views
Skip to first unread message

Andreas Neubacher

unread,
Mar 26, 1993, 5:48:38 PM3/26/93
to
Hi everybody!

I'm sorry to bother the net with this, but I read the FAQ, read the manuals,
tried all experiments I could think of --- without getting anywhere.
Actually, the only result of my experimenting is a fairly good (?) problem
description, which I hope will be sufficent for somebody on the net to give
me some help ... So, here comes:


1. PROBLEM DESCRIPTION
----------------------

One of our DN3500 (called "fichte") displays the following strange behaviour
since yesterday afternoon:
- If I am logged in on "fichte", I cannot copy files *to* "fichte"'s
disk (or more precisely: it will abort after having transferred a multiple
of 8 kB.
- If I am logged in on "fichte", I *can* copy files *from* "fichte" to
another one.
- If I am logged in on some other machine, I can copy files *to* and *from*
"fichte".

Now some more details: We have an Apollo Internetwork with an Ethernet as
its hardware base. This Ethernet is also carries TCP/IP traffic for various
other machines (DECstations and the like).
The behaviour described above only happens when the file being
copied is on a machine which is *not* on the same segment as "fichte".
(There are several networks connected by a router and several segments per
network connected by multiport repeaters.) Nevertheless, the Ethernet seems
to be ok, as all the other machines do not experience such problems, and
even a 'netstat -s' on "fichte" does not show any errors.
The strange things here are that
- no other machines are affected (only "fichte"),
- the problem appears only with machines not on the same Ethernet segment,
- it only appears for incoming data.


2. EXAMPLES
-----------

So, if I am on "fichte" I get behaviour as follows:

# ls -l //pinie/usr/lib/sendmail
-rwsr-xr-x 1 root 163135 Feb 18 15:27 //pinie/usr/lib/sendmail
# cp //pinie/usr/lib/sendmail /tmp/sendmail
cp: /tmp/sendmail: No such file or directory
# ls -l /tmp/sendmail
-rwsr-xr-x 1 root 8192 Mar 26 22:50 /tmp/sendmail

(Note that "pinie" is a node which is not on the same segment as "fichte".)
>From "pinie", I can do the following:

# cp /usr/lib/sendmail //fichte/tmp
# ls -l //fichte/tmp/sendmail
-rwxrwxrwx+ 1 root 163135 Mar 26 23:35 //fichte/tmp/sendmail

If I try to execute a file which does not reside on "fichte"'s disk, I get

# vmstat
remote node failed to respond to request (OS/network)

Here 'vmstat' is a file softlinked to an executable on "pinie".


3. AN OBVIOUS CLUE?
-------------------

Now, I found an obvious pointer to the source of the problem when I tried
'nodestat -l' on "fichte":

# nodestat -l


The net_ID.node_ID of this node is 100.19355.

**** Node 19355 **** //fichte
Time 1993/03/26.23:39:59 Up since 1993/03/26.12:34:20

Net I/O: total= 64232 rcvs = 33377 xmits = 30855

2080 page-in requests issued.
160 page-out requests issued.
363 page-in requests serviced.
107 page-out requests serviced.
Detected concurrency violations -- read: 0 write: 0

Hdwr xmits 45854 Hdwr rcvs 56763
CRC errors 30 Misalignments 1010
No resource 0 Overruns 0
Adapter err 0 Full socket 1
Bad interrupt 0 Xmit errors 0
Ctlr_# = 0 Unit_# = 0
Winchester I/O: total= 120926 reads= 24140 writes= 96786

Not ready 0 Contrlr busy 0
Seek error 0 Equip check 0
Drive time out 0 Overrun 0
CRC error percentage: 0.00%

System configured with 8.0 mb of memory.
A total of 0 parity errors were detected.

Now, the "Misalignments 1010" certainly looks serious. The problem is that I
have no idea what "misalignments" are in this contexts and how I could cure
them ...


I'd appreciate any help. If response is by e-mail, I will summarize for the
net.

Thanks,
Andreas.
--
-------------------------------------------------------------------------------
Andreas NEUBACHER, Research Institute for Symbolic Computation, Johannes Kepler
University, 4040 Linz, Austria. aneu...@risc.uni-linz.ac.at !Packed signature!

Jinfu Chen

unread,
Mar 30, 1993, 12:41:32 PM3/30/93
to
In article <1993Mar29.1...@alijku05.edvz.uni-linz.ac.at> aneu...@risc.uni-linz.ac.at (Andreas Neubacher) writes:
> Well, to make a long story (i.e. an afternoon of checking the
>ethernet) short, we found a bad ethernet cable which had been installed
>about a week ago! Note that the only machines (obviously) affected were two
>Apollos and an X-terminal, and none of these machines went off the net
>completely.
>
>What did I learn?
>
>1) Apollo DDS breaks much more easily than TCP/IP.

How did you come up such conclusion? A bad ethernet connection will break
any protocol. I don't know if DDS has error recovery feature like TCP. If
it doesn't have that explains why DDS breaks "easier" than TCP. TCP/IP may
not appear to be affected as much as other protocols but your overall network
traffic will slow down. We recently had similar incident that a bad 10base-T
transceiver causing 10% bad packets. TCP based applications seem running
fine except extrememly slow. However UDP based applications such as NFS were
going hell. It took us several days to find out with the help of a network
analyzer.

>5) If we had had a good Ethernet checker (i.e. some nice hardware tool)

Agree.

6) Apollo Token Ring is more reliabe than Ethernet. At least it ether works
or doesn't work.

0 new messages