Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Curious pings on SCO 5.0.4/6

43 views
Skip to first unread message

Stefan Marquardt

unread,
Nov 5, 2003, 12:14:04 PM11/5/03
to
Hello,

sometimes we had on many SCO PC's with 5.0.4 and 5.0.6 very slow
network conection.

PING kasseob4 (10.22.136.54): 56 data bytes
64 bytes from kasseob4 (10.22.136.54): icmp_seq=3 ttl=62 time=40 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=0 ttl=62 time=3080 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=1 ttl=62 time=2080 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=2 ttl=62 time=1090 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=4 ttl=62 time=2240 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=5 ttl=62 time=1240 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=6 ttl=62 time=240 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=7 ttl=62 time=2400 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=8 ttl=62 time=1400 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=9 ttl=62 time=400 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=13 ttl=62 time=40 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=10 ttl=62 time=3080 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=11 ttl=62 time=2080 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=12 ttl=62 time=1090 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=14 ttl=62 time=820 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=21 ttl=62 time=40 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=15 ttl=62 time=6110 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=16 ttl=62 time=5110 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=17 ttl=62 time=4120 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=18 ttl=62 time=3120 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=19 ttl=62 time=2120 ms
64 bytes from kasseob4 (10.22.136.54): icmp_seq=20 ttl=62 time=1120 ms


A PC, same HW, same OS, same NIC works fine connected at the same Hub.
But sometimes another PC has this error which works fine a few weeks.

Solution: Reboot

The ping looks like a sinus curve, running in a loop.
The RS 50x is installed.
Nic: SMC EtherPower II Driver (ver 2.0.5)
HW SMC EtherPower II 9432BFTX 10/100Mbps - PCI Bus# 0,Device#
4,Funct


Any ideas ?

Stefan

Bill Vermillion

unread,
Nov 5, 2003, 1:05:05 PM11/5/03
to
In article <14biqvckqcobdh1h3...@4ax.com>,

>Solution: Reboot

>Any ideas ?

Some auto-negotiation could be failing. And if something goes into
fdx mode the hubs are the culprit. Swithces are cheap enough to
toss all the hubs into the trash.

As to the large time going down to a slow time, that is typical of
not being able to make the connection, and then when it opens up
the timing of the first sent gets high, and then each succeeding
one is lower as they are all answered sequentially in real time.

Those times are very typical of an intermittent connection - and
also something doing HDX on a FDX and generating collisions - which
FDX doesn't have.

You will be best served by fixing the port speeds on all NICs.


--
Bill Vermillion - bv @ wjv . com

Bela Lubkin

unread,
Nov 5, 2003, 2:38:24 PM11/5/03
to sco...@xenitec.ca
Bill Vermillion wrote:

Actually, those ping times are typical of DNS lookup failures. Or not
failures, but timeouts.

ping does its sending semi-autonomously, in the background, while it
processes received packets in the foreground. It receives a packet,
does a reverse DNS lookup to convert its IP address to a name. If that
reverse lookup takes 5 seconds, no other received packets are processed
during that time, but more packets are still _sent_ once a second by the
background processing. Then the DNS lookup completes. ping reports the
correct time for that first packet. The second packet was sent early in
the DNS wait, but didn't get read until much later (after the DNS lookup
completed), so its round-trip time is misreported. Each subsequent
packet looks like it took 1 second less, because each was _sent_ one
second later than the previous one.

Now, the _cause_ of the DNS timeouts might be something like what you're
talking about...

>Bela<

Bill Vermillion

unread,
Nov 5, 2003, 5:05:02 PM11/5/03
to
In article <20031105193...@sco.com>,

I'd agree if the once the long numbers settled down to the lower
numbers - eg going from 3000 down to 40, but going from 3080 to
2080 to 1080 to 40 to 2240 to 1240 to 240 doesn't fit with any DNS
I've seen. Once you have the DNS info all the times should settle
down to a number that is similar except for delays.

Also the packets are coming back out of sequence.

I see packet order of 3,1,2,4,5,6,7,8,9,13,10,11,12,14,21 ...

That surely does not indicate DNS to me. If I'm overlooking
someting obvious in DNS please do let me know.

>ping does its sending semi-autonomously, in the background,
>while it processes received packets in the foreground. It
>receives a packet, does a reverse DNS lookup to convert its
>IP address to a name. If that reverse lookup takes 5 seconds,
>no other received packets are processed during that time, but
>more packets are still _sent_ once a second by the background
>processing. Then the DNS lookup completes. ping reports the
>correct time for that first packet. The second packet was sent
>early in the DNS wait, but didn't get read until much later
>(after the DNS lookup completed), so its round-trip time is
>misreported. Each subsequent packet looks like it took 1 second
>less, because each was _sent_ one second later than the previous
>one.

And then the packet time would remain relatively even after the
huge numbers decremented. I didn't explain large numbers to small
numbers as well as you.

>Now, the _cause_ of the DNS timeouts might be something like
>what you're talking about...

Looking back at those sequence numbers and packets not being
returned in order, do you still feel that way?

It's almost as if some packets are taking a differnt route back.
It definately is screwy.

Bill

Bela Lubkin

unread,
Nov 5, 2003, 7:38:05 PM11/5/03
to sco...@xenitec.ca
Bill Vermillion wrote:

[regarding:]

> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=3 ttl=62 time=40 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=0 ttl=62 time=3080 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=1 ttl=62 time=2080 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=2 ttl=62 time=1090 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=4 ttl=62 time=2240 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=5 ttl=62 time=1240 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=6 ttl=62 time=240 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=7 ttl=62 time=2400 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=8 ttl=62 time=1400 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=9 ttl=62 time=400 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=13 ttl=62 time=40 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=10 ttl=62 time=3080 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=11 ttl=62 time=2080 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=12 ttl=62 time=1090 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=14 ttl=62 time=820 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=21 ttl=62 time=40 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=15 ttl=62 time=6110 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=16 ttl=62 time=5110 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=17 ttl=62 time=4120 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=18 ttl=62 time=3120 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=19 ttl=62 time=2120 ms
> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=20 ttl=62 time=1120 ms

Bela>> Actually, those ping times are typical of DNS lookup failures. Or not
Bela>> failures, but timeouts.

Bill> I'd agree if the once the long numbers settled down to the lower
Bill> numbers - eg going from 3000 down to 40, but going from 3080 to
Bill> 2080 to 1080 to 40 to 2240 to 1240 to 240 doesn't fit with any DNS
Bill> I've seen. Once you have the DNS info all the times should settle
Bill> down to a number that is similar except for delays.

For whatever reason (I'll speculate in a moment), OSR5 `ping` does a
reverse DNS lookup of _every_ packet it receives. It doesn't try to
cache IP-to-name information. This is probably so that if you had a
long-running ping and one day someone changed that address's name, ping
would suddenly start reporting the new name. This _could_ have been
implemented with a cache, some knowledge of DNS record timeouts, etc.,
but it wasn't.

The data above is _almost_ characteristic of OpenServer ping's handling
of DNS timeouts. But after a closer look I think I agree that something
different is going on.

Look at packets number 0 1 2 3. Because ping is sending them
monotonically at 1 second intervals, they were sent at times like
1000000.23, 1000001.23, 1000002.23, 1000003.23. Now, if they had been
received in sequence, here's what I would believe had happened: the
first packet came back 40ms after it was sent. ping read the packet,
noted that interval, did an RDNS lookup. The lookup took several
seconds. While it was waiting, its background thread continued to send
more pings, and their replies also came back in ~40ms each. But ping
didn't read them until its foreground reader thread came back from the
RDNS lookup, so _as far as it could tell_ they had taken much longer.
The reply to the packet sent at 1000000.23 was received at 1000000.27,
40ms later, and read immediately. The reply to the 1000001.23 packet
was received at 1000001.27, but ping didn't _read_ it until 1000004.30,
so it reported that as a ~3-second turnaround.

But:

Bill> Also the packets are coming back out of sequence.
Bill>
Bill> I see packet order of 3,1,2,4,5,6,7,8,9,13,10,11,12,14,21 ...
Bill>
Bill> That surely does not indicate DNS to me. If I'm overlooking
Bill> someting obvious in DNS please do let me know.

I think you're right, because of the out of order receipt. That means
that there was blockage somewhere along the way. Some router between
the two machines was holding either the outgoing packets or the replies
-- _not_ losing them, just holding them and eventually letting them all
fly at once. During this holding period they got out of order (which is
fairly normal, routers do not guarantee in-order delivert). When ping
finally received them back, it reported them as having taken various
times about 1.0 second apart, because they all arrived at the same time
but were _sent_ 1 second apart.

Bela>> ping does its sending semi-autonomously, in the background,
Bela>> while it processes received packets in the foreground. It
Bela>> receives a packet, does a reverse DNS lookup to convert its
Bela>> IP address to a name. If that reverse lookup takes 5 seconds,
Bela>> no other received packets are processed during that time, but
Bela>> more packets are still _sent_ once a second by the background
Bela>> processing. Then the DNS lookup completes. ping reports the
Bela>> correct time for that first packet. The second packet was sent
Bela>> early in the DNS wait, but didn't get read until much later
Bela>> (after the DNS lookup completed), so its round-trip time is
Bela>> misreported. Each subsequent packet looks like it took 1 second
Bela>> less, because each was _sent_ one second later than the previous
Bela>> one.

Bill> And then the packet time would remain relatively even after the
Bill> huge numbers decremented. I didn't explain large numbers to small
Bill> numbers as well as you.

Bela>> Now, the _cause_ of the DNS timeouts might be something like
Bela>> what you're talking about...

Bill> Looking back at those sequence numbers and packets not being
Bill> returned in order, do you still feel that way?
Bill>
Bill> It's almost as if some packets are taking a differnt route back.
Bill> It definately is screwy.

A router along the way is going into a mode where it collects but does
not forward packets; then waking back up and forwarding several seconds
worth of collected packets. The down time is plausible for e.g. an ISDN
link to go through the stages of: link drops; router notices link is
down; router redials; link comes back up.

>Bela<

Stefan Marquardt

unread,
Nov 6, 2003, 10:02:00 AM11/6/03
to
On Wed, 05 Nov 2003 18:05:05 GMT, b...@wjv.comREMOVE (Bill Vermillion)
wrote:

>You will be best served by fixing the port speeds on all NICs.

How can i see whether it's half or full duplex from remote ?

Device MAC address in use Factory MAC Address
------ ------------------ -------------------
/dev/net1 00:04:e2:0a:84:10 00:04:e2:0a:84:10

Multicast address table
-----------------------
01:00:5e:00:00:01

FRAMES
Unicast Multicast Broadcast Error Octets Queue Length
---------- --------- --------- ------ ----------- ------------
In: 18597 0 4481 0 2725265 0
Out: 18692 0 2 0 3830565 0

DLPI Module Info: 2 SAPs open, 18 SAPs maximum
473 frames received destined for an unbound SAP

MAC Driver Info: Media_type: Ethernet
Min_SDU: 14, Max_SDU: 1514, Address length: 6
Interface speed: 100 Mbits/sec

DLPI Restarts Info: Last queue size: 0
Last send time: 5352483
Restart in progress: 0
Interface Version: MDI 100

ETHERNET SPECIFIC STATISTICS

Collision Table - The number of frames successfully transmitted,
but involved in at least one collision:

Frames Frames
------- -------
1 collision 0 9 collisions 0
2 collisions 0 10 collisions 0
3 collisions 0 11 collisions 0
4 collisions 0 12 collisions 0
5 collisions 0 13 collisions 0
6 collisions 0 14 collisions 0
7 collisions 0 15 collisions 0
8 collisions 0 16 collisions 0

0 collisions = Switch ?

Stefan

Bill Vermillion

unread,
Nov 6, 2003, 11:15:01 AM11/6/03
to
In article <20031106003...@sco.com>,

That seems a rather bizarre way to do things. That's just from my
way of thinking about things. Most of the things based on names
lookup an IP for a given name. The chances that someone would
change a name on an IP but that would typically be seen only on a
local network would it not. As anything outside is going to rely
on someone elses DNS and when the address/IP resolv is made up
stream, even if it has to go to the root servers to get that IP
initially, then the next level up will cache that name/ip
resolution as long as the TTL is still valid. That's my
impression, but I've never looked at the source code.

>The data above is _almost_ characteristic of OpenServer ping's
>handling of DNS timeouts. But after a closer look I think I
>agree that something different is going on.

Many readers on this list are fairly recent - eg since the new
internet came up in the early-mid 1990s. But I've been reading
your posts for 17 to 18 years - going back to the old Dr.Dobbs
on Compuserve - and I dropped C'serve in '86 when I brought up
my own usenet node. This is the first time I've ever had a single
question about anything you posted, and I think I must finally be
starting to understand all this mess. I save a good deal of your
posts, and if I went back an searched the floppy archives I have
I'd think I'd even find messages from your mother.

I can envision that something somewhere is waiting until it gets a
packet big enough to send a minimum amount of data, or waits a
pre-determined interval to return that. But why I have no idea.

But what we don't know is just how far apart the pinged IP is.
The orginal poster had munged the original IP and gave no clue
as to what/where it was. Long delays would/could be indicative
of a typical land/satellite link. The sent data goes via land
line, and the return data is intercepted as I recall at the level
3 of the ISO stack, and diverted to an uplink and the down to the
end user. That would almost guarantee a minimum of about 700ms.
And I would think you really would want to aggregate the data
and send in bigger chunks.

I've read about problems on what are called 'elephants' [ELFN -
Exteremely Long Fat Networks - very high speed distant links where
they make the packets HUGE and have large windows, otherwise
the data is slowed by the handshake/protocols/etc of small packets
and few outstanding]. Probably has nothing to do with this, but it
reminded me of disusssion I'd recentely seen.

That had not crossed my mind as I've not worked with ISDN in quite
awhile - though at one ISP we had several using it - they thought
it would be cheaper. We propopsed a dedicated T1 from Florida
to Ohio, but they looked at the ISDN cost, and did not realized
there was a connect charge on each connection, and running a
remote mail kiosk they figured it would be cheaper than the $1500
for the PtP T. When they got their first phone bill of over
$5000 they realized that we did know what we were talking about.

So and ISDN could be it - and IF the line is used as voice and IP
- then the data channel will drop back to 64K when the vox is in
use. However now that many places are out-sourcing their dialup
systems getting a bondable ISDN is almost impossible as to
bond them you have to come in on the same PRI. At one place
their 'modem' bank occupied about 15 feet of rack space. Each
rack had at least 5 of the Lucent/Max - each with a DS3 and each
handling about 600 lines. So with about 30,000 lines available
it would be only by extreme chance you could get two links into the
same PRI, so that's why the only way to get a bonded system is
direct from telco, if that's still possible.

This is an interesting problem, and if any of our theories are
correct - then it will probably be solved only by an onsite person
who knows networking intimately.

Dave Gresham

unread,
Nov 6, 2003, 11:41:53 AM11/6/03
to
In article <hgokqvomnv9dpneo4...@4ax.com>,

Stefan Marquardt <erase-this.st...@hagebau.de> wrote:
>On Wed, 05 Nov 2003 18:05:05 GMT, b...@wjv.comREMOVE (Bill Vermillion)
>wrote:
>
>>You will be best served by fixing the port speeds on all NICs.
>
>How can i see whether it's half or full duplex from remote ?
>
>Device MAC address in use Factory MAC Address
>------ ------------------ -------------------
>/dev/net1 00:04:e2:0a:84:10 00:04:e2:0a:84:10
>
<snip>

>MAC Driver Info: Media_type: Ethernet
> Min_SDU: 14, Max_SDU: 1514, Address length: 6
> Interface speed: 100 Mbits/sec
>

Sco Openserver 5.0.5 by default sets the Nic Cards to Auto negotiate.
i quote from space.h in /etc/conf/pack.d/e3H/space.h

"Media type may be overridden. Default is to let the NIC determine
the speed and duplex mode."

For a good description of this, go to Tony's site at:

http://aplawrence.com/SCOFAQ/scotec4.html#duplexspeed

Dave

Jeff Liebermann

unread,
Nov 6, 2003, 11:37:12 AM11/6/03
to
On Thu, 06 Nov 2003 16:02:00 +0100, Stefan Marquardt
<erase-this.st...@hagebau.de> wrote:

>How can i see whether it's half or full duplex from remote ?

I couldn't find any incantation that displays this.

You may wanna run:
ndstat -l
and see if there are any obvious errors. However, I suspect you won't
find any.

If your target is local, see if you can create errors by using ping
flood.
ping -f target_IP
Hopefully, this will give ndstat something to chew upon.

My guess(tm) is that you have a bad switch, bad cable, miswired cable,
or 100baseT to 10baseT transition where the internal buffer in the
switch is losing packets. If the switch is a managed switch with an
IP address, try pinging the switch and see if the problem persists.
Also see if you can extract SNMP statistics from the switch if it's a
managed switch. Also try pinging other machines on the network. The
idea is to isolate the common network segment that's causing a
problem.

I've also successfully induced a similar problem with some creative
wiring on the ethernet cable. I had the polarity of on of the data
wires reversed. Everything sorta functioned but I had lots of delays
that did NOT show up on the server diagnostic output. The switch and
card were apparently spending their time doing almost continuous NWAY
negotiations. The only clue was that the lights on the switch port
would sometimes do a weird dance when there was no traffic. It was
difficult to see as it sorta looked like normal traffic. If you have
any home made cables, I suggest you check them.

Since the target machine is on the local LAN, I can safely assume that
all packets are coming and going directly to the target and are NOT
being routed through some circuitous route. Just to be sure, run:
netstat -rn
and see if the routing table looks sane.

There are two different chips used on this card. The old one uses a
DEC Tulip chip. The current version uses an "Epic" 83C170 chip. The
Linux Epic driver code mentions that some chips have a hardware
multicast filter flaw. That should not affect ping which is unicast.
http://www.scyld.com/network/epic100.html

--
Jeff Liebermann 150 Felker St #D Santa Cruz CA 95060
(831)421-6491 pgr (831)336-2558 home
http://www.LearnByDestroying.com AE6KS
je...@comix.santa-cruz.ca.us je...@cruzio.com

Bill Vermillion

unread,
Nov 6, 2003, 12:35:00 PM11/6/03
to
In article <3faa79d1$0$41289$a186...@newsreader.visi.com>,

>http://aplawrence.com/SCOFAQ/scotec4.html#duplexspeed

And for a very good description on what happens when things are
mis-matched see http://www.cisco.com/warp/public/473/46.html

Though it was written about a Cisco switch it documents how
manufacturers adding their own ehancements make the the
problem of automatically determing duplex mode and the transfer
speed impossible in some circumstanes. A chart shows what happens
in each of the instances.

Jeff Liebermann

unread,
Nov 6, 2003, 12:14:33 PM11/6/03
to
On Thu, 6 Nov 2003 00:38:05 GMT, Bela Lubkin <be...@sco.com> wrote:

>For whatever reason (I'll speculate in a moment), OSR5 `ping` does a
>reverse DNS lookup of _every_ packet it receives.

Duz it also do this if one pings by IP address instead of by name?
One would assume that the IP address doesn't change in mid session and
therefore does not require a reverse DNS lookup.

Bela Lubkin

unread,
Nov 6, 2003, 1:16:03 PM11/6/03
to sco...@xenitec.ca
Bill Vermillion wrote:

> In article <20031106003...@sco.com>,
> Bela Lubkin <be...@sco.com> wrote:

> >For whatever reason (I'll speculate in a moment), OSR5 `ping` does a
> >reverse DNS lookup of _every_ packet it receives. It doesn't try to
> >cache IP-to-name information. This is probably so that if you had a
> >long-running ping and one day someone changed that address's name, ping
> >would suddenly start reporting the new name. This _could_ have been
> >implemented with a cache, some knowledge of DNS record timeouts, etc.,
> >but it wasn't.
>
> That seems a rather bizarre way to do things. That's just from my
> way of thinking about things. Most of the things based on names
> lookup an IP for a given name. The chances that someone would
> change a name on an IP but that would typically be seen only on a
> local network would it not. As anything outside is going to rely
> on someone elses DNS and when the address/IP resolv is made up
> stream, even if it has to go to the root servers to get that IP
> initially, then the next level up will cache that name/ip
> resolution as long as the TTL is still valid. That's my
> impression, but I've never looked at the source code.

You could have a "link watcher daemon" running, something like:

# Record link status every 2 minutes, forever

ping -i 120 123.456.789.012 > /usr/spool/linkwatch 2>&1 &

If the owner of "123.456.789.012" changed its name one day, it would
eventually show up in your log. Maybe not right away, because even
though ping did an RDNS lookup for every packet, the server might not
time out the old name for a few hours. But eventually it would see the
name change.

ping could have some awareness of DNS TTL, and only re-lookup an address
whose expiration time had passed. The fact is, it doesn't.

The fastest packets in the original ping output were 40ms -- clearly not
a satellite or anything particularly weird. In fact let me re-quote a
bit of it:

> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=3 ttl=62 time=40 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=0 ttl=62 time=3080 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=1 ttl=62 time=2080 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=2 ttl=62 time=1090 ms

Bear in mind that each sequential packet is _sent_ 1 second after the
previous one. So if two packets were sent at times 100000.23 and
100001.23, and both were received simultaneously at 100002.50, what
would ping report? It would show 2270ms for the first and 1270ms for
the second. If we assume that the above four packets were sent at exact
1-second interval, they were received at almost exactly the same time:

icmp_seq=3 time=40 ms sent=100003.23 received=100003.27
icmp_seq=0 time=3080 ms sent=100000.23 received=100003.31
icmp_seq=1 time=2080 ms sent=100001.23 received=100003.31
icmp_seq=2 time=1090 ms sent=100002.23 received=100003.32

I built that by assigning an arbitrary time to the first packet
(100000.23), then adding 1.00 second to each according to its sequence
number, then adding the reported round-trip time. This reads as if some
part of the link was down for at least 3 seconds, delaying either the
outbound or return trip of packets 0-2. The delay between return
receipt of packets 3 & 0 suggests that other data was also stuck in the
buffers, otherwise all 4 returns would have been received back to back.

The fact that packets 0 & 1 were received during the same 10ms timer
tick sets a lower limit on the speed of the slowest link -- if those
packets were traveling back to back, the slowest link must be able to
transmit at least 64 bytes per 10ms, or 6400 bytes/sec, or approximately
64Kbps. We don't know much about upper limit (any number of other,
non-ping packets could have been traveling at the same time).

> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=21 ttl=62 time=40 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=15 ttl=62 time=6110 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=16 ttl=62 time=5110 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=17 ttl=62 time=4120 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=18 ttl=62 time=3120 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=19 ttl=62 time=2120 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=20 ttl=62 time=1120 ms

This bit shows that the link sometimes goes down for as much as 6
seconds at a time.

In each of these two examples, we got back the last sequential packet
first, and its time was very low. Many router-like devices have
behaviors where receiving a packet to a particular known remote
destination will cause link bring-up. It seems likely that the packet
which caused link bring-up would also travel the link first. It is also
common that packets already in holding buffers will not actively cause
link bring-up (that is, the router may attempt bring-up when the packet
is first received, but if it is unsuccessful, it won't try again until
_another_ packet triggers the behavior). This router is pretty good
about keeping packets in received order, but the bring-up triggering
behavior causes the small amount of disordering we see in the output.

Two other segments are different:

> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=4 ttl=62 time=2240 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=5 ttl=62 time=1240 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=6 ttl=62 time=240 ms

> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=7 ttl=62 time=2400 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=8 ttl=62 time=1400 ms
> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=9 ttl=62 time=400 ms

In each of these segments, it looks like some _other_ packet, other
business this system had with stuff over the link, caused link bring-up.
None of the ping packets are out of order, and the newest packet clearly
had been held up for longer than the physical link turnaround time.

This analysis seems like exactly the sort of thing that an expert system
could be good at. Sure enough, searching on ``"expert system" ping
routers troubleshooting'' turns up a bunch of matches. I wonder if any
of them are actually any good?

I suspect the original sample we were shown was during an especially bad
period, that usually pings are clean with only occasional excursions. I
bet if we had 1000 pings' worth of results, we could diagnose it much
more closely (or at least an expert system could, it would have the
patience...)

>Bela<

Bela Lubkin

unread,
Nov 6, 2003, 1:36:09 PM11/6/03
to sco...@xenitec.ca
Jeff Liebermann wrote:

> On Thu, 6 Nov 2003 00:38:05 GMT, Bela Lubkin <be...@sco.com> wrote:
>
> >For whatever reason (I'll speculate in a moment), OSR5 `ping` does a
> >reverse DNS lookup of _every_ packet it receives.
>
> Duz it also do this if one pings by IP address instead of by name?
> One would assume that the IP address doesn't change in mid session and
> therefore does not require a reverse DNS lookup.

`ping` _does_ do RDNS lookups on numeric pings:

$ ping -c 1 192.122.209.42
PING 192.122.209.42 (192.122.209.42): 56 data bytes
64 bytes from deepthought.armory.com (192.122.209.42): icmp_seq=0 ttl=51 time=80 ms

--- 192.122.209.42 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 80/80/80 ms

If you want to suppress RDNS lookups, use "-n".

>Bela<

Jeff Liebermann

unread,
Nov 6, 2003, 5:35:11 PM11/6/03
to

Good. Then it can be suppressed. May I suggest that Mr Marquardt try
it with:
ping -n 10.22.136.54
which should either identify or eliminate DNS as the probable culprit.


--
# Jeff Liebermann 150 Felker St #D Santa Cruz CA 95060
# 831.336.2558 voice http://www.LearnByDestroying.com
# je...@comix.santa-cruz.ca.us
# 831.421.6491 digital_pager je...@cruzio.com AE6KS

Bill Vermillion

unread,
Nov 7, 2003, 6:05:01 AM11/7/03
to
In article <20031106181...@sco.com>,

Bela Lubkin <be...@sco.com> wrote:
>Bill Vermillion wrote:
>
>> In article <20031106003...@sco.com>,
>> Bela Lubkin <be...@sco.com> wrote:

>> >For whatever reason (I'll speculate in a moment), OSR5 `ping` does a
>> >reverse DNS lookup of _every_ packet it receives. It doesn't try to

>> >cache IP-to-name information. ...

...

>> That seems a rather bizarre way to do things. That's just from my

>> way of thinking about things. ...

>You could have a "link watcher daemon" running, something like:

> # Record link status every 2 minutes, forever

> ping -i 120 123.456.789.012 > /usr/spool/linkwatch 2>&1 &

>If the owner of "123.456.789.012" changed its name one day, it would
>eventually show up in your log. Maybe not right away, because even
>though ping did an RDNS lookup for every packet, the server might not
>time out the old name for a few hours. But eventually it would see the
>name change.

On the machines that are in my rackspace [even client machines that
I don't have access to] I run 'arpwatch'.

As to pinging every 2 minutes and then logging, one approach would
be to record the data to a file. Then the next time it runs, do a
diff on the new run vs the file, and if it's the same throw it
away. If it changes, save the old file with an extension, save the
new data, and send email to the admin. My BSD servers do a similar
check everyday for any changes at all to any SUID/SGID files
regarding size, time, owners etc. I hate logs that have too much
redundant data.

>ping could have some awareness of DNS TTL, and only re-lookup
>an address whose expiration time had passed. The fact is, it
>doesn't.

Is this behavior only with the OSR5 ping or is it this way in UW7
also. But I tend to use pings differently as mostly I'm in a
different environment. Trying to isolate a problem that we could
not determine if it was telco or our side, the telco guy built a 3
city/state loop to test an ATM circuit that seemed to fail after a set
number of bytes. So we did a flood ping with 1000 byte packets
and had a 3rd person monitoring via serial and found a buffer
overflow that locked a router, which then was able to sense that
and reset itself. So I look at pings different and the RDNS each
time doesn't fit my mindset. So thanks for that information.

...

>The fastest packets in the original ping output were 40ms --
>clearly not a satellite or anything particularly weird. In fact
>let me re-quote a bit of it:

Yup. I was just thinking of different things on the long pings and
all I did was cloud the issue. Sorry about that.

>> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=3 ttl=62 time=40 ms
>> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=0 ttl=62 time=3080 ms
>> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=1 ttl=62 time=2080 ms
>> >> >> >64 bytes from kasseob4 (10.22.136.54): icmp_seq=2 ttl=62 time=1090 ms

>Bear in mind that each sequential packet is _sent_ 1 second after the
>previous one. So if two packets were sent at times 100000.23 and
>100001.23, and both were received simultaneously at 100002.50, what
>would ping report? It would show 2270ms for the first and 1270ms for
>the second. If we assume that the above four packets were sent at exact
>1-second interval, they were received at almost exactly the same time:

> icmp_seq=3 time=40 ms sent=100003.23 received=100003.27
> icmp_seq=0 time=3080 ms sent=100000.23 received=100003.31
> icmp_seq=1 time=2080 ms sent=100001.23 received=100003.31
> icmp_seq=2 time=1090 ms sent=100002.23 received=100003.32

>I built that by assigning an arbitrary time to the first packet
>(100000.23), then adding 1.00 second to each according to its sequence
>number, then adding the reported round-trip time. This reads as if some
>part of the link was down for at least 3 seconds, delaying either the
>outbound or return trip of packets 0-2. The delay between return
>receipt of packets 3 & 0 suggests that other data was also stuck in the
>buffers, otherwise all 4 returns would have been received back to back.

I think Jeff's take on a bad switch or cable may be it. An FDX/HDX
mismatch could also be it, if the HDX failed [and it would because
the FDX doesn't use collisions] then it backs off for a retry as
it is supposed to. I think I mentioned checking things like that
in the first reply.

...

>In each of these two examples, we got back the last sequential
>packet first, and its time was very low. Many router-like
>devices have behaviors where receiving a packet to a particular
>known remote destination will cause link bring-up. It seems
>likely that the packet which caused link bring-up would also
>travel the link first.

I see that.

I hope that when this is resolved the original poster will give us
the details of the problem. So often we see question, and answers
given, and then don't know what fixed it.

>I suspect the original sample we were shown was during an
>especially bad period, that usually pings are clean with only
>occasional excursions. I bet if we had 1000 pings' worth of
>results, we could diagnose it much more closely (or at least an
>expert system could, it would have the patience...)

Typical of many things I've seen. Not quite enough data to really
know what is going on.

I'll bow out of this thread for now.

Thanks for all the information

Stefan Marquardt

unread,
Nov 7, 2003, 8:46:25 AM11/7/03
to
On 06 Nov 2003 16:41:53 GMT, gre...@visi.com (Dave Gresham) wrote:

>For a good description of this, go to Tony's site at:
>
>http://aplawrence.com/SCOFAQ/scotec4.html#duplexspeed

I don't find anything with "ndstat -l" whether it's running
full-duplex or not.

I thinks this tool isn't correct.

I just makes some test with a 1GB card from Broadcom in another SCO
5.0.7 host

ndstat shows 1000 Mbit
If i reconnect the cable i got 100MB full duplex on the console and
all switsch shows 100MB halfduplex.

Stefan

Stefan Marquardt

unread,
Nov 7, 2003, 8:53:51 AM11/7/03
to
On Thu, 06 Nov 2003 08:37:12 -0800, Jeff Liebermann
<je...@comix.santa-cruz.ca.us> wrote:

>My guess(tm) is that you have a bad switch, bad cable, miswired cable,
>or 100baseT to 10baseT transition where the internal buffer in the
>switch is losing packets. If the switch is a managed switch with an
>IP address, try pinging the switch and see if the problem persists.
>Also see if you can extract SNMP statistics from the switch if it's a
>managed switch. Also try pinging other machines on the network. The
>idea is to isolate the common network segment that's causing a
>problem.

Connect nic -> 3com SuperStack II switching hub -> Cisco -> Cisco ->
Host.

Next curious thing (i did all remote)
I set on kasseob4 the speed fix to 100MB and halfduplex.
During the system comes up the customer told me that the PC behind him
(same HW except nic -> 3com ISA) lost network connection.
I first couldn't trust it that i was the cause for the error !
I started some ping to this PC from the host and after 16 seconds i
got the first ping back:

PING kasseob3 (10.22.136.53): 56 data bytes
64 bytes from kasseob3 (10.22.136.53): icmp_seq=0 ttl=62 time=18780 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=1 ttl=62 time=17800 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=2 ttl=62 time=16790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=3 ttl=62 time=15780 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=4 ttl=62 time=14780 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=5 ttl=62 time=13790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=6 ttl=62 time=12790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=7 ttl=62 time=11790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=8 ttl=62 time=10790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=9 ttl=62 time=9790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=10 ttl=62 time=8790 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=11 ttl=62 time=7800 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=12 ttl=62 time=6800 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=13 ttl=62 time=5810 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=14 ttl=62 time=4810 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=15 ttl=62 time=3810 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=16 ttl=62 time=2810 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=17 ttl=62 time=1810 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=18 ttl=62 time=810 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=19 ttl=62 time=190 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=20 ttl=62 time=40 ms
64 bytes from kasseob3 (10.22.136.53): icmp_seq=21 ttl=62 time=100 ms

We redo that.
100% the host kasseob4 with the fixed speed knocks out for a few
seconds kasseob3 and the pings reduce one second / second.
kasseob4 had no conenction with the fixed speed i have to lead the
customer through netconfig to change it back.

This problem happens every month 1-2 times.


The new problem with knocking out the other PC is very funny.

I send a technical assistance to the customer to check hub and cables.

Regards,
Stefan

Stefan Marquardt

unread,
Nov 7, 2003, 10:38:57 AM11/7/03
to
On Thu, 06 Nov 2003 22:35:11 GMT, Jeff Liebermann
<je...@comix.santa-cruz.ca.us> wrote:

>On Thu, 6 Nov 2003 18:36:09 GMT, Bela Lubkin <be...@sco.com> wrote:
>
>>Jeff Liebermann wrote:
>>
>>> On Thu, 6 Nov 2003 00:38:05 GMT, Bela Lubkin <be...@sco.com> wrote:
>>>
>>> >For whatever reason (I'll speculate in a moment), OSR5 `ping` does a
>>> >reverse DNS lookup of _every_ packet it receives.
>>>
>>> Duz it also do this if one pings by IP address instead of by name?
>>> One would assume that the IP address doesn't change in mid session and
>>> therefore does not require a reverse DNS lookup.
>
>>`ping` _does_ do RDNS lookups on numeric pings:
>>
>> $ ping -c 1 192.122.209.42
>> PING 192.122.209.42 (192.122.209.42): 56 data bytes
>> 64 bytes from deepthought.armory.com (192.122.209.42): icmp_seq=0 ttl=51 time=80 ms
>>
>> --- 192.122.209.42 ping statistics ---
>> 1 packets transmitted, 1 packets received, 0% packet loss
>> round-trip min/avg/max = 80/80/80 ms
>>
>>If you want to suppress RDNS lookups, use "-n".
>>
>>>Bela<
>
>Good. Then it can be suppressed. May I suggest that Mr Marquardt try
>it with:
> ping -n 10.22.136.54
>which should either identify or eliminate DNS as the probable culprit.


Just i got the same error at another customer where the PC is
connected local:

# > ping -n 10.18.24.43
PING 10.18.24.43 (10.18.24.43): 56 data bytes
64 bytes from 10.18.24.43: icmp_seq=2 ttl=64 time=0 ms
64 bytes from 10.18.24.43: icmp_seq=0 ttl=64 time=2020 ms
64 bytes from 10.18.24.43: icmp_seq=1 ttl=64 time=1010 ms
64 bytes from 10.18.24.43: icmp_seq=3 ttl=64 time=2400 ms
64 bytes from 10.18.24.43: icmp_seq=4 ttl=64 time=1390 ms
64 bytes from 10.18.24.43: icmp_seq=5 ttl=64 time=390 ms
64 bytes from 10.18.24.43: icmp_seq=10 ttl=64 time=0 ms
64 bytes from 10.18.24.43: icmp_seq=6 ttl=64 time=4040 ms
64 bytes from 10.18.24.43: icmp_seq=7 ttl=64 time=3030 ms
64 bytes from 10.18.24.43: icmp_seq=8 ttl=64 time=2020 ms
64 bytes from 10.18.24.43: icmp_seq=9 ttl=64 time=1010 ms
64 bytes from 10.18.24.43: icmp_seq=15 ttl=64 time=0 ms
64 bytes from 10.18.24.43: icmp_seq=11 ttl=64 time=4040 ms
64 bytes from 10.18.24.43: icmp_seq=12 ttl=64 time=3030 ms
64 bytes from 10.18.24.43: icmp_seq=13 ttl=64 time=2020 ms
64 bytes from 10.18.24.43: icmp_seq=14 ttl=64 time=1010 ms
64 bytes from 10.18.24.43: icmp_seq=16 ttl=64 time=2390 ms
64 bytes from 10.18.24.43: icmp_seq=17 ttl=64 time=1380 ms
64 bytes from 10.18.24.43: icmp_seq=18 ttl=64 time=370 ms

--- 10.18.24.43 ping statistics ---
20 packets transmitted, 19 packets received, 5% packet loss
round-trip min/avg/max = 0/1660/4040 ms

i use ping -n : no DNS problem

I make a rlogin to it (very slow) and try a ping on localhost:

kassedo3 >ping -c20 localhost
PING localhost (127.0.0.1): 56 data bytes
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=5 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=6 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=7 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=8 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=9 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=10 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=11 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=12 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=13 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=14 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=15 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=16 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=17 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=18 ttl=64 time=0 ms
64 bytes from localhost (127.0.0.1): icmp_seq=19 ttl=64 time=0 ms

After rebooting the PC:

sender > ping kassedo3
PING kassedo3 (10.18.24.43): 56 data bytes
64 bytes from kassedo3 (10.18.24.43): icmp_seq=0 ttl=64 time=10 ms
64 bytes from kassedo3 (10.18.24.43): icmp_seq=1 ttl=64 time=0 ms
64 bytes from kassedo3 (10.18.24.43): icmp_seq=2 ttl=64 time=0 ms
64 bytes from kassedo3 (10.18.24.43): icmp_seq=3 ttl=64 time=0 ms
64 bytes from kassedo3 (10.18.24.43): icmp_seq=4 ttl=64 time=0 ms
64 bytes from kassedo3 (10.18.24.43): icmp_seq=5 ttl=64 time=0 ms

--- kassedo3 ping statistics ---
6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 0/1/10 ms


Regards,
Stefan

Bill Vermillion

unread,
Nov 7, 2003, 2:25:05 PM11/7/03
to
In article <qi8nqvgpg5j33g27e...@4ax.com>,

Stefan Marquardt <erase-this.st...@hagebau.de> wrote:
>On Thu, 06 Nov 2003 08:37:12 -0800, Jeff Liebermann
><je...@comix.santa-cruz.ca.us> wrote:

>>My guess(tm) is that you have a bad switch, bad cable, miswired cable,
>>or 100baseT to 10baseT transition where the internal buffer in the
>>switch is losing packets. If the switch is a managed switch with an
>>IP address, try pinging the switch and see if the problem persists.
>>Also see if you can extract SNMP statistics from the switch if it's a
>>managed switch. Also try pinging other machines on the network. The
>>idea is to isolate the common network segment that's causing a
>>problem.
>
>Connect nic -> 3com SuperStack II switching hub -> Cisco -> Cisco ->
>Host.
>
>Next curious thing (i did all remote)
>I set on kasseob4 the speed fix to 100MB and halfduplex.
>During the system comes up the customer told me that the PC behind him
>(same HW except nic -> 3com ISA) lost network connection.

-----------------------------^^^^--- that's really old technology,
and the card could be failing. Can you put in a new faster
PCI card?

And as to the Cisco, be sure to check that link I posted in a
message or so back from the Cicso site. Follow that doc and look
at all the possible choices, and do set things up the way they say
on everything and you might be working again.

>This problem happens every month 1-2 times.

Now that really sounds like something is in automatic mode and
failing to get the correct duplex setting. Getting the wrong
speed - unless something is doing conversion - just fails.

>The new problem with knocking out the other PC is very funny.

>I send a technical assistance to the customer to check hub and cables.

Have him check all the settings discussed in the Cisco document.

Jean-Pierre Radley

unread,
Nov 7, 2003, 2:20:18 PM11/7/03
to erase-this.st...@hagebau.de
Stefan Marquardt typed (on Fri, Nov 07, 2003 at 02:46:25PM +0100):

Did you install the Broadcom driver that SCO provided on Aug 25th?

--
JP

Bela Lubkin

unread,
Nov 7, 2003, 2:35:56 PM11/7/03
to sco...@xenitec.ca
Bill Vermillion wrote:

> >You could have a "link watcher daemon" running, something like:
>
> > # Record link status every 2 minutes, forever
>
> > ping -i 120 123.456.789.012 > /usr/spool/linkwatch 2>&1 &
>
> >If the owner of "123.456.789.012" changed its name one day, it would
> >eventually show up in your log. Maybe not right away, because even
> >though ping did an RDNS lookup for every packet, the server might not
> >time out the old name for a few hours. But eventually it would see the
> >name change.
>
> On the machines that are in my rackspace [even client machines that
> I don't have access to] I run 'arpwatch'.
>
> As to pinging every 2 minutes and then logging, one approach would
> be to record the data to a file. Then the next time it runs, do a
> diff on the new run vs the file, and if it's the same throw it
> away. If it changes, save the old file with an extension, save the
> new data, and send email to the admin. My BSD servers do a similar
> check everyday for any changes at all to any SUID/SGID files
> regarding size, time, owners etc. I hate logs that have too much
> redundant data.

My point was that `ping` is written such that a daemon like that would
work. I wasn't suggesting that was a really good way to write a
link-watcher daemon; rather, that the original author of the RDNS lookup
code in `ping` thought that someone, somewhere, would probably do that,
and the resulting behavior should be reasonable. To that author,
"reasonable" meant that if the public name of some IP address changed,
the change should become apparent in the output of an already-running
`ping` session.

It could have been done better, at the expense of a lot more code in
`ping`. Other `ping` implementations probably do this stuff
differently.

The simplest thing to do would be to cache the most recent RDNS result.
That has two problems: one, it isn't sensitive to name changes, and two,
it doesn't help if you're receiving multiple replies. You can ping your
local net's broadcast address and you'll get replies from most of the
machines on the net. Caching one RDNS result wouldn't help at all in
that situation. Also, you might be pinging a remote host and getting
two replies due to some sort of routing problem, bug in the remote host,
etc. -- the very sorts of things that `ping` helps you figure out.
Caching the most recent result would help _many_ cases, though.

It would also help to split the action into three threads: a sender, a
reader, and a reporter. Then read times wouldn't be affected by RDNS
delays. You might see a long gap in the printout, but when it finally
printed out, each ping's turnaround time would be accurate.

I've thought many times about making changes like this, and so far I've
left it alone because it would be very easy to break `ping` -- introduce
worse emergent behaviors than it currently has...

> >ping could have some awareness of DNS TTL, and only re-lookup
> >an address whose expiration time had passed. The fact is, it
> >doesn't.
>
> Is this behavior only with the OSR5 ping or is it this way in UW7
> also.

I don't know. I suspect so, given the common heritage of the OSR5 and
UW7 TCP/IP stacks.

>Bela<

Mike Brown

unread,
Nov 7, 2003, 7:06:28 PM11/7/03
to

I have seen similar problems when the intelligent switch/router was
playing with its arp table. It would hold all the packets for a
moment, then forward then all in a clump.

You could try packet sniffing to see what is happening, or maybe
replace the switch with a dumb hub as a test.

Mike

--
Michael Brown

The Kingsway Group

Rainer Zocholl

unread,
Nov 8, 2003, 6:00:00 PM11/8/03
to
(Bill Vermillion) 05.11.03 in /comp/unix/sco/misc:

>In article <14biqvckqcobdh1h3...@4ax.com>,
>Stefan Marquardt <erase-this.st...@hagebau.de> wrote:
>>Hello,
>>
>>sometimes we had on many SCO PC's with 5.0.4 and 5.0.6 very slow
>>network conection.
>>
>>PING kasseob4 (10.22.136.54): 56 data bytes
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=3 ttl=62 time=40 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=0 ttl=62 time=3080 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=1 ttl=62 time=2080 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=2 ttl=62 time=1090 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=4 ttl=62 time=2240 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=5 ttl=62 time=1240 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=6 ttl=62 time=240 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=7 ttl=62 time=2400 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=8 ttl=62 time=1400 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=9 ttl=62 time=400 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=13 ttl=62 time=40 ms
>>64 bytes from kasseob4 (10.22.136.54): icmp_seq=10 ttl=62 time=3080
>>ms 64 bytes from kasseob4 (10.22.136.54): icmp_seq=11 ttl=62
>>time=2080 ms 64 bytes from kasseob4 (10.22.136.54): icmp_seq=12
>>ttl=62 time=1090 ms 64 bytes from kasseob4 (10.22.136.54):
>>icmp_seq=14 ttl=62 time=820 ms 64 bytes from kasseob4 (10.22.136.54):
>>icmp_seq=21 ttl=62 time=40 ms 64 bytes from kasseob4 (10.22.136.54):
>>icmp_seq=15 ttl=62 time=6110 ms 64 bytes from kasseob4

>Some auto-negotiation could be failing.

ACK.
Or the LAN is very high loaded.
(loop?)

>And if something goes into fdx mode the hubs are the culprit.
>Swithces are cheap enough to
>toss all the hubs into the trash.

>Those times are very typical of an intermittent connection - and
>also something doing HDX on a FDX and generating collisions - which
>FDX doesn't have.

>You will be best served by fixing the port speeds on all NICs.

Normaly that "fixing" causes the auto-negotiation to fail!
That means that the Switch "thinks" that he is connected to
hub, because the card does not talk "n-way" anymore (like a stupid hub).
And a connected hub means: "No full duplex".
But the card may try "full duplex"...ouch.


easyly said: n-way has no way to negotiate a fixed data rate.
(That causes 3com to ignore that point in their software in windows...)

Typical you will see the LEDs on the switch indicating collisions.

So:
Never turn auto-negotiation off if you have modest modern materials.


Stefan Marquard could try to increas ping package size to
force the problem. Or start douzends f ping in the back ground
to cause a little bit more load in the LAN.

Too the network might have routing problems?
Saw that sometimes packets were routed (on their way back, invisible
for a normal traceroute) over a WAN connection...


Sometime a "netstat -i" may be very interessting.
Are there more than 0 errors listed?

Rainer Zocholl

unread,
Nov 8, 2003, 6:23:00 PM11/8/03
to
begin (Stefan Marquardt) 07.11.03 in /comp/unix/sco/misc:

>On Thu, 06 Nov 2003 08:37:12 -0800, Jeff Liebermann
><je...@comix.santa-cruz.ca.us> wrote:

>>My guess(tm) is that you have a bad switch, bad cable, miswired
>>cable, or 100baseT to 10baseT transition where the internal buffer in
>>the switch is losing packets. If the switch is a managed switch with
>>an IP address, try pinging the switch and see if the problem
>>persists. Also see if you can extract SNMP statistics from the switch
>>if it's a managed switch. Also try pinging other machines on the
>>network. The idea is to isolate the common network segment that's
>>causing a problem.

>Connect nic -> 3com SuperStack II switching hub -> Cisco -> Cisco ->
>Host.

All running the "best" software version available?
You know/checked the ciscos?
Some of them uses chips that have big trouble with short(!) cables.
Try 50m (fifty meters) extra cables or the recommended adapther to fix that.

I know that too. That's was caused by a broken "sniffing" hub.
Repowering the hub helped.


>The new problem with knocking out the other PC is very funny.

>I send a technical assistance to the customer to check hub and cables.

Once i saw a customer how managened to connect his laptop
as 9th device in a 8 port switch... ;-)

Surprise surprise:
His notebook connected to the "free" port parallel(!) to the uplink
X-port works. But the data rate was missable...

Stefan Marquardt

unread,
Nov 18, 2003, 3:57:39 AM11/18/03
to

Broadcom Gigabit Ethernet Driver (ver 6.0.1a)

I don't know from which date it is ..

Stefan

Jean-Pierre Radley

unread,
Nov 18, 2003, 12:20:04 PM11/18/03
to erase-this.st...@hagebau.de
Stefan Marquardt typed (on Tue, Nov 18, 2003 at 09:57:39AM +0100):

If you installed the bcme driver published by SCO last August, then
'custom' would show you that package. Does it?

--
JP

0 new messages