Setup: a local ethernet network. Two machines. One PDP-11 and one VAX.
Running ping, once at a time, from the PDP-11 and the VAX.
(The PDP-11 running my own IP-stack, just in case you wonder.)
Now, I would expect ping times to be somewhat similar from both sides,
but they were not.
Running on the PDP-11:
.ping krille
PING Krille.Update.UU.SE (130.238.19.20): 16 data bytes
16 bytes from 130.238.19.20: icmp_seq=1 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=2 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=3 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=4 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=5 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=6 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=7 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=8 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=9 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=10 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=11 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=12 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=13 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=14 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=15 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=16 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=17 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=18 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=19 ttl=255 time=0 ms
16 bytes from 130.238.19.20: icmp_seq=20 ttl=255 time=0 ms
----Krille.Update.UU.SE PING Statistics----
20 packets transmitted, 20 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
.
Running from the VAX:
Krille:local/bqt> ping mim
PING mim.Update.UU.SE (130.238.19.211): 56 data bytes
64 bytes from 130.238.19.211: icmp_seq=0 ttl=64 time=121.044 ms
64 bytes from 130.238.19.211: icmp_seq=1 ttl=64 time=205.748 ms
64 bytes from 130.238.19.211: icmp_seq=2 ttl=64 time=204.506 ms
64 bytes from 130.238.19.211: icmp_seq=3 ttl=64 time=234.709 ms
64 bytes from 130.238.19.211: icmp_seq=4 ttl=64 time=30.544 ms
64 bytes from 130.238.19.211: icmp_seq=5 ttl=64 time=160.720 ms
64 bytes from 130.238.19.211: icmp_seq=6 ttl=64 time=69.346 ms
64 bytes from 130.238.19.211: icmp_seq=7 ttl=64 time=8.665 ms
64 bytes from 130.238.19.211: icmp_seq=8 ttl=64 time=8.767 ms
64 bytes from 130.238.19.211: icmp_seq=9 ttl=64 time=9.040 ms
64 bytes from 130.238.19.211: icmp_seq=10 ttl=64 time=8.972 ms
64 bytes from 130.238.19.211: icmp_seq=11 ttl=64 time=11.386 ms
64 bytes from 130.238.19.211: icmp_seq=12 ttl=64 time=8.109 ms
64 bytes from 130.238.19.211: icmp_seq=13 ttl=64 time=18.590 ms
64 bytes from 130.238.19.211: icmp_seq=14 ttl=64 time=7.756 ms
64 bytes from 130.238.19.211: icmp_seq=15 ttl=64 time=7.390 ms
64 bytes from 130.238.19.211: icmp_seq=16 ttl=64 time=199.498 ms
64 bytes from 130.238.19.211: icmp_seq=17 ttl=64 time=7.783 ms
64 bytes from 130.238.19.211: icmp_seq=18 ttl=64 time=8.776 ms
64 bytes from 130.238.19.211: icmp_seq=19 ttl=64 time=8.236 ms
^C
----mim.Update.UU.SE PING Statistics----
20 packets transmitted, 20 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 7.390/66.979/234.709/84.827 ms
Krille:local/bqt>
Some additional details, that might be worth pointing out: The resultion
of times on the PDP-11 is 20ms, so anything less than that will just be
0. So the consistent 0ms times are not a bug. If I ping something a bit
further away I do indeed get longer times, which are visible. It's just
that this is close enough, and fast enough, that not a single clock tick
happens to pass before the reply comes back.
The VAX was running a make update on a pkgsrc directory meanwhile, but
this was equally true for the pings in both directions.
The wildly varying times when running ping from the VAX to the PDP-11
seems weird, and not good. Obviosuly, the IP stack itself is running at
full speed without problems, since ICMP echo reply packets go out as
soon as the echo request comes in. So, somewhere between the IP stack,
and userland, serious delays are introduced. Or else scheduling is
messing things up. Or something else I can't really think of is causing
this behaviour.
Anyone seen anything similar, or have any good ideas on why this
happens? I think it's a sign that something is not right anyway.
Oh, and the mandatory:
Krille:local/bqt> uname -a
NetBSD Krille.Update.UU.SE 5.99.40 NetBSD 5.99.40 (Krille) #367: Sat Jan
15 01:56:58 CET 2011
ro...@GW.SoftJAR.SE:/usr/obj/sys/arch/vax/compile/Krille vax
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: b...@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de
However, the fact that it happens much less on the 4000/90 suggest that
it might be some problem that is easier to hit on a slow machine, or
else takes longer to recover from on a slow machine.
Johnny
> I just repeated this test on a VAX-4000/90 and a MacBook PRO, and I can
> observe the same thing, although with less frequency. But occasional
> pings from the VAX to the Mac also get about 200 ms time, but not in the
> other direction. (With some load on the VAX.)
I tried your ping test on a couple of my VAX boxes, both running 5.1
GENERIC, and see similar symptoms. The machines were a 3100/10e and a
3100/95, both running a C compile at the same time. The 95 showed
occasional jumps to 200ms but on the, much slower, 10e the jumps were
mostly to around 100-130ms with just a few in the 200-300ms range.
I also tried it on a Cobalt RaQ 2 running 5.0.2 and that produced a
199ms result as well so it's not just a VAX effect.
Short excerpts from the logs are included below.
George
== ping test on Microvax 3100/95 ==
64 bytes from 10.0.1.164: icmp_seq=591 ttl=255 time=1.680 ms
64 bytes from 10.0.1.164: icmp_seq=592 ttl=255 time=1.852 ms
64 bytes from 10.0.1.164: icmp_seq=593 ttl=255 time=1.808 ms
64 bytes from 10.0.1.164: icmp_seq=594 ttl=255 time=1.806 ms
64 bytes from 10.0.1.164: icmp_seq=595 ttl=255 time=200.686 ms
64 bytes from 10.0.1.164: icmp_seq=596 ttl=255 time=1.803 ms
64 bytes from 10.0.1.164: icmp_seq=597 ttl=255 time=1.831 ms
64 bytes from 10.0.1.164: icmp_seq=598 ttl=255 time=1.723 ms
64 bytes from 10.0.1.164: icmp_seq=599 ttl=255 time=1.848 ms
== ping test on Microvax 3100/10e ==
64 bytes from 10.0.1.164: icmp_seq=576 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=577 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=578 ttl=255 time=20.000 ms
64 bytes from 10.0.1.164: icmp_seq=579 ttl=255 time=20.000 ms
64 bytes from 10.0.1.164: icmp_seq=580 ttl=255 time=20.000 ms
64 bytes from 10.0.1.164: icmp_seq=581 ttl=255 time=120.000 ms
64 bytes from 10.0.1.164: icmp_seq=582 ttl=255 time=20.000 ms
64 bytes from 10.0.1.164: icmp_seq=583 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=584 ttl=255 time=110.000 ms
64 bytes from 10.0.1.164: icmp_seq=585 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=586 ttl=255 time=10.000 ms
...
64 bytes from 10.0.1.164: icmp_seq=1396 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1397 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1398 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1399 ttl=255 time=150.000 ms
64 bytes from 10.0.1.164: icmp_seq=1400 ttl=255 time=20.000 ms
64 bytes from 10.0.1.164: icmp_seq=1401 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1402 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1403 ttl=255 time=50.000 ms
64 bytes from 10.0.1.164: icmp_seq=1404 ttl=255 time=260.000 ms
64 bytes from 10.0.1.164: icmp_seq=1405 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1406 ttl=255 time=10.000 ms
64 bytes from 10.0.1.164: icmp_seq=1407 ttl=255 time=10.000 ms
== ping text on Cobalt RaQ 2 ==
64 bytes from 10.0.1.164: icmp_seq=241 ttl=255 time=2.064 ms
64 bytes from 10.0.1.164: icmp_seq=242 ttl=255 time=2.015 ms
64 bytes from 10.0.1.164: icmp_seq=243 ttl=255 time=1.993 ms
64 bytes from 10.0.1.164: icmp_seq=244 ttl=255 time=2.033 ms
64 bytes from 10.0.1.164: icmp_seq=245 ttl=255 time=199.899 ms
64 bytes from 10.0.1.164: icmp_seq=246 ttl=255 time=2.004 ms
64 bytes from 10.0.1.164: icmp_seq=247 ttl=255 time=2.014 ms
64 bytes from 10.0.1.164: icmp_seq=248 ttl=255 time=2.021 ms
64 bytes from 10.0.1.164: icmp_seq=249 ttl=255 time=1.753 ms
Thanks for the check. Interesting that it's not just on the VAX. But the
question remains, why...?
Anyone who could test some other architecture?
Johnny
By the way. I tested on x86, running NetBSD 3.99.15, and I cannot
reproduce it on that machine.
Johnny
>
> Johnny
> Thanks for the check. Interesting that it's not just on the VAX. But the
> question remains, why...?
>
> Anyone who could test some other architecture?
I do not see this on my amigas (m68k).
It would look like it's likely related to scheduling, since the 200ms
delay seems suspiciously like two time slices. On my 4000/60, I can
easily see them if I'm running openssl speed at the same time. I haven't
had a chance to test with a slower 4000/VLC yet. I've done a ktrace of
the ping process, and that shows ~199ms betweel the poll() call and the
return.
It's almost as if the network software interrupt has to wait for the
current process to finish its 100ms time slice, then the software
interrupt signal the ping process that it has data, but the other process
gets scheduled for another 100ms time slice before the ping process is
scheduled. I don't know how this would happen though - the vax appears
to be using __HAVE_FAST_SOFTINTS, which I think should allow the software
interrupt to run as soon as the network interrupt code finishes, which
should then do what ever it does to wake up the poll() in the ping
process. The current interrupted process would run the rest of its 100ms
time slice, and the ping process should run right after that, which should
show at most a 100ms delay. I would also think that I would also be
seeing some 100ms delays on my amiga, but I haven't seen anything close to
that so far. It's also interesting that at least one mips port shows
this. I haven't tried any of my pmax systems yet, but I plan on trying
that sometime.
--
Michael L. Hitch mhi...@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA