2.11BSD patch 490 causes issue in arp resolution

75 views
Skip to first unread message

Sytse van Slooten

unread,
Aug 14, 2025, 5:57:33 PMAug 14
to [PiDP-11]
Hi.

patch 490 updates the c compiler to deal with a bug in how the compiler deals with right shifts of long unsigned. I'm not sure exactly why yet - and I'm not sure I grasp the workings of the compiler sufficiently to investigate.

However, using the patched compiler to build the kernel causes the ARP mechanism in the kernel to break.

What typically happens is that instead of the mac address that is in the arp table (ie, visible with arp -a) frames get sent out with the first two bytes as 0x46:0x37, followed by 4 bytes that represent the ip address. And of course that completely breaks all networking that relies on the arp mechanism.

/usr/src/sys/netinet/if_ether.c at least has a questionable macro called ARPTAB_HASH that uses right shifts - questionable meaning I'm not sure I understand what it does. It might be that the old way the compiler worked masked a problem with that macro or elsewhere in that source.

I'll investigate further when time allows (and also hoping Johnny will look into the issue :-)
but until then, it's probably better not to apply #490 yet.

Cheers
Sytse

Steven A. Falco

unread,
Aug 14, 2025, 6:08:04 PMAug 14
to pid...@googlegroups.com
It is curious that you are having a problem with patch 490. I've put all four of the recent patches onto two units - one is a Pi5 with SIMH and the other is a PDP2011 FPGA. Both appear to be fine; I can rsh into them.

But perhaps if I did a completely clean build of the kernel it might show the problem. When I have some time I'll try doing that.

Steve

Johnny Billquist

unread,
Aug 14, 2025, 6:15:39 PMAug 14
to pid...@googlegroups.com
The problem with the right shift that was fixed in 490 was that the
compiler had a hidden assumption that the value to be shifted were in
specific registers. But if you declared variables as registers in your
code, it could actually place the value to be shifted into other
registers, and as you might guess, if that happened, what you actually
got in the end was total garbage and corruption.

But I've not heard of anyone else having problems with it, and it's been
out for months...

But what Sytse seems to be saying also sounds extremely strange. The
actual IP address should never appear in the MAC address a packet is
sent to. Sounds like something is very broken for that to happen. Maybe
a clean build would be a good idea, if that hasn't been done yet?

Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: b...@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol

Sytse van Slooten

unread,
Aug 14, 2025, 6:16:01 PMAug 14
to Steven A. Falco, pid...@googlegroups.com
The issue will only appear if you actually recompile /usr/src/sys/netinet/if_ether.c, and that wouldn't typically happen if you just run make.

I'm pretty well sure though that the issue will definitely appear if you do a full rebuild, ie. make clean;make;make install.

I tried several times, on clean copies of a known good system: building a kernel with the patched compiler breaks arp consistently. And rolling back #490, then rebuilding the kernel on a broken system makes it work as it should. The only open question is exactly why it happens, not that it does.
> --
> You received this message because you are subscribed to the Google Groups "[PiDP-11]" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pidp-11+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/pidp-11/3352065b-5c02-4641-8060-bdd93ae8d66a%40gmail.com.

Sytse van Slooten

unread,
Aug 14, 2025, 6:18:00 PMAug 14
to Johnny Billquist, pid...@googlegroups.com
"Maybe a clean build would be a good idea"

I typically only do full builds. It's my way of stress testing the system.
> --
> You received this message because you are subscribed to the Google Groups "[PiDP-11]" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pidp-11+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/pidp-11/2c1e04cc-9e2c-484e-8da7-a39b1f058db5%40softjar.se.

Johnny Billquist

unread,
Aug 14, 2025, 6:47:52 PMAug 14
to Sytse van Slooten, pid...@googlegroups.com
I'll definitely check next week when I'm back home...

Johnny

Malcolm Ray

unread,
Aug 14, 2025, 7:49:00 PMAug 14
to pid...@googlegroups.com
After a cursory glance at the code, I'm having a hard time seeing how
ARPTAB_HASH could have this effect. As the name suggests, it's just a
hash function for the ARP table. It returns a bucket number, which
always falls within the number of buckets in the table. Worst case
scenario is that it could slow down table lookups and insertions.

Malcolm Ray

unread,
Aug 14, 2025, 8:42:05 PMAug 14
to pid...@googlegroups.com
FWIW, I see the same behaviour after a build from clean. Here's output
from tcpdump -e on the host running simh:

01:34:00.148791 08:00:2b:06:2d:f8 > 46:37:c0:a8:64:03, ethertype IPv4
(0x0800), length 80: 192.168.100.68.1030 > 192.168.100.3.53: 2+ A?
ntp0.redacted. (38)

c0:a8:64:03 == 192.168.100.3

That's on a patchlevel 494 system.

Martin Renters

unread,
Aug 14, 2025, 8:44:23 PMAug 14
to Sytse van Slooten, Johnny Billquist, pid...@googlegroups.com
I just rebuilt everything (make build && make installsrc) and built a new kernel in a cleaned directory and I’m seeing the same behaviour that Sytse is seeing.

In my case, the newly built kernel hangs at the “Assuming NETWORKING system …” and I can get it to continue by typing Control-C. After logging in, I can ping my local address, but get a kernel panic if I do an arp of that address.

# ifconfig de0

de0: flags=63<UP,BROADCAST,NOTRAILERS,RUNNING>

        inet 192.168.5.99 netmask 0xffffff00 broadcast 192.168.5.255

# ping 192.168.5.99

PING 192.168.5.99 (192.168.5.99): 56 data bytes

64 bytes from 192.168.5.99: icmp_seq=0 ttl=255 time=16.667 ms

64 bytes from 192.168.5.99: icmp_seq=1 ttl=255 time=16.667 ms

64 bytes from 192.168.5.99: icmp_seq=2 ttl=255 time=16.667 ms

^C

--- 192.168.5.99 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 16.667/16.666/16.667 ms

# arp 192.168.5.99

Unexpected net trap (0)

ka6 37160 aps 147044

pc 22526 ps 50240

panic: net crashed

syncing disks... done


dumping to dev 5001 off 512

dump succeeded


On a pre-490 compiled kernel I get:

# ifconfig de0

de0: flags=63<UP,BROADCAST,NOTRAILERS,RUNNING>

        inet 192.168.5.99 netmask 0xffffff00 broadcast 192.168.5.255

# ping 192.168.5.99

PING 192.168.5.99 (192.168.5.99): 56 data bytes

64 bytes from 192.168.5.99: icmp_seq=0 ttl=255 time=16.667 ms

64 bytes from 192.168.5.99: icmp_seq=1 ttl=255 time=16.667 ms

64 bytes from 192.168.5.99: icmp_seq=2 ttl=255 time=16.667 ms

^C

--- 192.168.5.99 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 16.667/16.666/16.667 ms

# arp 192.168.5.99

192.168.5.99 (192.168.5.99) -- no entry


Martin




Malcolm Ray

unread,
Aug 14, 2025, 9:29:29 PMAug 14
to pid...@googlegroups.com
Here I try to ping 192.168.100.98, and it sends out an arp for that
address:

01:46:15.755574 08:00:2b:06:2d:f8 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.100.98 tell 192.168.100.68, length 46

A response is sent:

01:46:15.756025 c0:3f:d5:62:7f:d7 > 08:00:2b:06:2d:f8, ethertype ARP (0x0806), length 60: Reply 192.168.100.98 is-at c0:3f:d5:62:7f:d7, length 46

This ends up in the arp table:

# arp -a
? (192.168.100.98) at c0:3f:d5:62:7f:d7

Then we send the ping to the wrong MAC address:

01:46:15.766793 08:00:2b:06:2d:f8 > f6:d8:a8:48:5e:01, ethertype IPv4 (0x0800), length 98: 192.168.100.68 > 192.168.100.98: ICMP echo request, id 32768, seq 0, length 64

Note that this time it's not the '46:37' MAC address. But repeating the
ping later, I did see that:

02:17:37.744502 08:00:2b:06:2d:f8 > 46:37:c0:a8:64:62, ethertype IPv4 (0x0800), length 98: 192.168.100.68 > 192.168.100.98: ICMP echo request, id 51712, seq 0, length 64

Incidentally, patch 482 removes /vmunix, updates
/usr/src/usr.sbin/arp/arp.c accordingly, but then forgets to build and
install it, so my system had a non-functional arp binary until I fixed
that.

I think tomorrow I'll add some debugging to log details of the kernel
arp table lookup.

terry-...@glaver.org

unread,
Aug 14, 2025, 10:18:58 PMAug 14
to [PiDP-11]
On Thursday, August 14, 2025 at 9:29:29 PM UTC-4 Sheepless wrote:
This ends up in the arp table:

# arp -a
? (192.168.100.98) at c0:3f:d5:62:7f:d7

Then we send the ping to the wrong MAC address:

01:46:15.766793 08:00:2b:06:2d:f8 > f6:d8:a8:48:5e:01, ethertype IPv4 (0x0800), length 98: 192.168.100.68 > 192.168.100.98: ICMP echo request, id 32768, seq 0, length 64

This could point the finger back at  ARPTAB_HASH if the issue only
happens when looking up ARP entries, not installing / removing them.

It would be interesting to see if manually installing an entry with arp -s
and verifying it with arp -a (verifying correct insertion) still causes 
packets to be sent to the wrong MAC address,
 
Incidentally, patch 482 removes /vmunix, updates
/usr/src/usr.sbin/arp/arp.c accordingly, but then forgets to build and
install it, so my system had a non-functional arp binary until I fixed
that. 

I see that here, too (on a patch 489 kernel):
# arp -a
arp: /vmunix: bad namelist

So, the future patch that corrects the 490 issue should also instruct the
user to rebuild arp:

# cd /usr/src/usr.sbin
# make
# make install
# make clean

terry-...@glaver.org

unread,
Aug 16, 2025, 10:05:35 PMAug 16
to [PiDP-11]
On Thursday, August 14, 2025 at 10:18:58 PM UTC-4 terry-...@glaver.org wrote:
# cd /usr/src/usr.sbin

Of course, this should be "cd /usr/src/usr.sbin/arp".  Running make from arp's parent directory 
should be harmless, but may result in unnecessary recompiles (for example, a patch that fixed
a typo comment in a source file would cause that program to be rebuilt, even though there are
no functional changes).

John Bruner

unread,
Aug 17, 2025, 4:31:01 PMAug 17
to Martin Renters, Sytse van Slooten, Johnny Billquist, pid...@googlegroups.com

I did a little digging, and I think I see the problem – it’s in netinet/if_ether.c, arpresolve(), which is declared with

 

arpresolve(ac, m, destip, desten, usetrailers)

    register struct arpcom *ac;

    struct mbuf *m;

    register struct in_addr *destip;

    register u_char *desten;

    int *usetrailers;

{

 

The compiler assigns R2 to desten and loads it from the value on the stack:

 

_arpresolve:

~~arpresolve:

      jsr   r5,csv

      mov   4(r5),r4

      ~ac=r4

      ~m=6

      mov   10(r5),r3

      ~destip=r3

      mov   12(r5),r2

      ~desten=r2

 

However, when it looks for the address in arptab using the ARPTAB_LOOK macro, which in turn uses ARPTAB_HASH, the compiler uses R2 as a scratch variable to compute the hash:

 

    s = splimp();

    ARPTAB_LOOK(at, destip->s_addr);

 

L16:  movb 177776,r0; movb $240,177776

      mov   r0,-40(r5)

      ~n=177736

      mov   $-20,-(sp)

      mov   2(r3),r1

      mov   (r3),r0

      jsr   pc,ulsh

      tst   (sp)+

      mov   r1,r0

      mov   2(r3),r2

      mov   (r3),r1

      xor   r2,r0

      bic   $-100000,r0

      mov   r0,r1

      sxt   r0

      div   $15,r0

      mul   $106,r1

      add   $_arptab,r1

      mov   r1,-12(r5)

      clr   -42(r5)

 

Later, the resolved MAC address is supposed to be copied to the buffer at desten, but r2 no longer points there. The out of bounds write will almost certainly cause some other eventual issue, but in addition, the caller will see whatever happened to be in the desten buffer at the time arpresolve was called.

 

--John

Malcolm Ray

unread,
Aug 17, 2025, 6:17:55 PMAug 17
to John Bruner, Martin Renters, Sytse van Slooten, Johnny Billquist, pid...@googlegroups.com
I was looking at arpioctl() in if_ether.c, since this easily demonstrates the problem. Using /usr/sbin/arp, you can manually add a new ARP entry, then attempt to read it back.
Both these operations use arpioctl().

In a pre-490 kernel, this works as expected. But post-490, the read attempt usually fails to find the entry just added.

With the -a flag, /usr/sbin/arp just reads the table from /dev/mem, without using ioctls, so you can verify that the new entry is indeed present, even when the ioctl says it isn't!

Debugging shows that the ARPTAB_HASH macro returns different values pre- and post-490. The fact that the right shift is responsible is easily demonstrated:
temporarily replace ARPTAB_HASH with one which doesn't use shift, and the problem goes away. This is not a fix!

The problem is, again, trashing of r2, which in the case of arpioctl() is used for this:

register struct sockaddr_in *sin;

This ARP problem is the canary in the coalmine, because the patch 490 change could potentially cause problems all over.
It's fortunate that it caused a problem which is so easily noticed! Other problems could be more elusive.

For the time being, I'm reverting the 490 change.

On Sun, 2025-08-17 at 20:30 +0000, John Bruner wrote:

I did a little digging, and I think I see the problem – it’s in netinet/if_ether.c, arpresolve(), which is declared with

 

arpresolve(ac,m,destip,desten,usetrailers)

    register structarpcom *ac;

    structmbuf *m;

    register structin_addr *destip;

    register u_char *desten;

    int *usetrailers;

{

 

The compiler assigns R2 to desten and loads it from the value on the stack:

 

_arpresolve:

~~arpresolve:

      jsr   r5,csv

      mov   4(r5),r4

      ~ac=r4

      ~m=6

      mov   10(r5),r3

      ~destip=r3

      mov   12(r5),r2

      ~desten=r2

 

However, when it looks for the address in arptab using the ARPTAB_LOOK macro, which in turn uses ARPTAB_HASH, the compiler uses R2 as a scratch variable to compute the hash:

 

   s = splimp();

   ARPTAB_LOOK(at,destip->s_addr);

Johnny Billquist

unread,
Aug 17, 2025, 6:23:44 PMAug 17
to Malcolm Ray, John Bruner, Martin Renters, Sytse van Slooten, pid...@googlegroups.com
I do agree that it's probably best to revert 490. But be aware that the
compiler pre 490 also can generate completely bad code for unsigned long
right shifts. Just for different conditions, and ones that were
undetected for a long time.

So this is a in a sense a headache no matter which version you use. But
the fact that the current code cause breakage in the kernel, I would
call it more serious.

Johnny
>> s= splimp();
>> *From:*'Martin Renters' via [PiDP-11] <pid...@googlegroups.com>
>> *Sent:* Thursday, 14 August, 2025 17:44
>> *To:* Sytse van Slooten <sy...@sytse.net>
>> *Cc:* Johnny Billquist <b...@softjar.se>; pid...@googlegroups.com
>> *Subject:* Re: [PiDP-11] 2.11BSD patch 490 causes issue in arp resolution
>>> <mailto:sy...@sytse.net>> wrote:
>>>
>>> "Maybe a clean build would be a good idea"
>>>
>>> I typically only do full builds. It's my way of stress testing the
>>> system.
>>>
>>>
>>>> On 15 Aug 2025, at 00:15, Johnny Billquist <b...@softjar.se
>>>> email: b...@softjar.se <mailto:b...@softjar.se>             ||
>>>>  Reading murder books
>>>> pdp is alive!                     ||  tryin' to stay hip" - B. Idol
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "[PiDP-11]" group.
>>>> To unsubscribe from this group and stop receiving emails from it,
>>>> send an email topidp-11+...@googlegroups.com
>>>> <mailto:pidp-11+u...@googlegroups.com>.
>>>> To view this discussion visit https://groups.google.com/d/msgid/
>>>> pidp-11/2c1e04cc-9e2c-484e-8da7-a39b1f058db5%40softjar.se <https://
>>>> groups.google.com/d/msgid/pidp-11/2c1e04cc-9e2c-484e-8da7-
>>>> a39b1f058db5%40softjar.se>.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "[PiDP-11]" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email topidp-11+...@googlegroups.com
>>> <mailto:pidp-11+u...@googlegroups.com>.
>>> To view this discussion visit https://groups.google.com/d/msgid/
>>> pidp-11/2985E9CE-7931-4678-9E96-41C83BF4030B%40sytse.net <https://
>>> groups.google.com/d/msgid/
>>> pidp-11/2985E9CE-7931-4678-9E96-41C83BF4030B%40sytse.net>.

John Bruner

unread,
Aug 17, 2025, 10:47:30 PMAug 17
to Johnny Billquist, Malcolm Ray, Martin Renters, Sytse van Slooten, pid...@googlegroups.com
arpioctl has the same issue as I noted for arpresolve. In this case, the compiler assigns r2 to hold "sin"

arpioctl(cmd, data)
int cmd;
caddr_t data;
{
register struct arpreq *ar = (struct arpreq *)data;
register struct arptab *at;
register struct sockaddr_in *sin;
int s;

but then it uses r2 as a scratch register while calculating the hash in ARPTAB_LOOK. If the ioctl is SIOCSARP and there was no ARP table entry, it will try to create one using arptnew:

switch (cmd) {

case SIOCSARP: /* set entry */
if (at == NULL) {
at = arptnew(&sin->sin_addr);

but r2 was clobbered, so &sin->sin_addr points to garbage.

When generating code for ARPTAB_HASH, the compiler loads the 32-bit value as an input to the XOR. It puts the upper half in r1 and (incorrectly) the lower half in r2. r2 is then XORed with r0 (lower word of the shift result). r1 is never used. So, one way to work around this is to adjust the casts in ARPTAB_HASH so that the inputs to the XOR are both shorts:

#define ARPTAB_HASH(a) \
((((short)((a) >> 16) ^ (short)(a)) & 0x7fff) % ARPTAB_NB)

Another option would be to cast the address of (a) to a (short *) and use address arithmetic to XOR the two shorts, avoiding the shift entirely.

Either way, this erroneous code generation could impact other things as well, so it seems best to fix or revert the change in 490.

--John
>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgr
>>>> oups.google.com%2Fd%2Fmsgid%2F&data=05%7C02%7C%7C00d8b1220d674db997
>>>> 5708dddddcb982%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C6389106
>>>> 62265372346%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiI
>>>> wLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7
>>>> C%7C%7C&sdata=v%2Bkpya3MyAtnMU2vkuHH1L0c97z%2BV4VUBdCEz4hDzRs%3D&re
>>>> served=0 pidp-11/2c1e04cc-9e2c-484e-8da7-a39b1f058db5%40softjar.se
>>>> <https://
>>>> groups.google.com/d/msgid/pidp-11/2c1e04cc-9e2c-484e-8da7-
>>>> a39b1f058db5%40softjar.se>.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "[PiDP-11]" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email topidp-11+...@googlegroups.com
>>> <mailto:pidp-11+u...@googlegroups.com>.
>>> To view this discussion visit
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgro
>>> ups.google.com%2Fd%2Fmsgid%2F&data=05%7C02%7C%7C00d8b1220d674db99757
>>> 08dddddcb982%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C6389106622
>>> 65393686%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjA
>>> uMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%
>>> 7C&sdata=2DaGDUnRkW66Vf6xMpGSGCAsKJh%2FNKfag6KUzMBMhKc%3D&reserved=0

John Bruner

unread,
Aug 18, 2025, 9:45:17 AMAug 18
to terry-...@glaver.org, [PiDP-11]

An interesting coincidence: when simh is using SLIRP for networking, the IP address does appear in the MAC address. (See slirp/slirp.c, where special_ethaddr is defined to be 52:55:IP:IP:IP:IP). This is visible in the "arp" output on the client. I don't think slirp cares about the MAC address it receives from the client, which may be why this hasn't been more visible. I didn't notice it, and I've done clean builds since applying patch 490.

 

Incidentally, a very minor bug that I noted while looking into this is this printf in 2.11BSD's netinet/if_ether.c :, function in_arpinput:

 

                printf("arp: ether address is broadcast for IP address %x!\n",

                    ntohl(isaddr.s_addr));

 

The format should be "%lx", because the IP address is a long.

 

--John

 

From: pid...@googlegroups.com <pid...@googlegroups.com> On Behalf Of terry-...@glaver.org
Sent: Thursday, 14 August, 2025 19:19
To: [PiDP-11] <pid...@googlegroups.com>
Subject: Re: [PiDP-11] 2.11BSD patch 490 causes issue in arp resolution

 

On Thursday, August 14, 2025 at 9:29:29PM UTC-4 Sheepless wrote:

--

You received this message because you are subscribed to the Google Groups "[PiDP-11]" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pidp-11+u...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages