Bug in EtherTalk Phase 2 version 2.5.5

John Davy

unread,

Jan 26, 1994, 8:47:52 PM1/26/94

to

A bug (or deliberate change in behaviour?) has been introduced into Apple's
EtherTalk Phase 2 system extension between version 2.5.3 and version 2.5.5.
The bug occurs in the way AppleTalk addresses are associated with ethernet
addresses. It is only known to occur on a network with Webster Multiport
Gateways. It is believed to be due to the way Webster Multiport Gateways
overcome the AppleTalk ImageWriter bug. Tom Evans of Webster plans to
change the way the Multiport gateway overcomes the AppleTalk ImageWriter
bug, but this will not occur until the release after next of the Multiport
gateway code.

The bug is that after a search in an AppleTalk zone for AppleTalk nodes,
EtherTalk Phase 2 version 2.5.5 associates the Multiport ethernet address
with all the nodes that are found. If packets are sent to and from any node
from the Macintosh with EtherTalk Phase 2 version 2.5.5, they are sent via
the Multiport even if the sending and receiving nodes are on the same side
of the Multiport. In the best of all possible worlds, this works, and only
doubles network bandwidth usage. Unfortunately my site network has some
known problems and loses occasional packets. Because of the 30 second Trel
packet timeout in Appleshare, Personal Filesharing in System 7 becomes
unuseable with only a very small percentage of packet loss.

To partially overcome the problems in our network, the network is
extensively subdivided by intelligent ethernet bridges which learn whether
to forward ethernet packets. If my packets stay behind the bridge in my
building I have no packet loss and Personal Filing Sharing works perfectly
between my two Macintoshes situated only 2 metres part on the thin ethernet
cable. The bug in EtherTalk Phase 2 version 2.5.5 means that the packets
have to go via one of three Multiport gateways on our network, and thus
have to leave the safety of the network in my building and run the risk of
being lost on the site network. This makes Personal File Sharing unuseably
slow.

The association of ethernet addresses with AppleTalk addresses in EtherTalk
Phase 2 is aged and after a period of nonuse the association will be
redetermined when need, this time correctly. While packets are being
exchanged (i.e. while file transfer or echo testing is taking place), the
association will not be aged. Doing another node search at any time will
reset the associations to incorrect values and the timeout will have to
occur before correct values are obtained.

Tom Evans of Webster says that the bug in EtherTalk Phase 2 Version 2.5.5
is that it is performing its "AARP address resolution by gleaning" and
doesn't
"check the hop counts".

The nature of the bug was originally determined by observing that two
Macintoshes on an isolated segment of thin ethernet did not show the
problem (now known to be due to there being no Multiport on the segment),
and by the accidental observation of the forwarding lights on an
intelligent ethernet bridge. Until then I had no idea that my test echo
packets were going outside my building. Up to this point I had spent a very
large amount of time changing hardware and software, and had observed that
the problem was intermittent ( now known to be due to the timeout).

Using the LANWAtch software running on a PC to capture packets, I was able
to establish that in the packet loss case, the packets were being sent to
one of four appletalk routers on our network ( three Webster Multiport
gateways and and one Novell server running Netware for Macintosh ), and
then back to the other Macintosh. The return packets also went via the same
router. Since the routers were all outside my building, this explains why
packets were going in and out through the bridge. In the non-loss case the
packets were going direct between the two Macintoshes.

The NBP packet sequence which occurred when the search in the zone was
being performed was

1. First Macintosh sends NBP packet to router,
2. Router multicasts NBP packet,
3. Second Macintosh sends NBP packet to router,
4. Router sends NBP packet to first Macintosh.

Both the third and fourth NBP packets had the AppleTalk source address of
the second Macintosh and the AppleTalk destination address of the first
Macintosh. I had to look at the ethernet source and destination addresses
to see where they had really started and finished. EtherTalk Phase 2
version 2.5.5 must incorrectly assume that the ethernet address of the
sender of the fourth packet is the ethernet address of the second
Macintosh.

I then discovered that on my network that I could use the hop count in
Apple's InterPoll software to determine if the bug was occurring. For the
case in which the sender and receiver are on the same side of all routers,
a hop count of zero means no bug and a hop count of one means that the bug
is occurring.

Tom Evans of Webster has pointed out to me that:-

>This will work on your site, but is not a reliable test on others.
>Routers compliant with the Apple-Internet-Router-V3 interpretation of
>AppleTalk don't increment the hop-count on the last hop, so if anyone
>tries to duplicate your problem with an Apple Internet Router V3:
>
> InterPoll will always show "0 hops" nomatter what, and
>
> The AIR always sends the forwarded packets with 0 hop-counts
> and this will probably make the "address gleaning code"
> operate differently (either fixing it or making it worse).

This bug was originally noticed with DaynaPORT ethernet cards. It turns out
that it will happen with any Macintosh ethernet hardware which uses
EtherTalk Phase 2 version 2.5.5. I would like to thank Dave Scott of Dayna
and Alison Kidd and Peter Svans of Conexus ( Dayna's Australian
distributors ) for their assistance in this matter. I would also like to
thank Tom Evans of Webster for his help on this matter and on many other
networking problem ( bugs in MacTCP, NCSA Telnet, etc.).

--
| John Davy CSIRO Division of Building, Construction and Engineering |
| Post Office Box 56, Highett, Victoria, Australia 3190 |
| Internet: jo...@mel.dbce.csiro.au or John...@dbce.csiro.au |
| Tel: +61 3 252 6054 (D) +61 3 252 6000 (SB) +61 3 570 72 71 (H) |
| Fax: +61 3 252 6252 or +61 3 252 6240 or +61 3 252 6244 |
|______________________________________________________________________|

David Stine

unread,

Jan 28, 1994, 4:40:56 AM1/28/94

to

In article <johnd-270...@jondmac.mel.dbce.csiro.au> jo...@mel.dbce.csiro.au (John Davy) writes:
>
>The association of ethernet addresses with AppleTalk addresses in EtherTalk
>Phase 2 is aged and after a period of nonuse the association will be
>redetermined when need, this time correctly. While packets are being
>exchanged (i.e. while file transfer or echo testing is taking place), the
>association will not be aged. Doing another node search at any time will
>reset the associations to incorrect values and the timeout will have to
>occur before correct values are obtained.

This is known as "AARP cache aging" -- on a cisco router, we typically age
out any AARP entry for which we have not seen a AARP Reply in four hours.

>Tom Evans of Webster says that the bug in EtherTalk Phase 2 Version 2.5.5
>is that it is performing its "AARP address resolution by gleaning" and
>doesn't "check the hop counts".

This sounds like it is true. And it gets even better. In a recent support
problem, we have seen that AppleShare V4 expects routers to perform address
gleaning. If the router has timed out the AARP entry for the node on which
the AppleShare 4.0 server is running, the licensing check fails.

Address gleaning is a concept which will can ultimately wreck the
performance of any router implementation. You end up having to sniff every
damn packet send from an end-node, looking to associate the source MAC
address with the source DDP address. And if the packet's source network
number matchs the network number of the cable on which we are hearing the
packet, we then have to insert the MAC address and the DDP address into the
AARP cache if it isn't already there.

Now, there is an alternative to checking the DDP network number of the
packet for matching the cable -- you can check the hop count. If the
packet's hop count is zero, that means the packet originated from an
end-node on the cable where the router hears it. Perhaps a little less
rigorous, since you then have the opportunity to stuff the AARP cache with
a DDP address which isn't valid for the cable in question, but it is
necessary, but not sufficient[1] condition for AARP cache consistency.

If you're trying to route packets as quickly as possible, you won't be
after you implement gleaning. Gleaning is a sure-fire performance killer.

Side note to Tom Webster: you might want to do the check for both, Tom. If
you're ablt to, run a performance test on your router with and without
gleaning. I'm going to bet that you aren't happy with the performance hit
you take. I know we were not.

dsa

[1] the term "necessary and sufficient" is important here. For those who
are grounded in mathematics, you already know what I mean. For those
not, here is an explaination:

- Checking the packet hop count is necessary to prove that the packet
comes into the router node from an end node which is on a cable
connected to the router.

- But the hop count is not sufficient to _prove_ that the end node is
directly connected to the router. You need to check the network
number, since with hop count reduction schemes and other such,
it is now possible that you could end up receiving a packet from
a neighboring router or a tunnel which has a zero hop-count. So
you _must_ check the network number for a match to prove
that the packet is "local" to the router's cable.

Tom Evans

unread,

Jan 31, 1994, 6:42:58 PM1/31/94

to

In article <2iamj8$7...@cronkite.cisco.com>, dst...@cisco.com (David Stine) writes:
> In article <johnd-270...@jondmac.mel.dbce.csiro.au> jo...@mel.dbce.csiro.au (John Davy) writes:
> >
> >The association of ethernet addresses with AppleTalk addresses in EtherTalk
> >Phase 2 is aged and after a period of nonuse
>

> This sounds like it is true. And it gets even better. In a recent support
> problem, we have seen that AppleShare V4 expects routers to perform address
> gleaning. If the router has timed out the AARP entry for the node on which
> the AppleShare 4.0 server is running, the licensing check fails.

Can you let us know the visible symptoms of this situation so that
network managers (and AppleTalk Router vendors) can recognise it when
it happens to us?

Does this mean that AppleShare V4 is deliberately incompatible with
AppleTalk as-she-is-written-in-the-book? Is this a bug? Has this been
reported? Does Apple know about this? Is it likely to be fixed (add
tons of smileys to this paragraph to taste :-).

Or does this mean that AppleShare V4 specifically REDEFINES the way
that all AppleTalk Routers have to work and that everybody that
wants to use AppleShare V4 has to pay money to have every AppleTalk
router that they own updated to support V4 (except the obsolete ones
which presumably they have to throw away and pay to replace)?

Enquiring Minds and so forth...

> Now, there is an alternative to checking the DDP network number of the
> packet for matching the cable -- you can check the hop count. If the
> packet's hop count is zero, that means the packet originated from an
> end-node on the cable where the router hears it.

That used to be the "Specification" for AppleTalk; at least it
definitely was while the AppleTalk Internet Router V2 was the
"reference implementation". When V3 came out the specification
changed (it now agrees with the book, which AIR V2 didn't).

If the hop count is zero it NOW means that the originating host is
either ZERO or ONE hop away, and you can't tell which (from the
hop-count). This of course stopped the "Best Router Cache" in
AppleTalk in Macintoshes dead in its tracks (because it did use the
hop-count), and this didn't get fixed until AppleTalk V58.1.1.

You gotta check the source network number against the network range,
exactly like it says to do in Inside AppleTalk on page 4-20.

========================
Tom Evans t...@wcc.oz.au
Webster Computer Corp P/L, 11 Glenvale Crescent Mulgrave, Melbourne 3170
Victoria, Australia 61-3-560-1100 FAX ...560-0067 A.C.N. 004 818 455

w...@cup.portal.com, AppleLink: "WEBSTER.USA"
2109 O'Toole Avenue, Suite J SAN JOSE CA 95131 - 1303 CALIFORNIA
1-408-954-8054 FAX 1-408-954-1832 AppleLink WEBSTER.USA

eric doc kampman

unread,

Feb 1, 1994, 9:36:49 AM2/1/94

to

In article <johnd-270...@jondmac.mel.dbce.csiro.au>,
jo...@mel.dbce.csiro.au (John Davy) wrote:

> A bug (or deliberate change in behaviour?) has been introduced into Apple's
> EtherTalk Phase 2 system extension between version 2.5.3 and version 2.5.5.
> The bug occurs in the way AppleTalk addresses are associated with ethernet
> addresses. It is only known to occur on a network with Webster Multiport
> Gateways. It is believed to be due to the way Webster Multiport Gateways
> overcome the AppleTalk ImageWriter bug. Tom Evans of Webster plans to
> change the way the Multiport gateway overcomes the AppleTalk ImageWriter

> bug ^^^^^^^^^^^^^^^^^^^^^
^^^^^

I'll bite. What's the AppleTalk Imagewriter bug?

--
d...@miracle.farallon.com
nono no blamee me employer, onlee blamee me
******************************************************************
Look for the thing you can't find/Seeing with eyes makes you blind
You know you're out of your mind

Jens-Uwe Mager

unread,

Feb 2, 1994, 8:19:12 AM2/2/94

to

In article <doc-0102...@163.176.8.222> (comp.protocols.appletalk), d...@miracle.farallon.com (eric doc kampman) writes:

> > Tom Evans of Webster plans to
> > change the way the Multiport gateway overcomes the AppleTalk ImageWriter
> > bug ^^^^^^^^^^^^^^^^^^^^^
> ^^^^^
>
> I'll bite. What's the AppleTalk Imagewriter bug?

If I remember correctly the ImageWriter does not answer NBP lookups
properly. Instead of sending the answer to the node mentioned in the
NBP header it will send the reply to the sender of the NBP lookup,
which might be a router. Most routers I know compensate for that by
forwarding the NBP reply to the originator of the broadcast request
that caused the lookup. The way I have solved the problem in our
routing software costs about 1 kB data to keep information about recent
BrRq's, this might not be feasible in the RAM limited world of a
typical router box.

______________________________________________________________________________
Jens-Uwe Mager j...@anubis.han.de
30177 Hannover j...@helios.de
Brahmsstr. 3 Tel.: +49 511 660238

John Davy

unread,

Feb 8, 1994, 7:09:41 PM2/8/94

to

In article <36...@wcc.oz.au>, t...@wcc.oz.au (Tom Evans) wrote:

>
> Firstly there's the behaviour to accommodate the ImageWriter bug.
> Secondly, Apple have changed EtherTalk to "aggressively glean" in
> EtherTalk version 2.5.5. Thus EtherTalk "gleans" the DDP address of
> the requesting Macintosh from the LkUp packet and stores the hardware
> address of the MPG. It then sends the NBP reply via the router. If
> the original requester is running EtherTalk V2.5.5 as well, it
> gleans the wrong address from the NBP Reply packet and the two
> Macintoshes exchange data via the Router "for ever" or until an AARP
> cache ages out. This only happens with "ImageWriter Bug Bypass #2".
>
> This didn't used to happen because the incoming packet has a hop-count
> of one, and you shouldn't glean hop-count-one packets, but EtherTalk
> seemingly now does glean from hop-count-one-packets. Presumably it
> ignores the hop-count now that the Internet Router V3 broke the
> seven-or-eight-year-old "zero-hop-count-is-this-net" assumption.
> A bit of "defensive programming" wouldn't go astray guys. :-)
>
The version of EtherTalk on the responding machine (Macintosh or printer)
does not appear to matter. Presumably the gleaning behaviour of EtherTalk
when responding has been incorrect for sometime, if not for ever. The
correct behaviour would be for the responding machine to respond direct to
the ethernet address of the requesting machine, if the requesting machine
is on the same network range. Then the change of gleaning behaviour of the
requesting EtherTalk between version 2.5.3 and 2.5.5 could not matter.

Jens-Uwe Mager

unread,

Feb 9, 1994, 4:23:44 AM2/9/94

to

In article <36...@wcc.oz.au> (comp.protocols.appletalk), t...@wcc.oz.au (Tom Evans) writes:
> 3. From: j...@anubis.han.de (Jens-Uwe Mager)

> Most routers I know compensate for that by forwarding the NBP
> reply to the originator of the broadcast request that caused the

> lookup (by keeping a recently-seen-BrRq cache).
>
> I've never heard of (3) until today. The AIR and FastPath do (1), but
> do it differently to each other!

Oops, I formulated it wrong, I wanted to say:

Most routers I know compensate for that in some way, we do by

forwarding the NBP reply to the originator of the broadcast request

that caused the lookup (by keeping a recently-seen-BrRq cache).

Sorry for the generalization.

John Davy

unread,

Feb 17, 1994, 1:28:03 AM2/17/94

to

I have just downloaded Network Software Installer version 1.4.2, now that
it is finally on ftp.apple.com. EtherTalk Phase 2 version 2.5.6 has the
same bug as version 2.5.5. Replacing these versions with version 2.5.3
makes the bug go away. I have already reported on this bug in detail to
this news group. Briefly, the bug is that under some circumstances the aarp
table is incorrect. This bug appears to have been deliberately introduced
because of the zero hop count behaviour of the Apple Internet Router
version 3.0. I am still trying to obtain version 2.5.4 of EtherTalk Phase 2
for testing purposes. It is part of Network Software Installer version 1.4.
If any one has it, could they please email me a copy? Thanks in advance.

John Davy

unread,

Feb 18, 1994, 12:48:25 AM2/18/94

to

Thanks to Ainsley Calladine who kindly emailed me a copy of EtherTalk Phase
2 version 2.5.4, I have been able to verify that the aarp bug was
introduced in the transition from version 2.5.4 to 2.5.5. I believe that
the bug was deliberately introduced, by ignoring the hop count, because of
the following extract from the NSI 1.4.1 release notes.

>EtherTalk and TokenTalk version 2.5.5
>=====================================
>
>A change was made to the AMT table maintenance for supporting Router 3.0 (which >uses 0 hop counts).

At this stage I am running with NSI 1.4.2, with EtherTalk Phase 2 version
2.5.6 replaced with version 2.5.4.

When Webster releases a version of the Multiport gateway code which handles
the Image Writer bug in a different way, I will experiment with the more
recent versions of EtherTalk Phase 2 and report the results to this news
group.

John Davy

unread,

Feb 23, 1994, 12:42:24 AM2/23/94

to

Version 2.15 of Megan (the code for Webster Multiport gateways) is now
available on ftp.connect.com.au. I have downloaded it and rebooted the
three Webster Multiport gateways that are on my network using this new
code. The way the Image Writer NBP bug is handled has changed with this
release of the code.

With previous versions of Megan, and with EtherTalk Phase 2 version 2.5.5
or 2.5.6, I had reported to this news group that Macs on the same piece of
ethernet exchanged packets via the Multiport gateway rather than direct to
each other, unless the 60 second aarp cache ageing was allowed to occur
after a search in a zone. Because of packet loss on my network, this
behaviour caused me severe problems for reasons which I have described in
earlier postings.

Using versions of EtherTalk Phase 2 with version numbers less than or equal
to 2.5.4 made the problems disappear. I can now report that use of Megan
version 2.15 makes the problems disappear even when EtherTalk Phase 2
version 2.5.6 is used. Although I haven't tested it, this presumably also
applies with version 2.5.5 of EtherTalk Phase 2.

Why did these problems occur? Here is my understanding of the situation.
AppleTalk ImageWriters have a bug that causes network problems if they
receive NBP packets with short DDP headers. Webster's programmers
discovered by experiment that the bug could be overcome by inserting the
AppleTalk address of the sending Macintosh into the DDP header of NBP
packets that the router was forwarding in place of the router's AppleTalk
address. This solved the bug by forcing a long DDP header to be used since
the sender's AppleTalk address in the DDP header was different from the
router's AppleTalk address. Short DDP packets don't contain the sender's
and receiver's addresses. The Multiport gateway increments the hop count
even on the last hop, unlike the Apple Internet Router 3.0. It appears that
EtherTalk Phase 2 and AppleTalk used the hop count to determine how many
routers the packet had come via, for both maintaining the aarp table by
gleaning and maintaining a best router cache. Checking the nonzero hop
count, stopped EtherTalk from associating the Multiport gateway's ethernet
address with the Macintosh's AppleTalk address in the long DDP packet of
the forwarded NBP packet.

When the AppleTalk Internet Router 3.0 changed to not incrementing the hop
counter on the last hop, both the aarp cache and best router cache were
broken. Because the hop count was now meaningless for determining if the
packet had come from the local network range, EtherTalk Phase 2 versions
2.5.5 and later ignore the hop count and correctly check if the network
number is in the local network range. When the address of the responding
Macintosh is in the local network range, EtherTalk Phase 2 version 2.5.5
and later associates the responding Macintosh's AppleTalk address in the
long DDP header with the ethernet address of the Multiport gateway by
"gleaning", instead of performing an aarp request. The best router cache
problem was also "fixed" for use with AIR 3.0 in AppleTalk version 58.1.1
which came with EtherTalk Phase 2.5.5 on Network Software Installer version
1.4.1.

Megan version 2.15 overcomes the AppleTalk ImageWriter NBP bug by forcing
NBP packets which are being forwarded onto LocalTalk legs to always have
long DDP headers. This solves the aarp table problem in EtherTalk Phase 2
in two ways. First, the fix only applies on LocalTalk legs. Secondly, there
is no change of sending address in the long DDP header. This is not as nice
a way to solve the problem, because it forces a link between different
layers of the well layered code in Megan.

--
| John Davy CSIRO Division of Building, Construction and Engineering |

| Internet: jo...@mel.dbce.csiro.au or John...@dbce.csiro.au |