Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

NLB and "Converging for an unknown reason"

1,022 views
Skip to first unread message

Paul Barber

unread,
Jan 23, 2004, 4:03:55 PM1/23/04
to
I have three Windows 2003 Servers set up in a Network Load
Balancing Cluster. Each server has two network cards,
a "public" interface with a 10.0.0.x address and
a "heartbeat" interface with a 192.168.1.x address. The
public is where the general traffic comes from, the
heartbeat is how they speak with each other. I put each
server's heartbeat IP address in all of their HOSTS files.

The cluster is set up in Unicast mode, configured for a
single, unique IP address with no affinity in the port
rules.

Three times now, I have had a server just "fall off" the
public (10.0.0.x) network. Most recently it was the
server with cluster Host ID 2. In the event logs of the
servers, they all have a entry for WLBS event ID #65
stating "Initiating convergence on host 'x'. Reason:
Host 'y' is converging for an unknown reason." where 'x'
always refers to the ID of the server whose log I'm
reading and 'y' is one of the other two (but not
consistent). So Host 1 says it's converging on itself
because 3 is converging for an unknown reason, but host 2
say it's converging on itself because 1 in converging for
an unknown reason.

In the end 1 and 3 are online and functioning, and 2 won't
respond over its 10.0.0.x interface (though it will ping
on it's 192.168.1.x interface). NLB manager on 2 shows 2
as the only host participating in the cluster and its
state is "Converging", from which it never changes even
after stopping and starting NLB on that host. NLB manager
on 1 and 3 both show 1 and 3 in the cluster,
both "Converged".

Microsoft's TechNet says that if NLB Event ID 65 is
followed by NLB 28, 29, 71 or 72, then everything's fine.
If not (which is my case) I should "use Network Load
Balancing (NLB) Manager to determine the specific
configuration problem and then correct as appropriate."

Does anyone know what I can do?

Marc Reynolds [MSFT]

unread,
Jan 23, 2004, 4:46:12 PM1/23/04
to
First to clear up a misconception - NLB's heartbeat uses the load balanced
network interface. So in your case the heartbeat is using the 10.0.0.x
address not the 192.168.1.x address.

Second you issues sounds like there may be a problem with whatever the
10.0.0.x interface is pluggged into (switch?). Try plugging the load
balanced interfaces of all three servers into a dumb hub and uplink the hub
to your switch or router. If you still have a problem post your wlbs
displays from all 3 servers.

--

Thanks,
Marc Reynolds
Microsoft Technical Support

This posting is provided "AS IS" with no warranties, and confers no rights.


"Paul Barber" <anon...@discussions.microsoft.com> wrote in message
news:2d3b01c3e1f4$693b9c50$a301...@phx.gbl...

Paul Barber

unread,
Jan 24, 2004, 1:26:16 PM1/24/04
to
After reading your reply and looking at Windows' "Help &
Support" for converging problems, I looked at my switch
and network card settings. All the server NICs were set
to Auto speed and duplex whereas the switch was set to
100mb / full duplex. I set the NICs for 100 / full as
well. So far, so good, but I'll wait a week or so of
uptime before declaring the problem solved.

Thanks for the reply.

>.
>

0 new messages