Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

strange route issue in powerha 5.4?

5 views
Skip to first unread message

rs6000er

unread,
Jun 24, 2009, 11:22:48 AM6/24/09
to
hi all

we upgraded hacmp(powerha) from 5.2 to 5.4 recently .

during the failover testing, we found a strange network issue. after
standby node took service ip address (172.15.100.8) online at standby
NIC, we were able to log in the standby node by telnet 172.15.100.8
which stays at standby node standyb NIC.

when we tried to run ping or traceroute to a ip address outside of of
our network, for example google.com or ibm.com, the NIC that holds
service ip address were not able to send any packet out at all.

instead, the primay NIC that holds boot address of standby node
(172.15.103.79) will take in place and send the ping packet out or
traceroute to out side of world.

on the other hand, any testing within our intranet acted normal. we
checked dns server and routing table of standby node and gateway
setup, they were all good.

the follwoing info is the routing configuration from standby node.

Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Route Tree for Protocol Family 2 (Internet):
default 172.15.100.150 UG 1 28215 en4 - -
127/8 127.0.0.1 U 9 921983 lo0 - -
172.15.100.0 172.15.103.79 UHSb 0 0 en4 - - =>
172.15.100/22 172.15.103.79 U 6 4212671 en4 - -
172.15.103.79 127.0.0.1 UGHS 0 671937 lo0 - -
172.15.103.255 172.15.103.79 UHSb 0 288 en4 - -
172.16.60.0 172.16.60.60 UHSb 0 0 en5 - - =>
172.16.60/22 172.16.60.60 U 2 2712283 en5 - -
172.16.60.60 127.0.0.1 UGHS 0 382645 lo0 - -
172.16.63.255 172.16.60.60 UHSb 0 1 en5 - -

Route Tree for Protocol Family 24 (Internet v6):
::1 ::1 UH 0 236 lo0 - -


as you may see, the default routing is set up for en4 (the primay NIC
that holds boot ip address). is it possible that we manally set up a
route entry to let en5 ( the standby NIC that will holds the service
ip address when failover) go out side as well like this?

Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Route Tree for Protocol Family 2 (Internet):
default 172.15.100.150 UG 1 28217 en4 - -
127/8 127.0.0.1 U 10 922176 lo0 - -
172.15.100.0 172.15.103.79 UHSb 0 0 en4 - - =>
172.15.100/22 172.15.103.79 U 6 4213902 en4 - - =>
172.15.100/22 172.15.100.8 UG 0 6 en5 - -
172.15.103.79 127.0.0.1 UGHS 0 672133 lo0 - -
172.15.103.255 172.15.103.79 UHSb 0 288 en4 - -
172.16.60.0 172.16.60.60 UHSb 0 0 en5 - - =>
172.16.60/22 172.16.60.60 U 2 2712830 en5 - -
172.16.60.60 127.0.0.1 UGHS 0 382726 lo0 - -
172.16.63.255 172.16.60.60 UHSb 0 1 en5 - -

Route Tree for Protocol Family 24 (Internet v6):
::1 ::1 UH 0 236 lo0 - -

thanks in advance,

Frank

casey b

unread,
Jun 25, 2009, 8:53:23 AM6/25/09
to
Hello Frank,

There is a powerHA forum hosted on IBM's developer works.
You may want to try posting future questions in that forum.

http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1611

If you don't get an answer here, or there, IBM support would be glad
to talk to you.


Let's start here:


172.15.100/22 172.15.103.79 U 6 4213902 en4 - - =>
172.15.100/22 172.15.100.8 UG 0 6 en5 - -

I wouldn't do that myself. AIX should create several routes when you
create an alias,
or change the base address of an interface.
Adding another route like that will at best mask the root problem.

It is hard for me to understand what your configuration is. You
didn't mention if it
was IPAT via replacement, or IPAT via aliasing. The addresses and
subnets that
you gave seem to indicate IPAT via replacement. But the routing table
you gave
indicates that there is another address still on en5.
(If it was IPAT via replacement, I would have expected that address to
be removed)


So, here is what I would do:
1) Check my HA config, and make sure that my addresses were in the
correct networks for
replacement, or aliasing.

2) Check hacmp.out to see where the ifconfig for the service label
is, and see if there are any errors
or irregularities.

3) Check my AIX level, and see if I am backlevel.
(Not that a blind upgrade will help...But you may be dealing with an
AIX problem that has already been
solved, and released)

Like I mentioned before, if this doesn't get you further to resolving
the problem,
then IBM support can take a look at your software levels, and all of
the HA logs to determine
a little bit better what is happening.

Hope this helps,
Casey

0 new messages