What are the commands and best ways to trouble-shoot the network and
HBAs ? "aggr-1" is the virtual interface aggregate of two interfaces
at the server. "atlanta" is one of the two-nodes in the Sun
Cluster. Thanks, Bill
Mar 1 12:24:07 atlanta in.mpathd[309]: [ID 23451 daemon.error] NIC
failure detected on aggr-1of group ipmp0
Mar 1 12:24:07 atlanta Cluster.PNM: [ID 23452 daemon.notice] sc_ipmp0:
state transition from OK to DOWN.
Mar 1 12:24:23 atlanta in.mpathd[309]: [ID 300212 daemon.error] NIC
repair detected on aggr-1 of group ipmp0
Mar 1 12:24:23 atlanta Cluster.PNM: [ID 234234 daemon.notice]
sc_ipmp0: state transition from DOWN to OK.
its NIC failure
see the output of cmd
# fmadm faulty .... it will show u carrect result
# prtdiag -v
if this NIC is onboard u have to replace motherboard
before that try to connect with new cable
#fmadm faulty command returns nothing. Does that mean we are good at
all the network parts in the server including the HBAs ?
#prtdiag -v doesn't show any errors either.
If we are to swap to different cable, do we swap out one at time on
the two cables since the aggregate virtual interface is built on 2
physical connections ?
Any command or thing we will need when swapping out these cables ?
Thanks, Bill
what does the link status show?
dladm show-dev
it might be a bad cable or switch port. I find I have to swap more cables
than I do network adapters, so I look there first.
We have the following outputs for "dladm" command. How do they
look ? Thanks, Bill
# dladm show-dev
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: up speed: 1000 Mbps duplex: full
e1000g2 link: up speed: 1000 Mbps duplex: full
e1000g3 link: unknown speed: 0 Mbps duplex: half
e1000g4 link: unknown speed: 0 Mbps duplex: half
# dladm show-link
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
aggr-1 type: non-vlan mtu: 1500 aggregation: key 1
# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:21:28:4f:f8:3c (auto)
device address speed
duplex link state
e1000g0 0:21:28:4f:f8:3c 1000 Mbps
full up attached
e1000g2 0:21:28:4f:f8:3e 1000 Mbps
full up attached
the solution was to update the system ,
u r talking about IPMP then surely d same issue
if possible apply latest recommended patch or patch-cluster.
we faced same issue in IPMP that interface is automatically getting swithed over , same kind u r facing now, we did all the things . i dont remember what was exactly the kernel level.
the problem was solved after updation.
it worked for us,
the solution was to update the system ,
u r talking about IPMP then surely d same issue
if possible apply latest recommended patch or patch-cluster.
we faced same issue in IPMP that interface is automatically getting swithed over , same kind u r facing now, we did all the things . i dont remember what was exactly the kernel level.
problem was solved after applyng the patch-cluster
it surely worked for me
Interesting point. We have the following Solaris 10 release version
and kernel patch level. Do they look similiar to yours before or
after the patch application. Thanks, Bill
Oracle Solaris 10 9/10
kernel level: 144488-06
Another thing to consider is network congestion , depending on the
configuration of IPMP it is possible the a bust router or target of
the ping in a probe based configuration is droping the ping packets,
it is possible that this will cause the interface to be failed over
and then as traffic goes away, fail back, Try increasing the value
in /etc/default/mpathd to FAILURE_DETECTION_TIME=20000 (20 seconds) or
more.
The current value for FAILURE_DETECTION_TIME is 10000. Do we need to
restart any process or program after increasing the value
to 20000 ? Thx, Bill
#
# Time taken by mpathd to detect a NIC failure in ms. The minimum time
# that can be specified is 100 ms.
#
FAILURE_DETECTION_TIME=10000
I believe you do need to restart mpathd.
looks ok- there's nothing goofy like 100Mb half duxplex or nonsense like
that. Does the switch they're connected to show that the ports are happy?