Re: AIX 5.2 ml03 with HA/ES 5.1 cluster resync

bigtiny

unread,

Oct 1, 2004, 9:36:42 AM10/1/04

to

You say that clcomd is running on node2...have you verified this via
lssrc ?
There is a known AIX problem that sometimes causes clcomd NOT to start
automatically. If this happens, you'll see this kind of error. The
other errors may be side effects of this problem...I wouldnt' worry
about trying to solve them until you solve this initial problem. Also,
did you reboot node2 after installing HACMP? If not, then do so
before trying to start cluster services.

bigtiny

empet...@yahoo.com (Pete's) wrote in message news:<6724a51f.04093...@posting.google.com>...
> Node1 - p630-6C4, AIX 5.2 ml03 & hacmp 5.1(patched to 5.1.0.5). This
> node currently running in a 1 node cluster, however, node 2 has been
> configured in the HA configuration. clcomd is currently active.
>
> Node2 - p660-6H1, AIX 5.2 ml03 & hacmp 5.1(patched to 5.1.0.5). This
> node is currently not part of the cluster.
>
> Both nodes are connected to shared scsi disk, heartbeat over serial
> connection, two 10/100 ethernet adapters.
>
> I've performed the following on node1 & 2,
> stty < /dev/tty1
> Results were as expected. That command execution on Node1 waited
> until it was entered on Node2. I successfully ping'd standby adapters
> and boot/service adapters. /usr/es/sbin/cluster/etc/clhosts &
> /usr/es/sbin/cluster/etc/rhosts files are the same on both nodes.
> /etc/hosts on both nodes are similar.
> Note, on Node1, HA is up and running with the resources. The problem
> I'm having is synchronization, here's the error:
>
> WARNING: Unable to communicate with the remote node: Node2.
> Please check that node: Node2 has the
> /usr/es/sbin/cluster/etc/rhosts
> file configured and the clcomdES subsystem running.
>
> It then gives libodm: Teh specified object class does not exist. Check
> path name and permissions.
>
> Then gives a warning that the standby adapter is not properly
> configured. Then an error that /dev/tty1 does not exist on node2.
> Then gives an error this time on the stby adapter on Node 2.
>
> clcomd is running on Node2 and the rhosts file is exactly the same on
> Node1 as it is on Node2. Anyone have any ideas? Since I'm adding a
> node to the cluster, do I need to shutdown HA on Node1 and then
> perform the synchronization?
>
> TIA,
> Pete's

Pete's

unread,

Oct 1, 2004, 5:08:18 PM10/1/04

to

big...@mac.com (bigtiny) wrote in message news:<c4435822.04100...@posting.google.com>...

> You say that clcomd is running on node2...have you verified this via
> lssrc ?
> There is a known AIX problem that sometimes causes clcomd NOT to start
> automatically. If this happens, you'll see this kind of error. The
> other errors may be side effects of this problem...I wouldnt' worry
> about trying to solve them until you solve this initial problem. Also,
> did you reboot node2 after installing HACMP? If not, then do so
> before trying to start cluster services.
>
> bigtiny

Snipped some text.

I verified it once more, it is showing as active. After installing
HA, OS maintenance levels and patches, the server was rebooted.

TIA
Pete's

bigtiny

unread,

Oct 4, 2004, 12:37:02 PM10/4/04

to

Can you post your cluster configuration?
You've added the node2 and its ethernet adapters to your configuration, right?

bigtiny

empet...@yahoo.com (Pete's) wrote in message news:<6724a51f.04100...@posting.google.com>...

Pete's

unread,

Oct 6, 2004, 9:30:24 AM10/6/04

to

big...@mac.com (bigtiny) wrote in message news:<c4435822.04100...@posting.google.com>...
> Can you post your cluster configuration?
> You've added the node2 and its ethernet adapters to your configuration, right?
>
> bigtiny
>
> empet...@yahoo.com (Pete's) wrote in message news:<6724a51f.04100...@posting.google.com>...
> > big...@mac.com (bigtiny) wrote in message news:<c4435822.04100...@posting.google.com>...
> > > You say that clcomd is running on node2...have you verified this via
> > > lssrc ?
> > > There is a known AIX problem that sometimes causes clcomd NOT to start
> > > automatically. If this happens, you'll see this kind of error. The
> > > other errors may be side effects of this problem...I wouldnt' worry
> > > about trying to solve them until you solve this initial problem. Also,
> > > did you reboot node2 after installing HACMP? If not, then do so
> > > before trying to start cluster services.
> > >
> > > bigtiny
> > Snipped some text.
> >
> > I verified it once more, it is showing as active. After installing
> > HA, OS maintenance levels and patches, the server was rebooted.
> >
> > TIA
> > Pete's

Cluster configuration is below. I have added the ethernet adapters to the config.

TIA,
Pete's

Cluster Topology:

NODE node1:
Network net_ether_01
node1 192.168.3.31
node1_boot 192.168.3.30
node1_stby 192.168.14.38
Network net_rs232_01
node1_tty1 /dev/tty1

NODE node2:
Network net_ether_01
node2_boot 192.168.3.45
node2_stby 192.168.14.5
Network net_rs232_01
node2_tty1 /dev/tty1

Resource Group:

Resource Group Name RG_1
Node Relationship cascading
Site Relationship ignore
Participating Node Name(s) node1 node2
Dynamic Node Priority
Service IP Label node1
Filesystems ALL
Filesystems Consistency Check logredo
Filesystems Recovery Method parallel
Filesystems/Directories to be exported
Filesystems to be NFS mounted
Network For NFS Mount
Volume Groups vg01
Concurrent Volume Groups
Use forced varyon for volume groups, if necessary false
Disks
GMD Replicated Resources
PPRC Replicated Resources
Connections Services
Fast Connect Services
Shared Tape Resources
Application Servers APP_Server_1 APP_Server_2
Highly Available Communication Links
Primary Workload Manager Class
Secondary Workload Manager Class
Delayed Fallback Timer
Miscellaneous Data Resource Group Attributes
Automatically Import Volume Groups false
Inactive Takeover true
Cascading Without Fallback false
SSA Disk Fencing false
Filesystems mounted before IP configured false
Run Time Parameters:

Node Name node1
Debug Level high
Format for hacmp.out Standard

Node Name node2
Debug Level high
Format for hacmp.out Standard

bigtiny

unread,

Oct 7, 2004, 9:47:25 AM10/7/04

to

I take it that 'node1' is your svc label?
Try using a subnet for this different than the boot adapters are
using...this should cause the network to be created as 'aliased'.
There was work done in HACMP 5.1 to make alias'd networks the default,
these use a different subnet for the svc label. There were some
special things one had to do in configuration (I can't remember
exactly what they were...) to configure a traditional network. Anyway,
but the svc label on a different subnet and try to sync/verify...the
other thing you can do is look at your network definition...smitty
hacmp ->> down to change/show network and see if the network is marked
as aliased. IF it is, then this svc label on the same subnet as the
boot is definitely a problem....

Pete's

unread,

Oct 7, 2004, 4:40:54 PM10/7/04

to

big...@mac.com (bigtiny) wrote in message news:<c4435822.04100...@posting.google.com>...

> I take it that 'node1' is your svc label?
> Try using a subnet for this different than the boot adapters are
> using...this should cause the network to be created as 'aliased'.
> There was work done in HACMP 5.1 to make alias'd networks the default,
> these use a different subnet for the svc label. There were some
> special things one had to do in configuration (I can't remember
> exactly what they were...) to configure a traditional network. Anyway,
> but the svc label on a different subnet and try to sync/verify...the
> other thing you can do is look at your network definition...smitty
> hacmp ->> down to change/show network and see if the network is marked
> as aliased. IF it is, then this svc label on the same subnet as the
> boot is definitely a problem....
>
> bigtiny
>

Thanks for the help, this has been ressolved. Bone-headed move on my
part. Problem, in /usr/es/sbin/cluster/etc/rhosts, for whatever
reason I had ip address and host name, removed the hostnames,
dynamically sync'd and viola. In fact, it makes clrsh fail.

Pete's