CWSIS1546I: Message Engine errors when second server started on a cluster

Kim.N...@ca.com

unread,

Mar 7, 2007, 10:05:37 PM3/7/07

to

I have a cluster defined with 2 servers, each associated with a remote node. I have configured a message engine with an MS SQL data source. I used the SIB DDL generator to configure the SIB tables and loaded them into the database. When I start the first server on the cluster, the ME starts correctly (the green arrow shows), and I can execute my application on the server. However, when I start the second server on the cluster, the ME arrow goes to partial start, and I see the following errors displayed on the System.out log on the 2nd node:

[3/7/07 16:21:11:094 EST] 0000002b SibMessage I [IMSBus:im_cluster.000-IMSBus] CWSIS1546I: The messaging engine, ME_UUID=F71BDEECA9C65CE4, INC_UUID=1b3591c72e463809, has lost an existing lock or failed to gain an initial lock on the data store.
[3/7/07 16:21:11:109 EST] 0000002b SibMessage I [IMSBus:im_cluster.000-IMSBus] CWSIS1538I: The messaging engine, ME_UUID=F71BDEECA9C65CE4, INC_UUID=1b3591c72e463809, is attempting to obtain an exclusive lock on the data store.

I can still execute the requests going to the first server, but all requests going to the second server fail.

I have switched the order of starting the servers and still get the same error each time I start the 2nd node (no matter which node it is). I have tried dropping the SIB tables and re-generating them, to no avail. It doesn't look like a problem with the data. It almost looks as if the 1st server gets a lock on the ME and won't give it up.
If I define a second ME, I get the same pattern, with errors reported on both ME's.

3/7/07 16:21:21:125 EST] 0000002b SibMessage I [IMSBus:im_cluster.000-IMSBus] CWSIS1546I: The messaging engine, ME_UUID=F71BDEECA9C65CE4, INC_UUID=1b3591c72e463809, has lost an existing lock or failed to gain an initial lock on the data store.
[3/7/07 16:21:21:141 EST] 0000002b SibMessage I [IMSBus:im_cluster.000-IMSBus] CWSIS1538I: The messaging engine, ME_UUID=F71BDEECA9C65CE4, INC_UUID=1b3591c72e463809, is attempting to obtain an exclusive lock on the data store.
[3/7/07 16:21:21:516 EST] 0000002c SibMessage I [IMSBus:im_cluster.001-IMSBus] CWSIS1546I: The messaging engine, ME_UUID=087FE97502A32E28, INC_UUID=1b3dd1c72e463ae8, has lost an existing lock or failed to gain an initial lock on the data store.
[3/7/07 16:21:21:516 EST] 0000002c SibMessage I [IMSBus:im_cluster.001-IMSBus] CWSIS1538I: The messaging engine, ME_UUID=087FE97502A32E28, INC_UUID=1b3dd1c72e463ae8, is attempting to obtain an exclusive lock on the data store.

I have searched for this error on the forum and elsewhere, but have not seen anything that really seems to resolve my issue.

Kim.N...@ca.com

unread,

Mar 8, 2007, 11:07:48 AM3/8/07

to

Well, I have narrowed it down a bit, I believe.

When the first server comes up, the System.out log on the associated node reads:

[3/8/07 10:54:37:219 EST] 00000028 SibMessage I [IMSBus:im_cluster.000-IMSBus] CWSIS1537I: The messaging engine, ME_UUID=067733C71A6E4ACD, INC_UUID=1b43135532424779, has acquired an exclusive lock on the data store.

The INC_UUID 1b43135532424779 is written to the SIBOWNER table.

When the second server comes up, the System.out log on the associated node reads:

[3/8/07 11:01:20:406 EST] 00000026 SibMessage I [IMSBus:im_cluster.000-IMSBus] CWSIS1546I: The messaging engine, ME_UUID=067733C71A6E4ACD, INC_UUID=1b599cf43247239d, has lost an existing lock or failed to gain an initial lock on the data store.

This is a different INC_UUID. It is never written to the SIBOWNER table (because we never get a lock on the data store, I guess?).

I am assuming that there is perhaps some aspect of configuration which would allow the 2nd ME_UUID, INC_UUID identity to have access to the data store also. Is this correct? Does anyone know what is wrong here?

Paul Ilechko

unread,

Mar 8, 2007, 11:35:58 AM3/8/07

to

When you create a bus and assign a cluster to the bus, you should by
default only have one messaging engine running in one server in the
cluster, because the default HA policy for messaging engines is "1 of
N". If you were to change that policy you would see the problem that you
are having, which is lock contention in the database, so that is not
advisable.

To get true WLM across the cluster, you need to create a second
messaging engine and define the policies to prefer for each of the
engines a different app server in the cluster. Each defined messaging
engine should have its own unique message store which basically provides
a logical partitioning of the destinations supported by the Bus.

Kim.N...@ca.com

unread,

Mar 8, 2007, 5:35:41 PM3/8/07

to

Thank you very much, Paul. That was quite clear. I created 2 ME's and the policies governing them. Now everything works fine.

After my initial get-it-going configurations, I stayed away from CloudScape. However, it occurs to me that there is perhaps nothing wrong with it in this configuration, where each ME is associated with a specific server. I tested failover as well as load balancing, and it works fine. When one node goes down, the requests from a session "stuck" to the node are re-directed nicely to the other node, which presumably executes them on its own ME. There doesn't seem to be any "moving" of the ME from the node which went down to its backup node.

So -- 2 follow-on questions:

1) Is CloudScape in fact a permissible database to use in this configuration?
2) In cases where each cluster member has its own ME, one cluster goes down, and the backup picks up, the business of the ME "moving" from the down cluster to the backup does not pertain -- is that correct? This is something that has confused me a bit.

Thanks very much for your help.

Paul Ilechko

unread,

Mar 8, 2007, 5:55:20 PM3/8/07

to

Kim.N...@ca.com wrote:
> Thank you very much, Paul. That was quite clear. I created 2 ME's and
> the policies governing them. Now everything works fine.
>
> After my initial get-it-going configurations, I stayed away from
> CloudScape. However, it occurs to me that there is perhaps nothing
> wrong with it in this configuration, where each ME is associated with
> a specific server. I tested failover as well as load balancing, and
> it works fine. When one node goes down, the requests from a session
> "stuck" to the node are re-directed nicely to the other node, which
> presumably executes them on its own ME. There doesn't seem to be any
> "moving" of the ME from the node which went down to its backup node.
>
> So -- 2 follow-on questions:
>
> 1) Is CloudScape in fact a permissible database to use in this
> configuration?

It's supported, but we don't really recommend it for production
environments. It's fine for test purposes but lacks the tools of more
industrial strength products.

2) In cases where each cluster member has its own ME,
> one cluster goes down, and the backup picks up, the business of the
> ME "moving" from the down cluster to the backup does not pertain --
> is that correct? This is something that has confused me a bit.

I'm not sure I fully understand your question. The ME in a cluster is a
singleton service. If the cluster member (appserver) running the ME goes
down, the HAManager will find a new server to run the process and will
start it on that cluster member. An ME will not move between clusters.
If you have two MEs in the same cluster, and you configure them to each
prefer one application server in a two member cluster, and one server
goes down, then both MEs will in fact run in the same appserver.

Martin Phillips

unread,

Mar 9, 2007, 3:32:43 AM3/9/07

to

> 1) Is CloudScape in fact a permissible database to use in this
> configuration?

You can use cloudscape, but you have to use network cloudscape as the
database has to be available from anywhere in the cluster (to allow the ME
that fails over to another server to reconnect to its own database).

> 2) In cases where each cluster member has its own ME, one cluster goes
> down, and the backup picks up, the business of the ME "moving" from the
> down cluster to the backup does not pertain -- is that correct? This is
> something that has confused me a bit.

(I hope this helps to answer your question!)

There are several different ways of setting up a Cluster with relation to
the SIBus.

1: Simple HA > You have one ME in the cluster where the ME can run on any
server in the cluster - the ME is always available. if the server the ME is
running on fails it will be moved to another running server and again be
available for work.

2: WLM (workload balancing) without HA > You have one ME for every server in
the cluster. The HA Policies are configured so that each ME can run on only
a single 'prefered' server in the cluster. In this case if a server goes
down then the ME that is 'pinned' to it is also unavailable, as are any
messages that are on the partitions of destinations associated with that ME
(this is a possible configuration, but I can't think of a good reason for
sitting it up in preference to option 3 below).

3: WLM with HA > You have one ME for every server in the cluster. The HA
Policies are configured so that each ME prefers to run on a seperate server
in the cluster, so when things are all working you have one ME per server.
HA Policies are configured so that every ME can failover to one or more
other servers in the cluster. If a server fails, the ME that was running
there will failover to another server in the cluster. When the server that
failed comes back up, te ME should failback to that server (assuming the HA
Policy is setup with failback enabled).

Section 11.3 in the WAS 6 Admin Redbook and the WAS Infocenter do have good
information on this sort of stuff.

--

Martin Phillips
mphi...@uk.ibm.com

<Kim.N...@ca.com> wrote in message
news:26474926.1173393404...@ltsgwas010.sby.ibm.com...

Kim.N...@ca.com

unread,

Mar 9, 2007, 1:21:12 PM3/9/07

to

Thanks very much for your terrific answers, Martin and Paul. I had Option 2 configured but was having a problem with Option 3. The Admin Redbook Section 11 helped me out there. It is now working nicely for load balancing and failover, and I can see the ME failing over and back.