Re: Weblogic Issue: Error joining cluster (2.4-ee)

Mehmet Dogan

unread,

Oct 23, 2012, 9:33:03 AM10/23/12

to haze...@googlegroups.com

Jack,

Can you check if you set 'hazelcast.socket.bind.any' parameter in your standalone nodes and weblogic server? Two sockets using the same interface should not be able to bind the same port, but if they use different network interfaces or one uses a specific interface (127.0.0.1) and the other binds to any local (0.0.0.0) then they can.

@mmdogan

On Mon, Oct 22, 2012 at 7:23 PM, Jack Gould <jjg...@gmail.com> wrote:

Starting a Hazelcast instance in a Weblogic 10.3.5 server results in errors when joining an existing Hazelcast cluster. When the Weblogic server starts, it does NOT auto increment to port 5704. Instead, it connects as 5701 and disrupts the entire cluster.

This happens in both multicast and IP configurations (have not tried interface-based configuration). Putting hazelcast JARs in Weblogic classpath or inside deployed EAR's APP-INF/lib results in the same behavior.

However, if I start Weblogic, first, and then run the stand-alone nodes, the cluster operates as expected. Details below. Has anyone else experienced this issue?

Thanks, in advance, for any help!

- Jack Gould

Given a three-node cluster, each running stand-alone in a bash terminal:

#====[ Before starting Weblogic ]=============================================#

Oct 22, 2012 8:36:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev]

Members [3] {
Member [127.0.0.1]:5701
Member [127.0.0.1]:5702

Member [127.0.0.1]:5703 this
}

Oct 22, 2012 8:36:17 AM com.hazelcast.impl.LifecycleServiceImpl
INFO: [127.0.0.1]:5703 [dev] Address[127.0.0.1]:5703 is STARTED

When the Weblogic server starts, it does NOT auto increment to port 5704. Instead, it connects as 5701 and disrupts the entire cluster. Below are some relevant messages from Weblogic console, and the Hazelcast terminal windows:

From Weblogic console:

Oct 22, 2012 8:59:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5701 [dev]

Members [3] {
Member [127.0.0.1]:5702

Member [127.0.0.1]:5703
Member [127.0.0.1]:5701 this

}

Oct 22, 2012 8:59:27 AM com.hazelcast.impl.FactoryImpl

WARNING: [127.0.0.1]:5701 [dev] null
java.util.concurrent.TimeoutException
at com.hazelcast.impl.ExecutorManager$MemberCall.get(ExecutorManager.java:611)

From the terminal running the original 5701 node:

#====[ Before starting Weblogic ]=============================================#

Oct 22, 2012 8:36:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5701 [dev]

Members [3] {
Member [127.0.0.1]:5701 this
Member [127.0.0.1]:5702

Member [127.0.0.1]:5703
}

#====[ After starting Weblogic ]==============================================#
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.Connection

INFO: [127.0.0.1]:5701 [dev] Connection [Address[127.0.0.1]:5703] lost. Reason: java.io.EOFException[null]
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.ReadHandler

WARNING: [127.0.0.1]:5701 [dev] hz._hzInstance_1_dev.IO.thread-2 Closing socket to endpoint Address[127.0.0.1]:5703, Cause:java.io.EOFException
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.ConnectionManager

INFO: [127.0.0.1]:5701 [dev] 51947 accepted socket connection from /127.0.0.1:5703
Oct 22, 2012 8:59:14 AM com.hazelcast.cluster.MembersUpdateCall

WARNING: [127.0.0.1]:5701 [dev] Received MembersUpdateCall from Address[127.0.0.1]:5702, but current master is Address[127.0.0.1]:5701
Oct 22, 2012 8:59:29 AM com.hazelcast.impl.PartitionManager

WARNING: [127.0.0.1]:5701 [dev] This is the master node and received a PartitionRuntimeState from Address[127.0.0.1]:5702. Ignoring incoming state!
[ above message repeats every 10 seconds until shutdown ]

From the terminal running the 5703 node:

#====[ Before starting Weblogic ]=============================================#

Oct 22, 2012 8:36:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev]

Members [3] {
Member [127.0.0.1]:5701
Member [127.0.0.1]:5702

Member [127.0.0.1]:5703 this
}

Oct 22, 2012 8:36:17 AM com.hazelcast.impl.LifecycleServiceImpl
INFO: [127.0.0.1]:5703 [dev] Address[127.0.0.1]:5703 is STARTED

Oct 22, 2012 8:36:17 AM com.hazelcast.impl.management.ManagementCenterService
INFO: [127.0.0.1]:5703 [dev] Hazelcast will connect to Management Center on address: http://127.0.0.1:8080/mancenter/

#====[ After starting Weblogic ]==============================================#
Oct 22, 2012 8:59:07 AM com.hazelcast.nio.SocketAcceptor

INFO: [127.0.0.1]:5703 [dev] 5703 is accepting socket connection from /127.0.0.1:51938
Oct 22, 2012 8:59:07 AM com.hazelcast.nio.ConnectionManager

INFO: [127.0.0.1]:5703 [dev] 5703 accepted socket connection from /127.0.0.1:51938
Oct 22, 2012 8:59:07 AM com.hazelcast.nio.ConnectionManager

WARNING: [127.0.0.1]:5703 [dev] Connection [/127.0.0.1:5701 -> Address[127.0.0.1]:5701] live=true, client=false, type=MEMBER is already bound to Address[127.0.0.1]:5701

Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
WARNING: [127.0.0.1]:5703 [dev] New join request has been received from an existing endpoint! => Member [127.0.0.1]:5701 Removing old member and processing join request...

Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] Removing Address Address[127.0.0.1]:5701

Oct 22, 2012 8:59:08 AM com.hazelcast.impl.Node
INFO: [127.0.0.1]:5703 [dev] ** setting master address to Address[127.0.0.1]:5702

Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev]

Members [2] {
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 this

}

Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager

INFO: [127.0.0.1]:5703 [dev] Removing Address Address[127.0.0.1]:5701
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.Connection

INFO: [127.0.0.1]:5703 [dev] Connection [Address[127.0.0.1]:5701] lost. Reason: Explicit close
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.SocketAcceptor

INFO: [127.0.0.1]:5703 [dev] 5703 is accepting socket connection from /127.0.0.1:51947
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.ConnectionManager

INFO: [127.0.0.1]:5703 [dev] 5703 accepted socket connection from /127.0.0.1:51947
Oct 22, 2012 8:59:14 AM com.hazelcast.cluster.ClusterManager

INFO: [127.0.0.1]:5703 [dev]

Members [3] {
Member [127.0.0.1]:5702

Member [127.0.0.1]:5703 this
Member [127.0.0.1]:5701

}

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/cUL6fYxZNbsJ.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.

Message has been deleted

Mehmet Dogan

unread,

Oct 24, 2012, 8:14:58 AM10/24/12

to haze...@googlegroups.com

Cause of second issue is that second instance is selected '10.20.84.213' as its local address and when first node tries to connect, second replies with wrong bind request message. Default value of 'hazelcast.socket.bind.any' is true, so second node accepts connections to all local interfaces. You should either define '10.20.84.213' in member list or you should configure correct interface.

<tcp-ip enabled="true">

<interface>127.0.0.1</interface>

<interface>10.20.84.213</interface>

</tcp-ip>

OR

</interfaces>

Can you also post debug/finest level node start logs of first issue? It should look like;

15:14:03,646 INFO [AddressPicker] - Interfaces is enabled, trying to pick one address matching to one of: [127.0.0.1]

15:14:03,647 WARN [AddressPicker] - Picking loopback address [127.0.0.1]; setting 'java.net.preferIPv4Stack' to true.

15:14:03,656 DEBUG [AddressPicker] - inet reuseAddress:true

15:14:03,658 DEBUG [AddressPicker] - Trying to bind inet socket address:/127.0.0.1:5701

15:14:03,660 DEBUG [AddressPicker] - Bind successful to inet socket address:/127.0.0.1:5701

15:14:03,661 INFO [AddressPicker] - Picked Address[127.0.0.1]:5701, using socket ServerSocket[addr=/127.0.0.1,localport=5701], bind any local is false

15:14:03,661 DEBUG [AddressPicker] - Using public address the same as the bind address. Address[127.0.0.1]:5701

@mmdogan

On Tue, Oct 23, 2012 at 6:37 PM, Jack Gould <jjg...@gmail.com> wrote:

Mehmet,

Thanks for the reply. I do not have that variable defined. Below is my complete hazelcast.xml file (without license). It is basically the 2.4-ee version, modified for IP-only connectivity, with the addition of a near-cache map configuration. There are no hazelcast properties defined as JVM arguments for my Weblogic server. I have deployed hazelcast.xml in a directory added to Weblogic's startup CLASSPATH.

While trying to work around the original issue, I am now encountering a different issue. Running two Weblogic instances, each as hazelcast nodes with the below configuraiton, I get the following messages when the second instances creates the second instance:

From console of Weblogic 1:

Oct 23, 2012 11:17:35 AM com.hazelcast.nio.ConnectionManager
INFO: [127.0.0.1]:5701 [dev] 58369 accepted socket connection from /127.0.0.1:5702

Oct 23, 2012 11:17:35 AM com.hazelcast.nio.Connection
INFO: [127.0.0.1]:5701 [dev] Connection [Address[127.0.0.1]:5702] lost. Reason: java.io.EOFException[null]

From console of Weblogic 2:

[10.20.84.213]:5702 [dev] Wrong bind request from Address[127.0.0.1]:5701! This node is not requested endpoint: Address[127.0.0.1]:5702

Hazelcast.xml:

<?xml version="1.0" encoding="UTF-8"?>


<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-2.4.xsd"

xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<license-key>**************************</license-key>
<group>
<name>dev</name>

<password>dev-pass</password>
</group>
<management-center enabled="true">http://127.0.0.1:8080/mancenter</management-center>

<network>
<port auto-increment="true">5701</port>
<outbound-ports>


<ports>0</ports>
</outbound-ports>

<join>
<multicast enabled="false">
<multicast-group>224.2.2.3</multicast-group>

<multicast-port>54327</multicast-port>
</multicast>
<tcp-ip enabled="true">

<interface>127.0.0.1</interface>
</tcp-ip>
<aws enabled="false">

<access-key>my-access-key</access-key>
<secret-key>my-secret-key</secret-key>


<region>us-west-1</region>



<hostHeader>ec2.amazonaws.com</hostHeader>


<security-group-name>hazelcast-sg</security-group-name>
<tag-key>type</tag-key>

<tag-value>hz-nodes</tag-value>
</aws>
</join>

<interfaces enabled="false">
<interface>10.10.1.*</interface>
</interfaces>
<ssl enabled="false" />
<socket-interceptor enabled="false" />

<symmetric-encryption enabled="false">


<algorithm>PBEWithMD5AndDES</algorithm>


<salt>thesalt</salt>


<password>thepass</password>


<iteration-count>19</iteration-count>
</symmetric-encryption>
<asymmetric-encryption enabled="false">


<algorithm>RSA/NONE/PKCS1PADDING</algorithm>


<keyPassword>thekeypass</keyPassword>

<keyAlias>local</keyAlias>


<storeType>JKS</storeType>

<storePassword>thestorepass</storePassword>


<storePath>keystore</storePath>
</asymmetric-encryption>

</network>
<partition-group enabled="false"/>
<executor-service>

<core-pool-size>16</core-pool-size>
<max-pool-size>64</max-pool-size>
<keep-alive-seconds>60</keep-alive-seconds>
</executor-service>
<queue name="default">


<max-size-per-jvm>0</max-size-per-jvm>

<backing-map-ref>default</backing-map-ref>
</queue>

<map name="default">

<backup-count>1</backup-count>

<async-backup-count>0</async-backup-count>



<time-to-live-seconds>0</time-to-live-seconds>


<max-idle-seconds>0</max-idle-seconds>

<eviction-policy>NONE</eviction-policy>


<max-size policy="cluster_wide_map_size">0</max-size>


<eviction-percentage>25</eviction-percentage>

<merge-policy>hz.ADD_NEW_ENTRY</merge-policy>
</map>

<map name="near-*">

<backup-count>1</backup-count>



<async-backup-count>0</async-backup-count>


<time-to-live-seconds>0</time-to-live-seconds>


<max-idle-seconds>0</max-idle-seconds>

<eviction-policy>NONE</eviction-policy>


<max-size policy="cluster_wide_map_size">0</max-size>


<eviction-percentage>25</eviction-percentage>

<merge-policy>hz.ADD_NEW_ENTRY</merge-policy>

<near-cache>

<max-size>250</max-size>


<time-to-live-seconds>0</time-to-live-seconds>


<max-idle-seconds>120</max-idle-seconds>



<eviction-policy>LFU</eviction-policy>


<invalidate-on-change>true</invalidate-on-change>

</near-cache>
</map>





</hazelcast>

To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/flI9hH_GyIQJ.

Jack Gould

unread,

Mar 4, 2013, 3:34:55 PM3/4/13

to haze...@googlegroups.com

This is an old post, but I did finally figure out exactly what caused the issue. In retrospect, it seems so obvious. Sigh. I'm posting this, in case others encounter a similar issue.

The issue turned out to be that the JVMs running "stand-alone" Hazelcast nodes included the "-Djava.net.preferIPv4Stack=true" argument, but the Weblogic server instances were launched without that. OS X was basically allowing one process to open port 5701 on an IPv4 address, and another process to open port 5701 on an IPv6 address.

Moral of the story... make sure all of you JVM options agree with each other.

- Jack

On Monday, October 22, 2012 12:23:26 PM UTC-4, Jack Gould wrote:

Starting a Hazelcast instance in a Weblogic 10.3.5 server results in errors when joining an existing Hazelcast cluster. When the Weblogic server starts, it does NOT auto increment to port 5704. Instead, it connects as 5701 and disrupts the entire cluster.

This happens in both multicast and IP configurations (have not tried interface-based configuration). Putting hazelcast JARs in Weblogic classpath or inside deployed EAR's APP-INF/lib results in the same behavior.

[ ... ]

Reply all

Reply to author

Forward