Re: Weblogic Issue: Error joining cluster (2.4-ee)

785 views
Skip to first unread message

Mehmet Dogan

unread,
Oct 23, 2012, 9:33:03 AM10/23/12
to haze...@googlegroups.com
Jack,

Can you check if you set 'hazelcast.socket.bind.any' parameter in your standalone nodes and weblogic server? Two sockets using the same interface should not be able to bind the same port, but if they use different network interfaces or one uses a specific interface (127.0.0.1) and the other binds to any local (0.0.0.0) then they can. 

@mmdogan




On Mon, Oct 22, 2012 at 7:23 PM, Jack Gould <jjg...@gmail.com> wrote:
Starting a Hazelcast instance in a Weblogic 10.3.5 server results in errors when joining an existing Hazelcast cluster.  When the Weblogic server starts, it does NOT auto increment to port 5704.  Instead, it connects as 5701 and disrupts the entire cluster.  

This happens in both multicast and IP configurations (have not tried interface-based configuration).  Putting hazelcast JARs in Weblogic classpath or inside deployed EAR's APP-INF/lib results in the same behavior.

However, if I start Weblogic, first, and then run the stand-alone nodes, the cluster operates as expected.  Details below.  Has anyone else experienced this issue?  

Thanks, in advance, for any help!

- Jack Gould

Given a three-node cluster, each running stand-alone in a bash terminal:

#====[ Before starting Weblogic ]=============================================#
Oct 22, 2012 8:36:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] 

Members [3] {
Member [127.0.0.1]:5701
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 this
}

Oct 22, 2012 8:36:17 AM com.hazelcast.impl.LifecycleServiceImpl
INFO: [127.0.0.1]:5703 [dev] Address[127.0.0.1]:5703 is STARTED

When the Weblogic server starts, it does NOT auto increment to port 5704.  Instead, it connects as 5701 and disrupts the entire cluster.  Below are some relevant messages from Weblogic console, and the Hazelcast terminal windows:

From Weblogic console:

Oct 22, 2012 8:59:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5701 [dev] 

Members [3] {
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703
Member [127.0.0.1]:5701 this
}

Oct 22, 2012 8:59:27 AM com.hazelcast.impl.FactoryImpl
WARNING: [127.0.0.1]:5701 [dev] null
java.util.concurrent.TimeoutException
at com.hazelcast.impl.ExecutorManager$MemberCall.get(ExecutorManager.java:611)


From the terminal running the original 5701 node:

#====[ Before starting Weblogic ]=============================================#
Oct 22, 2012 8:36:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5701 [dev] 

Members [3] {
Member [127.0.0.1]:5701 this
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703
}

#====[ After starting Weblogic ]==============================================#
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.Connection
INFO: [127.0.0.1]:5701 [dev] Connection [Address[127.0.0.1]:5703] lost. Reason: java.io.EOFException[null]
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.ReadHandler
WARNING: [127.0.0.1]:5701 [dev] hz._hzInstance_1_dev.IO.thread-2 Closing socket to endpoint Address[127.0.0.1]:5703, Cause:java.io.EOFException
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.ConnectionManager
INFO: [127.0.0.1]:5701 [dev] 51947 accepted socket connection from /127.0.0.1:5703
Oct 22, 2012 8:59:14 AM com.hazelcast.cluster.MembersUpdateCall
WARNING: [127.0.0.1]:5701 [dev] Received MembersUpdateCall from Address[127.0.0.1]:5702, but current master is Address[127.0.0.1]:5701
Oct 22, 2012 8:59:29 AM com.hazelcast.impl.PartitionManager
WARNING: [127.0.0.1]:5701 [dev] This is the master node and received a PartitionRuntimeState from Address[127.0.0.1]:5702. Ignoring incoming state! 
[ above message repeats every 10 seconds until shutdown ]


From the terminal running the 5703 node:

#====[ Before starting Weblogic ]=============================================#
Oct 22, 2012 8:36:15 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] 

Members [3] {
Member [127.0.0.1]:5701
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 this
}

Oct 22, 2012 8:36:17 AM com.hazelcast.impl.LifecycleServiceImpl
INFO: [127.0.0.1]:5703 [dev] Address[127.0.0.1]:5703 is STARTED
Oct 22, 2012 8:36:17 AM com.hazelcast.impl.management.ManagementCenterService
INFO: [127.0.0.1]:5703 [dev] Hazelcast will connect to Management Center on address: http://127.0.0.1:8080/mancenter/

#====[ After starting Weblogic ]==============================================#
Oct 22, 2012 8:59:07 AM com.hazelcast.nio.SocketAcceptor
INFO: [127.0.0.1]:5703 [dev] 5703 is accepting socket connection from /127.0.0.1:51938
Oct 22, 2012 8:59:07 AM com.hazelcast.nio.ConnectionManager
INFO: [127.0.0.1]:5703 [dev] 5703 accepted socket connection from /127.0.0.1:51938
Oct 22, 2012 8:59:07 AM com.hazelcast.nio.ConnectionManager
WARNING: [127.0.0.1]:5703 [dev] Connection [/127.0.0.1:5701 -> Address[127.0.0.1]:5701] live=true, client=false, type=MEMBER is already bound  to Address[127.0.0.1]:5701
Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
WARNING: [127.0.0.1]:5703 [dev] New join request has been received from an existing endpoint! => Member [127.0.0.1]:5701 Removing old member and processing join request...
Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] Removing Address Address[127.0.0.1]:5701
Oct 22, 2012 8:59:08 AM com.hazelcast.impl.Node
INFO: [127.0.0.1]:5703 [dev] ** setting master address to Address[127.0.0.1]:5702
Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] 

Members [2] {
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 this
}

Oct 22, 2012 8:59:08 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] Removing Address Address[127.0.0.1]:5701
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.Connection
INFO: [127.0.0.1]:5703 [dev] Connection [Address[127.0.0.1]:5701] lost. Reason: Explicit close
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.SocketAcceptor
INFO: [127.0.0.1]:5703 [dev] 5703 is accepting socket connection from /127.0.0.1:51947
Oct 22, 2012 8:59:08 AM com.hazelcast.nio.ConnectionManager
INFO: [127.0.0.1]:5703 [dev] 5703 accepted socket connection from /127.0.0.1:51947
Oct 22, 2012 8:59:14 AM com.hazelcast.cluster.ClusterManager
INFO: [127.0.0.1]:5703 [dev] 

Members [3] {
Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 this
Member [127.0.0.1]:5701
}

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/cUL6fYxZNbsJ.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.

Message has been deleted

Mehmet Dogan

unread,
Oct 24, 2012, 8:14:58 AM10/24/12
to haze...@googlegroups.com
Cause of second issue is that second instance is selected '10.20.84.213' as its local address and when first node tries to connect, second replies with wrong bind request message. Default value of 'hazelcast.socket.bind.any' is true, so second node accepts connections to all local interfaces. You should either define '10.20.84.213' in member list or you should configure correct interface.

<tcp-ip enabled="true">
    <interface>127.0.0.1</interface>
    <interface>10.20.84.213</interface>
</tcp-ip>

OR

<interfaces enabled="true">
    <interface>10.20.84.*</interface>
</interfaces>


Can you also post debug/finest level node start logs of first issue? It should look like;

15:14:03,646  INFO [AddressPicker] - Interfaces is enabled, trying to pick one address matching to one of: [127.0.0.1]
15:14:03,647  WARN [AddressPicker] - Picking loopback address [127.0.0.1]; setting 'java.net.preferIPv4Stack' to true.
15:14:03,656 DEBUG [AddressPicker] - inet reuseAddress:true
15:14:03,658 DEBUG [AddressPicker] - Trying to bind inet socket address:/127.0.0.1:5701
15:14:03,660 DEBUG [AddressPicker] - Bind successful to inet socket address:/127.0.0.1:5701
15:14:03,661  INFO [AddressPicker] - Picked Address[127.0.0.1]:5701, using socket ServerSocket[addr=/127.0.0.1,localport=5701], bind any local is false
15:14:03,661 DEBUG [AddressPicker] - Using public address the same as the bind address. Address[127.0.0.1]:5701

@mmdogan




On Tue, Oct 23, 2012 at 6:37 PM, Jack Gould <jjg...@gmail.com> wrote:
Mehmet,

    Thanks for the reply.  I do not have that variable defined.  Below is my complete hazelcast.xml file (without license).  It is basically the 2.4-ee version, modified for IP-only connectivity, with the addition of a near-cache map configuration.  There are no hazelcast properties defined as JVM arguments for my Weblogic server.  I have deployed hazelcast.xml in a directory added to Weblogic's startup CLASSPATH.

    While trying to work around the original issue, I am now encountering a different issue.  Running two Weblogic instances, each as hazelcast nodes with the below configuraiton, I get the following messages when the second instances creates the second instance:

From console of Weblogic 1:

Oct 23, 2012 11:17:35 AM com.hazelcast.nio.ConnectionManager
INFO: [127.0.0.1]:5701 [dev] 58369 accepted socket connection from /127.0.0.1:5702
Oct 23, 2012 11:17:35 AM com.hazelcast.nio.Connection
INFO: [127.0.0.1]:5701 [dev] Connection [Address[127.0.0.1]:5702] lost. Reason: java.io.EOFException[null]


From console of Weblogic 2:

[10.20.84.213]:5702 [dev] Wrong bind request from Address[127.0.0.1]:5701! This node is not requested endpoint: Address[127.0.0.1]:5702


Hazelcast.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!--
  ~ Copyright (c) 2008-2012, Hazel Bilisim Ltd. All Rights Reserved.
  ~
  ~ Licensed under the Apache License, Version 2.0 (the "License");
  ~ you may not use this file except in compliance with the License.
  ~ You may obtain a copy of the License at
  ~
  ~
  ~ Unless required by applicable law or agreed to in writing, software
  ~ distributed under the License is distributed on an "AS IS" BASIS,
  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  ~ See the License for the specific language governing permissions and
  ~ limitations under the License.
  -->

<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-2.4.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <license-key>**************************</license-key>
    <group>
        <name>dev</name>
        <password>dev-pass</password>
    </group>
    <management-center enabled="true">http://127.0.0.1:8080/mancenter</management-center>
    <network>
        <port auto-increment="true">5701</port>
        <outbound-ports>
            <!--
            Allowed port range when connecting to other nodes.
            0 or * means use system provided port.
            -->
            <ports>0</ports>
        </outbound-ports>
        <join>
            <multicast enabled="false">
                <multicast-group>224.2.2.3</multicast-group>
                <multicast-port>54327</multicast-port>
            </multicast>
            <tcp-ip enabled="true">
                <interface>127.0.0.1</interface>
            </tcp-ip>
            <aws enabled="false">
                <access-key>my-access-key</access-key>
                <secret-key>my-secret-key</secret-key>
                <!--optional, default is us-east-1 -->
                <region>us-west-1</region>
                <!--optional, default is ec2.amazonaws.com. If set, region shouldn't be set as it will override this property -->
                <hostHeader>ec2.amazonaws.com</hostHeader>
                <!-- optional, only instances belonging to this group will be discovered, default will try all running instances -->
                <security-group-name>hazelcast-sg</security-group-name>
                <tag-key>type</tag-key>
                <tag-value>hz-nodes</tag-value>
            </aws>
        </join>
        <interfaces enabled="false">
            <interface>10.10.1.*</interface>
        </interfaces>
        <ssl enabled="false" />
        <socket-interceptor enabled="false" />
        <symmetric-encryption enabled="false">
            <!--
               encryption algorithm such as
               DES/ECB/PKCS5Padding,
               PBEWithMD5AndDES,
               AES/CBC/PKCS5Padding,
               Blowfish,
               DESede
            -->
            <algorithm>PBEWithMD5AndDES</algorithm>
            <!-- salt value to use when generating the secret key -->
            <salt>thesalt</salt>
            <!-- pass phrase to use when generating the secret key -->
            <password>thepass</password>
            <!-- iteration count to use when generating the secret key -->
            <iteration-count>19</iteration-count>
        </symmetric-encryption>
        <asymmetric-encryption enabled="false">
            <!-- encryption algorithm -->
            <algorithm>RSA/NONE/PKCS1PADDING</algorithm>
            <!-- private key password -->
            <keyPassword>thekeypass</keyPassword>
            <!-- private key alias -->
            <keyAlias>local</keyAlias>
            <!-- key store type -->
            <storeType>JKS</storeType>
            <!-- key store password -->
            <storePassword>thestorepass</storePassword>
            <!-- path to the key store -->
            <storePath>keystore</storePath>
        </asymmetric-encryption>
    </network>
    <partition-group enabled="false"/>
    <executor-service>
        <core-pool-size>16</core-pool-size>
        <max-pool-size>64</max-pool-size>
        <keep-alive-seconds>60</keep-alive-seconds>
    </executor-service>
    <queue name="default">
        <!--
            Maximum size of the queue. When a JVM's local queue size reaches the maximum,
            all put/offer operations will get blocked until the queue size
            of the JVM goes down below the maximum.
            Any integer between 0 and Integer.MAX_VALUE. 0 means
            Integer.MAX_VALUE. Default is 0.
        -->
        <max-size-per-jvm>0</max-size-per-jvm>
        <!--
            Name of the map configuration that will be used for the backing distributed
            map for this queue.
        -->
        <backing-map-ref>default</backing-map-ref>
    </queue>
    <map name="default">
        <!--
            Number of backups. If 1 is set as the backup-count for example,
            then all entries of the map will be copied to another JVM for
            fail-safety. 0 means no backup.
        -->
        <backup-count>1</backup-count>
        <!--
            Number of async backups. 0 means no backup.
        -->
        <async-backup-count>0</async-backup-count>
        <!--
Maximum number of seconds for each entry to stay in the map. Entries that are
older than <time-to-live-seconds> and not updated for <time-to-live-seconds>
will get automatically evicted from the map.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
        <time-to-live-seconds>0</time-to-live-seconds>
        <!--
Maximum number of seconds for each entry to stay idle in the map. Entries that are
idle(not touched) for more than <max-idle-seconds> will get
automatically evicted from the map. Entry is touched if get, put or containsKey is called.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
        <max-idle-seconds>0</max-idle-seconds>
        <!--
            Valid values are:
            NONE (no eviction),
            LRU (Least Recently Used),
            LFU (Least Frequently Used).
            NONE is the default.
        -->
        <eviction-policy>NONE</eviction-policy>
        <!--
            Maximum size of the map. When max size is reached,
            map is evicted based on the policy defined.
            Any integer between 0 and Integer.MAX_VALUE. 0 means
            Integer.MAX_VALUE. Default is 0.
        -->
        <max-size policy="cluster_wide_map_size">0</max-size>
        <!--
            When max. size is reached, specified percentage of
            the map will be evicted. Any integer between 0 and 100.
            If 25 is set for example, 25% of the entries will
            get evicted.
        -->
        <eviction-percentage>25</eviction-percentage>
        <!--
            While recovering from split-brain (network partitioning),
            map entries in the small cluster will merge into the bigger cluster
            based on the policy set here. When an entry merge into the
            cluster, there might an existing entry with the same key already.
            Values of these entries might be different for that same key.
            Which value should be set for the key? Conflict is resolved by
            the policy set here. Default policy is hz.ADD_NEW_ENTRY

            There are built-in merge policies such as
            hz.NO_MERGE      ; no entry will merge.
            hz.ADD_NEW_ENTRY ; entry will be added if the merging entry's key
                               doesn't exist in the cluster.
            hz.HIGHER_HITS   ; entry with the higher hits wins.
            hz.LATEST_UPDATE ; entry with the latest update wins.
        -->
        <merge-policy>hz.ADD_NEW_ENTRY</merge-policy>
    </map>

    <map name="near-*">
        <!--
            Number of backups. If 1 is set as the backup-count for example,
            then all entries of the map will be copied to another JVM for
            fail-safety. 0 means no backup.
        -->
        <backup-count>1</backup-count>
        <!--
            Number of async backups. 0 means no backup.
        -->
        <async-backup-count>0</async-backup-count>
        <!--
Maximum number of seconds for each entry to stay in the map. Entries that are
older than <time-to-live-seconds> and not updated for <time-to-live-seconds>
will get automatically evicted from the map.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
        <time-to-live-seconds>0</time-to-live-seconds>
        <!--
Maximum number of seconds for each entry to stay idle in the map. Entries that are
idle(not touched) for more than <max-idle-seconds> will get
automatically evicted from the map. Entry is touched if get, put or containsKey is called.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
        <max-idle-seconds>0</max-idle-seconds>
        <!--
            Valid values are:
            NONE (no eviction),
            LRU (Least Recently Used),
            LFU (Least Frequently Used).
            NONE is the default.
        -->
        <eviction-policy>NONE</eviction-policy>
        <!--
            Maximum size of the map. When max size is reached,
            map is evicted based on the policy defined.
            Any integer between 0 and Integer.MAX_VALUE. 0 means
            Integer.MAX_VALUE. Default is 0.
        -->
        <max-size policy="cluster_wide_map_size">0</max-size>
        <!--
            When max. size is reached, specified percentage of
            the map will be evicted. Any integer between 0 and 100.
            If 25 is set for example, 25% of the entries will
            get evicted.
        -->
        <eviction-percentage>25</eviction-percentage>
        <!--
            While recovering from split-brain (network partitioning),
            map entries in the small cluster will merge into the bigger cluster
            based on the policy set here. When an entry merge into the
            cluster, there might an existing entry with the same key already.
            Values of these entries might be different for that same key.
            Which value should be set for the key? Conflict is resolved by
            the policy set here. Default policy is hz.ADD_NEW_ENTRY

            There are built-in merge policies such as
            hz.NO_MERGE      ; no entry will merge.
            hz.ADD_NEW_ENTRY ; entry will be added if the merging entry's key
                               doesn't exist in the cluster.
            hz.HIGHER_HITS   ; entry with the higher hits wins.
            hz.LATEST_UPDATE ; entry with the latest update wins.
        -->
        <merge-policy>hz.ADD_NEW_ENTRY</merge-policy>

        <near-cache>
            <!--
                Maximum size of the near cache. When max size is reached,
                cache is evicted based on the policy defined.
                Any integer between 0 and Integer.MAX_VALUE. 0 means
                Integer.MAX_VALUE. Default is 0.
            -->
            <max-size>250</max-size>
            <!--
                Maximum number of seconds for each entry to stay in the near cache. Entries that are
                older than <time-to-live-seconds> will get automatically evicted from the near cache.
                Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
            -->
            <time-to-live-seconds>0</time-to-live-seconds>

            <!--
                Maximum number of seconds each entry can stay in the near cache as untouched (not-read).
                Entries that are not read (touched) more than <max-idle-seconds> value will get removed
                from the near cache.
                Any integer between 0 and Integer.MAX_VALUE. 0 means
                Integer.MAX_VALUE. Default is 0.
            -->
            <max-idle-seconds>120</max-idle-seconds>

            <!--
                Valid values are:
                NONE (no extra eviction, <time-to-live-seconds> may still apply),
                LRU  (Least Recently Used),
                LFU  (Least Frequently Used).
                NONE is the default.
                Regardless of the eviction policy used, <time-to-live-seconds> will still apply.
            -->
            <eviction-policy>LFU</eviction-policy>

            <!--
                Should the cached entries get evicted if the entries are changed (updated or removed).
                true of false. Default is true.
            -->
            <invalidate-on-change>true</invalidate-on-change>

        </near-cache>
    </map>

    <!-- Add your own semaphore configurations here:
        <semaphore name="default">
            <initial-permits>10</initial-permits>
            <semaphore-factory enabled="true">
                <class-name>com.acme.MySemaphoreFactory</class-name>
            </semaphore-factory>
        </semaphore>
    -->

    <!-- Add your own map merge policy implementations here:
    <merge-policies>
            <map-merge-policy name="MY_MERGE_POLICY">
            <class-name>com.acme.MyOwnMergePolicy</class-name>
        </map-merge-policy>
    </merge-policies>
    -->

</hazelcast>
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/flI9hH_GyIQJ.

Jack Gould

unread,
Mar 4, 2013, 3:34:55 PM3/4/13
to haze...@googlegroups.com
This is an old post, but I did finally figure out exactly what caused the issue.  In retrospect, it seems so obvious.  Sigh.  I'm posting this, in case others encounter a similar issue.

The issue turned out to be that the JVMs running "stand-alone" Hazelcast nodes included the "-Djava.net.preferIPv4Stack=true" argument, but the Weblogic server instances were launched without that.  OS X was basically allowing one process to open port 5701 on an IPv4 address, and another process to open port 5701 on an IPv6 address.

Moral of the story... make sure all of you JVM options agree with each other.

- Jack


On Monday, October 22, 2012 12:23:26 PM UTC-4, Jack Gould wrote:
Starting a Hazelcast instance in a Weblogic 10.3.5 server results in errors when joining an existing Hazelcast cluster.  When the Weblogic server starts, it does NOT auto increment to port 5704.  Instead, it connects as 5701 and disrupts the entire cluster.  

This happens in both multicast and IP configurations (have not tried interface-based configuration).  Putting hazelcast JARs in Weblogic classpath or inside deployed EAR's APP-INF/lib results in the same behavior.

[ ... ]
Reply all
Reply to author
Forward
0 new messages