Problem with clustering configuration in OpenShift

369 views
Skip to first unread message

John Sanda

unread,
Oct 9, 2017, 3:08:49 PM10/9/17
to jgroups-dev
I am running a WildFly 10 application in OpenShift that uses JGroups/Infinispan for clustering. The Infinispan version is 8.1.8.Final. The JGroups version is 3.6.4.Final. I run into problems when two replicas are started at the same time. I see the following in my logs:

WARN  [org.jgroups.protocols.ASYM_ENCRYPT] (thread-11,ee,hawkular-metrics-4npd3) hawkular-metrics-4npd3: unrecognized cipher; discarding message from hawkular-metrics-j9d1q
...
WARN  [org.jgroups.protocols.openshift.KUBE_PING] (thread-2,ee,hawkular-metrics-4npd3) Error sending ping request: url [http://10.129.0.17:8888], clusterName [ee], attempts[1]: Connection refused (Connection refused)
...
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-3,ee,hawkular-metrics-4npd3) ISPN000094: Received new cluster view for channel server: [hawkular-metrics-4npd3|1] (2) [hawkular-metrics-4npd3, hawkular-metrics-j9d1q]
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-3,ee,hawkular-metrics-4npd3) ISPN000094: Received new cluster view for channel hawkular-metrics: [hawkular-metrics-4npd3|1] (2) [hawkular-metrics-4npd3, hawkular-metrics-j9d1q]
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-3,ee,hawkular-metrics-4npd3) ISPN000094: Received new cluster view for channel web: [hawkular-metrics-4npd3|1] (2) [hawkular-metrics-4npd3, hawkular-metrics-j9d1q]
INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-3,ee,hawkular-metrics-4npd3) ISPN000094: Received new cluster view for channel hawkular-alerts: [hawkular-metrics-4npd3|1] (2) [hawkular-metrics-4npd3, hawkular-metrics-j9d1q]
ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (thread-3,ee,hawkular-metrics-4npd3) ISPN000136: Error executing command PrepareCommand, writing keys [buckets, previousPartition, currentPartition]: org.infinispan.util.concurrent.TimeoutException: Replication timeout for hawkular-metrics-j9d1q
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:772)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:615)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47)
	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:16)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
 [0m [31m2017-08-08 01:28:38,169 ERROR [org.infinispan.transaction.impl.TransactionCoordinator] (thread-3,ee,hawkular-metrics-4npd3) ISPN000097: Error while processing a prepare in a single-phase transaction: org.infinispan.util.concurrent.TimeoutException: Replication timeout for hawkular-metrics-j9d1q
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:772)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:615)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47)
	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:16)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


 Here is my configuration from standalone.xml:

        <subsystem xmlns="urn:jboss:domain:jgroups:4.0">
            <channels default="ee">
                <channel name="ee" stack="tcp"/>
            </channels>
            <stacks>
                <stack name="udp">
                    <transport type="UDP" socket-binding="jgroups-udp"/>
                    <protocol type="kubernetes.KUBE_PING"/>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="ASYM_ENCRYPT">
                        <property name="encrypt_entire_message">true</property>
                        <property name="sym_keylength">128</property>
                        <property name="sym_algorithm">AES</property>
                        <property name="asym_keylength">512</property>
                        <property name="asym_algorithm">RSA</property>
                    </protocol>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="UFC"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG2"/>
                    <protocol type="AUTH">
                        <property name="auth_class">org.jgroups.auth.MD5Token</property>
                        <property name="auth_value">${jgroups.password}</property>
                        <property name="token_hash">MD5</property>
                    </protocol>
                </stack>
                <stack name="tcp">
                    <transport type="TCP" socket-binding="jgroups-tcp"/>
                    <protocol type="kubernetes.KUBE_PING" socket-binding="jgroups-mping"/>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
                    <protocol type="FD"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="ASYM_ENCRYPT">
                        <property name="encrypt_entire_message">true</property>
                        <property name="sym_keylength">128</property>
                        <property name="sym_algorithm">AES</property>
                        <property name="asym_keylength">512</property>
                        <property name="asym_algorithm">RSA</property>
                    </protocol>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG2"/>
                    <protocol type="AUTH">
                        <property name="auth_class">org.jgroups.auth.MD5Token</property>
                        <property name="auth_value">${jgroups.password}</property>
                        <property name="token_hash">MD5</property>
                    </protocol>
                </stack>
            </stacks>
        </subsystem>
        ...
    <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
        <socket-binding name="jgroups-mping" interface="private" port="0" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="45700"/>
        <socket-binding name="jgroups-tcp" interface="private" port="7600"/>
        <socket-binding name="jgroups-tcp-fd" interface="private" port="57600"/>
        <socket-binding name="jgroups-udp" interface="private" port="55200" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="45688"/>
        <socket-binding name="jgroups-udp-fd" interface="private" port="54200"/>
        <socket-binding name="modcluster" port="0" multicast-address="224.0.1.105" multicast-port="23364"/>
    </socket-binding-group>

Are there any problems with my configuration? If I restart the pod, then I do not encounter the above errors in the logs.

Thanks

John

Bela Ban

unread,
Oct 10, 2017, 6:08:01 AM10/10/17
to jgrou...@googlegroups.com
Hmm, this is the old version of KUBE_PING, but it seems Kubernetes
cannot be contacted:

http://10.129.0.17:8888

You may have to configure KUBE_PING with a port of 8888. Best use
kubectl exec -it container bash and check which address:port Kubernetes
master uses.

I'm afraid I don't really know the old code; the new KUBE_PING is at [1]
but requires JGroups 4.x.

[1] https://github.com/jgroups-extras/jgroups-kubernetes
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/46287f84-bf85-498f-aa30-399b5dc95367%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/46287f84-bf85-498f-aa30-399b5dc95367%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Bela Ban, JGroups lead (http://www.jgroups.org)

John Sanda

unread,
Oct 17, 2017, 4:58:17 PM10/17/17
to jgroups-dev
Hi Bela,

The reason for the old version of KUBE_PING is because we are running on EAP. I have not been able to reproduce on the upstream version which uses a more recent version of ISPN (I am not sure of the version of JGroups off hand though). The error happens when I scale down to zero pods and back up to 2. The "error sending ping request..." message is from one pod trying to contact the other replica. At one point I was thinking that the problem stemmed from scaling down and then scaling back up while there were pods in the terminating state, but I have run into this when there are no terminating pods. Any suggestions on configuration changes I might want to try?

Bela Ban

unread,
Oct 19, 2017, 4:21:48 AM10/19/17
to jgrou...@googlegroups.com
Hi John,

I'm afraid I'm not familar with the old version of KUBE_PING, but
perhaps the openshift team can help you out? Or Sebastian might be able
to help you, too
> > an email to jgroups-dev...@googlegroups.com <javascript:>
> > <mailto:jgroups-dev...@googlegroups.com <javascript:>>.
> > To post to this group, send email to jgrou...@googlegroups.com
> <javascript:>
> > <mailto:jgrou...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/jgroups-dev/46287f84-bf85-498f-aa30-399b5dc95367%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/c323f385-d96d-44ae-a518-72c8c28b3b7a%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/c323f385-d96d-44ae-a518-72c8c28b3b7a%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Bela Ban | http://www.jgroups.org

Reply all
Reply to author
Forward
0 new messages