Hi Paul, thanks for you patience.
So it seems modcluster was a red herring for me.
I just want ha-singleton and session replication.
Ok, let me try again please:
I have a working ha-singleton failover between hosts node1 10.10.10.2 and node2
10.10.10.3:
21:20:59,325 INFO [org.infinispan.CLUSTER] (thread-11,ejb,node1) [Context=default] ISPN100010: Finished rebalance with members [node1, node2], topology id 5
Now I log into a new web session of node1.
Then I access the http port of node2. Often the node2 has no session and acts completely independent.
Sometimes node2 stalls for 15s then sais:
21:21:45,163 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (non-blocking-thread--p8-t15) ISPN000136: Error executing command GetKeyValueCommand on Cache 'xxxportal.ear.xxxportal-war.war', writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey(ZrZPPP5MOuDVxXokBfAO8bZ-MmAHdlapH9GpG9ud) and requestor GlobalTransaction{id=13, addr=node2, remote=false, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=12, addr=node2, remote=false, xid=null, internalId=-1}
What could be the cause?
I also saw your post of 2021 on such a message and changed isolation REPEATABLE_READ to READ_COMMITTED.
Now it seems I don't get this exception anymore, but still the nodes share no session.
(using udp setup, I was able to change e.g. a filter in a list of my webapp, and the other node would reflect this upon reload, so I know the app could do it)
node1 can connect to 10.10.10.3 port 7600 alright, other direction also, network-wise...
(shutting down node2 I just saw
22:22:28,408 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger-10,ejb,node2) node2: socket to node1 was closed gracefully
so I guess the network is really ok)
config: (pls let me know if you need to see more)
<subsystem xmlns="urn:jboss:domain:jgroups:8.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="TCPPING" socket-binding="jgroups-tcp">
<property name="initial_hosts">10.10.10.2[7600],10.10.10.3[7600],10.10.10.4[7600]</property>
</socket-protocol>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
<socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
<socket-binding name="ajp" port="${jboss.ajp.port:8009}"/>
<socket-binding name="http" port="${jboss.http.port:8080}"/>
<socket-binding name="https" port="${jboss.https.port:8443}"/>
<socket-binding name="jgroups-tcp" interface="private" port="7600"/>
<socket-binding name="jgroups-tcp-fd" interface="private" port="57600"/>
...
as I was unable to find a current working example for wf26, had to patch together stuff I found for older
versions and adapt the syntax/schema. maybe my tcpstack ist incomplete?
I also inititally had a different socket-binding="jgroups-tcpping" at the socket-protocol,
but such a binding is defined nowhere, so I changed to jgroups-tcp. Do I need a dedicated one?
BTW I start using
./bin/standalone.sh --debug 18787 -Djboss.server.base.dir=standalone --server-config=standalone.xml -Djboss.socket.binding.port-offset=0 -
Djboss.node.name=node1 -Djboss.bind.address.private=10.10.10.2 -Djboss.bind.address.management=10.10.10.2
(my standalone.xml is really a standalone-ha.xml)
Does this make any sense to you?
Thanks & Cheers Tom.