I'm running cluster of 3 instances of docker Keycloak 13.0.1 on AWS Fargate.
Recently we face issue with Infinispan when LB is killing one of Keycloak tasks and new one should be spawned.
There's so much going on there out of my expertise that IDK what might be wrong.
There is bunch of ERRORs in Keycloak logs in order below (some few secs and some few minutes ofter another):
First
[0m[31m12:01:16,825 ERROR [org.jboss.modcluster] (ServerService Thread Pool -- 58) MODCLUSTER000034: Failed to start advertise listener: java.net.SocketException: bad argument for IP_MULTICAST_IF: address not bound to any interface
[0m[31m12:05:28,303 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 58) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache-container.keycloak: org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache-container.keycloak: org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Initial state transfer timed out for cache org.infinispan.CONFIG on ip-172-1-1-1
Bunch of these:
[0m[31m12:05:28,750 ERROR [
org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([
[0m[31m12:05:28,976 ERROR [
org.jboss.as] (Controller Boot Thread) WFLYSRV0026: Keycloak 13.0.1 (WildFly Core 15.0.1.Final) started (with errors) in 264013ms - Started 726 of 1107 services (52 services failed or missing dependencies, 695 services are lazy, passive or on-demand)
[0m[31m12:05:29,222 ERROR [org.infinispan.topology.LocalTopologyManagerImpl] (timeout-thread--p17-t1) ISPN000230: Failed to start rebalance for cache http-remoting-connector: java.util.concurrent.CompletionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 6 from ip-172-2-2-2
Then TERM signal 2021-06-18T12:39:07.031
And then there are dozens of ERRORs like these:
[0m[31m12:38:29,983 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (non-blocking-thread--p14-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache 'work', writing keys [task::ClearExpiredClientInitialAccessTokens]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key task::ClearExpiredClientInitialAccessTokens and requestor CommandInvocation:ip-172-3-3-3:574814. Lock is held by CommandInvocation:ip-172-2-2-2:574663
[0m[31m12:38:34,618 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p18-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache 'work', writing keys [task::ClearExpiredClientInitialAccessTokens]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1570820 from ip-172-4-4-4
[0m[31m12:38:37,022 ERROR [io.undertow.request] (default task-1304) UT005023: Exception handling request to /auth/realms/myrealm/protocol/openid-connect/auth: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1570865 from ip-172-5-5-5[0m
[31m12:38:39,477 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p18-t1) ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1570904 from ip-172-6-6-6
That results in not responsive Keycloak cluster. Login form is displayed but every action fails with internal error and nothing helps.
I have to shutdown everything, clear Infinispan database tables and that allows for fresh start of tasks and estabilishing cluster - but with the price of the huge downtime.
I would be grateful for help here. Thanks.
Best,
Szymon