Downtime during deployment due to Infinispan sync errors (HA in Kubernetes)

629 views
Skip to first unread message

Imre Kelényi

unread,
Jul 14, 2021, 6:40:05 PM7/14/21
to Keycloak User
Hi,

We are hosting Keycloak 14 in a K8s cluster in HA mode (3 instances) set up via the codecentric Helm chart. Everything is fine except that if a pod is terminated (like during redeploying the service), we start to receive internal server errors from the other pods:

2021-07-14 21:51:42,222 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p17-t1) ISPN000136: Error executing command RemoveCommand on Cache 'authenticationSessions', writing keys [abaa03b2-a829-4a7b-8a36-afe9e63cd82d]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1038 from auth-service-1

It seems that the running pods serving the requests are still trying to sync with the terminated pod's Infinispan instance (which obviously fails). This results in errors for the clients trying to request access tokens from the service (= downtime).

Any ideas what could be wrong?

Imre Kelényi

unread,
Jul 15, 2021, 3:03:45 AM7/15/21
to Keycloak User
Also, what's even weirder is that if I set cache owners count to 1 (or omit the setting entirely to use the default values), I still see the Infinispan error logs when one of the pods goes down. Isn't it so that with 1 cache owner, Keycloak instances should not sync their caches?

    - name: CACHE_OWNERS_COUNT
      value: "1"
    - name: CACHE_OWNERS_AUTH_SESSIONS_COUNT
      value: "1"

Ben Shaver

unread,
Jul 16, 2021, 3:51:59 AM7/16/21
to Keycloak User
Even if you set the cache owner to 1 there is a replication.
A brief explanation about distributed cache:

Let say you have 2 instances of Keycloak, and for example the session cache object is set to be distributed-cache with owner 1.
So when a user is trying to login the cache object is created in either of the instances (even if the requests go to instance A it could be made in instance B).
So yeah the owner is one, but the infinispan cache is a key value cache, so even tho we already have a value in one instance the key it self is replicated no matter what the value of owners is.

ב-יום חמישי, 15 ביולי 2021 בשעה 10:03:45 UTC+3, imre.k...@gmail.com כתב/ה:

Imre Kelényi

unread,
Jul 17, 2021, 10:08:15 AM7/17/21
to Keycloak User
Thanks, I've also come to the same conclusion :(

The solution I'm looking into now is setting up a separate Infinispan cluster and making Keycloak use that as the caching layer (so while pods are terminated the cache cluster can still stay operational).

Ben Shaver

unread,
Jul 17, 2021, 1:36:15 PM7/17/21
to Keycloak User
If this is your goal so check out Data Grid.
I think thats what you need.
ב-יום שבת, 17 ביולי 2021 בשעה 17:08:15 UTC+3, imre.k...@gmail.com כתב/ה:

Susant Padhi

unread,
Jul 1, 2022, 11:59:30 AM7/1/22
to Keycloak User
Hello, would you please share your solution ? I am looking for keycloak HA on kubernetes an came  across https://keycloak.discourse.group/t/cannot-login-administrator-console-when-running-keycloak-with-replicas-1-on-kubernetes/9022/5. I have question is what would be the right solution for caching ?  I am using keyclok 14.0.0 is the keyclok.cli is necessary to include in docker?
Reply all
Reply to author
Forward
0 new messages