Hi,
I am running a Keycloak cluster in kubernetes with HA configurations and JGroups Kubeping (most of the time my setup has 3 cluter nodes). This setup works fine in my uat cluster setups. But in my production setup, if one pod goes down (Either because of a new deployment, pod deletion or liveness probe failure) all other pods also goes to the unready state resulting a application downtime for about 3 mins.
Can someone please help me to identify the issue here? I can provide more information if required.
PS: I am using keycloak-6.0.1 for the setup(I know it is an older version and planning to upgrade). I am also using /auth/realms/master as readiness probe and /auth/ as liveness probe with 10s timeout.
Thanks and Regards,
Tishan.
I’ve been having issues with Keyclock HA as well where a cluster member unexpectedly leaving the cluster causes Keycloak to stop functioning properly.
I’m using JDBC_PING though.
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
--
You received this message because you are subscribed to the Google Groups "Keycloak User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/049898c0-03fa-419e-a905-0d655fd4662dn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/02b501d79335%2466246550%24326d2ff0%24%40prosentient.com.au.
I did notice that there seems to be a bug with the JDBC_PING (at least in older versions of Keycloak) where nodes that leave the cluster don’t clean up after themselves as the database connection appears to be closed before the jgroups table can be updated, which has led to the jgroups table containing a lot of invalid entries which caused a lot of performance problems. Using “remove_old_coords_on_view_change” and “remove_all_files_on_view_change” works around that issue though, as the jgroups table gets rewritten whenever there is a new cluster view.
Regarding Infinispan, we use a lot of the out-of-the-box “Standalone Clustered Configuration”, I haven’t delved deep enough into the Infinispan configuration (yet). Looking now… it seems that there is a range of local, distributed (most with 1 owner, one with 2 owners), invalidation, and replicated caches out of the box. I have been thinking about experimenting with making more replicated caches.
The Keycloak clustering leaves a bit to be desired so far.
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/DF0DACCE-A69E-435F-A547-072DEDFE49FB%40gmail.com.
Thanks for sharing that. Definitely looks like an either/or. I’d only tried the properties on a test instance, so hadn’t looked into it too deeply yet. I was just basing it off what I read at https://www.keycloak.org/2019/08/keycloak-jdbc-ping.
Are you still having problems even with “remove_all_data_on_view_change”?
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/12b4f731-0614-45ea-9379-0a29e4f5b8d9n%40googlegroups.com.