Scalable Keycloak setup with Kubernetes

1,842 views
Skip to first unread message

j.lak...@gmail.com

unread,
Aug 19, 2020, 6:22:40 PM8/19/20
to Keycloak User
Hi All,

I'm setting up Keycloak in clustered / HA mode within Kubernetes ecosystem using external shared Postgres DB (AWS RDS). Given that expected workload is in tens of thousands requests my initial idea was to start with a few standalone Keycloak instances. However going through the docs it became clear one can't simply treat Keycloak as a stateless worker node and easily scale-out to deal with higher load and scale-in when the load goes down.

Assuming my understanding is right, the way to go would be Standalone Clustered Mode. However that doesn't seem to be very Kubernetes friendly either (Infinispan / cache / discovery / upgrade (can't simply update K8s manifest with new Docker image) issues). I did notice mentions of different Docker images throughout the documentation as well:


Then there's the Keycloak operator: https://github.com/keycloak/keycloak-operator . However it appears to suffer from a number of issues: https://keycloak.discourse.group/t/various-issues-with-keycloak-operator/845

I was curious what would be the recommended way to set up a scalable Keycloak production cluster with Kubernetes assuming one starts from scratch? 

Many thanks,
Jacek



ed.le...@googlemail.com

unread,
Aug 24, 2020, 4:28:40 AM8/24/20
to Keycloak User
Hi Jacek,

Have you had a look at the Codecentric KeyCloak Helm chart? It's got pretty good out-of-the-box support for setting up Infinispan, and for the most part it just works. We had a big spike in usage after a release last week and it handled ~2,400 logins per minute without too much drama (just make sure you DB backend can take it).

The implementation is not perfect, as it doesn't do any automatic scaling out of the box, and there are some nuances we discovered when implementing our own, but it sure beats trying to run it on a single server.

Happy to go into more detail if you're interested :)

Best,
Ed

Shobhan Nath

unread,
May 17, 2021, 4:04:32 AM5/17/21
to Keycloak User
hello all

how did you solve the automatic scaling with  the codecentric keycloak helm chart and also what all nuances you discovered while implementing your cluster.
another question did you setup an external infinispan kubernetes cluster ?

it will be very helpful for me

thanks

dev

unread,
May 17, 2021, 5:38:14 AM5/17/21
to Keycloak User
Hi,

I'm also using the Codecentric Keycloak Helm chart, but so far I didn't find any solution about how I could create an external Infinispan Kubernetes cluster. 

I believe it's not easy to achieve and the starting point should be the WildFly Infinispan Subsystem: https://docs.wildfly.org/21/High_Availability_Guide.html#Infinispan_Subsystem


Thanks

Thomas Darimont

unread,
May 17, 2021, 7:23:12 AM5/17/21
to ed.le...@googlemail.com, Keycloak User
Hi Ed et al.,

I'm one of the maintainers of the codecentric Keycloak Helmchart. I'd be interested as well to learn how the chart could be improved :)

Cheers,
Thomas


--
You received this message because you are subscribed to the Google Groups "Keycloak User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/a5af2def-eca6-413f-a8de-fecb7007dd4dn%40googlegroups.com.

Edmond Lepedus

unread,
May 18, 2021, 11:42:37 AM5/18/21
to Thomas Darimont, Keycloak User
Hi Thomas,

Thanks for reaching out, and for your fantastic work on the KeyCloak chart! I’m personally loving the chart. We’ve been using KeyCloak under K8s since 2016, but our setup was very basic and super-flaky, losing sessions when pods restarted etc. I moved everything over to the Codecentric chart about a year ago, and it was a massive improvement. We started out on v8 of the chart, I believe, and it didn’t support horizontal scaling out of the box, but I wrote my own HPA and it worked reasonably well.

The biggest issue we faced was when a minor deployment accidentally upgraded us to version 9 of the chart (we weren’t pinning our chart versions!) and broke "All Teh Things!!1!”. While fixing that, I noticed that scaling support had been added, so I removed my HPA and configured it as per the documentation. This was late November last year, and it’s been running without a problem since, serving many thousands of users each day.

The thing I found most difficult was configuring the autoscaling behaviour. For a long time we either had too many pods, or things were crawling due to memory pressure, but the metrics would show RAM usage under 50%, so the autoscaling wouldn’t kick in. IIRC, it wasn’t until I added the ‘behaviours’ section that things really stabilised and started running well. It’s been great ever since, though:



Best,
Ed
Reply all
Reply to author
Forward
0 new messages