How to support 5+ million sessions?

1,839 views
Skip to first unread message

Silvan Jegen

unread,
Aug 19, 2021, 11:42:19 AM8/19/21
to Keycloak User
Hi everyone

Ideally we want to support 5+ million sessions in our Keycloak cluster to be able to keep our users logged in for 3 months. Our current cluster of 8 machines (with 16GB of RAM each) has issues supporting 1.5 million sessions because our machines start to reboot due to a out-of-memory condition.

Is there any good practice for scaling Keycloak that we can follow? Currently we are using an infinispan cache on each machine but we have read online that using a standalone infinispan cluster may be more adequate for this use case, for example.

Does anyone have some experience with maintaining similar session counts to share with us?

Alternatively, can somebody point us to some Keycloak experts that could consult for us in this matter?

Many thanks for your support!


Cheers,

Silvan

Phil Fleischer

unread,
Aug 21, 2021, 1:28:52 PM8/21/21
to Silvan Jegen, Keycloak User
Hey Silvan,

We have a similar workload.

I have not seen a great single source of documentation.  Sense would tell you a standalone infinispan cluster would be preferred but we haven’t had much success ourselves (functionally it works... BUT under load in production we’ve not been able to successfully implement without numerous runtime errors)

In our case we chose fewer nodes with much larger memory footprints with JVM options tuned to the memory needs of infinispan not keycloak (see link below).  In our experience the more nodes in use, the more cache syncing that is involved and consuming compute. 

This is obviously just our impression, but might help you work through it.  If you have any breaking news would love to hear about it!

— Phil

-server
-Xmx32G
-Xms32G
-Xmn8G
-XX:+UseG1GC



--
You received this message because you are subscribed to the Google Groups "Keycloak User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/91fcf4dc-eea0-4570-bd93-473bea128769n%40googlegroups.com.

Phil Fleischer

unread,
Aug 21, 2021, 1:35:57 PM8/21/21
to Silvan Jegen, Keycloak User
A few more functional comments…

Keep in mind the offline session timeout (3 months) are INACTIVE sessions, so this means they will have not used your app and refreshed the token.  Depending upon the app, that may or may not be a really long time for them to log-in.

Another thing, I’m not convinced that the offline session per session memory size is the same for every app.  In my opinion, whatever internally is cached as an object probably has way to many relations expanded to need this much memory for a year 5million records, but low and behold… such is life…

— Phil

Silvan Jegen

unread,
Aug 23, 2021, 7:47:33 AM8/23/21
to Phil Fleischer, Keycloak User
Hi Phil

On Sat, Aug 21, 2021 at 7:28 PM Phil Fleischer
<phillip....@gmail.com> wrote:
> I have not seen a great single source of documentation. Sense would tell you a standalone infinispan cluster would be preferred but we haven’t had much success ourselves (functionally it works... BUT under load in production we’ve not been able to successfully implement without numerous runtime errors)

hm, that is somewhat discouraging :P


> In our case we chose fewer nodes with much larger memory footprints with JVM options tuned to the memory needs of infinispan not keycloak (see link below). In our experience the more nodes in use, the more cache syncing that is involved and consuming compute.

At one point, scaling up (mostly) vertically will not be feasible
anymore though. I wonder if there is no way to address the issue of
needing that much RAM in the first place ...


> This is obviously just our impression, but might help you work through it. If you have any breaking news would love to hear about it!

If we find something, we will definitely let you know!


Cheers,

Silvan

Garth

unread,
Aug 23, 2021, 2:00:44 PM8/23/21
to keyclo...@googlegroups.com
Hi Phil,

Can you elaborate on "we’ve not been able to successfully implement without numerous runtime errors"? I've done a similar setup (albeit not at the same scale) with no problems. I'm curious if it is related to config, cluster size, load, etc. or some combination of factors.

Also, one of the driving factors behind using a remote infinispan was to use a backing cache store so that we wouldn't lose sessions on instance failure. Are you able to use an infinispan cache store when running it in the Keycloak process? We ran into some problems doing that given some of the peculiarities of how Keycloak uses it when it's local.

Best regards,
Garth

On Sat, Aug 21, 2021, at 7:28 PM, Phil Fleischer wrote:
> Hey Silvan,
>
> We have a similar workload.
>
> I have not seen a great single source of documentation. Sense would
> tell you a standalone infinispan cluster would be preferred but we
> haven’t had much success ourselves (functionally it works... *_BUT_*
> > To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/91fcf4dc-eea0-4570-bd93-473bea128769n%40googlegroups.com <https://groups.google.com/d/msgid/keycloak-user/91fcf4dc-eea0-4570-bd93-473bea128769n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Keycloak User" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to keycloak-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/keycloak-user/344AB6AE-7AC2-4AAB-97CA-065156E66871%40gmail.com <https://groups.google.com/d/msgid/keycloak-user/344AB6AE-7AC2-4AAB-97CA-065156E66871%40gmail.com?utm_medium=email&utm_source=footer>.

Phil Fleischer

unread,
Aug 23, 2021, 5:22:10 PM8/23/21
to Garth, keyclo...@googlegroups.com
Hey Garth,

We are already been using the cluster configuration with embedded sessions for a while before we began to look at going to remote infinispan basically using the standalone-ha.xml configuration with some minor configuration changes.

When we switched the configuration to remote (following documentation and similar to GitHub link below) we experienced performance and timeout issues (see error below) loading the offline sessions from the database.  Eventually the timeouts exceeded keycloak tolerance for failures (20).  I checked with our team and we didn’t have runtime issues, it was only during the load sessions phase.

We haven’t thrown in the towel yet, but it was a project larger than we expected.

We are working through some documents covering these issues, disclaimer, they may vary with version of keycloak/infinispan.

— Phil


2021-08-18 19:34:15,945 ERROR [org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer] (ServerService Thread Pool -- 55) ExecutionException when computed future. Errors: 37: java.util.concurren
t.ExecutionException: java.util.concurrent.TimeoutException                                                                                                                                                              
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:852)                                                                                                       
        at org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer.startLoadingImpl(InfinispanCacheInitializer.java:160)                                                                          
        at org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer.startLoading(InfinispanCacheInitializer.java:108)                                                                              
        at org.keycloak.models.sessions.infinispan.initializer.DBLockBasedCacheInitializer.startLoading(DBLockBasedCacheInitializer.java:75)                                                                             
        at org.keycloak.models.sessions.infinispan.initializer.CacheInitializer.loadSessions(CacheInitializer.java:41)                                                                                                   
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$2.run(InfinispanUserSessionProviderFactory.java:175)                                                                             
        at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)                                                                                                                 
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.loadPersistentSessions(InfinispanUserSessionProviderFactory.java:161)                                                            




Christian Becker

unread,
Aug 23, 2021, 6:11:43 PM8/23/21
to Keycloak User
Hey,

we also have a similar workload (~3M offline sessions / 6 months inactivity) and the most important setting is to use G1GC, else the GC will kill your cluster quite easily.

We’re running it as one single Infinispan cluster, but we’ve split the workloads. Only a few selected nodes are in the loadbalancing and these nodes have 0 data. So there’s some nodes with all the Infinispan data and a few nodes, which only serve frontend requests. This also makes it easier to deploy some of our plugins, as we basically only need to deploy them on these “frontend” nodes. This might not be strictly necessary, but it gives us a better feeling and better control about what’s happening in the cluster.

Unfortunately, we’ve recently also made the experience of timeouts, as described by Phil. We don’t know yet how they are caused, but they are concerning and seem to happen at random. If we replace a node, it either works and joins successfully or it doesn’t and suddenly we have a single node without any cluster knowledge. Fortunately this only happens to our “backend” nodes so far and customer impact is limited, though sometimes the cluster gets “unstable” and takes a while to settle before the error rate is 0 again.

Fortunately these issues were only transient so far and didn’t require a full cluster restart (which is another pain point with that many sessions, as a cold start takes ~15 minutes). That’s also still a very painful point, that whenever there’s a major version bump, you need to cold start the cluster which takes quite some time for the database migrations and then the aforementioned 15 minutes to fetch all sessions from the database. Even if all of this works sucessfully, we’ve had a 50% failure rate during the startup as sometimes it just suddenly decides to timeout on infinispan requests and you can try again...

FTR: We’re still on 9.0, as we’re running redhat-sso and not vanilla Keycloak.

In most cases we were debugging these issues on our own, as redhat support requires quite a lot of data and in most cases this data collection helps us to find the cause on our own. Though they were very helpful with some recent jgroups issue which caused some infinispan issues as well.

Cheers,
Christian
> To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/344AB6AE-7AC2-4AAB-97CA-065156E66871%40gmail.com.

Thomas Darimont

unread,
Aug 24, 2021, 2:28:47 AM8/24/21
to Christian Becker, Keycloak User
Hello all,

regarding the long keycloak start times with offline sessions in older keycloak / rhsso versions:
If you haven't already configured it there might be a chance that you could benefit from setting the sessionsPerSegment setting of the usersession spi to a higher value like 512 (defaults to 64).
Note that for larger values you might have to increase the workmem of your database.

See: 
https://medium.com/swlh/how-to-make-keycloak-start-up-faster-when-there-are-a-lot-of-offline-sessions-78ee49a907cb

Also note that since keycloak 14.0.0 it is possible to disable the preloading of offline-sessions on startup. This significantly reduces startup time for Keycloak, since offline sessions are then loaded on as needed bases into memory, and you can also limit the number of offline sessions that remain in memory via cache eviction policies in infinispan.

In my keycloak-project-example I recently added an example configuration that shows how you can run keycloak with an external infinispan cluster: https://github.com/thomasdarimont/keycloak-project-example/tree/main/deployments/local/cluster/haproxy-external-ispn

This gives you much more control about the infinispan configuration and also allows you to do a full keycloak cluster restart without using sessions.

I also just posted yesterday an example for using the jdbc-store support of infinispan to persist user sessions in the database instead of only having them in memory which you might be interesting as well: https://groups.google.com/g/keycloak-dev/c/sOBzG76f2FE

Cheers,
Thomas

Thomas Darimont

unread,
Aug 24, 2021, 2:32:11 AM8/24/21
to Christian Becker, Keycloak User
without using sessions
Should be without losing sessions ;-)

Typed while on the go :)

Zhandos Zhylkaidar

unread,
Aug 24, 2021, 3:12:43 AM8/24/21
to Thomas Darimont, Keycloak User
Hello Thomas,

Thanks for the information!
I have also read your previous email regarding jdbc-store and write through being disabled in 15.0.2.

So far, I see that there couple of ways of storing sessions externally, one being offline sessions and the other jdbc-store write through.

I am curious which one is the "right" way to do it ? And what are the advantages of one over the other?

Thanks, 
Zhandos.

Phil Fleischer

unread,
Aug 24, 2021, 9:23:28 AM8/24/21
to Zhandos Zhylkaidar, Thomas Darimont, Keycloak User
Thanks Everyone! We are on an older version so some of these options would require some upgrades but it is good to know. 

Regarding Zhandos question, I personally feel most modern rdbms have a cache mechanism built in which is typically easier to manage and equally fast as a remote infinispan. If it were up to me I’d make this the default and add a provider to only add a custom cache if necessary.  It’s very ingrained in the project, maybe recently starting to unwind with the quarks movement. 

— Phil

From: 'Zhandos Zhylkaidar' via Keycloak User <keyclo...@googlegroups.com>
Sent: Tuesday, August 24, 2021 3:12:30 AM
To: Thomas Darimont <thomas....@googlemail.com>
Cc: Keycloak User <keyclo...@googlegroups.com>
Subject: Re: [keycloak-user] How to support 5+ million sessions?
 

Garth

unread,
Aug 24, 2021, 7:47:30 PM8/24/21
to keyclo...@googlegroups.com
+1 for another implementation that does not require Infinispan. Do we know where the new data store project is from the Keycloak team? I found a design proposal once upon a time, but never heard about progress. Definitely one of the biggest pain-points for customers I work with. I'm happy to give time to that project if there is already definition.

On Tue, Aug 24, 2021, at 3:23 PM, Phil Fleischer wrote:
>
> Thanks Everyone! We are on an older version so some of these options
> would require some upgrades but it is good to know.
>
> Regarding Zhandos question, I personally feel most modern rdbms have a
> cache mechanism built in which is typically easier to manage and
> equally fast as a remote infinispan. If it were up to me I’d make this
> the default and add a provider to only add a custom cache if necessary.
> It’s very ingrained in the project, maybe recently starting to unwind
> with the quarks movement.
>
> — Phil
> *From:* 'Zhandos Zhylkaidar' via Keycloak User <keyclo...@googlegroups.com>
> *Sent:* Tuesday, August 24, 2021 3:12:30 AM
> *To:* Thomas Darimont <thomas....@googlemail.com>
> *Cc:* Keycloak User <keyclo...@googlegroups.com>
> *Subject:* Re: [keycloak-user] How to support 5+ million sessions?
> >>> >> To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com <mailto:keycloak-user%2Bunsu...@googlegroups.com>.
> >>> >> To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/91fcf4dc-eea0-4570-bd93-473bea128769n%40googlegroups.com.
> >>> >
> >>> >
> >>> > --
> >>> > You received this message because you are subscribed to the Google Groups "Keycloak User" group.
> >>> > To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com <mailto:keycloak-user%2Bunsu...@googlegroups.com>.
> >>> > To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/344AB6AE-7AC2-4AAB-97CA-065156E66871%40gmail.com.
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google Groups "Keycloak User" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com <mailto:keycloak-user%2Bunsu...@googlegroups.com>.
> >>> To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/80DD4CA5-8AF3-48BB-BC7B-12471E62DDDC%40googlemail.com.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Keycloak User" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/CAK-7U1g9mdAhsKEBUTsSnM%2BwGZsWPpyCG6yySsWrsDUROVMOtA%40mail.gmail.com <https://groups.google.com/d/msgid/keycloak-user/CAK-7U1g9mdAhsKEBUTsSnM%2BwGZsWPpyCG6yySsWrsDUROVMOtA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Keycloak User" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to keycloak-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/keycloak-user/CA%2BOqBQHvhAFzAFGq3S6AjgEDtrXcxZysevkXR0E5fPycr98aQQ%40mail.gmail.com <https://groups.google.com/d/msgid/keycloak-user/CA%2BOqBQHvhAFzAFGq3S6AjgEDtrXcxZysevkXR0E5fPycr98aQQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Keycloak User" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to keycloak-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/keycloak-user/MN2PR13MB41185E75974F248E77FE9B2EFDC59%40MN2PR13MB4118.namprd13.prod.outlook.com <https://groups.google.com/d/msgid/keycloak-user/MN2PR13MB41185E75974F248E77FE9B2EFDC59%40MN2PR13MB4118.namprd13.prod.outlook.com?utm_medium=email&utm_source=footer>.

Phil Fleischer

unread,
Aug 25, 2021, 2:06:43 AM8/25/21
to Christian Becker, Keycloak User
As a side note, versioning is definitely a big variable… We are using 5.0.0 and infinispan 9.4 so we are trying to upgrade but time is money as they say.
> To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/80DD4CA5-8AF3-48BB-BC7B-12471E62DDDC%40googlemail.com.

Sven-Torben Janus

unread,
Sep 1, 2021, 2:24:32 PM9/1/21
to Keycloak User
> Do we know where the new data store project is from the Keycloak team?
I think there was some work for the store.x and there is alreay a new map-based implementation (https://issues.redhat.com/browse/KEYCLOAK-14550)
With the JPA store (https://issues.redhat.com/browse/KEYCLOAK-17632) using a database would be possible, but I did not see any efforts into this lately.
Reply all
Reply to author
Forward
0 new messages