Performance/Cache issues in keycloak 12.0.4

886 views
Skip to first unread message

Matysiak Joerg (IOC/PAU1)

unread,
Apr 28, 2021, 11:19:57 AM4/28/21
to keyclo...@googlegroups.com, EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)
Hi,

we're facing some performance issues with keycloak 12.0.4 when creating and deleting many realms.
We couldn't reproduce the same with keycloak 11.

Digging deeper into the problem makes us think that this may be a problem that may also occur in other scenarios as well.

Looks like the cache gets into an inconsistent state that isn't repaired automatically.
The inconsistent cache obviously cannot be used by keycloak, so that all requests have to be handled by the database,
which causes a massive performance breakdown.

Restarting keycloak or clearing the realm cache seems to repair a running keycloak.

Test setup:

* Image based on quay.io/keycloak/keycloak:12.0.4
* configuration: standalone-ha.xml
* Problem observered in these environments:
* single instance running in docker on developer machine using postgres
* multi-instance environment running on Azure Kubernetes using mssql

Test actions:

* Created 150 realms, delete them again, repeat 1-5 times
* operations run sequentially (non parallel)

Oberservations:

* At some point in time keycloak becomes very slow, requests take multiple seconds
* this happens in ~80% of the cases, we also had test runs where it didn't happen

What we already found out:

* some objects e.g. clientscope "offline_access" or "web-origin" are no longer found in realm cache
* the cached object in [1] is null, even though there is an object with the id in the cache
* looks like the revision information in the "revisions" cache [2] does no longer match the revision information within the object in the "cache" cache [3]
* written in meta-code:
* cache.get(id).revision contains a small number (0 in most times, which seems to be correct) (see [3])
* revisions.get(id) returns a much higher number even if the object was not updated in between
* therefore the cached object is considered to be outdated (condition at [4] is true)
* the object is reloaded from the database

* the cache cannot be updated because there is already an object in the cache with the same id
* the call in [5] does not update the object (observed when debugging into the details of infinispan)

* since the cache is not updated, this happens from now on all the time when the object is requested (starting with "the cached object is null")

* This situation can only be fixed by restarting keycloak or clearing the realm cache manually

[1] https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/RealmCacheSession.java#L1196
[2] "revisions" cache: https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L58
[3] "cache" cache: https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L59
[4] https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L107
[5] https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L164


Did someone observe similar things?
Is there already a solution for this?

Until now we didn't manage to find out why the revisions cache is (falsely) updated, does someone have an explanation for this?

Mit freundlichen Grüßen / Best regards

Jörg Matysiak

Product Area User Management (IOC/PAU1)
Bosch.IO GmbH | Stuttgarter Straße 130 | 71332 Waiblingen | GERMANY | www.bosch.io

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling


Pedro Igor Craveiro e Silva

unread,
Apr 28, 2021, 11:34:02 AM4/28/21
to Matysiak Joerg (IOC/PAU1), keyclo...@googlegroups.com, EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)
Would you be able to run the same tests using https://github.com/keycloak/keycloak/pull/7857?

Recently, we found that Infinispan changed the encoding configuration which required a fix in both Wildfly and KC. That PR includes both. Your issue does not seem related to the original issue we are fixing in that PR but perhaps it might be related.

The behavior is weird and I did not hear anyone else facing the same issue.

--
You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/AM0PR10MB1923441C493C5DEC7FEA1B45ED409%40AM0PR10MB1923.EURPRD10.PROD.OUTLOOK.COM.

Matysiak Joerg (IOC/PAU1)

unread,
Apr 29, 2021, 2:28:40 AM4/29/21
to Pedro Igor Craveiro e Silva, keyclo...@googlegroups.com, EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)

Hi Pedro,

 

thanks for your reply and the hint about the PR.

 

We will run test with this version and keep you posted about the results

 

Mit freundlichen Grüßen / Best regards

Jörg Matysiak


Product Area User Management (IOC/PAU1)
Bosch.IO GmbH | Stuttgarter Straße 130 | 71332 Waiblingen | GERMANY | www.bosch.io

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)

unread,
Apr 29, 2021, 3:47:52 AM4/29/21
to Matysiak Joerg (IOC/PAU1), Pedro Igor Craveiro e Silva, keyclo...@googlegroups.com

Hi Pedro,

 

we tried it and the issue still occurs with the version in the mentioned PR https://github.com/keycloak/keycloak/pull/7857.

 

Mit freundlichen Grüßen / Best regards

Benjamin Weimer

Project Delivery Berlin 2 (IOC/PDL2)
Bosch.IO GmbH | Ziegelei 7 | 88090 Immenstaad | GERMANY |
www.bosch.io
Tel. +49 30 726112-0 |
external.Ben...@bosch-si.com



Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)

unread,
Apr 29, 2021, 6:14:29 AM4/29/21
to EXTERNAL Weimer Benjamin (TNG, IOC/PDL2), Matysiak Joerg (IOC/PAU1), Pedro Igor Craveiro e Silva, keyclo...@googlegroups.com

Hi,

 

we were just able to also reproduce the issue on Keycloak 11.0.3 by creating about 500 realms. Only a few objects in the cache got corrupted in the previously described way. So the performance breakdown was not as impactful as in the tests we ran on Keycloak 12.0.4. Anyways the issue seems not to be only related to the Keycloak 12 update.

 

Mit freundlichen Grüßen / Best regards

Benjamin Weimer


Project Delivery Berlin 2 (IOC/PDL2)
Bosch.IO GmbH | Ziegelei 7 | 88090 Immenstaad | GERMANY |
www.bosch.io
Tel. +49 30 726112-0 |
external.Ben...@bosch-si.com

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

Matysiak Joerg (IOC/PAU1)

unread,
Apr 30, 2021, 9:35:03 AM4/30/21
to keyclo...@googlegroups.com, EXTERNAL Weimer Benjamin (TNG, IOC/PDL2), Pedro Igor Craveiro e Silva
Hi,

we did some progress in analysing the issue.

It can easily and fast be reproduced when changing the value of REALM_REVISIONS_CACHE_DEFAULT_MAX to 100 in
model/infinispan/src/main/java/org/keycloak/connections/infinispan/InfinispanConnectionProvider.java.

After starting keycloak and adding 2-3 additional realms the caches gets into an inconsistent state.

Our analysis:

According to the infinispan documentation in [1].
older entries are deleted from the "revisions" cache [2] when the maximum cache size is reached.
The object is still present in the "cache" cache [3].

The condition at [4] in the get() method will be true the next time the object is requested.
This will lead to a new entry in "revisions" in the addRevisioned() method, see [5].

Later in this method putForExternalRead() is called to update the object in the "cache" cache, see [6].
Unfortunatly this does not update the cache since there is still an object with the same id in it.
See also infinispan doc [7].

=> The cache stays inconsistent from now on: each request on the affected object leads to a database access

We're currently preparing a pull request to fix this issue.
(Removing the object from the "cache" cache when the inconsistency is detected will make the cache consitent again.)

Kind regards

Benjamnin and Jörg

[1] https://infinispan.org/docs/stable/titles/configuring/configuring.html#eviction-config
[4] https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L100
[5] https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L144
[6] https://github.com/keycloak/keycloak/blob/12.0.4/model/infinispan/src/main/java/org/keycloak/models/cache/infinispan/CacheManager.java#L164
[7] https://infinispan.org/docs/stable/titles/developing/developing.html#putforexternalread_operation

Pedro Igor Craveiro e Silva

unread,
Apr 30, 2021, 12:16:14 PM4/30/21
to Matysiak Joerg (IOC/PAU1), keyclo...@googlegroups.com, EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)
Interesting, it makes more sense now that you were able to reproduce in 11 too.

I guess the assumption was that having a revision cache size equal to the size of the cache would be enough. But eviction might happen without respect to this "logical" mapping between revisions and the cache, thereby causing this inconsistency. Does it make sense?

I'm also wondering if in your case, you should not consider increasing the size of your caches? Perhaps you are noticing this now due to an increase in your realms data?

EXTERNAL Weimer Benjamin (TNG, IOC/PDL2)

unread,
May 3, 2021, 2:12:32 AM5/3/21
to Pedro Igor Craveiro e Silva, Matysiak Joerg (IOC/PAU1), keyclo...@googlegroups.com

Hi,

 

Exactly, this happens since the “revisions” cache and “cache” cache get out of sync. An entry in “revisions” is removed without removing it from the “cache”.

 

This actually didn’t happen in one of our production Keycloaks. We found this during a load test of Keycloak 12. Increasing the cache size might also have the effect to not run into this bug. Running into this issue in a productive Keycloak is definitely an indicator to increase the cache size or reduce the eviction time.

 

Mit freundlichen Grüßen / Best regards

Benjamin Weimer


Project Delivery Berlin 2 (IOC/PDL2)
Bosch.IO GmbH | Ziegelei 7 | 88090 Immenstaad | GERMANY |


Tel. +49 30 726112-0 |
external.Ben...@bosch-si.com


Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling


Von: Pedro Igor Craveiro e Silva <pigor.c...@gmail.com>
Gesendet: Freitag, 30. April 2021 18:16
An: Matysiak Joerg (IOC/PAU1) <Joerg.M...@bosch.io>
Cc: keyclo...@googlegroups.com; EXTERNAL Weimer Benjamin (TNG, IOC/PDL2) <external.Ben...@bosch.io>
Betreff: Re: [keycloak-dev] Performance/Cache issues in keycloak 12.0.4

 

Interesting, it makes more sense now that you were able to reproduce in 11 too.

Schuster Sebastian (IOC/PAU1)

unread,
May 3, 2021, 4:04:22 AM5/3/21
to EXTERNAL Weimer Benjamin (TNG, IOC/PDL2), Pedro Igor Craveiro e Silva, Matysiak Joerg (IOC/PAU1), keyclo...@googlegroups.com

Hi Pedro,

 

no matter how the cache sizes are configured, this piece of code in CacheManager.java:171 is most likely buggy:

// revisions cache has a lower value than the object.revision, so update revision and add it to cache
revisions.put(id, object.getRevision());
if (lifespan < 0) cache.putForExternalRead(id, object);
else cache.putForExternalRead(id, object, lifespan, TimeUnit.MILLISECONDS);

since cache.putForExternalRead calls CacheImpl.java:887 putIfAbsent(key, value, metadata, EnumUtil.mergeBitSets(PFER_FLAGS, explicitFlags), pferContextBuilder);

which, according to the documentation only adds and does not update. So the realm cache is simply not updated as the code above suggests.

 

Best regards,

Sebastian

 

 

Mit freundlichen Grüßen / Best regards

Dr.-Ing. Sebastian Schuster

Project Delivery Berlin 22 (IOC/PDL22)
Bosch.IO GmbH | Ullsteinstr. 128 | 12109 Berlin | GERMANY | www.bosch.io
Tel. +49 30 726112-485 | Mobil +49 152 02177668 | Telefax +49 30 726112-100 | Threema / Threema Work: MF9VMEAE | Sebastian...@bosch.io



Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

--

You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.

Matysiak Joerg (IOC/PAU1)

unread,
May 3, 2021, 4:55:11 AM5/3/21
to keyclo...@googlegroups.com, Schuster Sebastian (IOC/PAU1), EXTERNAL Weimer Benjamin (TNG, IOC/PDL2), Pedro Igor Craveiro e Silva

 

Mit freundlichen Grüßen / Best regards

Jörg Matysiak


Product Area User Management (IOC/PAU1)
Bosch.IO GmbH | Stuttgarter Straße 130 | 71332 Waiblingen | GERMANY | www.bosch.io

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

Reply all
Reply to author
Forward
0 new messages