Custom Metrics in Keycloak

2,519 views
Skip to first unread message

Thomas Darimont

unread,
Sep 17, 2021, 1:21:01 PM9/17/21
to Keycloak Dev
Dear Keycloak developers,

I've been using various metrics integrations with Keycloak for many years. 
The most prominent one is probably the aerogear/keycloak-metrics-spi project
on GitHub [1]. 
Although this plugin became quite popular among Keycloak users, it has some caveats:
- It uses the Prometheus library directly to write metrics
- It does not provide a way to add custom metrics via an SPI (although it is in the name)
- It only records a limited set of metrics
- It does not integrate with the existing metrics infrastructure in Wildfly / Quarkus.

In my projects, I usually do something based on my smallrye-metrics-extension [2], 
which leverages the smallrye-metrics / eclipse-metrics infrastructure and works with 
Keycloak and Keycloak-X. At least up until Keycloak 13.0.1 where the smallrye-metrics 
subsystem components were removed, however with my workaround [3] my extension
can be used with Keycloak 15.0.2 without problems.

Over many different projects, I learned that most of the time, users are interested in a
fixed set of application-specific core metrics but occasionally need some custom metrics.

Some metrics are simple counters, but others need to be computed dynamically 
and defined as a gauge, some metrics I often see in projects are the following:

Counters:
- User Login: Success, Errors
- User Logout: Success, Errors
- Client Login: Success, Errors
- OAuth Token Refresh: Success, Errors
- OAuth CodeToToken: Success, Errors

Gauges:
- Number of users per realm
- Number of clients per realm
- Number of roles per realm
- Number of groups per realm
- Number of scopes per realm
- Keycloak Server Version (!)
- Time to compute metrics (!)
- Number of user sessions per realm and client
- Number of offline sessions per realm and client

Some custom metrics:
- Number of disabled users per realm
- Number of blocked users per realm
- Number of users with unverified emails per realm
- Number of users with no email per realm
- Number of users with no phonenumber per realm
- Number of users by credential type (2FA) per realm
- Distribution of granted consents by clients / realms
- Number of users older than X per realm
- Number of users newer than X per realm
- Distribution of login durations per realm (computed from timestamp at auth-session start)

To support those use cases, I needed to find a way to support simple counters and custom metrics.

Simple counters can be updated by an EventListenerProvider, and custom metrics can 
be computed, e.g. by calling a Keycloak API or executing a database query.
Since collecting some metrics might be expensive, I needed to defer computations 
to a later time. I also needed to avoid unnecessary work due to concurrent calculations.

After some experiments, I came up with an IMHO relatively simple setup that enables all
the above use cases, which works well for Keycloak and Keycloak-X alike :)
My current implementation is based on smallrye-metrics / eclipse metrics but could be 
abstracted also to support micrometer-based metrics orchestration. 

I plan to create an SPI that enables users to collect metrics like the ones mentioned 
above with an easy to use API that hooks into the metrics facilities of the underlying 
Keycloak platform, be it Wildfly or Quarkus.

I didn't find the time to write a fully fledged SPI yet, but you can get a first impression of 
this in my poc/quarkus-metrics [4] branch.

Note that for simplicitly, I placed all classes in the Quarkus module. In an actual 
implementation, I would probably place some of those types in the keycloak-services module.

In the poc/quarkus-metrics branch, we can declare custom smallrye metrics like this:
```
...
public static final Metadata SERVER_VERSION = Metadata.builder()
  .withName("keycloak_server_version")
  .withDescription("Keycloak Server Version")
  .withType(MetricType.GAUGE)
  .build();

public static final Metadata USERS_TOTAL = Metadata.builder()
  .withName("keycloak_users_total")
  .withDescription("Total users")
  .withType(MetricType.GAUGE)
  .build();
...
```

After that, users can create custom metrics computations with a simple interface:
```
public class DefaultMetricProvider implements MetricProvider {

    //...

    @Override
    public void updateRealmMetrics(KeycloakSession session, RealmModel realm, MetricUpdater metricUpdater) {

        // Performs the dynamic metrics collection: this is called when metrics need to be refreshed

        metricUpdater.updateMetricValue(Metrics.USERS_TOTAL, realm, session.users().getUsersCount(realm));
        // ...
    }

    @Override
    public void registerMetrics(MetricRegistry metricRegistry, MetricAccessor metricAccessor) {

        // we should only register metrics here and avoid expensive initializations!
        metricRegistry.register(Metrics.SERVER_VERSION, (Gauge<Double>) () -> 0.0, tag("version", Version.VERSION));
        // ...
    }
}
```

With this in place, metrics are recorded lazily if the /metrics endpoint is called and 
buffered to avoid wasting resources. 

Metrics that are based on counters are captured with an EventListenerProvider, as 
shown in [5]. I did not add support for collection latency distributions yet but plan to do so shortly.

# Running the example

To play with the example, just check out my branch poc/quarkus-metrics and run 
the `org.keycloak.quarkus._private.IDELauncher` from the keycloak-quarkus-server-app module. 
Then create a realm and a user and login into the account-console.

Then browse to http://localhost:8080/q/metrics, and you'll see a set of jvm, server, and 
Keycloak metrics in the open-metrics format. I added an example excerpt below.

Do you think this is some worth pursuing?

Looking forward to your thoughts :)

Cheers,
Thomas



# Example metrics output

# HELP application_keycloak_admin_event_UPDATE_total Generic KeyCloak Admin event
# TYPE application_keycloak_admin_event_UPDATE_total counter
application_keycloak_admin_event_UPDATE_total{realm="demo",resource="USER"} 2.0
# HELP application_keycloak_clients_total Total clients
# TYPE application_keycloak_clients_total gauge
application_keycloak_clients_total{realm="demo"} 6.0
application_keycloak_clients_total{realm="master"} 7.0
# HELP application_keycloak_groups_total Total groups
# TYPE application_keycloak_groups_total gauge
application_keycloak_groups_total{realm="demo"} 1.0
application_keycloak_groups_total{realm="master"} 0.0
# HELP application_keycloak_metrics_refresh_total_milliseconds Duration of Keycloak Metrics refresh in milliseconds.
# TYPE application_keycloak_metrics_refresh_total_milliseconds gauge
application_keycloak_metrics_refresh_total_milliseconds 4.0
# HELP application_keycloak_oauth_code_to_token_success_total Total code to token exchanges
# TYPE application_keycloak_oauth_code_to_token_success_total counter
application_keycloak_oauth_code_to_token_success_total{client_id="account-console",provider="keycloak",realm="demo"} 5.0
# HELP application_keycloak_oauth_token_refresh_error_total Total errors during token refreshes
# TYPE application_keycloak_oauth_token_refresh_error_total counter
application_keycloak_oauth_token_refresh_error_total{client_id="account-console",error="invalid_token",provider="keycloak",realm="demo"} 1.0
# HELP application_keycloak_oauth_token_refresh_success_total Total token refreshes
# TYPE application_keycloak_oauth_token_refresh_success_total counter
application_keycloak_oauth_token_refresh_success_total{client_id="account-console",realm="demo"} 1.0
# HELP application_keycloak_server_version Keycloak Server Version
# TYPE application_keycloak_server_version gauge
application_keycloak_server_version{version="16.0.0-SNAPSHOT"} 0.0
# HELP application_keycloak_user_login_error_total Total errors during user logins
# TYPE application_keycloak_user_login_error_total counter
application_keycloak_user_login_error_total{client_id="account-console",error="invalid_user_credentials",provider="keycloak",realm="demo"} 4.0
application_keycloak_user_login_error_total{client_id="account-console",error="user_disabled",provider="keycloak",realm="demo"} 1.0
application_keycloak_user_login_error_total{client_id="account-console",error="user_not_found",provider="keycloak",realm="demo"} 1.0
# HELP application_keycloak_user_login_success_total Total successful user logins
# TYPE application_keycloak_user_login_success_total counter
application_keycloak_user_login_success_total{client_id="account-console",provider="keycloak",realm="demo"} 5.0
# HELP application_keycloak_user_logout_success_total Total successful user logouts
# TYPE application_keycloak_user_logout_success_total counter
application_keycloak_user_logout_success_total{provider="keycloak",realm="demo"} 1.0
# HELP application_keycloak_users_total Total users
# TYPE application_keycloak_users_total gauge
application_keycloak_users_total{realm="demo"} 1.0
application_keycloak_users_total{realm="master"} 1.0

Pedro Igor Craveiro e Silva

unread,
Sep 20, 2021, 3:38:29 PM9/20/21
to Thomas Darimont, Keycloak Dev
Hi Thomas,

Sounds interesting. Perhaps we can re-take discussions here https://github.com/keycloak/keycloak-community/pull/177.

I think the design document above provides a baseline we could rely on and your proposal would be a very good addition to also introduce a specific SPI for metrics (plus any metric we might be missing in the discussion).

My main concern lies around the calculation of realm metrics that can be costly and their impact when you have a certain number of realms. IIRC, micrometer allows you (at least for gauges) to manually set the value of a metric at runtime so perhaps we could use that to perform async updates to those gauges you mentioned (number of users, clients, realms, sessions, etc).

Regards.
Pedro Igor

--
You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/ca712d90-070a-4158-88f7-809320160102n%40googlegroups.com.

Thomas Darimont

unread,
Sep 20, 2021, 4:32:42 PM9/20/21
to Pedro Igor Craveiro e Silva, Keycloak Dev
Hi Pedro,

Oh yes indeed, I'll take another look at that observability design doc. I looked at this a while ago already, but at that time I had the impression,
it was only focussing on pure SRE metrics (USE/RED model), but not on application metrics. I'll try to add the application level metrics as well.

I am currently experimenting with different integration variants. The example from the PR contains a simple integration with smallrye and microprofile metrics,
which would work for Keycloak on Quarkus and Wildfly.
In another example I use quarkus-micrometer / quarkus-micrometer-registry-prometheus to register the same metrics as with the smallrye / microprofile metrics,
which works well already.

If I we get the abstractions right, we could support microprofile and smallrye metrics in the same codebase and decide at
build-time / runtime which backend to use. This would enable seamless usage of metrics for our old wildfly based world as well as Keycloak-X.

I agree, metrics collection is expensive for some application metrics, especially if there are database queries involved.
In my implementations I collect metrics only in intervals and buffer values in between. Yes, micrometer supports setting the value of a gauge directly,
but the same / similar thing is supported in smallrye / microprofile metrics if you pass a lambda to compute / fetch a precomputed value
see: https://github.com/keycloak/keycloak/compare/master...thomasdarimont:poc/quarkus-metrics#diff-2d2dd3daf8985dbe3679f1d5fb042ccbc7a03ae810be14366d3af70893084fdeR29
In this example "MetricsAccessor" interface provides access to precomputed gauge values that's used by the lambda that serves as the compute function for the gauge.

Thanks for your feedback :)

Cheers,
Thomas

Pedro Igor Craveiro e Silva

unread,
Sep 20, 2021, 4:48:44 PM9/20/21
to Thomas Darimont, Keycloak Dev
On Mon, Sep 20, 2021 at 5:32 PM Thomas Darimont <thomas....@googlemail.com> wrote:
Hi Pedro,

Oh yes indeed, I'll take another look at that observability design doc. I looked at this a while ago already, but at that time I had the impression,
it was only focussing on pure SRE metrics (USE/RED model), but not on application metrics. I'll try to add the application level metrics as well.

IIRC, that is because the initial scope was to provide the key metrics for monitoring the server and risks to availability. These are more important to get it right and done, IMO.
 
 

I am currently experimenting with different integration variants. The example from the PR contains a simple integration with smallrye and microprofile metrics,
which would work for Keycloak on Quarkus and Wildfly.
In another example I use quarkus-micrometer / quarkus-micrometer-registry-prometheus to register the same metrics as with the smallrye / microprofile metrics,
which works well already. 

If I we get the abstractions right, we could support microprofile and smallrye metrics in the same codebase and decide at
build-time / runtime which backend to use. This would enable seamless usage of metrics for our old wildfly based world as well as Keycloak-X.

For Quarkus, I think we should consider their recommendation and use micrometer. Not sure if makes sense to choose between different implementations. We should try whenever possible to be more opinionated about server configuration to make things simpler and reduce unnecessary maintenance costs;
 

I agree, metrics collection is expensive for some application metrics, especially if there are database queries involved.
In my implementations I collect metrics only in intervals and buffer values in between. Yes, micrometer supports setting the value of a gauge directly,
but the same / similar thing is supported in smallrye / microprofile metrics if you pass a lambda to compute / fetch a precomputed value
see: https://github.com/keycloak/keycloak/compare/master...thomasdarimont:poc/quarkus-metrics#diff-2d2dd3daf8985dbe3679f1d5fb042ccbc7a03ae810be14366d3af70893084fdeR29
In this example "MetricsAccessor" interface provides access to precomputed gauge values that's used by the lambda that serves as the compute function for the gauge.

I was actually wondering how to update metrics using an async model rather than blocking when calculating such metrics.

Thomas Darimont

unread,
Sep 20, 2021, 6:38:55 PM9/20/21
to Pedro Igor Craveiro e Silva, Keycloak Dev
On Mon, 20 Sept 2021 at 22:48, Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:


On Mon, Sep 20, 2021 at 5:32 PM Thomas Darimont <thomas....@googlemail.com> wrote:
Hi Pedro,

Oh yes indeed, I'll take another look at that observability design doc. I looked at this a while ago already, but at that time I had the impression,
it was only focussing on pure SRE metrics (USE/RED model), but not on application metrics. I'll try to add the application level metrics as well.

IIRC, that is because the initial scope was to provide the key metrics for monitoring the server and risks to availability. These are more important to get it right and done, IMO.
 

I see. I agree that it makes sense to have good support for SRE relevant metrics, however I also think that application / product metrics are also important.
I work with a lot of different teams that need to operate Keycloak from an SRE perspective as well as a product perspective. Most teams are able to figure out 
the relevant USE/RED metrics on their own, e.g. by using an extension, analyzing a JVM in a container / pod via JMX or by leveraging an agent based monitoring 
solution like Instana. 
However, what most teams cannot do on their own with a stock Keycloak is to uniformly expose application / business metrics that are 
relevant for product management.For this, teams usually need to write an extension for some of the metrics I mentioned in the first post. 
Providing those (perhaps with some configuration) out of the box would be very helpful to those teams. 

 

I am currently experimenting with different integration variants. The example from the PR contains a simple integration with smallrye and microprofile metrics,
which would work for Keycloak on Quarkus and Wildfly.
In another example I use quarkus-micrometer / quarkus-micrometer-registry-prometheus to register the same metrics as with the smallrye / microprofile metrics,
which works well already. 

If I we get the abstractions right, we could support microprofile and smallrye metrics in the same codebase and decide at
build-time / runtime which backend to use. This would enable seamless usage of metrics for our old wildfly based world as well as Keycloak-X.

For Quarkus, I think we should consider their recommendation and use micrometer. Not sure if makes sense to choose between different implementations. We should try whenever possible to be more opinionated about server configuration to make things simpler and reduce unnecessary maintenance costs;

I've heard that Quarkus favours the micrometer library for metrics collection over microprofile-metrics. However many examples and documentation 
still feature smallrye-metrics, like the Keycloak quarkus server itself :)

Would it be okay to remove the quarkus-smallrye-metrics dependency in the quarkus/runtime module and replace it with quarkus-micrometer / quarkus-micrometer-registry-prometheus?
 
 

I agree, metrics collection is expensive for some application metrics, especially if there are database queries involved.
In my implementations I collect metrics only in intervals and buffer values in between. Yes, micrometer supports setting the value of a gauge directly,
but the same / similar thing is supported in smallrye / microprofile metrics if you pass a lambda to compute / fetch a precomputed value
see: https://github.com/keycloak/keycloak/compare/master...thomasdarimont:poc/quarkus-metrics#diff-2d2dd3daf8985dbe3679f1d5fb042ccbc7a03ae810be14366d3af70893084fdeR29
In this example "MetricsAccessor" interface provides access to precomputed gauge values that's used by the lambda that serves as the compute function for the gauge.

I was actually wondering how to update metrics using an async model rather than blocking when calculating such metrics.
 
Yeah, I thought about that too :) We could schedule a background task for periodic metrics collection for expensive metrics computations, like database queries etc.
Users could then "register" arbitrary computations for metrics that are then executed by the background task. 

I think it would be a good design goal to strive for making metrics collection non-blocking by default.

Pedro Igor Craveiro e Silva

unread,
Sep 21, 2021, 8:00:00 AM9/21/21
to Thomas Darimont, Keycloak Dev
On Mon, Sep 20, 2021 at 7:38 PM Thomas Darimont <thomas....@googlemail.com> wrote:


On Mon, 20 Sept 2021 at 22:48, Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:


On Mon, Sep 20, 2021 at 5:32 PM Thomas Darimont <thomas....@googlemail.com> wrote:
Hi Pedro,

Oh yes indeed, I'll take another look at that observability design doc. I looked at this a while ago already, but at that time I had the impression,
it was only focussing on pure SRE metrics (USE/RED model), but not on application metrics. I'll try to add the application level metrics as well.

IIRC, that is because the initial scope was to provide the key metrics for monitoring the server and risks to availability. These are more important to get it right and done, IMO.
 

I see. I agree that it makes sense to have good support for SRE relevant metrics, however I also think that application / product metrics are also important.
I work with a lot of different teams that need to operate Keycloak from an SRE perspective as well as a product perspective. Most teams are able to figure out 
the relevant USE/RED metrics on their own, e.g. by using an extension, analyzing a JVM in a container / pod via JMX or by leveraging an agent based monitoring 
solution like Instana. 
However, what most teams cannot do on their own with a stock Keycloak is to uniformly expose application / business metrics that are 
relevant for product management.For this, teams usually need to write an extension for some of the metrics I mentioned in the first post. 
Providing those (perhaps with some configuration) out of the box would be very helpful to those teams. 

Yeah, I'm not saying we shouldn't have those. But that we need to have a clear picture of the USE/RED metrics we care about and how we want to deliver them easily by leveraging as much as possible the capabilities from the stack we are running (e.g.: Quarkus). I think that was the initial scope of that design document, make sure we have a clear definition of the metrics that we should be tracking.

Perhaps we should resume the discussions in that document and see if we can have at least the core points set. Including any feedback from you.
 

 

I am currently experimenting with different integration variants. The example from the PR contains a simple integration with smallrye and microprofile metrics,
which would work for Keycloak on Quarkus and Wildfly.
In another example I use quarkus-micrometer / quarkus-micrometer-registry-prometheus to register the same metrics as with the smallrye / microprofile metrics,
which works well already. 

If I we get the abstractions right, we could support microprofile and smallrye metrics in the same codebase and decide at
build-time / runtime which backend to use. This would enable seamless usage of metrics for our old wildfly based world as well as Keycloak-X.

For Quarkus, I think we should consider their recommendation and use micrometer. Not sure if makes sense to choose between different implementations. We should try whenever possible to be more opinionated about server configuration to make things simpler and reduce unnecessary maintenance costs;

I've heard that Quarkus favours the micrometer library for metrics collection over microprofile-metrics. However many examples and documentation 
still feature smallrye-metrics, like the Keycloak quarkus server itself :)

Would it be okay to remove the quarkus-smallrye-metrics dependency in the quarkus/runtime module and replace it with quarkus-micrometer / quarkus-micrometer-registry-prometheus?
 
Definitely, that is something we need to review in Dist.X.

Stian Thorgersen

unread,
Sep 23, 2021, 7:38:15 AM9/23/21
to Pedro Igor Craveiro e Silva, Thomas Darimont, Keycloak Dev
I like the direction this is going for sure. I have some questions:

* How can we identify what metrics we want to have built-in, and collaborate on that? Would it make sense to start a Google Sheet for example?
* How would metrics be configured? I would imagine we don't just want an enabled/disabled option
* What metrics do we want to expose from Quarkus extensions (db, http, etc..). How are these configured?
* With regard to above would we want some categories that can be enabled/disabled?

Thomas Darimont

unread,
Sep 23, 2021, 9:56:52 AM9/23/21
to Keycloak Dev
Hello Stian,

I compiled a list of metrics from the discussion above as a google sheet and shared it with you and pedro (redhat accounts).

If other folks want to access it feel free to reach out to me :) 

Cheers,
Thomas

Stian Thorgersen

unread,
Sep 24, 2021, 4:04:36 AM9/24/21
to Thomas Darimont, Keycloak Dev
Thanks,

Can we make that spreadsheet publicly available and allow anyone to comment on it?

Can you also get started on a design proposal around metrics? Following the new approach with github discussions?

Thomas Darimont

unread,
Sep 24, 2021, 4:18:51 AM9/24/21
to stho...@redhat.com, Keycloak Dev
Hi Stian,

I just made the spreadsheet available for public users with comment permissions.
Yes, I can start a small github discussion about metrics.

My current example implementation (that uses smallrye metrics and works with wildfly and quarkus) can be found here:

The currently produced metrics look like this:

I'm currently  working on a micrometer example for quarkus.

Cheers,
Thomas

Ben Shaver

unread,
Sep 24, 2021, 7:42:00 AM9/24/21
to Keycloak Dev
Hey,
   
I'm not a Keycloak engineer but I have some experience with it.
I really like the idea of exposing metrics without the need of implementing something by myself.

> * How would metrics be configured? I would imagine we don't just want an enabled/disabled option

About that question, I though that maybe it could be breakdown to topics like 'Client metrics', 'User metrics' and so on in the Admin console if possible,
and for every topic there will be an input like in the 'EVENTS' where there is a filter for example (where the customer can just chose multiple types from a given list).
That will reduce the amount of metrics for all the customers because they could just chose whatever they want, or add and remove really easily any new metric with solving the problem to most of the end cases.

One thing that should be taken in account is that for example sometimes getting all of the users with SQL query and then filtering could be faster than making SQL query for each type of metric and sometimes it could be the opposite (depends on the amount of different types).

Thanks,
Ben
ב-יום שישי, 24 בספטמבר 2021 בשעה 11:18:51 UTC+3, thomas....@googlemail.com כתב/ה:

Thomas Darimont

unread,
Sep 28, 2021, 4:50:48 AM9/28/21
to Keycloak Dev
Hello Keycloak Developers,

I started a new discussion on github https://github.com/keycloak/keycloak/discussions/8490 where I compiled a few ideas from this thread and the google sheet.
Looking forward to your feedback and thoughts on github.
Btw. I also stumbled upon the https://github.com/micrometer-metrics/micrometer-keycloak extension which provides an interesting baseline for built-in metrics support for Keycloak-X.

Cheers,
Thomas

You received this message because you are subscribed to a topic in the Google Groups "Keycloak Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keycloak-dev/8zkodKRmYCc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/cfcf1a9e-80c1-420d-afab-f5c52b28110an%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages