Dear Keycloak developers,
I've been using various metrics integrations with Keycloak for many years.
The most prominent one is probably the aerogear/keycloak-metrics-spi project
on GitHub [1].
Although this plugin became quite popular among Keycloak users, it has some caveats:
- It uses the Prometheus library directly to write metrics
- It does not provide a way to add custom metrics via an SPI (although it is in the name)
- It only records a limited set of metrics
- It does not integrate with the existing metrics infrastructure in Wildfly / Quarkus.
In my projects, I usually do something based on my smallrye-metrics-extension [2],
which leverages the smallrye-metrics / eclipse-metrics infrastructure and works with
Keycloak and Keycloak-X. At least up until Keycloak 13.0.1 where the smallrye-metrics
subsystem components were removed, however with my workaround [3] my extension
can be used with Keycloak 15.0.2 without problems.
Over many different projects, I learned that most of the time, users are interested in a
fixed set of application-specific core metrics but occasionally need some custom metrics.
Some metrics are simple counters, but others need to be computed dynamically
and defined as a gauge, some metrics I often see in projects are the following:
Counters:
- User Login: Success, Errors
- User Logout: Success, Errors
- Client Login: Success, Errors
- OAuth Token Refresh: Success, Errors
- OAuth CodeToToken: Success, Errors
Gauges:
- Number of users per realm
- Number of clients per realm
- Number of roles per realm
- Number of groups per realm
- Number of scopes per realm
- Keycloak Server Version (!)
- Time to compute metrics (!)
- Number of user sessions per realm and client
- Number of offline sessions per realm and client
Some custom metrics:
- Number of disabled users per realm
- Number of blocked users per realm
- Number of users with unverified emails per realm
- Number of users with no email per realm
- Number of users with no phonenumber per realm
- Number of users by credential type (2FA) per realm
- Distribution of granted consents by clients / realms
- Number of users older than X per realm
- Number of users newer than X per realm
- Distribution of login durations per realm (computed from timestamp at auth-session start)
To support those use cases, I needed to find a way to support simple counters and custom metrics.
Simple counters can be updated by an EventListenerProvider, and custom metrics can
be computed, e.g. by calling a Keycloak API or executing a database query.
Since collecting some metrics might be expensive, I needed to defer computations
to a later time. I also needed to avoid unnecessary work due to concurrent calculations.
After some experiments, I came up with an IMHO relatively simple setup that enables all
the above use cases, which works well for Keycloak and Keycloak-X alike :)
My current implementation is based on smallrye-metrics / eclipse metrics but could be
abstracted also to support micrometer-based metrics orchestration.
I plan to create an SPI that enables users to collect metrics like the ones mentioned
above with an easy to use API that hooks into the metrics facilities of the underlying
Keycloak platform, be it Wildfly or Quarkus.
I didn't find the time to write a fully fledged SPI yet, but you can get a first impression of
this in my poc/quarkus-metrics [4] branch.
Note that for simplicitly, I placed all classes in the Quarkus module. In an actual
implementation, I would probably place some of those types in the keycloak-services module.
In the poc/quarkus-metrics branch, we can declare custom smallrye metrics like this:
```
...
public static final Metadata SERVER_VERSION = Metadata.builder()
.withName("keycloak_server_version")
.withDescription("Keycloak Server Version")
.withType(MetricType.GAUGE)
.build();
public static final Metadata USERS_TOTAL = Metadata.builder()
.withName("keycloak_users_total")
.withDescription("Total users")
.withType(MetricType.GAUGE)
.build();
...
```
After that, users can create custom metrics computations with a simple interface:
```
public class DefaultMetricProvider implements MetricProvider {
//...
@Override
public void updateRealmMetrics(KeycloakSession session, RealmModel realm, MetricUpdater metricUpdater) {
// Performs the dynamic metrics collection: this is called when metrics need to be refreshed
metricUpdater.updateMetricValue(Metrics.USERS_TOTAL, realm, session.users().getUsersCount(realm));
// ...
}
@Override
public void registerMetrics(MetricRegistry metricRegistry, MetricAccessor metricAccessor) {
// we should only register metrics here and avoid expensive initializations!
metricRegistry.register(Metrics.SERVER_VERSION, (Gauge<Double>) () -> 0.0, tag("version", Version.VERSION));
// ...
}
}
```
With this in place, metrics are recorded lazily if the /metrics endpoint is called and
buffered to avoid wasting resources.
Metrics that are based on counters are captured with an EventListenerProvider, as
shown in [5]. I did not add support for collection latency distributions yet but plan to do so shortly.
# Running the example
To play with the example, just check out my branch poc/quarkus-metrics and run
the `org.keycloak.quarkus._private.IDELauncher` from the keycloak-quarkus-server-app module.
Then create a realm and a user and login into the account-console.
Keycloak metrics in the open-metrics format. I added an example excerpt below.
Do you think this is some worth pursuing?
Looking forward to your thoughts :)
Cheers,
Thomas
# Example metrics output
# HELP application_keycloak_admin_event_UPDATE_total Generic KeyCloak Admin event
# TYPE application_keycloak_admin_event_UPDATE_total counter
application_keycloak_admin_event_UPDATE_total{realm="demo",resource="USER"} 2.0
# HELP application_keycloak_clients_total Total clients
# TYPE application_keycloak_clients_total gauge
application_keycloak_clients_total{realm="demo"} 6.0
application_keycloak_clients_total{realm="master"} 7.0
# HELP application_keycloak_groups_total Total groups
# TYPE application_keycloak_groups_total gauge
application_keycloak_groups_total{realm="demo"} 1.0
application_keycloak_groups_total{realm="master"} 0.0
# HELP application_keycloak_metrics_refresh_total_milliseconds Duration of Keycloak Metrics refresh in milliseconds.
# TYPE application_keycloak_metrics_refresh_total_milliseconds gauge
application_keycloak_metrics_refresh_total_milliseconds 4.0
# HELP application_keycloak_oauth_code_to_token_success_total Total code to token exchanges
# TYPE application_keycloak_oauth_code_to_token_success_total counter
application_keycloak_oauth_code_to_token_success_total{client_id="account-console",provider="keycloak",realm="demo"} 5.0
# HELP application_keycloak_oauth_token_refresh_error_total Total errors during token refreshes
# TYPE application_keycloak_oauth_token_refresh_error_total counter
application_keycloak_oauth_token_refresh_error_total{client_id="account-console",error="invalid_token",provider="keycloak",realm="demo"} 1.0
# HELP application_keycloak_oauth_token_refresh_success_total Total token refreshes
# TYPE application_keycloak_oauth_token_refresh_success_total counter
application_keycloak_oauth_token_refresh_success_total{client_id="account-console",realm="demo"} 1.0
# HELP application_keycloak_server_version Keycloak Server Version
# TYPE application_keycloak_server_version gauge
application_keycloak_server_version{version="16.0.0-SNAPSHOT"} 0.0
# HELP application_keycloak_user_login_error_total Total errors during user logins
# TYPE application_keycloak_user_login_error_total counter
application_keycloak_user_login_error_total{client_id="account-console",error="invalid_user_credentials",provider="keycloak",realm="demo"} 4.0
application_keycloak_user_login_error_total{client_id="account-console",error="user_disabled",provider="keycloak",realm="demo"} 1.0
application_keycloak_user_login_error_total{client_id="account-console",error="user_not_found",provider="keycloak",realm="demo"} 1.0
# HELP application_keycloak_user_login_success_total Total successful user logins
# TYPE application_keycloak_user_login_success_total counter
application_keycloak_user_login_success_total{client_id="account-console",provider="keycloak",realm="demo"} 5.0
# HELP application_keycloak_user_logout_success_total Total successful user logouts
# TYPE application_keycloak_user_logout_success_total counter
application_keycloak_user_logout_success_total{provider="keycloak",realm="demo"} 1.0
# HELP application_keycloak_users_total Total users
# TYPE application_keycloak_users_total gauge
application_keycloak_users_total{realm="demo"} 1.0
application_keycloak_users_total{realm="master"} 1.0