Keycloak HA env setup

181 views

Skip to first unread message

Tufisi Radu

unread,

Dec 23, 2020, 6:10:12 AM12/23/20

to Keycloak User

Hello,

I'm new in Keycloak and I want to setup a local env for learning SSO and OpenID flows.

I want to create a HA env on my local PC where I can simulate the instances up/down (when an instance creashes and when more instances are needed and are automatically scaled up). I created the fallowing setup (based on KC official doc and other tutorials)

startup.cli

embed-server --server-config=standalone-ha.xml --std-out=echo

batch

echo *** Update site name ***

/subsystem=transactions:write-attribute(name=node-identifier, value=${jboss.node.name})

echo *** Setting CACHE_OWNERS to "${env.CACHE_OWNERS}" in all cache-containers

/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

echo "**Update data source**"

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=pool-prefill, value=true)

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=exception-sorter-class-name, value=org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLExceptionSorter)

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=valid-connection-checker-class-name, value=org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLValidConnectionChecker)

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=min-pool-size, value=15)

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=max-pool-size, value=100)

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=blocking-timeout-wait-millis, value=4000)

/subsystem=jgroups/stack=tcp:remove()

/subsystem=jgroups/stack=tcp:add()

/subsystem=jgroups/stack=tcp/transport=TCP:add(socket-binding="jgroups-tcp")

/subsystem=jgroups/stack=tcp/protocol=JDBC_PING:add()

/subsystem=jgroups/stack=tcp/protocol=JDBC_PING/property=datasource_jndi_name:add(value=java:jboss/datasources/KeycloakDS)

/subsystem=jgroups/stack=tcp/protocol=JDBC_PING/property=initialize_sql:add(value="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ping_data varbinary(5000) DEFAULT NULL, PRIMARY KEY (own_addr, cluster_name)) ENGINE=InnoDB DEFAULT CHARSET=utf8")

/subsystem=jgroups/stack=tcp/protocol=MERGE3:add()

/subsystem=jgroups/stack=tcp/protocol=FD_SOCK:add(socket-binding="jgroups-tcp-fd")

/subsystem=jgroups/stack=tcp/protocol=FD:add()

/subsystem=jgroups/stack=tcp/protocol=VERIFY_SUSPECT:add()

/subsystem=jgroups/stack=tcp/protocol=pbcast.NAKACK2:add()

/subsystem=jgroups/stack=tcp/protocol=UNICAST3:add()

/subsystem=jgroups/stack=tcp/protocol=pbcast.STABLE:add()

/subsystem=jgroups/stack=tcp/protocol=pbcast.GMS:add()

/subsystem=jgroups/stack=tcp/protocol=pbcast.GMS/property=max_join_attempts:add(value=4)

/subsystem=jgroups/stack=tcp/protocol=MFC:add()

/subsystem=jgroups/stack=tcp/protocol=FRAG3:add()

/subsystem=jgroups/stack=udp:remove()

/subsystem=jgroups/channel=ee:write-attribute(name=stack, value=tcp)

/socket-binding-group=standard-sockets/socket-binding=jgroups-mping:remove()

run-batch

stop-embedded-server

docker.compose.yml

version: '3.3'

services:

postgres:

image: postgres:alpine

volumes:

- ./postgres:/var/lib/postgresql/data

restart: 'always'

ports:

- "5432:5432"

environment:

POSTGRES_USER: keycloak

POSTGRES_PASSWORD: password

POSTGRES_DB: keycloak

POSTGRES_HOST: postgres

nginx-keycloak:

image: nginx:alpine

# environment: - I commented this becuase I setup it in KC (bellow). Is it ok?

# PROXY_ADDRESS_FORWARDING: 'true'

volumes:

- ./nginx-kc.conf:/etc/nginx/conf.d/default.conf:ro

ports:

- "80:80"

depends_on:

- keycloak

keycloak:

build: .

environment:

JAVA_OPTS: -server -Xms1g -Xmx1g -XX:+UseAdaptiveSizePolicy -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true

PROXY_ADDRESS_FORWARDING: 'true'

MY_CUSTOM_PARAM: 'Da'

CACHE_OWNERS: '3' - is there any difference? It won't work with both

# CACHE_OWNERS_AUTH_SESSIONS_COUNT: '3'

KEYCLOAK_DISABLE_THEME_CACHING: 'true'

KEYCLOAK_LOGLEVEL: INFO

ROOT_LOGLEVEL: INFO

KEYCLOAK_STATISTICS: all

DB_VENDOR: postgres

DB_ADDR: postgres

DB_PORT: '5432'

DB_DATABASE: keycloak

DB_USER: keycloak

DB_PASSWORD: password

KEYCLOAK_USER: admin

KEYCLOAK_PASSWORD: Pa55w0rd

JGROUPS_DISCOVERY_PROTOCOL: JDBC_PING

JGROUPS_DISCOVERY_PROPERTIES: datasource_jndi_name=java:jboss/datasources/KeycloakDS,info_writer_sleep_time=500,initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING ( own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, created timestamp default current_timestamp, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name))"

depends_on:

- postgres

ports:

- "8080"

Dockerfile

FROM jboss/keycloak:latest

COPY startup-scripts/ /opt/jboss/startup-scripts/

Based on this setup, could you please tell me:

1. Is it ok? Let's not say full production ready, but will it work on a dev/test env?

2. I run it with docker-compose up -d --scale keycloak=3. Cache owners is also 3. What I want to simulate- run it with that command, login as admin, stop an instance, refresh browser and continue. Simulate going through the application while nodes are going down and up. I do get messages like -

[org.infinispan.CLUSTER] (thread-19,ejb,6f3344b9db23) ISPN100001: Node b53645e0616c left the cluster

[org.infinispan.CLUSTER] (thread-34,ejb,6f3344b9db23) ISPN000094: Received new cluster view for channel ejb: [303627bb6a6d|5] (2) [303627bb6a6d, 6f3344b9db23]. What happens - If I remove an instance, the application get stuck a litle, I click on any configure tab and nothing happens. If I refresh the browser it will be a white-page and after a 10-15 seconds everything is ok and I am still logged in the application? Why does it happens? (Maybe it is my PC hardware not enought? Have i7-8750HQ and 16GB RAM [2x8GB-2666mhz])

3. Is JDBC ping fine? Should I opt for other varian? (like KUBE_PING?)

4. Based on this setup, I thought that maybe I need some partition handling enabled. I did this on CLI. What does it does and do I need it?

echo "**Update cache partition handling**"

/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions/component=partition-handling:write-attribute(name=enabled,value="true")

/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens/component=partition-handling:write-attribute(name=enabled,value="true")

/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions/component=partition-handling:write-attribute(name=enabled,value="true")

/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions/component=partition-handling:write-attribute(name=enabled,value="true")

/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures/component=partition-handling:write-attribute(name=enabled,value="true")

/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions/component=partition-handling:write-attribute(name=enabled,value="true")

/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions/component=partition-handling:write-attribute(name=enabled,value="true")

5. If you have any suggestion or recomandations, please tell me.

Thank you very much for all your time!

Phil Fleischer

unread,

Dec 31, 2020, 9:18:33 AM12/31/20

to Tufisi Radu, Keycloak User

I don’t see any technical reason why you wouldn’t be able to run ha cluster on local. I’ve done it with jdbc ping and remote infinispan but it was a long time ago and pretty manual setup but it did work.

How are you handling load balancing in nginx? It might be the latency is related to your browser being routed to a “dead” server… this would explain why your browser is flaky but you are still logged in. In the ingress controller land of kubernetes I believe the destruction of the container will trigger the load balancer change immediately but my experience is limited on that.

The rest looks pretty good, something that I personally had done differently was I mounted the configuration files explicitly because these commands became so verbose that I gave up on writing them out. Though your setup might be easier to isolate changes from core keycloak.

KUBE_PING would be better because JDBC_PING can sometimes leave some inconsistent nodes in the database if the service does not shut down gracefully, but this would not work with docker for my understanding because it uses k8s api in the service discovery.

— Phil

--
You received this message because you are subscribed to the Google Groups "Keycloak User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/5545b4a9-222c-4a94-b065-8bc68e48c3cfn%40googlegroups.com.

Tufisi Radu

unread,

Jan 2, 2021, 6:05:30 AM1/2/21

to Keycloak User

Thanks a lot for your feedback Phil!

Well, I am not sure why the problem is still there (the lantecy one - if a server is down, it takes 5-10 seconds of loading screen for the application to work again) but I did notice something: It happens if more nodes are CACHE_OWNERS. And after I read this "KUBE_PING would be better because JDBC_PING can sometimes leave some inconsistent nodes in the database" I belive this might be the real issue. Maybe when I shut down them down from docker they leave some inconsistent nodes there?

Reply all

Reply to author

Forward

0 new messages