Keycloak HA env setup

181 views
Skip to first unread message

Tufisi Radu

unread,
Dec 23, 2020, 6:10:12 AM12/23/20
to Keycloak User
Hello,

I'm new in Keycloak and I want to setup a local env for learning SSO and OpenID flows.
I want to create a HA env on my local PC where I can simulate the instances up/down (when an instance creashes and when more instances are needed and are automatically scaled up). I created the fallowing setup (based on KC official doc and other tutorials)

startup.cli

embed-server --server-config=standalone-ha.xml --std-out=echo
batch

echo *** Update site name ***
/subsystem=transactions:write-attribute(name=node-identifier, value=${jboss.node.name})

echo *** Setting CACHE_OWNERS to "${env.CACHE_OWNERS}" in all cache-containers

/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

echo "**Update data source**"

/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=pool-prefill, value=true)
/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=exception-sorter-class-name, value=org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLExceptionSorter)
/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=valid-connection-checker-class-name, value=org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLValidConnectionChecker)
/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=min-pool-size, value=15)
/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=max-pool-size, value=100)
/subsystem=datasources/data-source=KeycloakDS:write-attribute(name=blocking-timeout-wait-millis, value=4000)

/subsystem=jgroups/stack=tcp:remove()
/subsystem=jgroups/stack=tcp:add()
/subsystem=jgroups/stack=tcp/transport=TCP:add(socket-binding="jgroups-tcp")
/subsystem=jgroups/stack=tcp/protocol=JDBC_PING:add()
/subsystem=jgroups/stack=tcp/protocol=JDBC_PING/property=datasource_jndi_name:add(value=java:jboss/datasources/KeycloakDS)

/subsystem=jgroups/stack=tcp/protocol=JDBC_PING/property=initialize_sql:add(value="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ping_data varbinary(5000) DEFAULT NULL, PRIMARY KEY (own_addr, cluster_name)) ENGINE=InnoDB DEFAULT CHARSET=utf8")

/subsystem=jgroups/stack=tcp/protocol=MERGE3:add()
/subsystem=jgroups/stack=tcp/protocol=FD_SOCK:add(socket-binding="jgroups-tcp-fd")
/subsystem=jgroups/stack=tcp/protocol=FD:add()
/subsystem=jgroups/stack=tcp/protocol=VERIFY_SUSPECT:add()
/subsystem=jgroups/stack=tcp/protocol=pbcast.NAKACK2:add()
/subsystem=jgroups/stack=tcp/protocol=UNICAST3:add()
/subsystem=jgroups/stack=tcp/protocol=pbcast.STABLE:add()
/subsystem=jgroups/stack=tcp/protocol=pbcast.GMS:add()
/subsystem=jgroups/stack=tcp/protocol=pbcast.GMS/property=max_join_attempts:add(value=4)
/subsystem=jgroups/stack=tcp/protocol=MFC:add()
/subsystem=jgroups/stack=tcp/protocol=FRAG3:add()

/subsystem=jgroups/stack=udp:remove()
/subsystem=jgroups/channel=ee:write-attribute(name=stack, value=tcp)
/socket-binding-group=standard-sockets/socket-binding=jgroups-mping:remove()

run-batch
stop-embedded-server


docker.compose.yml

version: '3.3'

services:

   postgres:
    image: postgres:alpine
    volumes:
      - ./postgres:/var/lib/postgresql/data
    restart: 'always'
    ports:
       - "5432:5432"
    environment:
      POSTGRES_USER: keycloak
      POSTGRES_PASSWORD: password
      POSTGRES_DB: keycloak
      POSTGRES_HOST: postgres

   nginx-keycloak:
    image: nginx:alpine
   # environment:  - I commented this becuase I setup it in KC (bellow). Is it ok?
   #   PROXY_ADDRESS_FORWARDING: 'true'
    volumes:
      - ./nginx-kc.conf:/etc/nginx/conf.d/default.conf:ro
    ports:
      - "80:80"
    depends_on:
      - keycloak

   keycloak:
    build: .
    environment:
     JAVA_OPTS: -server -Xms1g -Xmx1g -XX:+UseAdaptiveSizePolicy -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true
     PROXY_ADDRESS_FORWARDING: 'true'
     MY_CUSTOM_PARAM: 'Da'
     CACHE_OWNERS: '3' - is there any difference? It won't work with both 
   #  CACHE_OWNERS_AUTH_SESSIONS_COUNT: '3'
     KEYCLOAK_DISABLE_THEME_CACHING: 'true'
     KEYCLOAK_LOGLEVEL: INFO
     ROOT_LOGLEVEL: INFO
     KEYCLOAK_STATISTICS: all
     DB_VENDOR: postgres
     DB_ADDR: postgres
     DB_PORT: '5432'
     DB_DATABASE: keycloak
     DB_USER: keycloak
     DB_PASSWORD: password
     KEYCLOAK_USER: admin
     KEYCLOAK_PASSWORD: Pa55w0rd
     JGROUPS_DISCOVERY_PROTOCOL: JDBC_PING
     JGROUPS_DISCOVERY_PROPERTIES: datasource_jndi_name=java:jboss/datasources/KeycloakDS,info_writer_sleep_time=500,initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING ( own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, created timestamp default current_timestamp, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name))"
    depends_on:
     - postgres
    ports:
      - "8080"

Dockerfile
FROM jboss/keycloak:latest
COPY startup-scripts/ /opt/jboss/startup-scripts/

Based on this setup, could you please tell me:

1. Is it ok? Let's not say full production ready, but will it work on a dev/test env?

2. I run it with docker-compose up -d --scale keycloak=3. Cache owners is also 3. What I want to simulate-  run it with that command, login as admin, stop an instance, refresh browser and continue. Simulate going through the application while nodes are going down and up. I do get messages like - 
 [org.infinispan.CLUSTER] (thread-19,ejb,6f3344b9db23) ISPN100001: Node b53645e0616c left the cluster
[org.infinispan.CLUSTER] (thread-34,ejb,6f3344b9db23) ISPN000094: Received new cluster view for channel ejb: [303627bb6a6d|5] (2) [303627bb6a6d, 6f3344b9db23]. What happens - If I remove an instance, the application get stuck a litle, I click on any configure tab and nothing happens. If I refresh the browser it will be a white-page and after a 10-15 seconds everything is ok and I am still logged in the application? Why does it happens? (Maybe it is my PC hardware not enought? Have i7-8750HQ and 16GB RAM [2x8GB-2666mhz])

3. Is JDBC ping fine? Should I opt for other varian? (like KUBE_PING?)

4. Based on this setup, I thought that maybe I need some partition handling enabled. I did this on CLI. What does it does and do I need it?

echo "**Update cache partition handling**"
/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions/component=partition-handling:write-attribute(name=enabled,value="true")
/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens/component=partition-handling:write-attribute(name=enabled,value="true")
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions/component=partition-handling:write-attribute(name=enabled,value="true")
/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions/component=partition-handling:write-attribute(name=enabled,value="true")
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures/component=partition-handling:write-attribute(name=enabled,value="true")
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions/component=partition-handling:write-attribute(name=enabled,value="true")
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions/component=partition-handling:write-attribute(name=enabled,value="true")

5. If you have any suggestion or recomandations, please tell me. 

Thank you very much for all your time!

Phil Fleischer

unread,
Dec 31, 2020, 9:18:33 AM12/31/20
to Tufisi Radu, Keycloak User
I don’t see any technical reason why you wouldn’t be able to run ha cluster on local.  I’ve done it with jdbc ping and remote infinispan but it was a long time ago and pretty manual setup but it did work.

How are you handling load balancing in nginx?  It might be the latency is related to your browser being routed to a “dead” server… this would explain why your browser is flaky but you are still logged in.  In the ingress controller land of kubernetes I believe the destruction of the container will trigger the load balancer change immediately but my experience is limited on that.

The rest looks pretty good, something that I personally had done differently was I mounted the configuration files explicitly because these commands became so verbose that I gave up on writing them out.  Though your setup might be easier to isolate changes from core keycloak.

KUBE_PING would be better because JDBC_PING can sometimes leave some inconsistent nodes in the database if the service does not shut down gracefully, but this would not work with docker for my understanding because it uses k8s api in the service discovery.

— Phil

--
You received this message because you are subscribed to the Google Groups "Keycloak User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-user/5545b4a9-222c-4a94-b065-8bc68e48c3cfn%40googlegroups.com.

Tufisi Radu

unread,
Jan 2, 2021, 6:05:30 AM1/2/21
to Keycloak User

Thanks a lot for your feedback Phil!
Well, I am not sure why the problem is still there (the lantecy one - if a server is down, it takes 5-10 seconds of loading screen for the application to work again) but I did notice something: It happens if more nodes are CACHE_OWNERS. And after I read this "KUBE_PING would be better because JDBC_PING can sometimes leave some inconsistent nodes in the database" I belive this might be the real issue. Maybe when I shut down them down from docker they leave some inconsistent nodes there?
Reply all
Reply to author
Forward
0 new messages