Hazelcast not working after upgrade from CAS 6.6 to CAS 7.0

241 views
Skip to first unread message

Phil Hale

unread,
Aug 1, 2024, 12:23:50 PM8/1/24
to CAS Community
Howdy folks,

I'm having some trouble with Hazelcast after upgrading from CAS 6.6 to 7.0.  I have CAS servers set up in an HA configuration and have been using Hazelcast for the Ticket Registry for many versions of CAS.  After upgrading to 7.0, the basic configuration I've used has stopped working.  Generally I just had to include the following two cas.properties settings:

node1:
cas.ticket.registry.hazelcast.cluster.core.instance-name=10.20.2.106
cas.ticket.registry.hazelcast.cluster.network.members=10.20.2.106,10.20.2.105

node2:
cas.ticket.registry.hazelcast.cluster.core.instance-name=10.20.2.105
cas.ticket.registry.hazelcast.cluster.network.members=10.20.2.106,10.20.2.105

But the two hosts no longer seem to communicate properly.  I'm on CAS version 7.0.6.  I've reviewed the Required configuration settings from https://apereo.github.io/cas/7.0.x/ticketing/Hazelcast-Ticket-Registry.html but I still can't seem to get it working.  Anyone have a sample configuration for a 2 node setup that is working under 7.0 that they can send me?

Thanks,

Phil

Erik Mallory

unread,
Aug 1, 2024, 10:42:06 PM8/1/24
to cas-...@apereo.org
Same for 7.0.4 in production our test and dev environments work but not prod. We have some style changes (colors) we use to identify the different environments and I use the war overlay to rebuild the war and copy it to the environment.  dev works, I finished work on the 7.0 branch a few months back and  upgraded test two months ago, and upgraded both environments to 7.0.4. The move to prod earlier this week did not go as well. We experienced the same thing you are. with 7.0.6. Moments ago I upgraded our dev system to 7.0.6 to see if I could replicate your problem and I could not. Hazelcast is up and running and cas works properly. 

So what's different?
our prod cas gets HAMMERED. test and dev not so much. Through Prometheus, we noticed that the Old_Gen Memory for our production system was growing under load and eventually would tank the cas service when the jvm ran out of memory.

I'm not sure where the problem is, but I don't think it's your config.




--
- Website: https://apereo.github.io/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/ea7920bc-854d-47e3-9280-6bb9e4c13d39n%40apereo.org.


--
Erik Mallory
------------------------
"A happy man's paradise is his own good nature." - Edward Abbey

Jeremiah Garmatter

unread,
Aug 2, 2024, 10:50:52 AM8/2/24
to CAS Community
Agreed,

Your configuration seems fine. Looks similar to mine. I have more options specified though:
cas.ticket.registry.hazelcast.cluster.core.instance-name=login-dev
cas.ticket.registry.hazelcast.cluster.network.members=ip1,ip2
cas.ticket.registry.hazelcast.cluster.network.port=5701
cas.ticket.registry.hazelcast.cluster.network.port-auto-increment=true
cas.ticket.registry.hazelcast.cluster.network.tcpip-enabled=true
cas.ticket.registry.hazelcast.cluster.discovery.multicast.enabled=false
cas.ticket.registry.hazelcast.cluster.core.replicated=true
cas.ticket.registry.hazelcast.cluster.core.async-backup-count=0
You've probably already done it but I'd check that the firewall is still open over your chosen Hazelcast ports.

Ray Bon

unread,
Aug 2, 2024, 1:47:03 PM8/2/24
to cas-...@apereo.org
Erik,

Increase tomcat memory; to 16 or more Gb.

Ray

On Thu, 2024-08-01 at 13:34 -0500, Erik Mallory wrote:
You don't often get email from erik.m...@gmail.com. Learn why this is important
Same for 7.0.4 in production our test and dev environments work but not prod. We have some style changes (colors) we use to identify the different environments and I use the war overlay to rebuild the war and copy it to the environment.  dev works, I finished work on the 7.0 branch a few months back and  upgraded test two months ago, and upgraded both environments to 7.0.4. The move to prod earlier this week did not go as well. We experienced the same thing you are. with 7.0.6. Moments ago I upgraded our dev system to 7.0.6 to see if I could replicate your problem and I could not. Hazelcast is up and running and cas works properly. 

So what's different?
our prod cas gets HAMMERED. test and dev not so much. Through Prometheus, we noticed that the Old_Gen Memory for our production system was growing under load and eventually would tank the cas service when the jvm ran out of memory.

I'm not sure where the problem is, but I don't think it's your config.




On Thu, Aug 1, 2024 at 11:23 AM Phil Hale <phal...@gmail.com> wrote:
Howdy folks,

I'm having some trouble with Hazelcast after upgrading from CAS 6.6 to 7.0.  I have CAS servers set up in an HA configuration and have been using Hazelcast for the Ticket Registry for many versions of CAS.  After upgrading to 7.0, the basic configuration I've used has stopped working.  Generally I just had to include the following two cas.properties settings:

node1:
cas.ticket.registry.hazelcast.cluster.core.instance-name=10.20.2.106
cas.ticket.registry.hazelcast.cluster.network.members=10.20.2.106,10.20.2.105

node2:
cas.ticket.registry.hazelcast.cluster.core.instance-name=10.20.2.105
cas.ticket.registry.hazelcast.cluster.network.members=10.20.2.106,10.20.2.105

But the two hosts no longer seem to communicate properly.  I'm on CAS version 7.0.6.  I've reviewed the Required configuration settings fromhttps://apereo.github.io/cas/7.0.x/ticketing/Hazelcast-Ticket-Registry.html but I still can't seem to get it working.  Anyone have a sample configuration for a 2 node setup that is working under 7.0 that they can send me?

Thanks,

Phil

--
- Website: https://apereo.github.io/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email tocas-user+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/ea7920bc-854d-47e3-9280-6bb9e4c13d39n%40apereo.org.

Phil Hale

unread,
Aug 6, 2024, 12:08:31 AM8/6/24
to CAS Community
Thanks for the responses.

I've made sure I'm running on the latest version (7.0.6).

I've made sure port 5701 is open thru the firewall on both servers, using telnet to verify from each.

I've set up the following config on both servers:
cas.ticket.registry.hazelcast.cluster.core.instance-name=login-test
cas.ticket.registry.hazelcast.cluster.network.members=10.20.2.106,10.20.2.105

cas.ticket.registry.hazelcast.cluster.network.port=5701
cas.ticket.registry.hazelcast.cluster.network.port-auto-increment=true
cas.ticket.registry.hazelcast.cluster.network.tcpip-enabled=true
cas.ticket.registry.hazelcast.cluster.discovery.multicast.enabled=false
cas.ticket.registry.hazelcast.cluster.core.replicated=true
cas.ticket.registry.hazelcast.cluster.core.async-backup-count=0

One of the two servers never seems to make it all the way "up" as I don't ever see it go to "Ready" state or show the periodic JsonServiceRegistry messages,

[org.apereo.cas.services.mgmt.AbstractServicesManager] - <Loaded [40] service(s) from [JsonServiceRegistry].>

If I check both servers today, I can see the following error message related to hazelcast:

node1:
ERROR [com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager] - <[10.20.2.106]:5701 [dev] [5.3.6] Connecting to self! [localhost]:5701>

node2:
ERROR [com.hazelcast.internal.cluster.impl.operations.MasterResponseOp] - <[10.20.2.105]:5701 [dev] [5.3.6] Connecting to self! [localhost]:5701>

I do run the tomcat behind an on-board apache httpd proxy, but it is the same configuration I ran under 6.6 where hazelcast worked.

Not sure what to check next.

Phil

Erik Mallory

unread,
Aug 6, 2024, 12:08:45 AM8/6/24
to cas-...@apereo.org
Ray...
The last thing you should do with an app with a memory leak is throw more resources at it. That's not a fix, that's a bandaid and not a very good one at that. 

Ray Bon

unread,
Aug 6, 2024, 8:27:53 AM8/6/24
to cas-...@apereo.org
There is no indication of a memory leak, just a memory demand.

Jeremiah Garmatter

unread,
Aug 6, 2024, 9:16:42 AM8/6/24
to CAS Community, Phil Hale
Phil,

Is port 5702 also open between your CAS servers? If you use the autoincrement option I believe the nodes take a different port. For instance, I have four servers in prod, autoincrement enabled, port 5701 set, so I need to open ports 5701-5704 between each server in my CAS cluster. If you only want one port specified you should take out the autoincrement option.

See https://apereo.github.io/cas/7.0.x/ticketing/Hazelcast-Ticket-Registry.html#hazelcast-cluster-networking Check for both the required and optional configurations that you specified in your config property.

Erik Mallory

unread,
Aug 6, 2024, 9:22:23 AM8/6/24
to cas-...@apereo.org
 This will be my final response regarding the memory issue I encountered: Research Old Gen Memory in the JVM and garbage collection. Something that cas is using is not freeing memory as it should That's why Old Gen climbs and never goes down. That's why you don't blindly throw more RAM at it. Cron up a restart every few hours, until a patch comes through. I can poke the JVM GC (garbage collection) and reclaim memory from old gen space. This isn't my first rodeo with a misbehaving java app. This is a side conversation anyway. tangent to the original issue but not central.

Phil Hale

unread,
Aug 15, 2024, 10:48:43 AM8/15/24
to CAS Community
Just wanted to follow up with the list and let everyone know I got it working.  I enabled debug logging to verify communication and I adjusted a few more of the options that seemed to help.  Here is my current config for posterity:

cas.ticket.registry.hazelcast.cluster.core.instance-name=login-test
cas.ticket.registry.hazelcast.cluster.network.members=10.20.2.106,10.20.2.105
cas.ticket.registry.hazelcast.cluster.network.port=5701
cas.ticket.registry.hazelcast.cluster.network.port-auto-increment=true
cas.ticket.registry.hazelcast.cluster.network.tcpip-enabled=true
cas.ticket.registry.hazelcast.cluster.network.ipv4-enabled=true
cas.ticket.registry.hazelcast.cluster.network.local-address=10.20.2.106
cas.ticket.registry.hazelcast.cluster.discovery.multicast.enabled=false
cas.ticket.registry.hazelcast.cluster.core.replicated=true
cas.ticket.registry.hazelcast.cluster.core.async-backup-count=0

Phil
Reply all
Reply to author
Forward
0 new messages