Good morning. Curious if anyone has seen this issue before with CAS running a pegged CPU as a result of a Hazelcast hiccup.
We run four CAS servers in each environment (QA and PROD):
The on-prem CAS servers sit behind a load balancer as do the cloud CAS servers. Between campus and on-prem we have a dedicated connection.
Over the last four months, just our cloud CAS servers (both QA and PROD at the same time) have experienced pegged CPU. We have been working with networking to see if we are experiencing a networking hiccup, and thus far we cannot find one.
Within the log, on days we have encountered the issue, I see thousands of these messages (just on the AWS QA and PROD CAS servers):
Initialized new cluster connection between /10.21.1.166:<ephemeral port number> and onePremHost.edu/10.6.55.32:5701
When I talked to the Hazelcast folks, they indicated syncing service tickets and the service registry via Hazelcast across our dedicated connection is not recommended via the community edition of Hazelcast. They recommended using the Enterprise edition which uses message queues instead of TCP/IP connections to sync the cluster.
Questions:
Thanks, Jay
________________________________
Jason Rappaport (he/him)
Identity and Access Management Analyst
Office of Information Technology
Email: jaso...@princeton.edu
Office: 609-258-8464