Hazelcast issue manifesting itself as high CPU

20 views
Skip to first unread message

Jason B. Rappaport

unread,
Jun 21, 2021, 11:27:48 AM6/21/21
to cas-...@apereo.org

Good morning.  Curious if anyone has seen this issue before with CAS running a pegged CPU as a result of a Hazelcast hiccup. 

 

We run four CAS servers in each environment (QA and PROD):

  • 2 x on prem
  • 2 x in the cloud

 

The on-prem CAS servers sit behind a load balancer as do the cloud CAS servers.  Between campus and on-prem we have a dedicated connection. 

 

Over the last four months, just our cloud CAS servers (both QA and PROD at the same time) have experienced pegged CPU.  We have been working with networking to see if we are experiencing a networking hiccup, and thus far we cannot find one. 

 

Within the log, on days we have encountered the issue, I see thousands of these messages (just on the AWS QA and PROD CAS servers):

Initialized new cluster connection between /10.21.1.166:<ephemeral port number> and onePremHost.edu/10.6.55.32:5701

 

When I talked to the Hazelcast folks, they indicated syncing service tickets and the service registry via Hazelcast across our dedicated connection is not recommended via the community edition of Hazelcast.  They recommended using the Enterprise edition which uses message queues instead of TCP/IP connections to sync the cluster. 

 

Questions:

  1. Is anyone using Hazelcast between on-prem and off-prem CAS servers
  2. Has anyone seen these error messages about Hazelcast and come across a means to make the cluster more resilient? 

 

 

Thanks, Jay

 

 

________________________________

Jason Rappaport (he/him)

Identity and Access Management Analyst

Office of Information Technology

Email:  jaso...@princeton.edu

Office:  609-258-8464

 

 

Reply all
Reply to author
Forward
0 new messages