Hazelcast with CAS resulting in pegged CPU on some hosts but not all

43 views
Skip to first unread message

Jason Rappaport

unread,
Jun 2, 2021, 1:48:12 PM6/2/21
to Hazelcast
Good afternoon - newbie here looking for help.  

We run a Hazelcast cluster with CAS (https://www.apereo.org/projects/cas) to sync authentication tickets with our on-prem data center and hosts in AWS via a private network.  Our on-prem hosts sit behind an F5 and the AWS hosts sit behind a ELB.  Between on and off prem we have a private network that that Hazelcast traffic uses.  

What we have encountered is 100% CPU usage, typically, on the AWS hosts. The process that is using up all of the CPU is Tomcat.  However, we have seen one hosts on-prem with a pegged CPU while one of the AWS hosts has a pegged CPU.  

When we looked at this from a networking perspective, we found a lot of retransmissions and packets out of order.  Running netstat, one could see about ~7.7k of these records:

TCP 10.21.1.154:5701 10.6.55.29:51872-59613 (ephemeral port) ESTABLISHED 3712
TCP 10.21.1.154:51872-59613 (ephemeral port) 10.6.55.29:5701 ESTABLISHED 3712

When I looked at the logs for one of the impacted hosts, I see a lot (17k) of these entries:
2021-05-30 15:17:13,980 WARN [com.hazelcast.internal.server.tcp.TcpServerConnection] - <[107w.aws.myhost.com]:5701 [dev] [4.1] Connection[id=835, /10.21.1.154:56883->105.m/10.6.55.29:5701, qualifier=null, endpoint=[105w.myhost.com]:5701, alive=false, connectionType=MEMBER, planeIndex=0] closed. Reason: Exception in Connection[id=835, /10.21.1.154:56883->105w.myhost.com/10.6.55.29:5701, qualifier=null, endpoint=[105w.myhost.com]:5701, alive=true, connectionType=MEMBER, planeIndex=0], thread=hz.authqa-tickets.IO.thread-in-2>
java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[?:?]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[?:?]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[?:?]
    at sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[?:?]
    at sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358) ~[?:?]
    at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:119) ~[hazelcast-4.1.jar:4.1]
    at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:383) ~[hazelcast-4.1.jar:4.1]
    at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:368) ~[hazelcast-4.1.jar:4.1]
    at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:294) ~[hazelcast-4.1.jar:4.1]
    at com.hazelcast.internal.networking.nio.NioThread.executeRun(NioThread.java:249) ~[hazelcast-4.1.jar:4.1]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[hazelcast-4.1.jar:4.1]
 
Any thoughts on where I could go from here?  Thank you in advance! Jay 

Reply all
Reply to author
Forward
0 new messages