Facing CPU Utilization issues on Kie execution server with HA & Clustering

728 views
Skip to first unread message

Nizam

unread,
Nov 18, 2016, 12:01:19 PM11/18/16
to Drools Setup
Hi Drools Experts ,
       I have a set of 10 kie execution servers which is exposing rules as RESTFUL service load balanced behind a F5 load balancer. There are around 55 different rules that are exposed behind the RESTFUL end point. The rules are as simple as checking a series of condition and setting a value to specific field  to same object. The initial load tests gives a good throughput and when the test is ran over a period of time again, we could see there is a high CPU utilization and the throughput drops. Is there any specific best practice that needs to be followed to avoid any high CPU utilization issues from kie execution server's or rule authoring perspective.


Nizam

unread,
Nov 18, 2016, 12:10:57 PM11/18/16
to Drools Setup
Some background on underlying infrastructure and load test settings

Server configuration details:

Each node -  Apache Tomcat - 16 Core CPU with 32 GB Ram (Total 10 nodes)

Load test 
150 Threads with 1 second ramp up for 20000 iterations --> Total 3MM load

Nizam

unread,
Nov 18, 2016, 12:21:37 PM11/18/16
to Drools Setup
Running whole drools wb controller & kie execution server with 6.4.0.Final


On Friday, November 18, 2016 at 12:01:19 PM UTC-5, Nizam wrote:

Nizam

unread,
Nov 18, 2016, 3:39:17 PM11/18/16
to Drools Setup

Took a thread dump of high CPU utilized process/thread. There are few threads in BLOCKED STATE.
 
"http-nio-8080-exec-83" daemon prio=10 tid=0x00007f6810331800 nid=0x4729 waiting for monitor entry [0x00007f689a5e3000]
   java
.lang.Thread.State: BLOCKED (on object monitor)
        at java
.lang.Package.getSystemPackage(Package.java:511)
       
- waiting to lock <0x00000007806fc4e0> (a java.util.HashMap)
        at java
.lang.Package.getPackage(Package.java:334)
        at java
.lang.Class.getPackage(Class.java:730)
        at org
.codehaus.jackson.xc.JaxbAnnotationIntrospector.isHandled(JaxbAnnotationIntrospector.java:128)
        at org
.codehaus.jackson.map.AnnotationIntrospector$Pair.isHandled(AnnotationIntrospector.java:932)
        at org
.codehaus.jackson.map.introspect.AnnotatedClass._collectRelevantAnnotations(AnnotatedClass.java:857)
        at org
.codehaus.jackson.map.introspect.AnnotatedClass._constructField(AnnotatedClass.java:839)
        at org
.codehaus.jackson.map.introspect.AnnotatedClass._addFields(AnnotatedClass.java:713)
        at org
.codehaus.jackson.map.introspect.AnnotatedClass.resolveFields(AnnotatedClass.java:457)
        at org
.codehaus.jackson.map.introspect.BasicClassIntrospector.collectProperties(BasicClassIntrospector.java:159)
        at org
.codehaus.jackson.map.introspect.BasicClassIntrospector.forDeserialization(BasicClassIntrospector.java:108)
        at org
.codehaus.jackson.map.introspect.BasicClassIntrospector.forDeserialization(BasicClassIntrospector.java:16)
        at org
.codehaus.jackson.map.DeserializationConfig.introspect(DeserializationConfig.java:868)
        at org
.codehaus.jackson.map.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:587)
        at org
.codehaus.jackson.map.deser.StdDeserializerProvider._createDeserializer(StdDeserializerProvider.java:401)
        at org
.codehaus.jackson.map.deser.StdDeserializerProvider._createAndCache2(StdDeserializerProvider.java:310)
        at org
.codehaus.jackson.map.deser.StdDeserializerProvider._createAndCacheValueDeserializer(StdDeserializerProvider.java:290)
       
- locked <0x00000007dac73120> (a java.util.HashMap)
        at org
.codehaus.jackson.map.deser.StdDeserializerProvider.findValueDeserializer(StdDeserializerProvider.java:159)
        at org
.codehaus.jackson.map.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:132)
       
- locked <0x00000007dac85750> (a java.util.HashMap)
        at org
.codehaus.jackson.map.jsontype.impl.AsWrapperTypeDeserializer._deserialize(AsWrapperTypeDeserializer.java:93)
        at org
.codehaus.jackson.map.jsontype.impl.AsWrapperTypeDeserializer.deserializeTypedFromObject(AsWrapperTypeDeserializer.java:45)
        at org
.codehaus.jackson.map.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:52)
        at org
.codehaus.jackson.map.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:219)
        at org
.codehaus.jackson.map.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:194)
        at org
.codehaus.jackson.map.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:30)
        at org
.codehaus.jackson.map.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:299)
        at org
.codehaus.jackson.map.deser.SettableBeanProperty$FieldProperty.deserializeAndSet(SettableBeanProperty.java:579)
        at org
.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:697)
        at org
.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
        at org
.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
        at org
.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863)
        at org
.kie.server.api.marshalling.json.JSONMarshaller.unmarshall(JSONMarshaller.java:192)
        at org
.kie.server.services.drools.DroolsKieContainerCommandServiceImpl.callContainer(DroolsKieContainerCommandServiceImpl.java:59)
        at org
.kie.server.remote.rest.drools.CommandResource.manageContainer(CommandResource.java:72)
        at sun
.reflect.GeneratedMethodAccessor58.invoke(Unknown Source)
        at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java
.lang.reflect.Method.invoke(Method.java:606)
        at org
.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:168)
        at org
.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:269)
        at org
.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:227)
        at org
.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:216)
        at org
.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:541)
        at org
.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:523)
        at org
.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:125)
        at org
.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208)
        at org
.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55)
        at org
.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50)
        at javax
.servlet.http.HttpServlet.service(HttpServlet.java:725)
        at org
.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
        at org
.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org
.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org
.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org
.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org
.kie.server.services.impl.security.web.CaptureHttpRequestFilter.doFilter(CaptureHttpRequestFilter.java:42)
        at org
.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org
.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org
.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
        at org
.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
        at org
.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:613)
        at org
.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
        at org
.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
        at org
.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
        at org
.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:673)
        at org
.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at org
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:516)
        at org
.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1086)
        at org
.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:659)
        at org
.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
        at org
.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1558)
        at org
.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1515)
       
- locked <0x00000007da72c880> (a org.apache.tomcat.util.net.NioChannel)
        at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org
.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java
.lang.Thread.run(Thread.java:745)



Edson Tirelli

unread,
Nov 18, 2016, 7:39:27 PM11/18/16
to drools...@googlegroups.com

   Nizam,

   There is no reason for the server to degrade performance over time, so this requires investigation.
 
   From your threaddump, it seems the locks are on the marshalling process? It seems you are using JSON for the marshalling of the REST calls?

   Can you please confirm which app server are you using? Do you have the security manager enabled? 

   Can you provide a simple reproducer that demonstrates the degradation?

   Thank you,
    Edson
 


--
You received this message because you are subscribed to the Google Groups "Drools Setup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to drools-setup+unsubscribe@googlegroups.com.
To post to this group, send email to drools...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/drools-setup/c3f4f529-7f79-4664-93f6-280886584f78%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
  Edson Tirelli
  Sr Principal Software Engineer 
  Red Hat Business Systems and Intelligence Group

Nizam

unread,
Nov 19, 2016, 5:53:45 PM11/19/16
to Drools Setup
Hi Edson,
                Please find my replies inline for your questions. Since this is within my organizations firewall, the environments cannot be accessed outside. However i will try to detail  my complete scenario in the next comment. 

it seems the locks are on the marshalling process?  -[Nizam:] Yes Looks like
It seems you are using JSON for the marshalling of the REST calls? - [Nizam:] Yes I am using JSON message format for message interchanges between REST calls
Can you please confirm which app server are you using? -[Nizam:]  I am using Apache Tomcat 8.2.0
Do you have the security manager enabled? - [Nizam:] No, the security manager is not enabled explicitly while starting tomcat.
Can you provide a simple reproducer that demonstrates the degradation? - Since environments are not accessible, let me try to describe more detailed



My Kie execution server contains Server Template say 'test-kie-server' hosts a container called 'TestContainer', this container holds a series of 55 rules similar to ones below in Sample rule. 


REST URL



sample rule

package rules;


import java.lang.Number;
import com.example.object.ExampleObject;


rule
"Rule_001"
 dialect
"mvel"
 
no-loop true
 salience
1
 
when
 $obj
: ExampleObject( fieldD in ( "GSC", "BDC" ) , fieldE in ( "FGHF", "FGPC", "FMGC", "FMPC", "FMPT", "FRPC", "FNDM", "FNMS", "FXMS", "FGAR", "FGHA", "FMAR", "FMHA", "FRAR", "FNAR", "FXAR", "FRST", "FMST", "FGST", "FNST", "FXST", "FFPA", "FGMO", "FGPA", "FGPT", "FGRA", "FGRM", "FMHF", "FMMO", "FMRA", "FMRM", "FRPA", "FRRA", "FRRM", "FTRA", "FTRM", "FNRA", "FDRM", "FNRM", "FQRA", "FXRA", "FXRM", "VARA", "VARM" ) )
 
then
 $obj
.setFieldH( "ABC0061" );
end



Sample Request 

{
 
"lookup" : "defaultKieSession",
 
"commands" : [ {
   
"insert" : {
     
"object" : {"com.example.object.ExampleObject":{
 
"fieldA" : null,
 
"fieldB" : null,
 
"fieldC" : null,
 
"fieldD" : "BDC",
 
"fieldE" : "FXST",
 
"fieldF" : null,
 
"fieldG" : null,
 
"fieldH" : null,
 
"fieldI" : "RE3"
}},
     
"disconnected" : false,
     
"out-identifier" : "RespObj",
     
"return-object" : true,
     
"entry-point" : "DEFAULT"
   
}
 
}, {
   
"fire-all-rules" : {
     
"max" : -1,
     
"out-identifier" : null
   
}
 
} ]
}




Expected Response


{
 
"type" : "SUCCESS",
 
"msg" : "Container TestContainer successfully called.",
 
"result" : {
   
"execution-results" : {
     
"results" : [ {
       
"key" : "RespObj",
       
"value" : {"com.example.object.ExampleObject":{
 
"fieldA" : null,
 
"fieldB" : null,
 
"fieldC" : null,
 
"fieldD" : "BDC",
 
"fieldE" : "FXST",
 
"fieldF" : null,
 
"fieldG" : null,
 
"fieldH" : "ABC0061",
 
"fieldI" : "RE3"
}}
     
} ],
     
"facts" : [ {
       
"key" : "RespObj",
       
"value" : {"org.drools.core.common.DefaultFactHandle":{
 
"external-form" : "0:4626979:97673493:97673493:4626979:DEFAULT:NON_TRAIT:com.example.object.ExampleObject"
}}
     
} ]
   
}
 
}
}


I am trying to load test from a JMeter script using HTTP requests and passing the payload as given in the sample request and i am getting expected response. The load test seems fine initially for even upto 10 MM hits when hit in parallel with 150 threads per second and gives a higher throughput say 6000 requests/second. After a while when some additional loads are given the throughput drops to upto 400 requests/second. When each & every instance of kie execution server is inspected, they reported a high CPU utilization of upto 85%-90%. The threads dumps as shown earlier has some thread in BLOCKED state.


Let me know if you need more Info..


Thanks a lot again for your time.

Nizam
















To unsubscribe from this group and stop receiving emails from it, send an email to drools-setup...@googlegroups.com.

To post to this group, send email to drools...@googlegroups.com.

Nizam

unread,
Nov 19, 2016, 6:47:04 PM11/19/16
to Drools Setup
Few more troublshooting reveals that GC Task threads are consuming more CPU. I have defined a very high memory -XmX=5120M (5GB). All my RESTUL request calls has insertObjectCommand & FireAllRulesCommand . 
  • Do we need to set DeleteObjectCommand too ?? 
  • Is the kie server retaining all the objects after a specific call is made ? 
  • Does kie execution server doesn't takes care of objects and kieSession deletion after a RESTUL call is completed ?

Edson Tirelli

unread,
Nov 19, 2016, 6:59:07 PM11/19/16
to drools...@googlegroups.com

   Nizam, 

  • Do we need to set DeleteObjectCommand too ?? 
  • Is the kie server retaining all the objects after a specific call is made ? 
  • Does kie execution server doesn't takes care of objects and kieSession deletion after a RESTUL call is completed ?

   If you are using a stateful session, then the server will retain all objects until you explicitly dispose the session or delete the objects. If you don't need a stateful session, then just switch to use a stateless session and the server will automatically dispose all objects after the service is executed.

   "lookup" : "defaultKieSession",

   This is by default stateful. Either explicitly create your own stateless session or use:

"lookup" : "defaultStatelessKieSession",

    If this is not the problem you are facing, let me know and we will investigate the issue further this week.

    Edson

To unsubscribe from this group and stop receiving emails from it, send an email to drools-setup+unsubscribe@googlegroups.com.

To post to this group, send email to drools...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nizam

unread,
Nov 20, 2016, 4:37:42 PM11/20/16
to Drools Setup
Hi Edson,
                  Looks like "lookup" was the culprit. I could see a stable CPU Utilization now. It makes sense, as the objects were retained and JVM trying to run GC and the GC tasks were frequent enough to keep CPU busy. I will continue to do a few more tests & get back if I need your help. Thanks a lot for your insights Edson. Really Appreciate your help.

On Another note, I could see a significant difference in performance when i use xml as data format instead of JSON. Is this something a known fact ?



Edson Tirelli

unread,
Nov 20, 2016, 5:46:38 PM11/20/16
to drools...@googlegroups.com

   Ok, that makes sense.

 I could see a significant difference in performance when i use xml as data format instead of JSON. Is this something a known fact ?

   To be honest, I never compared performance between different marshalling formats, but I believe they might differ. For a start, we use different implementations for marshalling (Jackson for JSON, XStream for XML). Do you have any numbers you can share? Also, did you use JAXB or XStream xml format? 

   Edson

   

To unsubscribe from this group and stop receiving emails from it, send an email to drools-setup+unsubscribe@googlegroups.com.

To post to this group, send email to drools...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages