Drools 6.2.0-Final performance issue with 100M (or more) records

752 views
Skip to first unread message

Abhishek Maheshwari

unread,
Jul 20, 2015, 2:16:43 AM7/20/15
to drools...@googlegroups.com
Hello,

I am facing performance issue with drools when using large number of records. To evaluate at very basic, I am using 1 rule on 100M (sample results for 20M) records where each record is different from another record. I have simplified my record also (1 String and 1 Labels object).

sample.drl (very simple rule)
-----------
rule "sample"
  agenda-group "sample"
  when
    $labels:Labels()
    $key:String(equals("0") == false)
  then
    $labels.addLable("L1");
  end
-----------

I am using KieSession (stateful) because when I use StatelessKieSession performance decreases more.
-------------
Labels labels = new Labels();
kSession.insert(key); // key is String
kSession.insert(labels);
kSession.fireAllRules();
deleteFacts(kSession);
-------------

Results (for 20M records):
Java If/Else - 1 sec (if(key != null && !key.equals("0") && labels!= null) labels.addLable("L1");)
Drools - 35 sec

Can any Drools experts explain me what am i missing here? OR Drools is not designed for such a very large Big Data problems?

Thanks a lot in advance,
Abhishek.
--

Abhishek Maheshwari

unread,
Jul 20, 2015, 2:22:58 AM7/20/15
to drools...@googlegroups.com
To add more, all Kie* (KieServices, KieContainer, KieSession)  objects are created once for the whole JVM lifetime. My kmodule.xml is also very simple and all drl file packed in jar (put it classpath) created using kie-maven-plugin
-----
<?xml version="1.0" encoding="UTF-8"?>
  <kbase name="kb">
    <ksession name="state"/>
    <ksession name="stateless" type="stateless"/>
  </kbase>
----

Abhishek Maheshwari

unread,
Jul 20, 2015, 8:16:27 AM7/20/15
to drools...@googlegroups.com
My further analysis showed that 85% of time is going into kSession.insert() (no rules were focus for this test, hence no rule execution).

Can any drools expert help me out here? Am i missing very basic things or drools inherently takes that much amount of time compared to Java-If/Else statements?

Thanks a lot in advance,
Abhishek
--

--
You received this message because you are subscribed to the Google Groups "Drools Usage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to drools-usage...@googlegroups.com.
To post to this group, send email to drools...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/drools-usage/b29f5b59-ebaa-45ba-9463-5c13448abe3e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark Proctor

unread,
Jul 20, 2015, 8:37:01 AM7/20/15
to drools...@googlegroups.com
You can’t compare a java for loop, with if/else to a rule engine. You will always get vastly better performance from custom hand coded efforts. There is a lot of method indirection and overhead for the more advanced functionality of the rule engine. Secondly a rule engine is most effective when you have joins and with large amounts of incremental data (you update fields). When i say large, you need to differentiate between batch processing of small data chunks, like you are doing, and processing of large amounts of data that is inserted at the same time.

When you have a large number of rules, with joins and large amounts of incremental data (all inserted at the same time), it will be much harder for you to hand code an engine that can perform as well and be maintainable  - that last bit is key.

If you only have a few rules, and no joins, and you are working in a batch like mode (insert one or two objects to be processed at a time), then you may find it better to just code generate a for loop and solve the problem linearly. Although the key here comes down to maintainability.

A small note on your rules.
While we don’t recommend you insert Strings as objects, if you do need to do this you can write the rule as:
String( this != “0”)

Finally rule engines perform better with equal ==, or range <, > symbols - as these ca be indexed. Although again this only really has an impact when you have joins on large amounts of data, or there is a really large fan out on your literal constraints.

Mark


Abhishek Maheshwari

unread,
Jul 20, 2015, 1:30:41 PM7/20/15
to drools...@googlegroups.com
Thanks a lot Mark for making a guideline about where to use drools. I understand drools use case better now.

Why you don't recommend String to be inserted as object in KieSession? Is it true for other data type like Integer, Double as well? What if I have 10 String objects (s1 to s10) and I want apply rule like s1=="1" && s2=="2" && ... s10=="10"? Do you recommend some other way to execute this through drools? (I know its not a good example because it has no joins but it has many objects to compare/check).

Also I did not get what is "incremental data" from your response. Please throw some light on the same.

Abhishek.

jco

unread,
Jul 21, 2015, 2:26:21 AM7/21/15
to drools...@googlegroups.com
One other thing:  If at all possible do NOT use String comparisons whenever possible.  Rather, make the keys int variables.  If you absolutely need some kind of String representation, use a Constants file and make them something like

public final static int MY_STRING = 5;

Then in the code itself you compare them as
if Obj.Attrr == MY_STRING
then...

I have done a bit of benchmarking and I can guarantee that if you have been doing a lot of String comparisons in the rules and if you change and use either int or char variables, performance will increase dramatically.

Just a bit of RBS tweaking...  8)

jco

Abhishek Maheshwari

unread,
Jul 22, 2015, 3:14:41 AM7/22/15
to drools...@googlegroups.com
Thanks a lot jco

Mark Proctor

unread,
Jul 24, 2015, 12:30:08 PM7/24/15
to drools...@googlegroups.com
On 20 Jul 2015, at 18:30, Abhishek Maheshwari <tech....@gmail.com> wrote:

Thanks a lot Mark for making a guideline about where to use drools. I understand drools use case better now.

Why you don't recommend String to be inserted as object in KieSession? Is it true for other data type like Integer, Double as well? What if I have 10 String objects (s1 to s10) and I want apply rule like s1=="1" && s2=="2" && ... s10=="10"? Do you recommend some other way to execute this through drools? (I know its not a good example because it has no joins but it has many objects to compare/check).
If it works for you, then that is fine. It’s just good to be aware.

Its hard to know, without understanding your domain. But Strings and Numbers have no semantic meaning. What does that String mean? When you have tables or classes you give them a name, and then you have a field or column that is a String or an integer. It would be like having a DB table called “String” with a single column of Strings, there just isn’t a lot of meaning for this. Then theres is the issue of how do you uniquely identify and separate your inserted objects.

There is also an issue of performance, if you have a larger number of rules. The engine uses a discrimination network, google Rete or look in docs, that reduces the amount of testing performance. If everything is a String, you lose the first level of discrimination and thus get a performance hit.

Also I did not get what is "incremental data" from your response. Please throw some light on the same.
In SQL terms a rule is a materalized view - this is all covered in the documentation. When you insert/update/delete  data there may be changes in the resulting visible materialized rows. The engine does not need to re-evaluate all data each time, as it holds the partial matches, and thus can do incremental evaluation.

Reply all
Reply to author
Forward
0 new messages