Accumulate performance

243 views
Skip to first unread message

rao

unread,
May 19, 2016, 5:04:09 AM5/19/16
to Drools Usage
Environment description 
Using drools in an application to monitor 3 million input tuples stored as facts in working memory. We have 150 rules running on 3 million facts with accumulates in 60 of these rules. Many facts are kept in working memory for 8 hours in a day, some facts are retained in memory for 24 hours to derive statistical values. 

Challenges
Accumulate takes a long time to compute since it traverses all the facts to beginning of day. How to exit from accumulate without going to the first fact of day. Even if we put a condition for 2 hours, accumulate function still goes to the beginning of day. It appears as though accumulate will go up to the beginning of day even if 2 hours condition could have stopped the accumulate after 100K facts instead of traversing all the 3 million facts.

Anything unique to note

How to find the immediate previous fact for each entity ( say a card, account etc)? We are able to speed this function only by using java code in THEN condition. Is there any way to do this equally fast with an accumulate. We tried with collect and accumulate found them to be very slow since they go to the beginning of the stream while searching for previous fact of same entity.

Will appreciate any advice or guidance,
Thanks
Rao

Mark Proctor

unread,
May 20, 2016, 12:22:59 PM5/20/16
to drools...@googlegroups.com
On 19 May 2016, at 10:04, rao <rao...@gmail.com> wrote:

Environment description 
Using drools in an application to monitor 3 million input tuples stored as facts in working memory. We have 150 rules running on 3 million facts with accumulates in 60 of these rules. Many facts are kept in working memory for 8 hours in a day, some facts are retained in memory for 24 hours to derive statistical values. 

Challenges
Accumulate takes a long time to compute since it traverses all the facts to beginning of day. How to exit from accumulate without going to the first fact of day. Even if we put a condition for 2 hours, accumulate function still goes to the beginning of day. It appears as though accumulate will go up to the beginning of day even if 2 hours condition could have stopped the accumulate after 100K facts instead of traversing all the 3 million facts.
I’m not entirely sure what the problem is here.

If you are using 6.x the rules are lazy evaluated, so they will not evaluate unless they can potentially fire. Does that help?

Beyond that we have no incremental house keeping type stuff for accumulates. When it’s triggered from a left input it opens the window and accumulates facts over that window. You could use the trigger fact (left input) to make or break when accumulates can happen. You can use before/after to try and constrain when that trigger fact is true for, and incrementally over time update some timer/data object that will make or break the trigger fact range.

Anything unique to note

How to find the immediate previous fact for each entity ( say a card, account etc)? We are able to speed this function only by using java code in THEN condition. Is there any way to do this equally fast with an accumulate. We tried with collect and accumulate found them to be very slow since they go to the beginning of the stream while searching for previous fact of same entity.
If all you want to do is find the top or bottom or a range, use can use not. On more recent versions we put all matches into a ranged indexed data structure, to keep it constant time matching.

I think something like this will work: For a given d0, select all of the previous MyObjects, then select the most recent from that list.
d0 : MyObject()
d1 : MyObject( seq <  d0.seq ) 
not ( MyObject( seq > d1.seq )

Mark

Will appreciate any advice or guidance,
Thanks
Rao


--
You received this message because you are subscribed to the Google Groups "Drools Usage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to drools-usage...@googlegroups.com.
To post to this group, send email to drools...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/drools-usage/909ac6ba-0b2a-474b-9fe8-b83c18ddd414%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rao

unread,
May 23, 2016, 11:52:53 PM5/23/16
to Drools Usage
Thanks Mark. I will try what you have suggested and report the results back. I just want to make sure that I highlight my concern on matching logic across all of working memory. By the way, we are still using the old 5.5.x version. I dont know if 6 will solve all the problems. 

1. Assume that I have 2 transactions on card number 100 in my working memory over last 2 hours  ( which may have 3 million across the day). 

2. Any matching function I use ( collect, accumulate or sequence) appears to be comparing with all transactions of WM ( 3 million of them). I dont want to check for more than 2 hours in WM as per point 1 above. Logic works by not enabling such tuples to participate. But, search continues to the beginning of day. Is this right? This is what we see in our rules. 

3. We have done a workaround ( for the present) for one of the rules which needed the immediately previous transaction on same card in last 2 hours. We used a hasmap to store the last transaction and check the same in rule. This improved the speed by a million times. 

My question is the following:

Is it possible to stop searching more tuples when 2 hours filter condition is satisfied in accumulate or collect? 

Our tuples are time ordered in working memory in true stream fashion. I dont have to search the whole WM after 2 hour condition filter has been checked and it returns false. 

I shall check your suggestion once more and come back to you. 

Thanks for your advice,
Rao

rao

unread,
May 26, 2016, 1:03:06 AM5/26/16
to Drools Usage
Describing the problem in another way:

1. Weak filter function followed by accumulate over a large stream of 3 million transactions is slow. 

      We have a weak filter function ( for functional compliance) in 3 out of 145 rules over 3 million transactions.
 
2. Every transaction which passes the weak filter function enables an accumulate over a large stream of 1 to 3 million facts. 

     transactions after peak hours ( say after 1 PM in day) navigate the full stream of 2 million transactions for accumulate. 
     Every insert new fact starts a huge RETE rebalancing because of accumulate across full stream.

3. Conditional Element of time in accumulate
      Conditional function of accumulate is being used ( it appears) to include or exclude the tuple from accumulate RETE. It will go to the beginning of the full stream or RETE network even though I dont want to accumulate any more after that.

4. Order of transactions in stream 

   Our transactions are time ordered in stream. Hence, we were hoping that conditional element of accumulate will stop the search after 3 minutes or 20 minutes are over rather than search the whole RETE for 24 hours. 

5. Our workaround

   We have created smaller streams of 5 minutes, 15 minutes and 30 minutes to make some of the rules fast in addition to 24 hour stream. This works and is fast. 

   We implemented hashmap for storing the last transaction for each of the entities. This makes a few of the rules a million times faster especially. 

Can you suggest a more permanent and better solution to our challenge of working with large fact base based streams which are time ordered and where we want to accumulate values for individual entities ( like cards, ATMs, accounts etc) over 4 minutes, 1 hour or 24 hours. 

Thanks for your help once more,
Rao
Reply all
Reply to author
Forward
0 new messages