Using tuple-shaped facts with Clara?

770 views
Skip to first unread message
Message has been deleted

Luke VanderHart

unread,
Feb 20, 2015, 4:25:07 PM2/20/15
to clara...@googlegroups.com
I'm curious how good a fit Clara is for fundamentally tuple-shaped data (such as Datomic datoms), as opposed to the strong default of maps/Records/Objects.

It *seems* like it should be possible out of the box, doing something like this:

(defrule is-bling?
 
"Is something considered bling?"
 
[Datom [[e a v]] (= a :cost) (< 1000.00 v) (= ?item e)]
 
[Datom [[e a v]] (= a :material) (= v :gold) (= ?item e)]
 
=>
 
(println "Yes! It's over $1000 and made of gold!"))


Of course, one could write wrappers to make it less verbose.

Is this at all a viable approach, or are there performance considerations that would make it impractical?

To go even further... The Doorenbos paper uses tuples as the underlying primitive data structure. If Clara does that under the hood, too, would it be at all possible to tap into the implementation at that level, for even better performance?


Ryan Brush

unread,
Feb 20, 2015, 5:17:30 PM2/20/15
to clara...@googlegroups.com
Hey Luke,

Tuples should be entirely workable in Clara. The only performance consideration that comes to mind is that Clara (like other rule engines) uses the type to only evaluate the conditions that match the type of a given fact. (So it wouldn't attempt to evaluate a Person-based condition against an Address fact, for instance.)

However, what Clara considers to be the "type" of a fact is entirely pluggable. By default it uses clojure.core/type, but you can provide any function that returns the "type" we want to dispatch on. In your example it looks like you probably want to match on specific attribute types, so we could actually make this more efficient and shorten the syntax by using a fact-type function like this:

;; When creating a session, we'd use this so we match on the attribute.
(mk-session 'my.rules.namespace :fact-type-fn #(fn [[e a v]] a)) 

The rule would then look like this:

(defrule is-bling?
  
"Is something considered bling?"

  
[:cost [[e a v]] (< 1000.00 v) (= ?item e)]
  
[:material [[e a v]] (= v :gold) (= ?item e)]

  
=>
  
(println "Yes! It's over $1000 and made of gold!"))


Whether this makes sense for your use case or not I can't say. But this way the :cost expression will only be run on tuples with the :cost attribute, and likewise for the :material. I don't know Datomic very well, so there might be something else you can use as the logical "type" to match on. You could also use the default clojure.core/type and attach metadata to the incoming facts, which clojure.core/type will use.

Of course, the syntax in that example isn't the most intuitive, but we can change that, too. A specialized version of the defrule macro could easily be created that works with tuples. If you look at the var that defrule creates you'll see a structure, and any macro that creates that structure could be used to define rules. (The structure created by defrule is actually defined by a Prismatic schema here.)

So we could probably write a macro that looks more like this:

(tuples/defrule is-bling?
  
"Is something considered bling?"
  
[[e :cost v] (< 1000.00 v) (= ?item e)]
  
[[e :material v] (= v :gold) (= ?item e)]

  
=>
  
(println "Yes! It's over $1000 and made of gold!"))

There are lots of variations we could consider. (Of course, by itself something like core.match could also match this specific condition...so I'm assuming this would be part of a larger set of composable rules.)

Hope this helps! My use cases tend to heavily use records (and some instances of JavaBeans from an existing system), but I think the tuple pattern can work as well. 

Luke VanderHart

unread,
Feb 20, 2015, 5:39:59 PM2/20/15
to clara...@googlegroups.com
Thanks for the quick reply!

Great idea on using the attribute for the type. That largely resolves my concern about performance.

I'm going to run up some load tests using generated data and generated rules, and see how things perform (both in Clojure and ClojureScript.) If those turn out well it looks like this may be a good fit.

Ryan Brush

unread,
Feb 20, 2015, 5:51:33 PM2/20/15
to clara...@googlegroups.com
Sounds good! My one performance suggestion would be to insert facts as a sequence rather than individually, (e.g. use clara.rules/insert-all), since Clara is optimized to work with groups of facts.

If anything particularly hideous emerges from your benchmarks, please post it here. Clara has been optimized to be comparable or beat Drools performance for our workloads, but if different workloads yield different performance characteristics we may be able to tune further.

Alan Moore

unread,
Feb 20, 2015, 5:59:11 PM2/20/15
to Luke VanderHart, clara...@googlegroups.com
Luke,

I don't know what Datomic APIs (clojure or java) you are using or if the data you are obtaining are actually Datom java objects or native clojure data... but if they were Datoms you might get away with simply inserting them directly into the working memory and using normal object pattern matching as you suggested.


I think the main problem would be matching against the attribute(s) since Datom::a() only returns the attribute ID, not the actual keyword/symbol, etc. That would involve another lookup. I suppose you could, at the start of your app, retrieve the attribute datoms and insert them as well and then do pattern matching/unification against them... TBD.

I am just thinking that it would be nice to avoid converting all the returned Datoms into tuples if they aren't already in that form. If you already have actual tuples then Ryan's approach would obviously be the most direct.

The key performance issue is to avoid partial matches. The Datom/Object pattern matching approach would exacerbate this very badly... The type is the first level of filtering/matching so if all your patterns have the same type (Datoms) that would result in a significant performance penalty, AFAIK. Of course, I'm sure you are familiar with this given your experience with Datalog pattern matching and rule ordering.

I'm sure Ryan will correct me with "no problem, I've optimized that away already..." so take my comments with a grain of salt.

I am very interested to hear what ends up working best for you - I'm looking at using Datomic as the basis of a durable (storage) distributed blackboard. I have not had time to work on it yet so I have not worked out the details.

Alan


--
You received this message because you are subscribed to the Google Groups "Clara" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clara-rules...@googlegroups.com.
To post to this group, send email to clara...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clara-rules/0529a367-5e65-43b8-85d0-0794111fac10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
-- 
"Whatever you can do, or dream you can do, begin it. Boldness has genius, power, and magic in it. Begin it now." - Goethe

Ryan Brush

unread,
Feb 22, 2015, 5:25:38 PM2/22/15
to clara...@googlegroups.com, luke.va...@gmail.com
Regarding Alan's points: I had assumed in my earlier post the inserted items would be of a tuple structure that could be directly destructured as a clojure tuple (e.g., fixed-length vector or sequence). Not being deeply familiar with Datomic, it's not obvious to me how a Datom is efficiently expressed as a Clojure tuple, but I assume there is an efficient way to do so.

I do think we can avoid the performance penalty of matching against all Datoms by using a :fact-type-fn that uses the attribute as the fact type, as discussed above. This actually seems intuitive to me, since the "type" of a datom does seem to be defined by its attribute.

Alan Moore

unread,
Feb 22, 2015, 5:58:34 PM2/22/15
to Ryan Brush, clara...@googlegroups.com, luke.va...@gmail.com
Given that Luke was asking about ClojureScript rules it is obvious to me in retrospect that he won't be using Datoms proper as those are Java classes and would need some kind of conversion to get down to the browser (or into nodejs) anyway. I should have seen that... my bad.

Luke - for my edification, I see that Datomic has both Clojure and Java APIs but I don't know how Datoms are represented/implemented internally. Are Datoms a Java facade over the underlying Clojure data or is it the other way around? Or is it some third thing?

Alan

-- 
"Whatever you can do, or dream you can do, begin it. Boldness has genius, power, and magic in it. Begin it now." - Goethe

On Feb 22, 2015, at 2:25 PM, Ryan Brush <rbr...@gmail.com> wrote:

Regarding Alan's points: I had assumed in my earlier post the inserted items would be of a tuple structure that could be directly destructured as a clojure tuple (e.g., fixed-length vector or sequence). Not being deeply familiar with Datomic, it's not obvious to me how a Datom is efficiently expressed as a Clojure tuple, but I assume there is an efficient way to do so.

I do think we can avoid the performance penalty of matching against all Datoms by using a :fact-type-fn that uses the attribute as the fact type, as discussed above. This actually seems intuitive to me, since the "type" of a datom does seem to be defined by its attribute.

--
You received this message because you are subscribed to the Google Groups "Clara" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clara-rules...@googlegroups.com.
To post to this group, send email to clara...@googlegroups.com.

John Conti

unread,
Feb 28, 2017, 2:55:55 PM2/28/17
to Clara
I have been using this tuple-shaped-fact pattern for a few weeks.  It has offered a considerable amount of flexibility for me in processing data from widely varying sources like SQL databases, No-SQL databases and various web services that lack common schema or domain level objects.  The granularity of the facts adapts to these resources, and is join-able easily across them.

I am a new to rule based systems, and my code is unlike many of the examples.  However, I think my use case may also be driving that.  My system starts from a small (unpredictable) set of input facts, and then attempts to grow the fact base to the largest possible size.  It is a type of search, useful for a diagnostic and analysis system.  The consequence of this is that often, given a particular tuple, many more tuples can be read in and added to the collection from external data sources.  When the new tuples being added contain a copy of the tuple that triggered the left hand side of the activated rule, I get an infinite loop.

For example:

Given a session containing the tuple [:account :id 1]

(defrule read-account-data-from-sql
[:account [[e a v]] (= a :id) (= ?aid v)] 
=> 
(insert-all! (read-entity-tuples! :account ?aid)))

This rule will result in an infinite loop if one of the tuples read in via read-entity-tuples! is [:account :id 1].

Now,  this problem is certainly not insurmountable in simple cases like this.  But I can imagine cases where such bugs would be harder to avoid.  In essence read-entity-tuples! must filter any tuple matched on the LHS.  This incidental complexity and the fact that the failure hangs the thread has motivated me to seek a more bulletproof solution.

Any suggestions?

Thanks!
John

Alan Moore

unread,
Feb 28, 2017, 4:07:19 PM2/28/17
to Clara
John,

Some engines have a way to mark rules and/or templates (data/schema) with declarations to deal with this. For example, Jess has (declare (no-loop TRUE)) that does what you want. I don't know if there are any plans to add this kind of functionality to Clara. TBD.

One thing that has not been clear in the documentation for insert! is what happens to "duplicate" facts. Am I missing something? The docs focus on truth maintenance so maybe that can be updated w more detail on the semantics re: duplicate facts.

There is an interesting discussion on a related issue here:


Needless to say, it's complicated...

A total hack might be to write a wrapper around insert-all! that examines the LHS activation record which is available to the RHS to do generic filtering. Not trivial but possible.

I'd like to hear more about your solution as I've been wanting to integrate Clara with datascript which, as you know, uses the datalog/tuple approach.

Good luck!

Alan
--
You received this message because you are subscribed to the Google Groups "Clara" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clara-rules...@googlegroups.com.
To post to this group, send email to clara...@googlegroups.com.

William Parker

unread,
Mar 2, 2017, 11:02:08 AM3/2/17
to Clara
Regarding duplicate facts, Clara's intended API is to not eliminate duplicate facts.  That applies to any kind of insertion or retraction.  https://github.com/cerner/clara-rules/issues/171 was an edge case where retracting a fact during internal truth maintenance operations could lead to other facts equal to that one incorrectly being retracted. For example, one could have two temperature facts {:temp 10} {:temp 10} and reach a case where the engine needed to retract one of them as part of truth maintenance.  Prior to that bug in some cases both could have been removed, when the correct behavior was to remove one.  https://github.com/cerner/clara-rules/issues/250 was a bug that caused similar results for a different sort of case.  I'm not aware of any outstanding bugs concerning the behavior of duplicate facts.  I agree that it should be documented somewhere; perhaps an example in https://github.com/cerner/clara-examples would help.

John Conti

unread,
Mar 2, 2017, 11:28:25 AM3/2/17
to Clara
Thank you for the replies, I am groping my way through using forward-chaining rules for the first time.  I appreciate the background information, both on Jess and Clara.

I quickly realized the truth of the fact semantics.  It was fairly easy to write mutually recursive rules that resulted in endless loops.  No filtering of the LHS matches would prevent these scenarios from looping endlessly.  Such a rule set would be fairly common for me (or more common that I care to worry about).

What surprised me about this semantic is that the data involved is immutable and =.  Given those characteristics of the data, I expected re-insertion of the same fact to be a nop.  Though I am very in-experienced, duplicate facts as nops seems like the desirable effect since it is invariant across execution order of fact assertion.  This behavior looks like a feature designed to support certain kinds of dynamic systems using loops to "model" time. But I wonder if this feature takes too much away from the logical view of the rule set?  It seems like rule authors need to now start thinking in a more procedural style to avoid loops.  They need to simulate in their head the state of the current set of facts, and the rules triggered by them before asserting facts.  That removes the declarative characteristic of rules, and their ability to abstract away execution order.  And that seems undesirable in a rules engine.  Of course, I really wouldn't know, having just started.

In my case I think a simple approach will ultimately lead to good results.  My current thinking is to remove all direct IO from the rules themselves (no doubt a good thing), and instead assert a fact describing the IO that needs to happen.  When all activations are done, the enclosing clojure program can query the session for the IO, perform the IO, filter resulting tuples by removing all of those new tuples already existing in the session, and then asserting the new tuples into the session before firing rules again.  While it seems awkward at first blush, it is a pattern that might do well with many different types of side-effect-filed IO.  It may be useful for user-input, output, as well as longer-running IO (remote query processing) that could result in failures.  Real life maintenance headaches often arise in these areas, and keeping those concerns away from the rules may make my life simpler.

So other than being a bit surprised that reasserting the same fact can produce different results (isn't that the definition of crazy :-) ), I may be on a path to using clara in a new expert diagnostic system.  Having the engine embedded in clojure is such a joy.

Thanks again for the assistance.  I'm sure I will be back soon.

John

William Parker

unread,
Mar 2, 2017, 12:03:52 PM3/2/17
to Clara
Some more thoughts:

- Obviously I don't know the specifics of your use case, but in general using queries to drive IO rather than rule RHS (right-hand side) code seems like a good idea.  I generally try to make rules represent pure logical relationships; when that is the case execution order can become purely a performance concern.
- The tricky thing with duplicate facts is that in some cases the cardinality is actually meaningful.  Say you have an "Order" fact like [customer-id product-id]; the customer could have placed multiple orders for the same product.  This is of course contrived, but that is the basic idea.  For immediate use-cases, my first thought with duplicate facts is that it seems like it should be possible to use accumulators or :exists conditions, which ultimately compile down to accumulators internally as can be seen at https://github.com/cerner/clara-rules/blob/0.13.0/src/main/clojure/clara/rules/compiler.clj#L595
One could for example use a custom accumulator to write a rule like

(defrecord Temperature [temp location])
(defrule distinct-temps
  [?temps <- (acc/get-distinct-temperatures) :from Temperature]
  =>
  (insert {:distinct-temps ?temps}))

So say you had temperatures of [10 "MCI"] [10 "SFO"] [0 "LGA"].  You could write an accumulator that would just look at the distinct temperature field values, and you'd insert {:distinct-temps #{10 0}}.  You could also do something similar for multiple fields and return {:distinct-temps #{[10 "MCI"] [10 "SFO"] [0 "LGA"]}}.  If you wanted separate distinct temperature facts for each location, you could do something like

(defrule distinct-temps
  [?temps <- (acc/get-distinct-temperatures) :from Temperature (= ?loc location)]
  =>
  (insert {:loc ?loc
           :distinct-temps ?temps}))

which would execute once per distinct location value.  Such a rule will not fire if there are no Temperature facts.  If you use this behavior you'll want to be on at least version 0.12.0 since there were fixes for it discussed in issues 188, 189, and 102 that went in with that release.

- It might be possible to implement a version of insert that didn't accept duplicates, but it would need some thought particularly if it involved changes in Clara's engine.  I haven't thought deeply about it, but perhaps an extension could be written to do this using either Clara's existing extension points or additional ones. To give an example of this sort of thing, https://github.com/cerner/clara-rules/pull/220 was added for a use-case where we wanted to determine what facts matched any conditions in the rules so we could do some custom logic later around retracting them.

Alan Moore

unread,
Mar 2, 2017, 4:21:05 PM3/2/17
to Clara
On your point about duplicate facts, unfortunately my intuition is that if you have something like an order - customer - product relationship that using unique ids would distinguish them and obviate the need for allowing duplicates into working memory. For your example instead of having more than one [customer-id product-id] tuple you could use [order-id customer-id product-id] that uniquely identifies each order* and that it would be nonsensical to have more than one tuple (in working memory) with the same id value combination. 

Just thinking out loud here.

Alan

* - realistically it would be [customer-id order-id line-item-id product-id quantify price] since orders can have more than one line item and other properties. Using a fully "normalized" [entity-id attribute-id value transaction-id] tuple (e.g. Datomic/DataScript) would require a lot more unification on the LHS and could be very slow (many partial matches?)


To unsubscribe from this group and stop receiving emails from it, send an email to clara-rules+unsubscribe@googlegroups.com.

To post to this group, send email to clara...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

John Conti

unread,
Mar 3, 2017, 2:05:39 PM3/3/17
to Clara

I agree.  Cardinality is straightforward to handle without duplicate assertions. A unique synthetic id is the usual solution to data that shows up without a natural identifier.  Datomic's use of the commit transaction number as a clear notional idea of time is one such mechanism.  So is a twitter tweet id.  Many systems now have ridiculously granular clocks that can serve for this purpose in non-distributed environments.

However, dealing with with the lack of identity created by allowing identical assertions to rematch the same rules, is very difficult in general.  Such a shame given Clojure's wonderful story on immutability and value-based semantics.  Worse the logical semantics of the model are muddled by what is essentially a procedural feature.  The operational cost is that lots of relatively normal conditions can create non-terminating rule sets.  Since records are naturally = and immutable in Clojure, I wonder if Clara went out of its way to provide this as a feature, or if it is a consequence of the RETE tree matching algorithm that is easier to keep than change.

Of course, I have no experience in the area, so I could be missing something dead obvious.  This has been good motivation for me to learn more about the algorithm, which should be fun.  I got a real kick the other week reading fogus's latest on a homebrew rules system: http://blog.fogus.me/2017/01/27/read-eval-print-%CE%BBove-v004-production-rules-has-landed/  If you haven't had a chance to check it out, it may be well worth your time.

John

On Thursday, March 2, 2017 at 2:21:05 PM UTC-7, Alan Moore wrote:
On your point about duplicate facts, unfortunately my intuition is that if you have something like an order - customer - product relationship that using unique ids would distinguish them and obviate the need for allowing duplicates into working memory. For your example instead of having more than one [customer-id product-id] tuple you could use [order-id customer-id product-id] that uniquely identifies each order* and that it would be nonsensical to have more than one tuple (in working memory) with the same id value combination. 

Just thinking out loud here.

Alan

Ryan Brush

unread,
Mar 3, 2017, 3:43:31 PM3/3/17
to Clara
I'm afraid the ship has sailed on the cardinality/identity semantics, at least for the default behavior, since there is too much relying on these semantics today. If your problem space demands different behavior, it's possible to write a clara.rules.engine.ISession implementation that wraps another session, and checks a set for whether a fact had already been inserted.

We have analogs of the Order example Will mentioned earlier, and the current behavior is also easier to reason about in the context of rules asserting (and automatically retracting) facts. For instance, we might have multiple rules that assert (->CustomerNeedsSupport customer-id)...but if one gets retracted by truth maintenance (because the rule that inserted it becomes false), we obviously don't want to retract the others. I think insertion of facts by users and rules should behave consistently, in line with following the "least surprising behavior" for the system.

It's true in many cases there are unique identifiers that prevent collisions, but not in all...and if a user runs into that situation where all inserted facts are retracted due to equality, that users will have a very difficult problem to debug.

I realize this probably isn't the most satisfying answer for some, but it does appear to be the right tradeoff for the several Clara users we have internally and would be disruptive to change, so I think we'll need to stay the course on this one.





Alan Moore

unread,
Mar 3, 2017, 4:39:57 PM3/3/17
to Clara
Ryan: I assumed this was the case. I think your suggestion about using a session wrapper is great - I'll give that a shot to see how far I can get. Not sure how performant it will be but for my use case the number of facts isn't that large anyway.

John: Thanks for the link to Fogus' post. I'll be reading it this weekend.

--
You received this message because you are subscribed to the Google Groups "Clara" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clara-rules+unsubscribe@googlegroups.com.
To post to this group, send email to clara...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Mike Rodriguez

unread,
Mar 4, 2017, 10:56:05 AM3/4/17
to Clara
I'll add that there is a practical aspect involved around trying to prevent duplicate facts in memory. It relates to performance.

If duplicates are not allowed, this is going to end up being based on equality, which will be slow if linear - so it'll end up resulting to some sort of hashing. Basically you'll end up with something like a hash map/set as the way facts are stored in memory.

Hashing and equality checking are not free and can definitely add up to significant costs. This is true for many POJOs (which Clara supports), as well as Clojure records (still don't cache hash code as of v1.8 see http://dev.clojure.org/jira/browse/CLJ-1224 ).

As an example, Drools logical insertions (ie tracked by the truth maintenance system (TMS)) do automatically manage duplicate logical insertions by storing a map of fact to a count of how many times it has been supported (ie equal objects inserted).

So it looks like:

Fact1 -> 1
Fact2 -> 4

When a fact is removed that is a key in this memory map the number associated with that fact is decremented. If it reaches 0 the fact is removed entirely and the process recurses for any other logical insertions that were based on that fact (ie TMS does it work).

I've seen this be a critical hot spot in terms of performance when dealing with large sets of rules against large sets of facts.

I always really appreciated the lack of this duplicate removal feature in Clara since it allows the rule writer to decide what a duplicate object means and what even constitutes a duplicate.

It is not necessarily always easy to fabricate some ID to avoid facts being equals. If you can't add it to the object you have to create a wrapper or something. This may be expensive to perform (space perhaps) this sweeping generation of new ID'ed facts depending on the domain.

However, it could be argued that Clara could provide an alternative implementation option that did behavior similar to Drools. It'd probably be quite a bit to maintain since it would seem to be an alternative memory implementation at least.

William Parker

unread,
Mar 5, 2017, 11:10:37 AM3/5/17
to Clara
For the reasons elaborated on previously, I don't think changing the default treatment of multiple equal facts in Clara is going to be a viable option.  
However, speaking only for myself, I can see adding an option to eliminate duplicates as useful enough to be worth putting in the Clara API.  The lack of duplicate elimination does seem problematic for use-cases where rules loop back on themselves; we haven't run into such cases at Cerner that I've seen anyway because our rulesets are almost always a DAG.  However, I do think that such an option should be consistent in the way that the default is between rule insertions from all sources and behaves as expected w.r.t truth maintenance.  That is:

- Duplicate removal would be processed in the same way between logical insertions, unconditional RHS insertions, external insertions, RHS retractions, external retractions, and any other fact operations added in the future.
- Execution order would not impact the final state of the rules session when truth maintenance is consistently used.
- Externally inserting and retracting a fact would leave the session in the same logical state as before the insertion when truth maintenance is consistently used.

In a more concrete sense, in the case Ryan describes where two rules insert (->CustomerNeedsSupport customer-id) for the same customer ID, retracting the CustomerNeedsSupport from one of these rules would leave the CustomerNeedsSupport fact in the session unless the other rule also retracted the CustomerNeedsSupport fact it inserted as well.

Using a wrapper on LocalSession, or equivalently using your own implementation of insertion functions that checked a set much as clojure.core/distinct does, would work if you're willing to control execution order in order to ensure correctness.  However, this approach seems to me to break down in the fact of the truth maintenance described above.  There might be ways to enhance Clara's existing extension points to facilitate this sort of user-controlled behavior.  In particular, my feeling would be that if differences in behavior between different fact operations e.g. logical vs. unconditional insertions were desired this would be better handled in specialized user code.

I don't think the implementation would need to involve anything as complicated as a custom memory implementation.  For example, I think you could just provide an option at mk-session time that would cause a DuplicateRemovalNode to be placed between each AlphaNode (the first point of entry for all facts into the rule network) and its children.  This DuplicateRemovalNode would only propagate the first instance of each fact to its children; subsequent insertions would increment a counter for that fact.  Retractions would decrement that counter, and if it reached 0 the fact would be retracted downstream from the children.  I think this could be done with the current memory implementation.  This approach would continue to allow facts that don't match any alpha conditions to be garbage-collected if no other references are held to them.

Mike Rodriguez

unread,
Mar 6, 2017, 10:14:59 AM3/6/17
to Clara
Will, it is a good point that it seems this may be able to be enabled at the alpha node layer in the engine as it currently stands!

However, the same performance concerns apply.  However, perhaps in special cases that still could make sense.

William Parker

unread,
Mar 6, 2017, 12:10:46 PM3/6/17
to Clara
Agreed on performance, removing duplicates seems to inherently require extra checks, whether those checks would be hash-based or otherwise such as a linear sweep.  My guess is that in most use-cases a hash-based algorithm would be fine, particularly in cases where the user was willing to accept the cost of some duplicate removal.  In principle one could give the option to use either, but my instinct would be to use a hash-based algorithm first and revisit that if it became a problem for users using such a duplicate elimination option.

For some background info for other readers, at Cerner we've run into cases where computing the hashcode of facts was extraordinarily expensive (in one particularly extreme case it took around 40 seconds IIRC) and, in practice, a highly optimized linear sweep in Clara's memory worked out to be more efficient.

John Conti

unread,
Mar 15, 2017, 5:46:04 PM3/15/17
to Clara
Ryan, Alan, Mike, William and more,

Thanks again for the replies and useful information about the implementation.  Since I have no experience it seemed like the time for me to do a deep-ish dive on the technology and concurrently implement my requirements fully to understand what I might really be asking for.  It has been more than a little bit fun.

On Friday, March 3, 2017 at 1:43:31 PM UTC-7, Ryan Brush wrote:
I'm afraid the ship has sailed on the cardinality/identity semantics, at least for the default behavior, since there is too much relying on these semantics today. If your problem space demands different behavior, it's possible to write a clara.rules.engine.ISession implementation that wraps another session, and checks a set for whether a fact had already been inserted.

Firstly on cardinality/identity, my current impression is that Ryan's assessment above is part understatement.  State modification looks axiomatic for the forward chaining algorithms like RETE, LEAPS etc.  Born onto small machines, and intended for the OPS5 language, side-effects and efficiency were primary goals, not implementation issues.  In fact the ability to program side effects implements a basic form of dynamic scope it appears to me.  

We have analogs of the Order example Will mentioned earlier, and the current behavior is also easier to reason about in the context of rules asserting (and automatically retracting) facts. For instance, we might have multiple rules that assert (->CustomerNeedsSupport customer-id)...but if one gets retracted by truth maintenance (because the rule that inserted it becomes false), we obviously don't want to retract the others. I think insertion of facts by users and rules should behave consistently, in line with following the "least surprising behavior" for the system.

It's true in many cases there are unique identifiers that prevent collisions, but not in all...and if a user runs into that situation where all inserted facts are retracted due to equality, that users will have a very difficult problem to debug.

I realize this probably isn't the most satisfying answer for some, but it does appear to be the right tradeoff for the several Clara users we have internally and would be disruptive to change, so I think we'll need to stay the course on this one.

While unwieldy from a software engineering perspective, dynamic scope has the happy effect of making the rules easy to interpret on their own, and often in a context-free manner.  That is a huge feature and benefit of such a system.  Dynamic scope is useful in lots of scenarios like games, editors (emacs), where dynamic state makes sense.  I think rules constructed this way have much more utility in most real-world applications than the pure value based semantics I was after.  Not to mention that Clara is exactly what a Drools or Jess user would want too.  I think I will have many uses for these side-effect and potentially infinite looping rule-sets.  There is no doubt that my use case is a "falling off the edge" case for this technology.

That did not however remove my need to prevent the looping that killed my rules-as-search algorithm system.  So I spent some time wrapping clara from the outside.  In the end, it was not a bunch of code.  And while very usable for my application, one would not want my implementation in general:

- I asserted all external io as a record, specifying the io operation to do.  These records could be expected to have duplicates.
- In each loop of my search I detected new requests for io, and injected the records resulting from the io into the session.
- To do that I kept track of the previously done io operations, so as not to redo operations (an optimization on the next point).
- When injecting new tuples from io operations, I verified each tuple did not previously exist in the session.  Different io operations can return the exact same data.
- Since these operations were outside the session, I needed to detect when the session reached a fixed-point (no more *unique* results) by querying all the data in the system.

This is the heart of it:

(defn make-todo [session prev-loads]
  (let [session-loads (select [ALL MAP-VALS] (query session all-loads))]
    (remove #(contains? prev-loads %) session-loads)))

(defn remove-existing [session tuplst]
  (remove #(seq (query session equal-tuple :?tup %)) tuplst))

(defn do-out
  "Perform actions from rule activations that have not yet been done."
  [sys {:keys [loads session] :as state}]
  
  (let [todo (make-todo session loads)
        raw-io (map #(tuple-io/perform-load! sys %) todo)
        new-tuples (select [ALL ALL] raw-io)
        to-insert (remove-existing session new-tuples)]
    
    (assoc state
           :loads (into loads todo)
           :session (insert-all session to-insert))))

The code eventually became nice to look at.  The utility of a data-directed and heuristic search that can be altered by users has much value in the system. And clara's normal semantics will have lots of use elsewhere in the same system.  So I view this all as a very happy ending.  The only thing I might have liked but will not attempt is any simulation of truth-maintenance effects on this externally inserted data. 

Reflecting on the experience, I think what I actually need here is more of a constraint solver, or logic programming type system.  Looking at those however, I think the results of actually implementing this with them would not be very satisfying.  This is mostly because the behavior of the system would be fairly opaque to anyone working on the code.  Interpreting rules with a logic programming system is simple (there are some beautiful examples), but not clear.  Debugging a real system might be maddening, IMO.

The practicalities of the code I wrote only involved a few implementation gotchas:
- I needed to use the compiler/mk-session as one example does because I could not create code that allowed me to apply a vector of namespaces to the mk-session macro.  I burned a bunch of time trying to do this until I saw the example reference to the function.
- Using inspect has the virtue of having truly complete data, but the vice of deeply nested data structures. The code I wrote to dig what I wanted out of those structures was hideous (more me that the data structures I am sure).  In the end, Nathan Marz's Specter library turned into a real asset.  I think it is worth the learning curve for working with deeply nested data structures.  Here's some sample code:

(def INSPECTION (path [:session (parser ci/inspect identity)]))

(defn matches
  "Find matching rules, and the tuples that match them."
  [state]
  (select-one
   [INSPECTION :rule-matches (selected? [MAP-VALS seq])]
   state))

(defn activations
  "Find the names of matching rules and the data the matches them."
  [state]
  (select
   [ALL (collect-one [FIRST :name]) LAST ALL :matches ALL FIRST]
   (matches state)))


 Thanks again for all the friendly help from the group.  Like I said, I am quite satisfied with the results, and very much looking forward to more data-driven logic in my programs courtesy of clara-rules.

Cheers,
John

Ryan Brush

unread,
Mar 17, 2017, 12:32:35 PM3/17/17
to Clara
Cool approach, John. I don't claim to have a great understanding of your use case, but it sounds like you've arrived at some interesting patterns. I'm glad that the transparency into Clara's sessions is helpful here...it's a challenging feature to get right (and we're still evolving it), but can be powerful.
Reply all
Reply to author
Forward
0 new messages