On Wed, Mar 9, 2011 at 6:24 AM, iamniche <nichola...@yahoo.co.uk> wrote:
> Hello Jens,
>
> I am currently researching possible solutions for inferencing on very
> large data sets and came across Mandarax whilst surveying the current
> situation with regards to inference engines, etc.
>
> I note that Mandarax has a completely unique approach to this problem
> and so have a few questions to aid my research (I will definitely
> download it and evaluate)
>
> Firstly, I am looking to integrate a distributed factbase and have
> been looking at various graph databases e.g. Neo4J. Would it be fairly
> involved to integrate Mandarax
> with a graph database? Would I be better integrating Madarax with a
> relational database first?
I haven't used Neo4J but I don't think integration would be difficult.
You will probably extract the facts from edges and edge and vertex
properties. The key to scalability would be to have lazy, iterator
based access to these graph elements (and I assume that Neo4J has an
API for this, it might be called traverser or graph walker). Then you
can integrate the DB into the rules using includes (see manual for an
example).
>
> Secondly, I have a ruleset that is currently deployed in Drools. Will
> it be fairly trivial to translate those rules into Mandarax rules for
> compilation? I will have to investigate this further.
>
This depends .. In drools you can modify objects in rule heads.
Mandarax on the other hand just queries referenced objects, it does
not mutate them. It is therefore "side-effect free". The idea is that
the manipulation of objects (if necessary) is done outside the rule
engine.
Another important difference is that Mandarax computes the derivation
(proof tree) and makes it available (via an API). As far as I know,
forward reasoning systems don't do this.
> Thirdly, please could you explain the relative performance hit of
> performing backward chaining rules instead of the assertion-based
> forward chaining as per Drools?
Forward changing engines creates massive caches. The so-called agenda
or working memory matches rules against facts. This is very fast as
long as all facts fit into memory. Mandarax on the other hand follows
a more DB approach: it computes against the original data (for
instance, from data bases) but does lazy initialization. It is
therefore good in google like applications: computing the first 10
results is sufficient.
For an example, check this discussion:
http://drools-java-rules-engine.46999.n3.nabble.com/Loading-facts-and-memory-size-limits-td1440236.html
Note that caching can be easily added to Mandarax again using the
includes features: if your include references a set of recently
computed records for the respective relationship. I am planning to add
some classes to make this more convenient to Mandarax, but haven't had
the time to do this.
>
> I look forward to discussing these topics with you.
>
> Many thanks.
>
> Regards,
> Nic Hemley
Hope this helps, Jens