Inference Engine for very large data sets

27 views
Skip to first unread message

iamniche

unread,
Mar 8, 2011, 12:24:19 PM3/8/11
to mandarax
Hello Jens,

I am currently researching possible solutions for inferencing on very
large data sets and came across Mandarax whilst surveying the current
situation with regards to inference engines, etc.

I note that Mandarax has a completely unique approach to this problem
and so have a few questions to aid my research (I will definitely
download it and evaluate)

Firstly, I am looking to integrate a distributed factbase and have
been looking at various graph databases e.g. Neo4J. Would it be fairly
involved to integrate Mandarax
with a graph database? Would I be better integrating Madarax with a
relational database first?

Secondly, I have a ruleset that is currently deployed in Drools. Will
it be fairly trivial to translate those rules into Mandarax rules for
compilation? I will have to investigate this further.

Thirdly, please could you explain the relative performance hit of
performing backward chaining rules instead of the assertion-based
forward chaining as per Drools?

I look forward to discussing these topics with you.

Many thanks.

Regards,
Nic Hemley

Jens Dietrich

unread,
Mar 8, 2011, 7:40:49 PM3/8/11
to mand...@googlegroups.com
Hi Nick,

On Wed, Mar 9, 2011 at 6:24 AM, iamniche <nichola...@yahoo.co.uk> wrote:
> Hello Jens,
>
> I am currently researching possible solutions for inferencing on very
> large data sets and came across Mandarax whilst surveying the current
> situation with regards to inference engines, etc.
>
> I note that Mandarax has a completely unique approach to this problem
> and so have a few questions to aid my research (I will definitely
> download it and evaluate)
>
> Firstly, I am looking to integrate a distributed factbase and have
> been looking at various graph databases e.g. Neo4J. Would it be fairly
> involved to integrate Mandarax
> with a graph database? Would I be better integrating Madarax with a
> relational database first?

I haven't used Neo4J but I don't think integration would be difficult.
You will probably extract the facts from edges and edge and vertex
properties. The key to scalability would be to have lazy, iterator
based access to these graph elements (and I assume that Neo4J has an
API for this, it might be called traverser or graph walker). Then you
can integrate the DB into the rules using includes (see manual for an
example).

>
> Secondly, I have a ruleset that is currently deployed in Drools. Will
> it be fairly trivial to translate those rules into Mandarax rules for
> compilation? I will have to investigate this further.
>

This depends .. In drools you can modify objects in rule heads.
Mandarax on the other hand just queries referenced objects, it does
not mutate them. It is therefore "side-effect free". The idea is that
the manipulation of objects (if necessary) is done outside the rule
engine.
Another important difference is that Mandarax computes the derivation
(proof tree) and makes it available (via an API). As far as I know,
forward reasoning systems don't do this.


> Thirdly, please could you explain the relative performance hit of
> performing backward chaining rules instead of the assertion-based
> forward chaining as per Drools?

Forward changing engines creates massive caches. The so-called agenda
or working memory matches rules against facts. This is very fast as
long as all facts fit into memory. Mandarax on the other hand follows
a more DB approach: it computes against the original data (for
instance, from data bases) but does lazy initialization. It is
therefore good in google like applications: computing the first 10
results is sufficient.

For an example, check this discussion:
http://drools-java-rules-engine.46999.n3.nabble.com/Loading-facts-and-memory-size-limits-td1440236.html

Note that caching can be easily added to Mandarax again using the
includes features: if your include references a set of recently
computed records for the respective relationship. I am planning to add
some classes to make this more convenient to Mandarax, but haven't had
the time to do this.

>
> I look forward to discussing these topics with you.
>
> Many thanks.
>
> Regards,
> Nic Hemley

Hope this helps, Jens

iamniche

unread,
Mar 15, 2011, 12:03:51 PM3/15/11
to mandarax
Jens,

Many thanks for your swift reply. My responses are below.

> In drools you can modify objects in rule heads.
I assume that the equivalent in Mandarax is to iterate a ResultSet and
perform operations on the returned objects?
In this pull model, the rule evaluation is left to the programmer, so
we lose the efficiency of the rule-firing RETE algorithm.

> lazy initialization
The equivalent in Drools is lazy evaluation i.e. using the 'from'
keyword to perform queries for more data, but this is an expensive
operation.

I am considering whether we could potentially utilise both approaches
to gain the best of both worlds.
We can forward-chain certain conditions, which act as 'guards' to
backward-chaining queries from the database.

So I guess the key question is when to use each approach.

Cheerio,
Nic Hemley
Reply all
Reply to author
Forward
0 new messages