minikanren + triple store = ❤

125 views
Skip to first unread message

Amirouche Boubekki

unread,
Feb 26, 2019, 12:41:49 PM2/26/19
to minikanren
Hello,


Following on my previous work with mini/micro kanren with wiredtiger storage engine.

I made a in-memory triple store with minikanren querying.

The source is attached to this mail. It can also be found at source hut.
There is a few tests in yiwen.test.scm. The full minikanren language is
supported even recursive queries like appendo. There is no tests for that,
tho, because I did not find a good example of recursive queries in triple store
setting.

In the code 'azul logic' is minikanren from the reasoned schemer second edition.
It rely on r7rs scheme mapping module that is actually srfi 146 which is
(like wiredtiger) an ordered mapping. It is less that 200 sloc.

This should be easy to port to any scheme that has some kind of ordered mapping.
The current implementation rely on a red-black tree.

Also, it is not RDF. It is not datomic either because I does not force to declare a
schema upfront. Garbage in, garbage out!

All scheme base data types (string, fixnum, floatnum, array, cons/lists, symbol) are
supported as subject, predicate and value. They are compared using their respective
equal predicate and comparator (see SRFI 128 comparators). It seems to work with
records like srfi-9 too.

I am currently experimenting with that triple store as an application tree to build an
editor. An editor has more or less deeply nested structures think about the rendering
tree of horizontal and vertical panes. You might think zippers or cursors.

The triple store allows to easily represent recursive data structures in a single flat
namespace by introducing indirection between a key-value pair (predicate-object)
and its 'subject'. Also, the fact that the triple store is a sort-of global unique data
structure that stores everything, it makes it easy to debug state. Not introducing
zipper or cursors is easier to my mind because at the end of the day a triple store
is a mapping that is ordered.
yiwen.scm
yiwen.test.scm
logic.scm
logic.test.scm

Amirouche Boubekki

unread,
Feb 26, 2019, 5:24:55 PM2/26/19
to minikanren

Rick Moynihan

unread,
Feb 26, 2019, 6:25:48 PM2/26/19
to minik...@googlegroups.com
This sounds very similar to a library I knocked together about 6 months ago:


It's essentially an in memory triple store for Clojure (written as a thin layer of macros over core.logic).

Like with your library, Matcha also uses the concept of a triple without being directly tied to RDF types.  Essentially any clojure or java value can be used in any ?s ?p or ?o position of a triple.  Matcha also has seamless optional support for Grafter (a clojure RDF library) without having any direct dependency on it.

It's intended primarily to help working with RDF, and consequently recreates some of SPARQLs semantics.  A colleage and I have recently been working to add support for SPARQL OPTIONALs and VALUES bindings, this work is expected to be finished and merged in the next few days.

Matcha's approach seems slightly different to yours; in that we aim to hide core.logic, and instead present an interface that is familar to clojure developers who know SPARQL.  In particular I believe Matcha programs should always terminate, not allow recursion; and provide ceremony free interop with clojure.

We also have some "cute" features, for example Matcha's construct does not construct graphs of triples; but clojure data structures.  e.g. a query like this:

(def db [[:s1 :rdfs/label "subject"]])

(construct {:s ?s
                  :rdfs/label ?label }
   [[?s :rdfs/label ?label]] db)

constructs the solutions in the shape of the supplied map template e.g.

; => [{:s :s1 :rdfs/label "subject"}]

You can do this to arbitrary depth e.g.

(construct {:s ?s
                  :nested-map {:label ?label } }
   [[?s :rdfs/label ?label]] db)

;; => [{:s :s1 :nested-map {:label "subject"}}]

We also facilitate grouping of resources with the special keyword :grafter.rdf/uri like so:

(def db [[:s1 :rdfs/label "subject"]
             [:s1 :rdfs/label "another subject"]
             [:s1 :foo "bar"]])


(construct {:grafter.rdf/uri ?s
                   ?p ?o }
                 [[?s ?p ?o]] db2)

;; => [{:grafter.rdf/uri :s1
          :rdfs/label #{"subject" "another subject"}
          :foo "bar}]

This later usage groups solutions into maps by ?s and ?p, lifting multiple values with the same predicate into a set.  This resource object form is particularly useful for post processing in clojure.

Performance hasn't been investigated deeply; but it's good enough for us right now and we use it for production workloads.  It's certainly nowhere near as fast as a proper commercial triple store; but some systems we have may query with modestly complex patterns Matchagraphs with up to perhaps a million edges, and return all solutions within a few seconds.  Though I'd say it's much more typical for us to use it on small graphs of dozens to thousands of edges.

Anyway I spotted your email and thought I'd share incase you find it relevant or useful.

--
You received this message because you are subscribed to the Google Groups "minikanren" group.
To unsubscribe from this group and stop receiving emails from it, send an email to minikanren+...@googlegroups.com.
To post to this group, send email to minik...@googlegroups.com.
Visit this group at https://groups.google.com/group/minikanren.
For more options, visit https://groups.google.com/d/optout.

Amirouche Boubekki

unread,
Jun 20, 2019, 5:52:27 AM6/20/19
to minikanren
re: minikanren + triple store + on-disk persistence

I did benchmarks on trivial queries over a dataset that is smaller than memory.
The results were disappointing, minikanren queries were around 100 times slower
than the equivalent query using only SRFI-168 primitives (which are 2 to 3 times
faster than a Java-based triple store using Chez Scheme).

Anyway, SRFI-168 is still a work-in-progress don't hesitate to chime in.

Amirouche Boubekki

unread,
Jun 20, 2019, 6:19:45 AM6/20/19
to minikanren
Hello Rick!

Very interesting work. Sorry, for not responding earlier, I was busy and I was not sure what to reply except +1.

Find below, some comments (and brag about my project):

On Wednesday, February 27, 2019 at 12:25:48 AM UTC+1, Rick Moynihan wrote:
This sounds very similar to a library I knocked together about 6 months ago:


It's essentially an in memory triple store for Clojure (written as a thin layer of macros over core.logic).

I warmly recommend you have look at both SRFI-167 (ordered key-value store) and SRFI-168 (generic tuple store). The nstore is a generic approach to triple store while the rationale is light on the reasons I built that.
The article "Evaluation of Metadata Representation in RDF stores" somewhat gives a better rational: performance.

Long story short, when you need to represent metadata (provenance, license, history significance etc...)
in every tuple, it is more performant to add an item to every tuple that reify using the tools described
in that paper. I needed a nstore because of a functional store I am building where I need n+3 items tuple store where n is the size of the exposed tuples. Yes, the versioned tuple store is also generic.

Anyway, the math behind it is beyond me, but the algorithm is simple.
 
Like with your library, Matcha also uses the concept of a triple without being directly tied to RDF types.
Essentially any clojure or java value can be used in any ?s ?p or ?o position of a triple. 

Similar to nstore. The accepted data types as tuple items are defined by the lexicographic packing procedure...
It can pack bigint, symbol, strings, bytevectors but float and nested types are not implemented, yet.
 
Matcha also has seamless optional support for Grafter (a clojure RDF library) without having any direct dependency on it.

I am not sure what grafter does. Not sure it is related: In the case of nstore, the sample in-memory implementation
rely on okvs, that is the in-memory triple store only support scheme objects that can packed.
 
It's intended primarily to help working with RDF, and consequently recreates some of SPARQLs semantics.  A colleage and I have recently been working to add support for SPARQL OPTIONALs and VALUES bindings, this work is expected to be finished and merged in the next few days.

I am still pondering the idea of extending nstore via another specification to support SPARQL... SPARQL 1.2 is in the work... So, I am not sure it is a good investment to do right now. Also some parts of SPARQL are tricky. What do you do with nested queries? How to you implement aggregation?
 
Matcha's approach seems slightly different to yours; in that we aim to hide core.logic, and instead present an interface that is familar to clojure developers who know SPARQL.

That is what I intend to do with another SRFI proposal. I will not rely on a minikanren engine to execute queries, like I said in the other thread it is / was slow on my benchmarks. I would gladly be challenged on that.
 
  In particular I believe Matcha programs should always terminate, not allow recursion; and provide ceremony free interop with clojure.

In nstore, queries are composition of streams.
 

We also have some "cute" features, for example Matcha's construct does not construct graphs of triples; but clojure data structures.  e.g. a query like this:

(def db [[:s1 :rdfs/label "subject"]])

(construct {:s ?s
                  :rdfs/label ?label }
   [[?s :rdfs/label ?label]] db)

constructs the solutions in the shape of the supplied map template e.g.

; => [{:s :s1 :rdfs/label "subject"}]

You can do this to arbitrary depth e.g.

(construct {:s ?s
                  :nested-map {:label ?label } }
   [[?s :rdfs/label ?label]] db)

;; => [{:s :s1 :nested-map {:label "subject"}}]

We also facilitate grouping of resources with the special keyword :grafter.rdf/uri like so:

(def db [[:s1 :rdfs/label "subject"]
             [:s1 :rdfs/label "another subject"]
             [:s1 :foo "bar"]])


(construct {:grafter.rdf/uri ?s
                   ?p ?o }
                 [[?s ?p ?o]] db2)

;; => [{:grafter.rdf/uri :s1
          :rdfs/label #{"subject" "another subject"}
          :foo "bar}]

This later usage groups solutions into maps by ?s and ?p, lifting multiple values with the same predicate into a set.  This resource object form is particularly useful for post processing in clojure.

Can you compose CONSTRUCT inside another query?
 

Performance hasn't been investigated deeply; but it's good enough for us right now and we use it for production workloads.  It's certainly nowhere near as fast as a proper commercial triple store; but some systems we have may query with modestly complex patterns Matchagraphs with up to perhaps a million edges, and return all solutions within a few seconds.  Though I'd say it's much more typical for us to use it on small graphs of dozens to thousands of edges.

Yeah, it might work on small-ish datasets. That said, it seems to me processing a stream of bindings that are hashmap is as easy if not more easy than dealing with SPARQL (except if you already know SPARQL)...

 
Anyway I spotted your email and thought I'd share incase you find it relevant or useful.


Thanks for sharing :)
Reply all
Reply to author
Forward
0 new messages