Querying a tuple database with minikanren

188 views
Skip to first unread message

Amirouche Boubekki

unread,
Nov 29, 2015, 9:27:55 AM11/29/15
to minikanren
Héllo,


I am interested in building database both for the purpose of exploring
datasets like conceptnet or wikibase and building webapps.

You might be wondering how this relates to minikanren. Well, there's
two sides:

* First, you might want to store logic facts inside database persisted
  to disk because your dataset doesn't fit into memory

* Second, you might want to take advantage of the declarative nature
  of minikanren programs

For this exercise I've chosen to work on tuple database so called
(uid, attribute, identifier) aka. uav database, because it's the
database that is the most flexible and can span various usecases from
relational, graph or hierarchical problems to key/value or document
based kind of problems. And it can be loosely typed. It's similar in
principle to datomic EAV except that it doesn't have an audit trail
ie. it is not immutable.

The backend storage is handled by wiredtiger [0], it's a ordered
key/value store, a hashmap similar to leveldb or Oracle BerkeleyDB.

To picture how it looks like imagine a giant list of (u a v) tuples
like the following, that stores messages from a blogging application:

(define database '(("uid1" type post)
                   ("uid1" blog/post "uav db is a tuple db")
                   ("uid1" blog/create-at 0)

                   ("uid2" type tag)
                   ("uid2" tag/name "scheme")

                   ("uid3" type taglink)
                   ("uid3" taglink/tag "uid2")
                   ("uid3" taglink/post "uid1")))

In wiredtiger, `uid` and `attribute` compose the *key*, and the
`value` unique *value* column of the entry.

There is only three primitives that can be run against such database:

* `(uav-ref uid attribute)` retrieve the value associated with
  `(UID . ATTRIBUTE)` pair.

* `(uav-ref* uid)` retrieve all the `(attribute . value)` pairs
  associated with `UID` aka. retrieve the document with `UID` as
  unique identifier.

* `(uav-index-ref attribute value)` retrieve all `uid` for which a
  `(uid ATTRIBUTE VALUE)` tuple exists in the database. This is kind
  of a reverse lookup. I don't call it as such because it's not the
  only "reverse" lookup that can be made.

Next, I translate `uav-ref` and`uav-index-ref` procedures into
minikanren procedures that makes it possible to solve logic problems
backed by a database and then explain the higher level interface
that use a pattern matching-like interface to query the tuples.

To simplify let's use on single procedure, `(queryo uid attribute
value)` that calls the correct underlying `uav-*` procedure and does
the correct *substitutions* mangling with `disj` and `conj`. To
query all blog posts that have the a given tag, one can use the
following minikanren query:

(define (query:posts-with-tags name)
 
(run* (uid?)
   
(fresh (tag?? taglink??)
     
(queryo tag?? 'tagname name)
      (queryo taglink?? '
tag/link tag??)
     
(queryo taglink?? 'tag/post uid?))))

The above query translates in plain english line by line as:

* Given a tag with uid `tag??` and `tag/name` equal to the value `name`
* Find a tuple with `taglink??` uid where `'tag/link` is associated with `tag??`
* Such as there is also a tuple with `taglink??` uid where `'tag/post`
  is associated with `uid?`

And return `uid?`.

`queryo` is implemented in terms of `uav-refo` and `uav-index-refo`:

(define queryo
 
(lambda (u a v)
   
(lambda (s/c)
     
(let ((u (walk u (car s/c)))
           
(v (walk v (car s/c))))
       
(cond
         
((var? u) ((uav-index-refo u a v) s/c))
         
((var? v) ((uav-refo u a v) s/c))
         
((throw 'uav "unsupported query type")))))))

This doesn't support at least, the case where both `u` and `v` are
values. What it does is lookup the value associated with the mk
variables `u` and `v` and return themself if their are no mk variables
thanks to `walk`. It's similar to how `==` works through
`unify`.

`queryo` is really a syntax helper to make it possible to use a single
procedure which has the look of a UAV tuple.

The difference with `==` is that `==` is the end of the story,
there is no further processing. In `queryo` down the flow calls
to `disj` and `conj` happen.

Let's start simple, `(uav-refo u a v?)` expects only `v?` to be a
variable and is defined in terms of `==`:

(define uav-refo
 
(lambda (uid attribute value?)
   
(== value? (uav-ref uid attribute))))

It says that `value?` must be equal to the output of `uav-ref`.

This is the first step into making minikanren work with the database,
but this is not enough. `uav-refo` is useful to check *constraints*
where usually in a database query one need to "create" values and then
check constraint against.

To create values we use `uav-index-refo`, remember `uav-index-ref` is
the kind of reverse lookup that allows to retrieve uids that we can later
constraint using `uav-refo`.

`(uav-index-ref attribute value)` will return several uids for the
same *constraint* so this is a hint that a `disj` is happening. The `disj`
procedure defined in microkanren [1] takes only two arguments and `disj+`
is a macro. So I defined the following `disj*`:

(define (disj* . gs)
 
(if (null? gs) (lambda (s/c) (list))
     
(disj (car gs) (apply disj* (cdr gs)))))

Now we can define `uav-index-refo`:

(define findo
 
(lambda (uid? attribute value)
   
(let ((uids (uav-index-ref attribute value)))
     
(apply disj* (map (lambda (uid) (== uid uid?)) uids)))))

This implement that every result of `uav-index-ref` creates its own
constraint space with a different uid.

Now everything is in place to do queries using minikaren, it also
works with recursive queries without having to add anything else.

To sum up, I translated the database primitives to minikanren.

On top of this I defined a macro that allows to have tuple pattern matching.

The pattern matching interface is less powerful that the raw
minikanren interface in the sens that it doesn't allow so far to
create recursive queries but allows to express SQL queries in a terse
and nice syntax.

For instance, to do the same query as above, one can use the following query:

(define (query:posts-with-tag* name)
 
(query* uid? where ((tag?? 'tag/name name)
                     (taglink?? '
tag/link tag??)
                     
(taglink?? 'tag/post uid?))))

The where clause declare the pattern that should be looked for. The
symbols that ends with a single question mark e.g. `tag?` are return
variables. symbols that ends with two question marks e.g. `taglink??`
are internal variables matched during querying.  (In the underlying
implementation `taglink??` is a minikanren variable instantiated with
`fresh`)

Here is the macro:

(define-syntax query*
 
(lambda (x)
   
(define (is-not-free-var? sym)
     
(let ((sym (symbol->string sym)))
       
(eq? (string-rindex sym #\?) (- (string-length sym) 1))))
   
(define (make-vars ctx names tuples)
     
(let* ((names (map syntax->datum names))
             
(vals (map syntax->datum tuples))
             
(syms (filter symbol? (concatenate vals)))
             
(syms (filter is-not-free-var? syms))
             
(syms (delete-duplicates syms))
             
(vars (fold (lambda (x out)
                           
(remove (cut equal? x <>) out))
                         syms
                         names
)))
       
(datum->syntax ctx vars)))
   
   
(syntax-case x (where)
     
((query* names ... where ((u a v) ...))
       
(with-syntax ((vars (make-vars x #'(names ...) #'((u a v) ...))))
         
#'(reify-all (lambda (names ...)
                       
(fresh vars
                           
((queryo *context*) u a v) ...))
                     
'names ...))))))

A few notes:

* Here `queryo` takes a `*context*` argument because in my code, database primitives
  all take a database context argument. I'm not sure how to deal with that I think the
  best interface for this would be something like `(query* *context* query arguments)`
  but I couldn't achieve that. Using a fluid is another solution maybe.

* I also rely on a `reify-all` procedure not defined in microkanren, here is it:

(use-modules (srfi srfi-1))
(use-modules (srfi srfi-26))


(define (run goal . vars)
 
(take-all
   
(call/goal (apply goal vars))))

(define-public (reify-all goal . names)
 
(let* ((vars (map var names))
         
(s/c (apply run (cons goal vars))))
   
(map (lambda (s) (map (cut reify <> s) vars)) s/c)))

The above queries can be written using the database primitives like that:

(define (query:posts-with-tag name)
 
(let* ((tag?? (uav-index-ref dbc 'tag/name name))
         (taglink?? (uav-index-ref dbc '
tag/link tag??)))
   
(map (cut uav-ref dbc <> 'tag/post) taglink??)))

There is small webapp that make use of this, that can be found over
git [2]. It only requires guile and wiredtiger built from sources (develop
branch, which is easy to build from source) [3].


All the best!


[0] https://github.com/wiredtiger/wiredtiger
[1] https://github.com/jasonhemann/microKanren
[2] https://git.framasoft.org/a-guile-mind/nanoblog.git
[3] git clone https://github.com/wiredtiger/wiredtiger.git

Amirouche Boubekki

unread,
Dec 23, 2016, 6:02:48 AM12/23/16
to minikanren
I will give a talk at fosdem about my work on wiredtiger and talk a little
about this microkanren integration. If you wish to discuss stuff related
to mini/micro kanren at FOSDEM make sure to drop by Guile dev room :)

Best!

Amirouche aka. amz3
Reply all
Reply to author
Forward
0 new messages