SPARQL DSL - a humble request for review and guidance

102 views
Skip to first unread message

Adam Harrison (Clojure)

unread,
Dec 21, 2008, 12:03:46 PM12/21/08
to clo...@googlegroups.com

Hi folks,

First let me say 'Thankyou very much' for Clojure - it has enabled me to
finally take the plunge into learning a Lisp without feeling like I'm
abandoning a ten year investment in the Java platform and its libraries.
I bought 'Practical Common Lisp' about eighteen months ago and read it
half heartedly, never making it to a REPL; however I now have the luxury
of working on a project of my own and decided the time was right to
revisit the promised land of Lisp. I have been aware of Clojure for
about six months, and, given my experience of Java, it seemed like the
obvious place to start. I've spent the past couple of weeks devouring
lots of general Lisp related documentation, Rich's Clojure webcasts and
Stuart's 'Programming Clojure' book. I am pleased to report that my Lisp
epiphany occurred yesterday when I found I could understand this macro
pasted to lisp.org by Rich:

(defmacro defnk [sym args & body]
(let [[ps keys] (split-with (complement keyword?) args)
ks (apply array-map keys)
gkeys (gensym "gkeys__")
letk (fn [[k v]]
(let [kname (symbol (name k))]
`(~kname (~gkeys ~k ~v))))]
`(defn ~sym [~@ps & k#]
(let [~gkeys (apply hash-map k#)
~@(mapcat letk ks)]
~@body))))

I am now spoiled forever, and although my powers are weak, I realise I
have become one of those smug Lisp types condemned to look down on all
other programming languages for the rest of their lives. Thankyou's and
cliched tales of enlightenment dispensed with, I can now move on to the
plea for aid :)

My current project involves semantic web technologies, at this point
specifically SPARQL queries over RDF triple stores (for those unfamiliar
with SPARQL it will suffice to say that it's a straightforward query
language modelled after SQL, but operating over graphs rather than
tables). Like their SQL counterpart, the Java APIs for performing these
queries are verbose and cumbersome, and I am hoping that they can be
hidden behind some cunning Lisp macros to provide an expressive and
tightly integrated semantic query capability (similar to Microsoft's
LINQ). Unfortunately my ambitions far exceed my skills at this point,
and I am hoping to garner some gentle mentoring to steer me in the right
direction from the outset.

My first inclination is to start with the simplest thing which will
work, which is to create a function which takes a SPARQL query string as
an argument and returns a list of results:

(sparql "
PREFIX foaf: <http://xmlns.com/foaf/0.1>
SELECT ?name
WHERE
{
?person foaf:mbox \"mailto:adam-c...@antispin.org\" .
?person foaf:name ?name
}")

However this style is poor for several reasons: it looks ugly, quotation
marks have to be escaped manually, and interpolation of variables into
the query string is a chore which further decreases readability. What I
really want is a nice DSL:

(let [mbox "mailto:adam-c...@atispin.org"]
(sparql
(with-prefixes [foaf "http://xmlns.com/foaf/0.1"]
(select ?name
(where
(?person foaf:mbox mbox)
(?person foaf:name ?name)))))

Clearly this is going to require a macro, because for a start I don't
want the symbols representing SPARQL capture variables (the ones
starting with '?') to be evaluated - I want to take the name of the
symbol, '?' and all, and embed it into the query string which this DSL
will ultimately generate before calling into the Java API. On the other
hand, I do want some things evaluated - I want to embed the value
('mailto:adam-c...@antispin.org') bound to the symbol 'mbox' in the
query string, not the name of the symbol.

From my position of total ignorance, I can see two broad approaches to
tackling this. The first is to implement (sparql ...) as a macro which
is responsible for interpreting the entire subtree of forms below it,
building a query string by selectively evaluating some forms whilst
using others as navigational markers which give context. It would honour
the grammar which defines SPARQL queries, and either signal an error or
be guaranteed to generate syntactically correct queries. The macro would
also have insight into the format of the data which would be returned
(gleaned from the 'select' part) and so could return something useful
like a list of maps where the keys are the names of the capture
variables that appear in the select clause. I have no idea how to do
this, but it feels like the 'right' way.

The other approach, which is IMO a bit hacky, but within my reach, is to
define 'with-prefixes', 'select' and 'where' as individual macros whose
first arguments are expanded into the relevant subcomponent of the query
string and whose final argument is a string to be appended to the end.
You then compose these together into the right order to get the compete
query string:

(with-prefix [foaf "http://xmlns.com/foaf/0.1"] "...the select
statement") would evaluate to "PREFIX foaf: <http://xmlns.com/foaf/0.1>
...the select statement"

(select ?name "... the where clause") would evaluate to "SELECT ?name
WHERE ...the where clause"

and so on. In this case 'sparql' would remain a function which takes a
string argument, and you build that query string recursively by nesting
calls to with-prefix, select & where in the correct order. Whist this is
much easier to implement, it has some serious drawbacks - it is up to
the user to compose these things in the correct order (since everything
gets flattened to a string at each level of recursion, we can't check
for errors), and it's no longer possible for the 'sparql' function to
glean the structure of its return value without reparsing the query
string it is passed.

These problems aside, I decided to take a stab at implementing the
'where' macro from the second approach because I have to start somewhere
:) Here is what I have so far:

(defmacro where [& triples]
`(let [encode# (fn [x#] (cond (and (symbol? x#) (= (first (name x#))
\?)) (name x#)
(integer? x#) (str "\"" x# "\"^^xsd:integer")
(float? x#) (str "\"" x# "\"^^xsd:decimal")
(string? x#) (str "\"" x# "\"")))]
(apply str
(interpose " .\n"
(for [triple# '~triples]
(apply str
(interpose " "
(map encode# triple#))))))))

As you can see, so far it correctly encodes SPARQL capture variables,
and literal strings, integers and floats:

user=> (print (where (?a ?b 1) (?a ?b 2.0) (?a ?b "string")))
?a ?b "1"^^xsd:integer .
?a ?b "2.0"^^xsd:decimal .
?a ?b "string"

I tried adding '(list? x#) (eval x#)' to the encode cond to make it cope
with expressions like this:

(where (?a ?b (+ 1 2)))

Unfortunately that results in an unencoded literal '3' in the query
string instead of the '"3"^^xsd:integer' I was looking for. I tried
calling encode recursively (despite the obvious infinite recursion issue
if the eval returns a list) '(list? x#) (encode# eval x#)' but I got an
error at runtime:

user=> (where ((+ 1 2)))
java.lang.Exception: Unable to resolve symbol: encode__2605 in this context
clojure.lang.Compiler$CompilerException: NO_SOURCE_FILE:153: Unable to
resolve symbol: encode__2605 in this context

Clealy this is due to encode# not being bound until after the function
definition completes, but I have no idea how to fix it yet (is there a
way to refer to a function during its definition?). I am also struggling
to get access to variables bound outside the macro:

(let [v "a value"]
(where (?a ?b v))

I tried adding '(symbol? x#) (eval x#)' to the encode cond but that gets
me a complaint about 'v' being unresolvable in this context.

Another problem I face is that there is no enforcement that triples are
passed - the macro just maps all the values through encode and
interposes them with spaces, so no error is raised if you have too many
or too few values to create a valid where clause. I have no idea what is
the proper way of dealing with things like this in a functional language.

So as you can see, if you have made it heroically to the end of this
email, I am keen but facing a steep learning curve :) I am aware that
most of my troubles are well trodden issues that no doubt have thirty
year old Lisp idiomatic solutions, and for bothering you with them on a
Clojure specific mailing list I apologise. On the other hand, if you
feel like sharing some wisdom with me, either directly or through
pointers to resources I would be most grateful. I would also be most
appreciative to receive comments on the design of the DSL itself, and
suggestions for the most Lispish/Clojurish implementation thereof. In
the meantime, it's back to the REPL for me :)

Best Regards,

Adam Harrison

.Bill Smith

unread,
Dec 21, 2008, 1:49:06 PM12/21/08
to Clojure
> I am now spoiled forever, and although my powers are weak, I realise I
> have become one of those smug Lisp types condemned to look down on all
> other programming languages for the rest of their lives.

Beware, Grasshopper. The world is more complicated than you know.

Daniel E. Renfer

unread,
Dec 21, 2008, 4:35:25 PM12/21/08
to clo...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I have also been doing work with Jena+Clojure+Sparql.

I'm not sure if you're aware of it or not, but you might want to look
into binding result sets. [1]

I would love to compare notes with you some time. I am duck1123 on
freenode, and my jabber address is the same as my email.

[1]: http://www.ldodds.com/blog/archives/000251.html


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAklOtpEACgkQWorbjR01Cx53YACfZVGEON9YceoVllNgBxVmHXDt
+9kAoNIRpxSfYdFu2e1F0QLmEjmHz+aF
=4xnT
-----END PGP SIGNATURE-----

Brian Sletten

unread,
Dec 21, 2008, 5:45:42 PM12/21/08
to clo...@googlegroups.com
Adam, I just joined the list, but I am very interested in working with
you on the SPARQL DSL. Let me catch up on what you've written and
we'll muddle through it together.

Glad to see there are other Clojure-loving SemWeb nerds around here. :)

Michael Wood

unread,
Dec 21, 2008, 6:30:40 PM12/21/08
to clo...@googlegroups.com
On Sun, Dec 21, 2008 at 7:03 PM, Adam Harrison (Clojure)
<adam-c...@antispin.org> wrote:
[...]

> (defmacro where [& triples]
> `(let [encode# (fn [x#] (cond (and (symbol? x#) (= (first (name x#))
> \?)) (name x#)
> (integer? x#) (str "\"" x# "\"^^xsd:integer")
> (float? x#) (str "\"" x# "\"^^xsd:decimal")
> (string? x#) (str "\"" x# "\"")))]
> (apply str
> (interpose " .\n"
> (for [triple# '~triples]
> (apply str
> (interpose " "
> (map encode# triple#))))))))
>
> As you can see, so far it correctly encodes SPARQL capture variables,
> and literal strings, integers and floats:
>
> user=> (print (where (?a ?b 1) (?a ?b 2.0) (?a ?b "string")))
> ?a ?b "1"^^xsd:integer .
> ?a ?b "2.0"^^xsd:decimal .
> ?a ?b "string"
>
> I tried adding '(list? x#) (eval x#)' to the encode cond to make it cope
> with expressions like this:
>
> (where (?a ?b (+ 1 2)))
>
> Unfortunately that results in an unencoded literal '3' in the query
> string instead of the '"3"^^xsd:integer' I was looking for. I tried

Here's my poor excuse for a nudge in what may or may not be the right direction:

user=> (defn encode-symbol [x] (if (= (first (name x))) (name x)))
#'user/encode-symbol
user=> (defn encode-other [x]
(cond (integer? x) (str \" x \" "^^xsd:integer")
(float? x) (str \" x \" "^^xsd:decimal")
(string? x) (str \" x \")))
#'user/encode-other
user=> (defmacro encode [x]
(if (symbol? x)
(encode-symbol x)
`(encode-other ~x)))
nil
user=> (encode ?a)
"?a"
user=> (encode 1)
"\"1\"^^xsd:integer"
user=> (encode 2.0)
"\"2.0\"^^xsd:decimal"
user=> (encode "string")
"\"string\""
user=> (encode (+ 1 2))
"\"3\"^^xsd:integer"
user=>

(Disclaimer: I don't know what I am doing, but this at least seems to work.)

--
Michael Wood <esio...@gmail.com>

Michael Wood

unread,
Dec 22, 2008, 10:58:54 AM12/22/08
to clo...@googlegroups.com

This seems to be a lot simpler if you use keywords instead of ?something:

user=> (defmacro encode [x]
`(let [x# ~x]
(cond (keyword? x#) (str \? (name x#))


(integer? x#) (str \" x# \" "^^xsd:integer")

(float? x#) (str \" x# \" "^^:decimal")


(string? x#) (str \" x# \")

:else (throw (new Exception "Invalid SPARQL atom")))))
nil
user=> (println (encode :a))
?a
nil
user=> (println (encode 1))
"1"^^xsd:integer
nil
user=> (println (encode 2.0))
"2.0"^^:decimal
nil
user=> (println (encode "string"))
"string"
nil
user=> (println (encode (+ 1 2)))
"3"^^xsd:integer
nil
user=> (println (encode (keyword "b")))
?b
nil
user=> (println (encode 'invalid))
java.lang.Exception: Invalid SPARQL atom (NO_SOURCE_FILE:0)
user=> (println (encode 1/2))
java.lang.Exception: Invalid SPARQL atom (NO_SOURCE_FILE:0)
user=>

--
Michael Wood <esio...@gmail.com>

Randall R Schulz

unread,
Dec 22, 2008, 11:41:47 AM12/22/08
to clo...@googlegroups.com
On Monday 22 December 2008 07:58, Michael Wood wrote:
> ...

>
> This seems to be a lot simpler if you use keywords instead of
> ?something:

The tradition of using a leading question mark to designate a (logical)
variable is pretty widespread, and many practitioners from the realms
of automated reasoning and related ares are inured to it. You'll find
it in many texts on ATP and related topics. (The Common Logic
specification abandoned it, and along with that naming convention went
the possibility of writing formulas with free variables. Many of us
still use the question marks even though they're not required.) Prolog
does something similar: An initial capital letter makes a name a
variable. Prover9 (and its forebear Otter) deems a name to be that of a
variable when its first letter is 'x' through 'z' (lower-case only). It
can be switched to the Prolog convention, as well. Cyc names are
signified by an initial #$ (you can guess why, given its Lisp heritage)
and its variables by the question mark

In any event, the approach of assigning syntactic categories based on
the first character is widespread. Furthermore, many of the systems
that do so are written in (some dialect of) Lisp, and hence already
support and use keywords designated by initial colons (typically to
specify various options). I believe the development of such systems was
a big motivation for programmable readers in Common Lisp and its
ancestors.


I bring up all this history, which may seem irrelevant to many, because
I continue to believe that Clojure would benefit from read-table
control.


> ...


Randall Schulz

Adam Harrison (Clojure)

unread,
Dec 23, 2008, 10:27:14 AM12/23/08
to clo...@googlegroups.com
Hi Randall,

How do you envision read-table control to be useful in the context of
this particular problem - perhaps to redefine tokens beginning with '?'
as keywords? And is read-table control the same thing as reader macros?

Best Regards,

Adam.

Randall R Schulz

unread,
Dec 23, 2008, 11:00:09 AM12/23/08
to clo...@googlegroups.com
On Tuesday 23 December 2008 07:27, Adam Harrison (Clojure) wrote:
> Randall R Schulz wrote:
> > ...

>
> Hi Randall,
>
> How do you envision read-table control to be useful in the context of
> this particular problem - perhaps to redefine tokens beginning with
> '?' as keywords? And is read-table control the same thing as reader
> macros?

Loosely speaking, yes read-table control is about attaching your own
code to particular input signifiers, usually (though in CL not
necessarily) a specific character that appears following
the "dispatching" macro character '#'.

And I would strongly recommend _not_ trying to represent your query
variables as keywords, but rather as entities that have the requisite
properties or types of SPARQL query variables (as needed in your own
design). That could be as simple as a metadata tag designating the
symbol as a variable name or shunting them to an alternative namespace
reserved for SPARQL query variables, e.g. Additionally or alternatively
it could also handle entering the name into whatever indexing or
table-of-contents required. Generally speaking, variables (whether the
kind used in C / C++ / Java / etc. or the kind used in logic or the
kind used in FP, the latter two being like each other than either is to
the former) require lots of special treatment beyond their simple
existence as a name of a something.

As far as keywords relate to your problem, I recommend staying away from
them for this purpose. The initial colon at the beginning of their
names is ineluctable in Clojure and you shouldn't try to use them at
cross purposes to their intended uses. Furthermore, you cannot attach
metadata to keywords as you can to symbols.

No, your logical variables should be represented by some other entity.
Keep in mind that if you want to process SPARQL as a DSL based on the
Clojure reader (assuming it's even readable by the Clojure reader (*) —
not all languages are; even many Lisp-like languages are not), then
you're going to have to control evaluation when you process it. That's
true, anyway, if you're going to use an embedded DSL approach wherein
the Clojure compiler will be the first thing to get its hands on your
input S-Expressions after they're returned from the Reader. That's what
macros are for.

On the other hand, if you use an external DSL approach where you get the
S-Expressions encoding SPARQL queries, then the whole issue of
evaluation control and macros goes away. You just process those S-Exprs
as data that encodes the queries. Evaluation and name resolution never
enter the picture this way and you needn't worry about them (nor can
you take advantage of them, whichever perspective suits you best...).

Anyway, if SPARQL uses a leading question mark to signify a variable
name, then you're going to have to check every symbol in your SPARQL
input for that signifier and handle it appropriately. And I don't know
if SPARQL has nested namespaces and rules about reusing variables
(shadowing, e.g.), but if it does, you're going to need a symbol table
design that accommodates that requirement (to state the obvious, I
hope).

Just don't try to press Clojure keywords into service for something
they're not meant to be used for.


> Best Regards,
>
> Adam.


Randall Schulz
--
Did I mention I'm wordy?

Reply all
Reply to author
Forward
0 new messages