Deparsing a parsed SPARQL query

101 views
Skip to first unread message

devoni...@gmail.com

unread,
Oct 9, 2017, 11:49:37 AM10/9/17
to rdflib-dev
I brought this up about a month ago on another thread but I am ready to revisit it.

I am working on this library

https://github.com/paulhoule/gastrodon

and one of the goals for the library is that it should know more about SPARQL than most users.

Here are two bits of SPARQL intelligence that the library already has:

(1) It uses the SPARQL parser in rdflib to look for a GROUP BY statement in a query and if the group variables are used in the SELECT clause, these are automatically made the indexes of a pandas data frame made from the SELECT output.

(2) The library also substitutes binding variables into SPARQL queries in order to use binding variables with queries sent to remote SPARQL endpoints.

I've been going at (2) with a rather stupid approach based on str.replace() which worked for a while, but then I found an easy way to break it. I can see a hack that will get me through the day, but there is a way to break that, and even though I think I see a way to make one that is unbreakable, it is enough work that I might want to write something that can turn SPARQL parse trees back into text because that, in principle, is the way to complete SPARQL intelligence.

So I am thinking about the right way to do it. For a while I was puzzled by the large parse trees created by simple expression, for instance,

parseQuery("SELECT (5 as ?o) {}")

parses to

[([], {}), SelectQuery_{'projection': [vars_{'expr': ConditionalOrExpression_{'expr': ConditionalAndExpression_{'expr': RelationalExpression_{'expr': AdditiveExpression_{'expr': MultiplicativeExpression_{'expr': rdflib.term.Literal('5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))}}}}}, 'evar': rdflib.term.Variable('o')}], 'where': GroupGraphPatternSub_{}}], {})

But the more I think about it, this is just how parse trees produced by something like pyparsing work -- the precedence structure is embedded in the sequence of nodes; I can imagine this might make query understanding a little harder than it has to be, but it shouldn't be a barrier to unparsing because if the conditionalAndExpression has only one leg, the string form is just the string form of the single leg.

Another thing I am thinking about is that I'd like to have some way to substitute list variables from Python into the ExpressionList in an IN clause. It seems I could add an XVAR term which is ?? + VARNAME, then add the appropriate production, but I hate the idea of having to copy that whole parse tree when I only want to make a small change.

Any ideas?

Dan Davis

unread,
Nov 18, 2017, 10:01:17 AM11/18/17
to rdflib-dev
This is related to my topic question about how to do binding.   I'm working with a remote SPARQL HTTP endpoint.   I want to create code examples that are not vulnerable to "SPARQL injection".   This requires a bit more than the str.replace() method you mention.

My thought was to construct a graph as follows:

sparqlstore = SparqlStore(endpoint_url, sparql11=True)
g = Graph(store=sparqlstore)
q = prepareQuery(querystring, initNs=prefixes_dict)
g.query(q, initBindings=bindings_dict)

For your case, an approach like this may work because prepareQuery returns a Query object that has  query.algebra, and so you can walk the structure to do your binding.

And ... if you do this, it may allow me to really at last avoid the somewhat unreal dangers of SPARQL injection...

Dan Davis

unread,
Nov 18, 2017, 10:07:06 AM11/18/17
to rdflib-dev
Going one step further - my plan now is to walk the query.algebla for a query returned from prepareQuery and replace any matching rdflib.terms.Query with a Literal when I do bindings.   So, I intend to write a new BindingSparqlStore to accomplish this.

Dan Davis

unread,
Nov 18, 2017, 11:45:27 AM11/18/17
to rdflib-dev

So, once parseQuery is done, which returns pyparsing pieces, rdflib.plugins.sparql calls translateQuery to translate it into a SPARQL algebra.   So, for what we want to do, it seems we need two pieces:

* Code which mimics translateQuery in https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/algebra.py to convert rdflib.terms.Variable into rdflib.terms.Literal without creating the algebra.

* Code to unparse the pyparsing result.   I wonder whether this is a feature of pyparsing, but I do not know.

I'll let you know if I make more progress.

Dan Davis

unread,
Nov 18, 2017, 12:12:13 PM11/18/17
to rdflib-dev
Paul,

I tthink that was it - there is a very sophisticated traverse function in rdflib.plugins.sparql.algebra that can do what I want, and I think it can also do what you want with an io.StringIO.   I need to go farther to get it back into a SPARQL query, but it is beginning to seem like simply work, rather than discovery.

Here's what I was able to do:

from rdflib import Variable, Literal
from rdflib.nampsace import Namespace, RDFS
from rdflib.plugins.sparql.parser import parseQuery
from rdflib.plugins.sparql.algebra import traverse

query_string = """
SELECT ?s ?l
WHERE {
    ?s a meshv:Descriptor .
    ?s rdfs:label ?l .
    FILTER(REGEX(?l, ?expr)) .
}
"""

query = parseQuery(query_string)

# I now want to bind ?expr to "Cobra" and search for desriptors whose label includes "Cobra" in them
bindings = { Variable('expr'): Literal('Cobra') }

def convertBoundVar(v):
    if instanceof(v, Variable) and v in bindings:
        return bindings[v]
    return None

new_query = traverse(query, visitPost=convertBoundVar)

So, if we have an io in a closure, we can convert it back to a SPARQL query as we traverse it.

from io import StringIO
qbuf = StringIO()

and so on...

mar...@gmail.com

unread,
Dec 28, 2017, 10:24:37 AM12/28/17
to rdflib-dev
Hello

I don't know how relevant this is for you, but I am working on the SERVICE implementation for federated queries within rdflib

https://github.com/RDFLib/rdflib/issues/769#issuecomment-344939011

I am currently on a personal branch, investigating options; obtaining the query string from a parsed result is a piece of useful functionality

I have been looking into the evaluate processes, if, in
https://github.com/marqh/rdflib/commit/aef5726ef02347714be5e7f1e0c4342357c9d90f#diff-7839f066926df815492a71be68544a7dR214
class Comp(TokenConverter)
the 'expr' is kept, then this can be used later, e.g. for the service clause
https://github.com/marqh/rdflib/commit/aef5726ef02347714be5e7f1e0c4342357c9d90f#diff-7839f066926df815492a71be68544a7dR225

making use of pyparsing's
originalTextFor and searchString methods
(https://pythonhosted.org/pyparsing/pyparsing-module.html#originalTextFor)

perhaps there is a useful pattern here for you

all the best
mark

scossu

unread,
Jan 13, 2018, 9:13:45 PM1/13/18
to rdflib-dev
Hello,
I am also trying to construct a SPARQL update string from a parsed query structure.

E.g. I want to parse a query, validate and possibly alter some elements, and then submit the modified query to a graph.

Is Mark's approach the only one available to do that, i.e. do I have to use separate code?

I would expect that some methods must be somewehere in RDFLib that transform e.g. a Python set operation on a Graph class into a SPARQL query, and I wonder if I can use those methods directly for my custom operations.

Thanks.
Reply all
Reply to author
Forward
0 new messages