I am working on this library
https://github.com/paulhoule/gastrodon
and one of the goals for the library is that it should know more about SPARQL than most users.
Here are two bits of SPARQL intelligence that the library already has:
(1) It uses the SPARQL parser in rdflib to look for a GROUP BY statement in a query and if the group variables are used in the SELECT clause, these are automatically made the indexes of a pandas data frame made from the SELECT output.
(2) The library also substitutes binding variables into SPARQL queries in order to use binding variables with queries sent to remote SPARQL endpoints.
I've been going at (2) with a rather stupid approach based on str.replace() which worked for a while, but then I found an easy way to break it. I can see a hack that will get me through the day, but there is a way to break that, and even though I think I see a way to make one that is unbreakable, it is enough work that I might want to write something that can turn SPARQL parse trees back into text because that, in principle, is the way to complete SPARQL intelligence.
So I am thinking about the right way to do it. For a while I was puzzled by the large parse trees created by simple expression, for instance,
parseQuery("SELECT (5 as ?o) {}")
parses to
[([], {}), SelectQuery_{'projection': [vars_{'expr': ConditionalOrExpression_{'expr': ConditionalAndExpression_{'expr': RelationalExpression_{'expr': AdditiveExpression_{'expr': MultiplicativeExpression_{'expr': rdflib.term.Literal('5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))}}}}}, 'evar': rdflib.term.Variable('o')}], 'where': GroupGraphPatternSub_{}}], {})
But the more I think about it, this is just how parse trees produced by something like pyparsing work -- the precedence structure is embedded in the sequence of nodes; I can imagine this might make query understanding a little harder than it has to be, but it shouldn't be a barrier to unparsing because if the conditionalAndExpression has only one leg, the string form is just the string form of the single leg.
Another thing I am thinking about is that I'd like to have some way to substitute list variables from Python into the ExpressionList in an IN clause. It seems I could add an XVAR term which is ?? + VARNAME, then add the appropriate production, but I hate the idea of having to copy that whole parse tree when I only want to make a small change.
Any ideas?
I don't know how relevant this is for you, but I am working on the SERVICE implementation for federated queries within rdflib
https://github.com/RDFLib/rdflib/issues/769#issuecomment-344939011
I am currently on a personal branch, investigating options; obtaining the query string from a parsed result is a piece of useful functionality
I have been looking into the evaluate processes, if, in
https://github.com/marqh/rdflib/commit/aef5726ef02347714be5e7f1e0c4342357c9d90f#diff-7839f066926df815492a71be68544a7dR214
class Comp(TokenConverter)
the 'expr' is kept, then this can be used later, e.g. for the service clause
https://github.com/marqh/rdflib/commit/aef5726ef02347714be5e7f1e0c4342357c9d90f#diff-7839f066926df815492a71be68544a7dR225
making use of pyparsing's
originalTextFor and searchString methods
(https://pythonhosted.org/pyparsing/pyparsing-module.html#originalTextFor)
perhaps there is a useful pattern here for you
all the best
mark