I'm having trouble with the Talis SPARQL endpoint, even for some
pretty simple queries, but on a largish store (a megatriple roughly?).
I'm asking for
which corresponds to the simple query
SELECT DISTINCT ?p
WHERE { ?s ?p ?o
}
LIMIT 10
I get the error message "class
com.hp.hpl.jena.graph.query.BufferPipe$Finished" which presumably
anyone else can reproduce! Is this just that the SPARQL engine doesn't
know how to run this query efficiently, and is running into a wall? Or
is something else going on?
advice appreciated!
thanks,
Scott Morrison
adding "ORDER BY ?p" to the end of that query returns instead
Query exceeded the maximum number of solutions allowed for sorting. To
preserve a minimum level of service for all platform users we
currently limit the number of solutions that may be sorted to 50000.
You may be able to reformulate your query to return less solutions or
remove the ordering.
It seems that this indicates the SPARQL engine is doing the
inefficient thing --- ORDERing first, then looking for DISTINCT
values. Surely for this sort of query the engine should just be
looking up all the key values in some hashtable indexed by the
"predicate"? This sort of query seems fairly fundamental, in the level
of discovering whether a SPARQL endpoint is actually interesting for a
given use. Perhaps I'm misunderstanding some obstacle to making the
SPARQL engine deal with such queries efficiently, however, because I
know almost nothing about how such engines work!
thanks,
scott
yes, you're absolutely right - the SPARQL engine is doing the
inefficient thing here. The SPARQL endpoints in the Platform are
currently based on Jena (as you can see from the ugly error responses
you got). We've made some tweaks to the core engine to halt queries
which consume excessive system resources, and this is one area which is
undergoing particularly rapid evolution as we identify and iron out
these issues.
I've raised this as a critical issue and we've begun investigation, you
can track progress at http://jira.talis.com/browse/API-153
Providing a reliable SPARQL endpoint over potentially thousands of
graphs, each containing potentially millions of triples is one the our
biggest challenges and the more usage/feedback we can get, the better we
can make the service. So please do keep experimenting and helping us
improve the Platform.
Cheers,
Sam
Just a quick update to let you know we're still investigating. We've
created a duplicate of your store and pulled it back into our scaling
environment so we can isolate what's causing the problem.
We'll keep you posted as soon as we go.
Thanks,
Sam
Hi Scott,
the good news is that we've fixed the problem with your store, and its available for use again. The bad news is that the Sparql engine is still doing the inefficient thing - so that the query you're trying to run ( select distinct predicates ) is exceeding the maximum allowed time when executed against your store. This currently results in a 502 proxy error, but in the next release will return a more informative response. We're also continually working to improve performance so you can expect the Sparql service to become more efficient as the Platform matures.
Sorry for the disruption and thanks again for the feedback, please keep it coming.
Cheers,
Sam