Slow query

1 view
Skip to first unread message

Håvard Ottestad

unread,
Oct 13, 2016, 3:19:31 PM10/13/16
to Stardog
Hi,

I have a query that is getting slower and slower the more data I add to stardog, even though the query isn't returning any more data.

construct{
    <http://example.com/1> ?b ?c.
    ?c ?d ?e.
} where {
    <http://example.com/1> ?b ?c.
    OPTIONAL{
        ?c ?d ?e.
   }
}

When I rewrite the query to specify all the fields in the data it runs much faster.


construct{
    <http://example.com/1> ?b ?c.
    ?friend <http://example.com/name> ?name
} where {
    <http://example.com/1> ?b ?c.
    OPTIONAL{
       OPTIONAL{
           ?friend <http://example.com/name> ?name
      }
   }
}


It seems that the original ?c ?d ?e actually does a match all against the database instead of first calculating ?c from <http://example.com/1> ?b ?c and then find all the predicates from ?c.

I've also tried to use the Getter in the java api. Starting at <http://example.com/1>, listing all the statements, then listing all the statements from that again. And it's considerably faster than the first query, but not as fast as the second query.

Regards,
Håvard M. Ottestad


Zachary Whitley

unread,
Oct 13, 2016, 3:23:43 PM10/13/16
to Stardog
I don't really have an answer but you can try taking a look at the query plan with "stardog query explain"[1] and try checking stardog.log to make sure it isn't writing some warning there that might explain it.

[1] http://docs.stardog.com/man/query-explain.html

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+unsubscribe@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
---
You received this message because you are subscribed to the Google Groups "Stardog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stardog+unsubscribe@clarkparsia.com.

Håvard Ottestad

unread,
Oct 13, 2016, 3:59:44 PM10/13/16
to Stardog

I've enabled reasoning for both queries.


This is the top query which matches <http://example.com/1> ?b ?c. and then ?c ?d ?e. It runs in around 500 ms.


Distinct [cardinality=1]
 
Projection(?sgrlcogn AS ?subject, ?b AS ?predicate, ?c AS ?object;
           
?c AS ?subject, ?d AS ?predicate, ?e AS ?object) [cardinality=1]
   
Bind((<http://example.com/1> AS ?sgrlcogn)) [cardinality=1]
     
HashJoinOuter[?c] [cardinality=1]
       
Property(<http://example.com/1>, ?b, ?c)
       
Property(?c, ?d, ?e)




And this is the other query, where I specify each predicate. It runs in around 50 ms (10x faster then the one above)


Distinct [cardinality=1]
 
Projection(?jbnpbxlx AS ?subject, ?b AS ?predicate, ?c AS ?object;
           
?jbnpbxlx AS ?subject, ?knjyjcst AS ?predicate, ?friend AS ?object;
           
?friend AS ?subject, ?yvoyruhb AS ?predicate, ?name AS ?object) [cardinality=1]
   
Bind((<http://example.com/friend> AS ?knjyjcst) (<http://example.com/1> AS ?jbnpbxlx) (<http://example.com/name> AS ?yvoyruhb)) [cardinality=1]
     
LoopJoinOuter[_] [cardinality=1]
       
Property(<http://example.com/1>, ?b, ?c)
       
MergeJoinOuter[?friend] [cardinality=1]
         
Scan[SPOC](<http://example.com/1>, <http://example.com/friend>, ?friend) [cardinality=1]
         
Scan[PSOC](?friend, <http://example.com/name>, ?name) [cardinality=100K]



Håvard

For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
---
You received this message because you are subscribed to the Google Groups "Stardog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stardog+u...@clarkparsia.com.

Håvard Ottestad

unread,
Oct 13, 2016, 4:12:49 PM10/13/16
to sta...@clarkparsia.com
I've doubled the data. Now the slowest query runs in around 900 ms. While the fast one runs in the same < 50 ms as before. 

Double the amount of data again (1 200 000 triples) and they run at 2000 ms and <80ms respectively. 

I've attached some sample data, I tried uploading a file with 250 000 triples, but it was too large. My data is just a repetition of the sample data with an incrementing index number.


PS: Yes. I'm getting a warning in the logs: 
WARN  2016-10-13 22:07:37,500 [StardogServer.WorkerGroup-6] com.complexible.stardog.reasoning.blackout.TypeOracle:inferTypes(234): The type of variable 0 is ambiguous; it will be assumed to be INDIVIDUAL.
data.ttl

Evren Sirin

unread,
Oct 13, 2016, 4:21:09 PM10/13/16
to Stardog
Looks like you are using reasoning and queries with variables in the
predicate position is problematic with reasoning. When you use an
explicit property then reasoning becomes easier. Without reasoning
both versions of the query is trivial to answer with any amount of
data. Does your use case require to run this query with reasoning?

Best,
Evren

On Thu, Oct 13, 2016 at 4:12 PM, Håvard Ottestad <hmott...@gmail.com> wrote:
> I've doubled the data. Now the slowest query runs in around 900 ms. While
> the fast one runs in the same < 50 ms as before.
>
> Double the amount of data again (1 200 000 triples) and they run at 2000 ms
> and <80ms respectively.
>
> I've attached some sample data, I tried uploading a file with 250 000
> triples, but it was too large. My data is just a repetition of the sample
> data with an incrementing index number.
>

Håvard Ottestad

unread,
Oct 13, 2016, 4:41:55 PM10/13/16
to Stardog
Hi, Evren,

I had a look at the query plan for for the slow query without reasoning. And you are correct it is trivial and very similar to the fast one.

Reduced [cardinality=1]
 
Projection(?qxohfjan AS ?subject, ?b AS ?predicate, ?c AS ?object;
           
?c AS ?subject, ?d AS ?predicate, ?e AS ?object) [cardinality=1]
   
Bind((<http://example.com/1> AS ?qxohfjan)) [cardinality=1]
     
MergeJoinOuter[?c] [cardinality=1]
       
Sort(?c) [cardinality=1]
         
Scan[SPOC](<http://example.com/1>, ?b, ?c) [cardinality=1]
       
Scan[SPOC](?c, ?d, ?e) [cardinality=600K]



The reason we are using such a broad query is to be able to have simple queries where we return n levels of data from stardog with nested optionals. This way we can return all the relevant data to the frontend and the frontend developers can pick and choose what data they need while they develop the frontend. Maybe at some point in the future we would lock down the query to a minimal set of fields, but until then I was hoping to keep the query simple and broad.

For reasoning we are using a couple of stardog rules, simple rdfs subclassing and some sub property chains with inverse.

However, if this approach is not possible we can write larger and more specific query without variables in the predicate position.

Evren Sirin

unread,
Oct 13, 2016, 4:55:24 PM10/13/16
to Stardog
Avoiding variables in predicate position will give best performance
with reasoning. Or you can run such queries without reasoning first
and then get inferences for the retrieved instances in a separate
query.

Let me also mention that the OPTIONAL in your reasoning query is also
having an impact. If both patterns were in the same pattern the query
would be faster even with reasoning but might not be fast enough:

construct{
<http://example.com/1> ?b ?c.
?c ?d ?e.
} where {
<http://example.com/1> ?b ?c.
?c ?d ?e.
}

Best,
Evren

Håvard Ottestad

unread,
Oct 14, 2016, 2:36:19 AM10/14/16
to Stardog
Thank you very much Evren and Zackary for your quick and useful help. I will rewrite my queries to specify all my predicates and attempt to reduce the use of optional.
Reply all
Reply to author
Forward
0 new messages