How to optimize query with finite VALUES?

Ajay Kamble

unread,

Feb 5, 2016, 10:36:47 AM2/5/16

to sta...@clarkparsia.com

I am trying to optimize following query. It takes about 800 ms. I want it to run much faster (200-300 ms) as exact ids are given (contrary to a search).

Here is the query,

PREFIX co: <http://abc.com/ns/concepts#>
PREFIX d: <http://abc.com/ns/data#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>


SELECT ?p1
       (GROUP_CONCAT(?y; SEPARATOR=',') AS ?ys)
       ?z
       ?c1
       ?e1
WHERE {
  VALUES ?p1 { d:001 d:002 d:003 d:004 d:005 d:006 d:007 d:008 d:009 d:010 d:011 d:012 d:013 d:014 d:015 d:016 d:017 d:018 d:019 d:020 }
  ?p1 co:f1 ?y ;
       co:g1 ?m1 ;
       co:h1 ?n1 ;
       co:i1 ?b1 ;
  OPTIONAL { ?p1 co:j1 ?d1 . ?d1 skos:prefLabel ?e1 .}
  OPTIONAL { ?p1 co:k1 a1 . a1 co:l1 ?z . }
  ?b1 skos:prefLabel ?c1 .
}
GROUP BY ?p1 ?c1 ?e1 ?z
ORDER BY ASC(?m1) DESC(?n1)

Here is the query plan,



prefix co: <http://abc.com/ns/concepts#>
prefix d: <http://abc.com/ns/data#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>


Projection(?p1, ?ys, ?z, ?c1, ?e1) [cardinality=0]
  OrderBy(ASC(?m1), DESC(?n1)) [cardinality=0]
    Group(by=[?p1, ?e1, ?z, ?c1] aggregates=[(GROUP_CONCAT(?y, ",") AS ?ys)]) [cardinality=0]
      HashJoin[?b1] [cardinality=0]
        MergeJoinOuter[?p1] [cardinality=0]
          MergeJoinOuter[?p1] [cardinality=0]
            MergeJoin[?p1] [cardinality=0]
              Sort(?p1) [cardinality=20]
                VALUES (?p1) {
                  (http://abc.com/ns/data#015 )
                  (http://abc.com/ns/data#008 )
                  (http://abc.com/ns/data#004 )
                  (http://abc.com/ns/data#018 )
                  (http://abc.com/ns/data#020 )
                  (http://abc.com/ns/data#006 )
                  (http://abc.com/ns/data#019 )
                  (http://abc.com/ns/data#005 )
                  (http://abc.com/ns/data#009 )
                  (http://abc.com/ns/data#014 )
                  (http://abc.com/ns/data#010 )
                  (http://abc.com/ns/data#007 )
                  (http://abc.com/ns/data#017 )
                  (http://abc.com/ns/data#013 )
                  (http://abc.com/ns/data#001 )
                  (http://abc.com/ns/data#002 )
                  (http://abc.com/ns/data#003 )
                  (http://abc.com/ns/data#011 )
                  (http://abc.com/ns/data#016 )
                  (http://abc.com/ns/data#012 )
                }
              MergeJoin[?p1] [cardinality=731K]
                Scan[PSOC](?p1, co:f1, ?y) [cardinality=806K]
                MergeJoin[?p1] [cardinality=445K]
                  Scan[PSOC](?p1, co:h1, ?n1) [cardinality=494K]
                  MergeJoin[?p1] [cardinality=445K]
                    Scan[PSOC](?p1, co:i1, ?b1) [cardinality=447K]
                    Scan[PSOC](?p1, co:g1, ?m1) [cardinality=494K]
            Sort(?p1) [cardinality=108K]
              MergeJoin[?d1] [cardinality=108K]
                Scan[POSC](?p1, co:j1, ?d1) [cardinality=108K]
                Scan[PSOC](?d1, skos:prefLabel, ?e1) [cardinality=64K]
          Sort(?p1) [cardinality=316K]
            MergeJoin[a1] [cardinality=316K]
              Scan[POSC](?p1, co:k1, a1) [cardinality=316K]
              Scan[PSOC](a1, co:l1, ?z) [cardinality=316K]
        Scan[PSOC](?b1, skos:prefLabel, ?c1) [cardinality=64K]

What can be done to fix this?

Also, is there any documentation that explains how to read query plan/use it for optimization.

Pavel Klinov

unread,

Feb 5, 2016, 11:11:51 AM2/5/16

to sta...@clarkparsia.com

Ajay,

How did you get 800ms, e.g. how many times did you run the query and did the average include the first run or not?

Also, I don't think the optimizer can do a better job with VALUES here. I suspect that the only issue here is the OPTIONALs. Can you run two additional queries to report how many results the following two BGPs return:

select (count(*) as ?c) where { ?p1 co:j1 ?d1 . ?d1 skos:prefLabel ?e1 . }

and

select (count(*) as ?c) where { ?p1 co:j1 ?d1 . ?d1 skos:prefLabel ?e1 . }

Best,

Pavel

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Pavel Klinov

unread,

Feb 5, 2016, 2:05:53 PM2/5/16

to sta...@clarkparsia.com

On Fri, Feb 5, 2016 at 5:11 PM, Pavel Klinov <pa...@clarkparsia.com> wrote:

Ajay,

How did you get 800ms, e.g. how many times did you run the query and did the average include the first run or not?

Also, I don't think the optimizer can do a better job with VALUES here. I suspect that the only issue here is the OPTIONALs. Can you run two additional queries to report how many results the following two BGPs return:

select (count(*) as ?c) where { ?p1 co:j1 ?d1 . ?d1 skos:prefLabel ?e1 . }
and
select (count(*) as ?c) where { ?p1 co:j1 ?d1 . ?d1 skos:prefLabel ?e1 . }

My apologies for the copy'n'paste error, I meant

select (count(*) as ?c) where { ?p1 co:k1 a1 . a1 co:l1 ?z . }

for the second query.

Cheers,

Pavel

Ajay Kamble

unread,

Feb 6, 2016, 10:53:12 AM2/6/16

to sta...@clarkparsia.com

Hi Pavel,

1. 800 ms was based on a number of requests except the first one. I have observed that the first query takes a long time (true of most of the queries that I have tried).

2. Count 1: select (count(*) as ?c) where { ?p1 co:j1 ?d1 . ?d1 skos:prefLabel ?e1 . } is 107082.

3. Count 2: select (count(*) as ?c) where { ?p1 co:k1 a1 . a1 co:l1 ?z . } is 316405.

Ajay Kamble

unread,

Feb 8, 2016, 1:01:43 AM2/8/16

to Stardog

What is a good strategy to handle optional values. What would you suggest?

Is it recommended to keep some default value like null/nil maybe (so that nothing is optional in data)?

-Ajay

Pavel Klinov

unread,

Feb 8, 2016, 6:06:21 AM2/8/16

to sta...@clarkparsia.com

On Mon, Feb 8, 2016 at 7:01 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:

What is a good strategy to handle optional values. What would you suggest?

Is it recommended to keep some default value like null/nil maybe (so that nothing is optional in data)?

There's nothing special about implementation of outer joins in Stardog that would make them considerably slower than inner joins. You just need to keep in mind the basic things, e.g. they tend to return more results than inner joins (i.e. some with unbound values).

One thing you can try is to copy your VALUES into the OPTIONALs. This may help the optimizer effectively filter values of ?p1 before evaluating the outer joins.

Another thing is that ?m1 and ?n1 are not group variables in your query thus they are unbound at the point when ORDER BY is applied. As such, ordering is meaningless. You may see this by looking at the SPARQL algebra of your query using the SPARQL query validator at http://sparql.org/query-validator.html

Cheers,

Pavel

Ajay Kamble

unread,

Feb 22, 2016, 2:01:23 AM2/22/16

to Stardog

Thank you Pavel.

I put VALUES inside both OPTIONALs and the issue is solved. Now I get results within 200 ms.

Also, I found Sparql Algebra (which I was not aware of) good tool for learning. Are there any documents (general/stardog specific) that explain this. I found some but are difficult read (for example W3C specification).

-Regards

Ajay

Pavel Klinov

unread,

Feb 22, 2016, 2:37:47 AM2/22/16

to sta...@clarkparsia.com

On Mon, Feb 22, 2016 at 8:01 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:

Thank you Pavel.

I put VALUES inside both OPTIONALs and the issue is solved. Now I get results within 200 ms.

I'm glad it worked.

Also, I found Sparql Algebra (which I was not aware of) good tool for learning. Are there any documents (general/stardog specific) that explain this. I found some but are difficult read (for example W3C specification).

There's nothing Stardog specific regarding SPARQL Algebra. You're right that W3C docs aren't aimed at helping people learn, they're specifications to describe the correct behavior. You may find papers or presentations on SPARQL semantics more useful for learning, e.g.:

https://ai.wu.ac.at/~polleres/sparqltutorial/ESWC2007_SPARQL_Tutorial_unit2b.pdf

http://www.inf.tu-dresden.de/content/institutes/ki/cl/study/summer14/fswt/slides/FSWT2014-L08-SPARQL-Algebra.pdf

Cheers,

Pavel

--

Ajay Kamble

unread,

Feb 22, 2016, 3:48:14 AM2/22/16

to Stardog

Thank you again Pavel. These documents are helpful!

-Regards

Ajay

On Friday, February 5, 2016 at 9:06:47 PM UTC+5:30, Ajay Kamble wrote:

Reply all

Reply to author

Forward