How to optimize a Sparql query over large number of triples

10 views
Skip to first unread message

ajay.ri...@gmail.com

unread,
Jan 13, 2016, 6:33:19 AM1/13/16
to Stardog
Hi,

We are evaluating Stardog and we are stuck at a query because of its performance.

Here is the query,

PREFIX concept: <tag:stardog:api:obf:fd612ca55e21f2836462e13e3d8811034b7da1fc4b238d45e4ab44afcac00752>
SELECT ?x0 (group_concat(?x1; separator="d03502c43d74a30b936740a9517dc4ea2b2ad7168caa0a774cefe793ce0b33e7") AS ?x2)
WHERE {
   {
      {
         ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:2f98728525480b33ebfcbc9cc58442ce11e7c261ac35451e7338a0e7f89c45b2> ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x1 .
         FILTER (regex(?x1, "24d7f03d8dc3c3666969e6fa5bb1fac4736d3f1353c28307ed51b320f9dc42d3", "de7d1b721a1e0632b7cf04edf5032c8ecffa9f9a08492152b926f1a5a7e765d7"))
      }
   }
   UNION
   {
      {
         ?x3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:b67172c1bfc12e719df784886983eda5888a03f0c144f18fd7844e10233d2216> ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x4 .
         ?x0 <tag:stardog:api:obf:b67172c1bfc12e719df784886983eda5888a03f0c144f18fd7844e10233d2216> ?x3 ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x1 .
         FILTER (regex(?x4, "24d7f03d8dc3c3666969e6fa5bb1fac4736d3f1353c28307ed51b320f9dc42d3", "de7d1b721a1e0632b7cf04edf5032c8ecffa9f9a08492152b926f1a5a7e765d7"))
      }
   }
}
GROUP BY (?x0)
OFFSET 0
LIMIT 20

The problem we think here is that the number of entities queried in,
  ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:2f98728525480b33ebfcbc9cc58442ce11e7c261ac35451e7338a0e7f89c45b2>
is 493932, and filtering by some property on this large dataset is slow.

Right now above query takes close to 2 seconds. Is there any way to optimize this?

Michael Grove

unread,
Jan 13, 2016, 7:45:27 AM1/13/16
to stardog
Could you send us the query plan?

Also, are those case insensitive regexes?  Have you considered using full-text search [1] instead?

Cheers,

Mike

 

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Ajay Kamble

unread,
Jan 13, 2016, 9:46:20 AM1/13/16
to Stardog
Hi Mike,

Here is the query plan,

Slice(offset=0, limit=20) [cardinality=20]
  Projection(?obj1, ?titles) [cardinality=1.0M]
    Group(by=[?obj1] aggregates=[(GROUP_CONCAT(DISTINCT ?title, ",") AS ?titles)]) [cardinality=1.0M]
      Union [cardinality=1.0M]
        MergeJoin[?obj1] [cardinality=804K]
          Filter(Regex(?title, "foo", "i")) [cardinality=571K]
            Scan[PSOC](?obj1, concept:title, ?title) [cardinality=1.1M]
          Scan[POSC](?obj1, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, concept:object1) [cardinality=494K]
        MergeJoin[?obj1] [cardinality=205K]
          Sort(?obj1) [cardinality=131K]
            MergeJoin[?com] [cardinality=131K]
              MergeJoin[?com] [cardinality=290K]
                Filter(Regex(?comName, "foo", "i")) [cardinality=571K]
                  Scan[PSOC](?com, concept:title, ?comName) [cardinality=1.1M]
                Scan[POSC](?com, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, concept:object2) [cardinality=191K]
              Scan[POSC](?obj1, concept:object2, ?com) [cardinality=1.3M]
          Scan[PSOC](?obj1, concept:title, ?title) [cardinality=1.1M]

I am not sure about full text search. Isn't the query that checks only single property should run faster than a full text search that might potentially check multiple properties? But I am new to Stardog so would be great to try any suggestions you have.

Michael Grove

unread,
Jan 13, 2016, 9:52:18 AM1/13/16
to stardog
On Wed, Jan 13, 2016 at 9:43 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Hi Mike,

Here is the query plan,

Slice(offset=0, limit=20) [cardinality=20]
  Projection(?obj1, ?titles) [cardinality=1.0M]
    Group(by=[?obj1] aggregates=[(GROUP_CONCAT(DISTINCT ?title, ",") AS ?titles)]) [cardinality=1.0M]
      Union [cardinality=1.0M]
        MergeJoin[?obj1] [cardinality=804K]
          Filter(Regex(?title, "foo", "i")) [cardinality=571K]
            Scan[PSOC](?obj1, concept:title, ?title) [cardinality=1.1M]
          Scan[POSC](?obj1, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, concept:object1) [cardinality=494K]
        MergeJoin[?obj1] [cardinality=205K]
          Sort(?obj1) [cardinality=131K]
            MergeJoin[?com] [cardinality=131K]
              MergeJoin[?com] [cardinality=290K]
                Filter(Regex(?comName, "foo", "i")) [cardinality=571K]
                  Scan[PSOC](?com, concept:title, ?comName) [cardinality=1.1M]
                Scan[POSC](?com, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, concept:object2) [cardinality=191K]
              Scan[POSC](?obj1, concept:object2, ?com) [cardinality=1.3M]
          Scan[PSOC](?obj1, concept:title, ?title) [cardinality=1.1M]

This is a reasonable query plan.  Most likely the performance suffers from the two case-insensitive regexes that you're applying to scans that produce over 1M entries.
 

I am not sure about full text search. Isn't the query that checks only single property should run faster than a full text search that might potentially check multiple properties? But I am new to Stardog so would be great to try any suggestions you have.

Using the full-text search index, instead of the regex, you'd scan the full-text index similar to `?title stardog:textMatch "foo"`.  You can use the lucene search syntax as the value for that BGP.  That will likely be faster than performing the regex over potentially millions of literals.

Cheers,

Mike

Ajay Kamble

unread,
Jan 14, 2016, 1:54:32 AM1/14/16
to Stardog
Hello Mike,

I was trying to write full text query but could not get it working. Tried following query but it just hangs and needs to be killed,

PREFIX concept: <tag:stardog:api:obf:fd612ca55e21f2836462e13e3d8811034b7da1fc4b238d45e4ab44afcac00752>
SELECT ?x0 (group_concat(?x1; separator="d03502c43d74a30b936740a9517dc4ea2b2ad7168caa0a774cefe793ce0b33e7") AS ?x2)
WHERE {
   {
      {
         ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:2f98728525480b33ebfcbc9cc58442ce11e7c261ac35451e7338a0e7f89c45b2> ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x1 .
         (?x1 ?x3) <tag:stardog:api:obf:edf800b5cef9b0be79b0419675297216c38c676545ecc9c1a0ab7f710c4c3fe3> ("24d7f03d8dc3c3666969e6fa5bb1fac4736d3f1353c28307ed51b320f9dc42d3" 0.5) .
      }
   }
   UNION
   {
      {
         ?x4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:b67172c1bfc12e719df784886983eda5888a03f0c144f18fd7844e10233d2216> ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x5 .
         ?x0 <tag:stardog:api:obf:b67172c1bfc12e719df784886983eda5888a03f0c144f18fd7844e10233d2216> ?x4 ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x1 .
         (?x5 ?x3) <tag:stardog:api:obf:edf800b5cef9b0be79b0419675297216c38c676545ecc9c1a0ab7f710c4c3fe3> ("24d7f03d8dc3c3666969e6fa5bb1fac4736d3f1353c28307ed51b320f9dc42d3" 0.5) .
      }
   }
}
GROUP BY (?x0)
OFFSET 0
LIMIT 20

Here is the triple that specifies full text search,

(?title ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('foo', 0.5) .

Michael Grove

unread,
Jan 14, 2016, 8:09:55 AM1/14/16
to stardog
On Thu, Jan 14, 2016 at 1:54 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Hello Mike,

I was trying to write full text query but could not get it working. Tried following query but it just hangs and needs to be killed,

PREFIX concept: <tag:stardog:api:obf:fd612ca55e21f2836462e13e3d8811034b7da1fc4b238d45e4ab44afcac00752>
SELECT ?x0 (group_concat(?x1; separator="d03502c43d74a30b936740a9517dc4ea2b2ad7168caa0a774cefe793ce0b33e7") AS ?x2)
WHERE {
   {
      {
         ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:2f98728525480b33ebfcbc9cc58442ce11e7c261ac35451e7338a0e7f89c45b2> ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x1 .
         (?x1 ?x3) <tag:stardog:api:obf:edf800b5cef9b0be79b0419675297216c38c676545ecc9c1a0ab7f710c4c3fe3> ("24d7f03d8dc3c3666969e6fa5bb1fac4736d3f1353c28307ed51b320f9dc42d3" 0.5) .
      }
   }
   UNION
   {
      {
         ?x4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <tag:stardog:api:obf:b67172c1bfc12e719df784886983eda5888a03f0c144f18fd7844e10233d2216> ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x5 .
         ?x0 <tag:stardog:api:obf:b67172c1bfc12e719df784886983eda5888a03f0c144f18fd7844e10233d2216> ?x4 ;
             <tag:stardog:api:obf:292117b50461b8d23095c6c56656d5c0af4647221eb78c67e7f9afe6b8d59784> ?x1 .
         (?x5 ?x3) <tag:stardog:api:obf:edf800b5cef9b0be79b0419675297216c38c676545ecc9c1a0ab7f710c4c3fe3> ("24d7f03d8dc3c3666969e6fa5bb1fac4736d3f1353c28307ed51b320f9dc42d3" 0.5) .
      }
   }
}
GROUP BY (?x0)
OFFSET 0
LIMIT 20

Here is the triple that specifies full text search,

(?title ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('foo', 0.5) .

What is the query plan?
Reply all
Reply to author
Forward
0 new messages