performance problem with stardog

2 views
Skip to first unread message

Lucas Wagner

unread,
Nov 18, 2014, 10:59:30 AM11/18/14
to sta...@clarkparsia.com
Hi,

I'm trying to execute a sparql query against a stardog server using snarl protocol in java. The problem is that it takes about 40 seconds to execute this query when Virtuoso OpenSource takes only a few milliseconds with the same database.

Here is the query:

SELECT *
FROM
<graph>{
?a ?b ?c .
FILTER
(?c in (wd:SF, wd:N, wd:E))  
}

There is approximately 8 000 000 triples in the database. 

Is it stardog which have some difficulties dealing with huge database or could it be something else?

Thanks.

Lucas WAGNER

Mike Grove

unread,
Nov 18, 2014, 11:06:01 AM11/18/14
to stardog
8M triples is not a huge database, Stardog will handle tens of billions of statements efficiently.

How are you measuring the 40s?  A single execution against a cold server/JVM is not a meaningful result.  You should consider running the query for a number of warm-up runs before executing the query many more times taking the average response time to get a true measure of the evaluation time for that query.

Cheers,

Mike
 

Thanks.

Lucas WAGNER

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Lucas Wagner

unread,
Nov 19, 2014, 4:10:32 AM11/19/14
to sta...@clarkparsia.com
Hi,

to measure these 40s, I first tried to run the query 5 times in a java application then i tried to run it 5 more times in console using stardog query.

I didn't catch what you mean about warming-up server. I thought that the server would be ok after being started. How many warm-up runs would you recommend?

After further tests, it seems that it's not the query execution which take so much time but the reading of results. i ran a test class and printed durations (in ms). 
 
query execution 0 : 728
query execution
+ result reading : 35308
query execution
1 : 16
query execution
+ result reading : 34602
query execution
2 : 15
query execution
+ result reading : 34285
query execution
3 : 33
query execution
+ result reading : 34496
query execution
4 : 32
query execution
+ result reading : 34950
query execution
5 : 31
query execution
+ result reading : 34209
query execution
6 : 16
query execution
+ result reading : 34344
query execution
7 : 8
query execution
+ result reading : 34295
query execution
8 : 48
query execution
+ result reading : 34890
query execution
9 : 15
query execution
+ result reading : 34797
query execution
10 : 18
query execution
+ result reading : 34508
query execution
11 : 16
query execution
+ result reading : 34865
query execution
12 : 32
query execution
+ result reading : 34574
query execution
13 : 15
query execution
+ result reading : 34232
query execution
14 : 15
query execution
+ result reading : 34052
query execution
15 : 16
query execution
+ result reading : 34146
query execution
16 : 15
query execution
+ result reading : 34253
query execution
17 : 21
query execution
+ result reading : 34434
query execution
18 : 16
query execution
+ result reading : 34348
query execution
19 : 32
query execution
+ result reading : 34476
total duration query execution
: 1138
total duration query execution
+ result reading : 690064
average duration query execution
: 56
average duration query execution
+ result reading : 34503

the query returns about 1000 results. 

Regards, 

Lucas WAGNER

Evren Sirin

unread,
Nov 19, 2014, 8:54:43 AM11/19/14
to Stardog
The query results are computed lazily so when you call Query.execute()
it will return immediately and the real work will be done when you
iterate over the results.

One issue with your query is the use of filter. Filters are evaluated
after their variables are bound so they don't make use of the indexes
in the database. Equality filters in queries are rewritten
automatically by Stardog to use indexes but IN expressions are not. If
you rewrite your query to use UNIONs instead it should speed up your
query:

{ ?a ?b wd:SF } UNION { ?a ?b wd:N } UNION { ?a ?b wd:E }

Best,
Evren

Lucas Wagner

unread,
Nov 19, 2014, 9:55:16 AM11/19/14
to sta...@clarkparsia.com
Thanks Evren, I didn't know that stardog works like that (I'm not an english native so I sometimes miss information like that when I read an english doc).

I tried your solution and it's nearly instant now.

Thanks.

Lucas WAGNER

Lucas Wagner

unread,
Nov 20, 2014, 4:42:28 AM11/20/14
to sta...@clarkparsia.com
Hello,

Sorry to bother you again with that but i have another performance problem with stardog.

I'm trying to run a count query which contains FILTER NOT EXISTS and MINUS. It takes long time and i can't find how to avoid the problem.

here is the complete request :

SELECT (COUNT(?id) AS ?nb)
WHERE
{
 
?id rdf:type wd:P .
  FILTER NOT EXISTS
{?id wd:IST ?IST . ?id rdf:type wd:P . } .
  FILTER NOT EXISTS
{?id wd:D ?D . ?id rdf:type wd:P . } .
  MINUS
{
    SELECT DISTINCT
?id {
      GRAPH
?a {
       
?id rdf:type wd:P .
       
?b rdf:type wd:AR .
       
?b wdAS ?id .
     
}
   
}
 
}
}


I tried to run the query only with one FILTER NOT EXISTS : 

SELECT (COUNT(?id) AS ?nb)
WHERE
{
 
?id rdf:type wd:P .
  FILTER NOT EXISTS
{?id wd:IST ?IST . ?id rdf:type wd:P . } .
}


and it's the same problem.

I also tried to substitute a FILTER to the FILTER NOT EXISTS :

SELECT (COUNT(?id) AS ?nb )
WHERE
{
 
?id rdf:type wd:P .
 
?id ?rel ?o .
  FILTER
(?rel != wd:IST) .
}


but the problem remains unsolved. 

Does someone have an idea to replace FILTER NOT EXISTS which improve this query please?

Thanks.

Lucas WAGNER

Lucas Wagner

unread,
Nov 24, 2014, 2:44:47 AM11/24/14
to sta...@clarkparsia.com
Hi,

Any idea to solve this problem? I'm really stuck with that.

Thanks in advance,

Lucas WAGNER

Evren Sirin

unread,
Nov 24, 2014, 9:46:19 AM11/24/14
to Stardog
On Thu, Nov 20, 2014 at 4:42 AM, Lucas Wagner <lucas....@webdrone.fr> wrote:
> Hello,
>
> Sorry to bother you again with that but i have another performance problem
> with stardog.
>
> I'm trying to run a count query which contains FILTER NOT EXISTS and MINUS.
> It takes long time and i can't find how to avoid the problem.
>
> here is the complete request :
>
> SELECT (COUNT(?id) AS ?nb)
> WHERE {
> ?id rdf:type wd:P .
> FILTER NOT EXISTS {?id wd:IST ?IST . ?id rdf:type wd:P . } .
> FILTER NOT EXISTS {?id wd:D ?D . ?id rdf:type wd:P . } .
> MINUS {
> SELECT DISTINCT ?id {
> GRAPH ?a {
> ?id rdf:type wd:P .
> ?b rdf:type wd:AR .
> ?b wdAS ?id .
> }
> }
> }
> }
>
>
> I tried to run the query only with one FILTER NOT EXISTS :
>
> SELECT (COUNT(?id) AS ?nb)
> WHERE {
> ?id rdf:type wd:P .
> FILTER NOT EXISTS {?id wd:IST ?IST . ?id rdf:type wd:P . } .
> }

You don't need the rdf:type triple in the FILTER. That triple will
obviously be satisfied since it also exists in the query body. So try
the simpler query:

SELECT (COUNT(?id) AS ?nb)
WHERE {
?id rdf:type wd:P .
FILTER NOT EXISTS {?id wd:IST ?IST } .
}

>
>
> and it's the same problem.
>
> I also tried to substitute a FILTER to the FILTER NOT EXISTS :
>
> SELECT (COUNT(?id) AS ?nb )
> WHERE {
> ?id rdf:type wd:P .
> ?id ?rel ?o .
> FILTER (?rel != wd:IST) .
> }

This query is not equivalent to the previous query. The filter in this
query is useless since ?rel can bind to rdf:type and any resource that
satisfies the type triple will be returned.

Best,
Evren

Lucas Wagner

unread,
Nov 24, 2014, 10:25:05 AM11/24/14
to sta...@clarkparsia.com
Hi Evren,

Thanks for your answer. As you said, the rdf:type in the FILTER was'nt needed. I removed it and the query still works. 
However, I still have the same problem. This request takes about 40 seconds to be executed and I can't simply leave it like that since there is this type of query twice in a row to print a screen.

It seems to be the association of COUNT and FILTER NOT EXISTS which is problematic. That's why I asked for a different way to do that but if anyone has a solution which allow this association, it would be perfect.
For information, I also encountered this kind of problem with a DISTINCT clause and FILTER NOT EXISTS recently.

Thanks.

Lucas WAGNER

Evren Sirin

unread,
Nov 24, 2014, 11:02:46 AM11/24/14
to Stardog
On Mon, Nov 24, 2014 at 10:25 AM, Lucas Wagner <lucas....@webdrone.fr> wrote:
> Hi Evren,
>
> Thanks for your answer. As you said, the rdf:type in the FILTER was'nt
> needed. I removed it and the query still works.
> However, I still have the same problem. This request takes about 40 seconds
> to be executed and I can't simply leave it like that since there is this
> type of query twice in a row to print a screen.

What is the count this query returns? What is the count without the filter?

>
> It seems to be the association of COUNT and FILTER NOT EXISTS which is
> problematic. That's why I asked for a different way to do that but if anyone
> has a solution which allow this association, it would be perfect.
> For information, I also encountered this kind of problem with a DISTINCT
> clause and FILTER NOT EXISTS recently.

In some cases, using the FILTER/OPTIONAL/!BOUND template might be faster.

SELECT (COUNT(?id) AS ?nb)
WHERE {
?id rdf:type wd:P .
OPTIONAL {?id wd:IST ?IST } .
FILTER (!bound(?IST))
}

Best,
Evren

Lucas Wagner

unread,
Nov 24, 2014, 11:22:36 AM11/24/14
to sta...@clarkparsia.com
Thanks Evren, 
I tried to use this template and the query execution duration was reduced to 3 seconds which is perfect.

Do you think it could come from the amount of lines which are filtered? Indeed, the query result is 224 000 with the filter and without it, it increases to 4 000 000.

Regards,

Lucas WAGNER
Reply all
Reply to author
Forward
0 new messages