Finding rdf:type is slow

7 views
Skip to first unread message

Håvard Ottestad

unread,
Oct 15, 2016, 5:35:49 AM10/15/16
to Stardog
Hi,

I'm trying to get the types of child nodes by doing an inverse traversal og their parent relation. And the query is quite slow, around 600 ms for getting the types of 29 children. I can get labels for the children and many other properties in < 200 ms. 

Is this normal, or is there some way to optimize it? I'm using SL reasoning.

prefix arkiv: <http://www.arkivverket.no/standarder/noark5/arkivstruktur/>

CONSTRUCT {
?mappe ^arkiv:parent ?jp3.
?jp3 a ?jp3Type.
}
WHERE{
?mappe ^arkiv:parent ?jp3.
?jp3 a ?jp3Type.
}

with the query plan:

From all
Distinct [cardinality=58]
 
Projection(?jp3 AS ?subject, ?nyyalomg AS ?predicate, ?mappe AS ?object;
           
?jp3 AS ?subject, ?qdgrmryi AS ?predicate, ?jp3Type AS ?object) [cardinality=58]
   
Bind((<http://www.arkivverket.no/standarder/noark5/arkivstruktur/parent> AS ?nyyalomg) (<http://www.arkivverket.no/standarder/noark5/arkivstruktur/Saksmappe--test--827--2015> AS ?mappe) (rdf:type AS ?qdgrmryi)) [cardinality=58]
     
Type(?jp3, ?jp3Type)
       
Scan[POSC](?jp3, arkiv:parent, arkiv:Saksmappe--test--827--2015) [cardinality=29]


Regards,
Håvard M. Ottestad

Pavel Klinov

unread,
Oct 15, 2016, 9:38:59 AM10/15/16
to sta...@clarkparsia.com
Hi Håvard,

Currently this is the expected behavior. Patterns which have variables in the predicate position or in the object position of rdf:type are not supported by OWL 2 Direct Semantics in the SPARQL Entailment Regimes spec, which is what Stardog implements (see Table 7.2, line Legal Queries in [1] esp. the part regarding a mapping from BGPs to OWL). Very informally, such patterns do not map to first-order query atoms which is known to cause computational issues, in particular, to query rewriting.

Stardog goes beyond the spec in this regard and integrates such patterns into the query rewriting approach. But it has a cost of running aux queries inside the Type operator, which is what you're observing here. 

We're considering various possibilities to improve the situation. Meanwhile the best you can do (assuming you have to query for types) is to make other patterns in the same BGP, e.g. ?mappe ^arkiv:parent ?jp3, as selective as possible.

HTH,
Pavel

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+unsubscribe@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Håvard Ottestad

unread,
Oct 15, 2016, 5:24:33 PM10/15/16
to Stardog
Hi Pavel,

Thank you for the very informative reply.

I've found that turning on automatic consistency checking together with making most of my classes in the ontology disjoint has improved performance. Halved query time in most cases. Also, pulling out the type query in separate union blocks seems to have helped a bit.

I'll have to consider another approach though, maybe doing two separate queries, one for the data and another for the type info (without reasoning) and then forward chaining our data model instead which is implemented in SHACL. Currently we are using SHACL on the frontend to constrain our data model, so that predicates with max 1 cardinality become objects or literals while those without max 1 cardinality become arrays. The type info is then used to get the applicable constraints.

There wouldn't be a way to write one query where I can extract type info without reasoning while the rest of the query uses reasoning? Would sp:directType help?

Regards,
Håvard



Pavel Klinov

unread,
Oct 17, 2016, 7:38:58 AM10/17/16
to sta...@clarkparsia.com
No, I don't think there's a way to get type info without reasoning. What you're doing here is called instance realization (in OWL/DL parlance -- computing all atomic classes for an individual or set of individuals), which is a core reasoning task.

For most queries sp:directType can't help performance. It obviously reduces the number of results, which could be important in some cases, but here it just means the reasoner has to do extra work filtering out indirect types.

For your query performance depends on 2 factors:

1) the number of individuals for which the reasoner needs to figure out types. That's why the pattern which binds the individual variable should be selective.
2) Size and complexity of your schema (ontology). We have some information on that in the docs [1].

Cheers,
Pavel


Reply all
Reply to author
Forward
0 new messages