Reading query plans

Ajay Kamble

unread,

Mar 3, 2016, 2:55:38 AM3/3/16

to Stardog

Is there any document/resource that explains how to understand query plans?

Some questions that I have,

1. What are types of indexes that are mentioned in Scan, example - Scan[POSC]

2. How important is cardinality

3. How to identify problems from query plan

-Ajay

Pavel Klinov

unread,

Mar 3, 2016, 3:23:45 AM3/3/16

to sta...@clarkparsia.com

On Thu, Mar 3, 2016 at 8:55 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:

Is there any document/resource that explains how to understand query plans?

Not at the moment but we're considering adding a section for advanced users.

Some questions that I have,

1. What are types of indexes that are mentioned in Scan, example - Scan[POSC]

POSC means that the index supports iteration over quads in the order of predicate/object/subject/context. Other index abbreviations have the same semantics.

2. How important is cardinality

It depends. Estimations of some nodes in the plan tree are more accurate than others but this is constantly changing transparently to the user (as we improve the internals). The query optimizer makes decisions based on estimated cardinalities but it also reasons under uncertainty knowing that some estimations are inherently inaccurate.

We could have dropped estimations from plans showed to the user but often plans are the only piece of information that the users can share with us and it gives us some insights.

3. How to identify problems from query plan

This is a tricky question. Generally there're two kinds of nodes in the plan: i) those which process streaming data and ii) those which accumulate intermediate results before processing. It is often the case that ii) hit latency and cause higher memory consumption (or even disk IO if the intermediate results are flushed to disk). The prime examples of ii) are sort operators which typically pre-process data for merge joins, hash joins, aggregation, etc. Examples of i) are merge joins, filters (other than FILTER EXISTS), projections, etc.

Now, I've said it's tricky and all said above is a very simplified picture. For example some of our hash joins depart from the classical hash join algorithm described in the database literature and are stateless. They are not currently marked in the plan in any special way. Similar caveats exist for other plan nodes. This makes it difficult to describe in the docs in a way that is both conceptually simple and useful to the user.

But we might still do it when we have time.

Cheers,

Pavel

-Ajay

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Ajay Kamble

unread,

Mar 3, 2016, 4:42:10 AM3/3/16

to Stardog

On Thursday, March 3, 2016 at 1:53:45 PM UTC+5:30, Pavel Klinov wrote:

On Thu, Mar 3, 2016 at 8:55 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Is there any document/resource that explains how to understand query plans?

Not at the moment but we're considering adding a section for advanced users.

Some questions that I have,

1. What are types of indexes that are mentioned in Scan, example - Scan[POSC]

POSC means that the index supports iteration over quads in the order of predicate/object/subject/context. Other index abbreviations have the same semantics.

Does context mean a graph (default or named)? Could you tell me what indexes are available? Which indexes are considered fast, which ones are considered slow. This will help me in finding bottleneck.

Pavel Klinov

unread,

Mar 3, 2016, 4:52:57 AM3/3/16

to sta...@clarkparsia.com

On Thu, Mar 3, 2016 at 10:42 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:

On Thursday, March 3, 2016 at 1:53:45 PM UTC+5:30, Pavel Klinov wrote:

On Thu, Mar 3, 2016 at 8:55 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Is there any document/resource that explains how to understand query plans?

Not at the moment but we're considering adding a section for advanced users.

Some questions that I have,

1. What are types of indexes that are mentioned in Scan, example - Scan[POSC]

POSC means that the index supports iteration over quads in the order of predicate/object/subject/context. Other index abbreviations have the same semantics.

Does context mean a graph (default or named)?

Yes.

Could you tell me what indexes are available? Which indexes are considered fast, which ones are considered slow. This will help me in finding bottleneck.

All indexes are equally fast. In 99% of situations performance of index scans is not the problem. If you have a performance problem, you should look at how results of scans are processed (e.g. joined, filtered, etc.). Of course, a sub-optimal plan may mean that scans fetch more data from the index than necessary but again the root cause isn't scans or indexes.

Cheers,

Pavel

Ajay Kamble

unread,

Mar 8, 2016, 4:42:14 AM3/8/16

to Stardog

Could you give some clues/examples to find which joins/filters are causing problems?

What kind of joins are available (I've seen MergeJoin, HashJoin, N-ary Join). Are there any thumb rules like some joins are good and others are not?

-Ajay

Pavel Klinov

unread,

Mar 8, 2016, 4:53:31 AM3/8/16

to sta...@clarkparsia.com

Merge joins are generally faster than hash joins and have lower memory footprint (but watch for sort nodes as arguments to a merge join). Performance of filters or sort operations largerly depends on the number of binding sets (aka solutions) that the filter or the sort operation is applied to.

You can check how many times the filter is applied to by running a sub-query which corresponds to the argument of the filter node in the plan. Same works for sort operations.

Cheers,

Pavel

--

Reply all

Reply to author

Forward