No index use when using sub query

93 views
Skip to first unread message

Jamie Blair

unread,
Mar 2, 2015, 11:26:18 AM3/2/15
to orient-...@googlegroups.com

The following query returns a set of `@rid` and completes in about `0.012sec`   

SELECT in FROM SomeEdge WHERE out IN #27:819

Now if I were to select from another Vertex using those `@rid`s in there literal form, this would take a very long time and I get a timeout (above 4seconds). I'm presuming its scanning over all the entries

SELECT FROM SomeVertex WHERE @rid in [#23:83354, #23:366, #23:99933, #23:80708, #23:70291]

Interestingly enough if I were to use a single `@rid` rather than a set it would be fast. So I'm assuming the query optimizer has the scope to go a little further and also optimize multiple results

SELECT FROM SomeVertex WHERE @rid in [#23:83354]

But not to worry because I can make this faster by adding an index to `SomeVertex.@rid` so now this is fast again

CREATE INDEX foo on SomeVertex (@rid) unique
SELECT FROM
SomeVertex WHERE @rid in [#23:83354,#23:366,#23:99933,#23:80708,#23:70291]

But if I compose the 2 queries, I'd expect this to be fast, but it's still slow and causes timeout (above 4 seconds)

SELECT FROM SomeVertex WHERE @rid in (SELECT in FROM SomeEdge WHERE out IN #27:819)

I'm assuming I could write this in another way, but I'm more interested in why this is slow. Is this a bug or if not are there any details on how/why this is slow?

Luigi Dell'Aquila

unread,
Mar 2, 2015, 11:55:51 AM3/2/15
to orient-...@googlegroups.com
Hi Jamie,

this is a known issue, we are working hard on the new query parser and executor and one of the main goals of all this is query optimization.

Thanks

Luigi


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jamie Blair

unread,
Mar 2, 2015, 12:11:41 PM3/2/15
to orient-...@googlegroups.com
Luigi,

Have you any idea how I would make this fast, or is there a work around for the present query optimizer? Also is there a ticket I can track in github for this?

Thanks,
Jamie

Luigi Dell'Aquila

unread,
Mar 3, 2015, 2:21:34 AM3/3/15
to orient-...@googlegroups.com
Hi Jamie,

yes, the right thing to make it faster is this:

select expand(inV()) FROM SomeEdge WHERE out IN #27:819

Currently there are no work arounds for that at parser level. 
The problem is not tracked as a single issue because it's a wide topic, and it's being addressed as a full development process, but if you want you can create a specific issue for this

Regards

Luigi

Jamie Blair

unread,
Mar 3, 2015, 5:43:00 AM3/3/15
to orient-...@googlegroups.com
Luigi,

Thanks, but there is an extra part to our problem as soon as we want to put conditions on the Vertex we can't. So for our use case we have a lucene index on the Vertex:name, I've found the expand to not be very useful because it kind of backs me into a corner with adding conditions to my query.

Is there a list of the current issues with the query optimizer anywhere? Also if anybody could give me an example of how I would also add a lucene condition to the above query that'd be greatly appreciated  

Thanks,
Jamie

Luca Garulli

unread,
Mar 5, 2015, 2:48:46 PM3/5/15
to orient-database
Hi Jamie,
First, OrientDB supports direct loading by RID at O(1) cost. So this:


SELECT FROM SomeVertex WHERE @rid in [#23:83354, #23:366, #23:99933, #23:80708, #23:70291]

Can be translated to:

SELECT FROM [#23:83354, #23:366, #23:99933, #23:80708, #23:70291]

That is MUCH faster. Don't put indexes on RIDs: RIDs are physical positions and are the reason why OrientDB is so fast on traversing and direct loading by RID.

Then, in this query you're thinking relational:

SELECT FROM SomeVertex WHERE @rid in (SELECT in FROM SomeEdge WHERE out IN #27:819)

Try this (it should take few ms):

SELECT expand(in()) FROM #27:819

If you want to filter by vertex's properties you can do:

SELECT FROM (
  SELECT expand(in()) FROM #27:819
) WHERE age >= 18

Lvc@

Jamie Blair

unread,
Mar 5, 2015, 4:29:10 PM3/5/15
to orient-...@googlegroups.com

Luca,

So I had problems with this approach if I needed to select on a larger collection rather that a single rid. So something along the lines of

   select from (select expand(in()) from User where status = 'active') where foo = "bar" LIMIT 10

I ran into problems I'm assuming I'm not quite think in the correct way. Also I know I could put a limit on in inner query also but that would mean that I could potentially end up with less than the required number of results.

Hope that makes sense

Thanks for the help,
Jamie

You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/-U6IZNLAtSQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.

Luca Garulli

unread,
Mar 6, 2015, 11:38:32 AM3/6/15
to orient-database
Hi Jamie,
To speed up the query like:

   select from (select expand(in()) from User where status = 'active') where foo = "bar" LIMIT 10

You should have an index on User.status to avoid scanning all User. Can you elaborate more the query that is slow?

Lvc@

Jamie Blair

unread,
Mar 6, 2015, 12:02:41 PM3/6/15
to orient-...@googlegroups.com

Luca,

So I'll try and give you a full test case early next week. But basically what I'm having trouble with is that its fine to put a index on the sub query and that will make that part faster. But if the sub query were to return 3000 rows the outer part of the query (where foo = "bar") will never use the index on 'foo', and becomes very slow +1second.

I'll give you a full test case next week, however given an index on 'foo' and 'status' should that query be fast? Or am I misunderstanding something?

Thanks,
Jamie

Reply all
Reply to author
Forward
0 new messages