ORDER BY even on indexed property is very slow

1,009 views
Skip to first unread message

Hendy Irawan

unread,
Jan 1, 2012, 1:01:17 PM1/1/12
to Neo4j
Dear Neo4j,

This query:

    @Query("START u=node({userId}) MATCH u-[:LIKE]->y RETURN y")
    public Page<Interest> findUserLikes(@Param("userId") long userId, Pageable pageable);

On about 650 relationships, paged by 20 at a time, takes 70-80ms on a Core i7-2630QM.

However, adding an ORDER BY either on the query or the Pageable parameter :

    @Query("START u=node({userId}) MATCH u-[:LIKE]->y RETURN y ORDER BY y.name")
    public Page<Interest> findUserLikes(@Param("userId") long userId, Pageable pageable);

Makes it execute around 1000-1200ms. That's about 15x slower!

Note that the "name" property is @Indexed.

Is this by design ? Or am I doing something wrong ?

Environment: Neo4j 1.6.M2 accessed via REST, OpenJDK 7 on Ubuntu 11.10 64-bit

Hendy

Michael Hunger

unread,
Jan 2, 2012, 4:49:21 PM1/2/12
to ne...@googlegroups.com
Hendy,

would it be possible to share your dataset/sample code for us to investigate?

Is this the first run? Or a later one?
What about your dataset? Is it 650 rels in total or just the returned subset?

1) if you just page it takes the first 20 entries.
2) if you sort it has to fetch _all_ the data, order it and then fetch the first 20 entries which is much more effort.
3) Indexing, does make no difference (YET) wrt ordering.
4) It might be that there is an issue with the REST execution, that's why having your project to check/verify would be great.

Please note that right now the cypher execution is not yet optimized this is in our plans for 2012.


Cheers

Michael

Hendy Irawan

unread,
Jan 2, 2012, 10:38:12 PM1/2/12
to ne...@googlegroups.com

Hi Michael,

I will share the data privately. It's not only 1 mb compressed.

Times are after a few runs. 650 rels is the returned subset.

#2-3. I understand it's slower, but I expect the index should help. Or maybe make another/additional type of index for sorting purposes.

#4. Right now just running the query via webadmin is sufficient to notice the difference.

I truly hope you continue on improving the (auto)indexing and query optimization. I've much less performance issues with graph traversals (though there is one, like finding mutual likes as I discussed in previous thread, it turns out a careful gremlin script is much faster than cypher..rather unexpected). However I hope neo4j can also support some common "relational" use cases.

Forgive me for replying via mobile.

Hendy

Andres Taylor

unread,
Jan 3, 2012, 3:15:57 AM1/3/12
to ne...@googlegroups.com
On Tue, Jan 3, 2012 at 4:38 AM, Hendy Irawan <ceefo...@gmail.com> wrote:

Hi Michael,

I will share the data privately. It's not only 1 mb compressed.

Times are after a few runs. 650 rels is the returned subset.

#2-3. I understand it's slower, but I expect the index should help. Or maybe make another/additional type of index for sorting purposes.

The index can't be used for sorting since it is not the index Cypher uses to find the matching nodes. Maybe it makes more sense in SQL-lingo: You are joining on one index. You can't then use another index for ordering.  What you are asking for is not possible in any database I know of. 

#4. Right now just running the query via webadmin is sufficient to notice the difference.

I truly hope you continue on improving the (auto)indexing and query optimization. I've much less performance issues with graph traversals (though there is one, like finding mutual likes as I discussed in previous thread, it turns out a careful gremlin script is much faster than cypher..rather unexpected).

Gremlin is a very good piece of code written by very smart guys. I'm not surprised at all that it performs well. Also, Gremlin allows you to choose how to actually run your query - Cypher will make that decision for you. Our hope is that Cypher gets about even with Gremlin most of the time, but - there will always be instances where it's much faster to handcraft your queries. And we are not there yet.

However I hope neo4j can also support some common "relational" use cases.

I don't quite understand. What is it you are looking for?

Andrés 

Peter Hunsberger

unread,
Jan 3, 2012, 11:09:33 AM1/3/12
to ne...@googlegroups.com
On Tue, Jan 3, 2012 at 2:15 AM, Andres Taylor <and...@neotechnology.com> wrote:

The index can't be used for sorting since it is not the index Cypher uses to find the matching nodes. Maybe it makes more sense in SQL-lingo: You are joining on one index. You can't then use another index for ordering.  What you are asking for is not possible in any database I know of. 
 
Not sure quite what you're saying here?   ANSI standard SQL supports joining on any column and sorting on any column.  If any of those are indexed then the indexes may or may not be used for the respective operations, depending on the guts of the database in question and what the result set looks like. Eg, Oracle will sort pre-join or post-join depending on whet it thinks will be most efficient (and depending on hints if you use them) and sometimes it can avoid a table reference completely if the relevant data is held within the index.

Andres Taylor

unread,
Jan 3, 2012, 3:12:15 PM1/3/12
to ne...@googlegroups.com


On Tue, Jan 3, 2012 at 5:09 PM, Peter Hunsberger <peter.hu...@gmail.com> wrote:
Not sure quite what you're saying here?   ANSI standard SQL supports joining on any column and sorting on any column.  If any of those are indexed then the indexes may or may not be used for the respective operations, depending on the guts of the database in question and what the result set looks like. Eg, Oracle will sort pre-join or post-join depending on whet it thinks will be most efficient (and depending on hints if you use them) and sometimes it can avoid a table reference completely if the relevant data is held within the index.

Right. I agree with all the above. But, to be able to use indexes to do the ORDER BY, we have to scan the index-leafs in the index trees. And since we are finding 'y' by the relationships coming out from 'u', and not by scanning an index. Maybe a covering index on the ordered column could be used, but that still requires a full leaf scan on that index, so for a table of any size to talk about, it will be prohibitively expensive.

Said in an other way - if we find the rows in the table 'A' by joining using an index on that table,
you can't in the same query use another index on the same table to do the ordering.

Ah... That was nice. I used to spend a lot of time thinking about this stuff... Nice to trott down memory lane like that... :)

Anyway - Cypher sure as hell can't do it now. We're planning on integrating Cypher and the new automatic indexes much tighter, but it's still only in the planning stages.

Andrés

Peter Hunsberger

unread,
Jan 3, 2012, 3:48:50 PM1/3/12
to ne...@googlegroups.com
Fair enough, guess I'm still a little confused as to why you wrote that you couldn't join on one index and sort on another even in SQL, but it doesn't really matter...

Peter Hunsberger
Reply all
Reply to author
Forward
0 new messages