Re: Graph Indexes: Lucene vs in-graph (node-based)

124 views
Skip to first unread message

Niels Hoogeveen

unread,
Jun 1, 2012, 4:56:01 PM6/1/12
to Neo4j
I have done some work on in-graph indexes in the past and my
experience is that it is not always worth the effort. It depends
however on the context. If for example you want to expose the index as
part of your application, an in-graph index is a great solution.

In my experience in-graph indexes become less attractive when indexing
large numbers of nodes. Rebalancing index trees can become
prohibitively slow when indexes become big. In "normal" Btrees eg.,
the index consists of blocks that can be swapped in and out of memory
as a unit. In-graph indexes use relationships to span up a tree, but
those relationships are not grouped together on disk, so rebalancing
an index tree may require disk reads from many different places in the
relationship file.

In my experience (running on my development machine, without any
additional tuning) an index up to approximately 100,000 entries still
performs reasonably well, above that number of entries, performance
becomes progressively slower. Of course tuning can make the approach
work well for higher numbers of entries, but I have to assume the
basic pattern remains.


On Jun 1, 4:39 pm, SimonH <simon.ha...@gmail.com> wrote:
> Hi, I've got a graph on which I want to index different Node (Entities and
> Events) using properties such as time range, location, domain Ontology,
> etc. The obvious 2 options I've got for doing this is to use: 1) a Lucene
> Index; or 2) an in-graph Index, where I'll use a Node to index the Nodes I
> seek. One main advantage with the in-graph Index is the versatility it
> provides, by supporting a multilevel index (as shown in  http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html) and
> reverse index lookup and other possibilities in traversals... However, it
> is a bit more complex to maintain and it "pollutes" the graph with "system
> nodes". Moreover, I'm not sure how the in-graph index compares in term of
> efficiency to the Lucene Index? More specifically, in terms of time/date
> indexing, how would the previous multilevel index compare to a Lucene
> "YYYYMMDD" String field index? The in-graph index seems to offer some
> advantage when indexing a start and end date and searching for date
> ranges...
>
> I would appreciate any insights about those two indexing approaches...
> thanks!
>
> Simon

SimonH

unread,
Jun 6, 2012, 8:50:47 AM6/6/12
to ne...@googlegroups.com
Thanks for your feedback Niels!
Reply all
Reply to author
Forward
0 new messages