access IndexWriter and/ or IndexSearcher (was: get to Lucene directly when that is used as the indexer?)

126 views
Skip to first unread message

Eelco Hillenius

unread,
Apr 13, 2012, 6:55:39 PM4/13/12
to ne...@googlegroups.com
Hi,

I'm still trying to find a reasonable way to get to some of the internals of Neo4J indexing when it uses Lucene. Most of the time, I like Neo4J's indexing just like it is; hooks into the transaction handling etc. However, I have the case where I'd like to get direct access of the IndexWriter(s) (or IndexSearcher(s)) that Neo4J manages. The reason I want this is because I'm trying to implement facetted search. Whether I use the Lucene extension for this, or do it myself, I need to have direct access to the index to gather facets while going through the search results. These facets are stored in fields that aren't properties in the nodes they relate to and hence, getting the node references doesn't help me here (also don't need the overhead involved in creating those). Opening an index reader directly on the directory isn't cutting it either, because then I get problems with that not being in sync with the writer (stuff like 'org.apache.lucene.index.IndexNotFoundException: no segments* file found').

I guess one option I have is to write my own LuceneIndexProvider and LuceneDataSource, which would basically be copies of the regular ones, but exposing the readers/ writers. There's quite a bit of code in LuceneDataSource though, and getIndexSearcher and getIndexWriter are package private (I could of course expose them through introspection, but that's rather nasty too), so I'm still wondering if there isn't a better way to do this that I have overlooked so far. Or tell me I'm out of my mind and should do things entirely different...

Thanks!

Eelco

Lawrence Stone

unread,
Apr 13, 2012, 7:07:09 PM4/13/12
to ne...@googlegroups.com
Bump!

Peter Neubauer

unread,
Apr 14, 2012, 9:48:42 AM4/14/12
to ne...@googlegroups.com
Guys,
trying to get hold of Matthias to shed some more expert light on this :)

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j                                - Graphs rule.
Program or be programmed - Computer Literacy for kids.
http://foocafe.org/#CoderDojo

Eelco Hillenius

unread,
Apr 14, 2012, 12:42:19 PM4/14/12
to ne...@googlegroups.com
The workaround I'm chipping away at now (between getting the first sun shine on my skin and dealing with whiny kids) is to create a custom index provider in my own org.neo4j.index.impl.lucene package. That at least gives me access to most of the things I'm interested in. I can probably make this work for me, though I'd prefer a solution where I don't have to rely on Neo4J's internals :-)

Cheers,

Eelco

Mattias Persson

unread,
Apr 15, 2012, 6:03:00 AM4/15/12
to ne...@googlegroups.com
Hi,

IndexWriter/IndexSearchers aren't exposed and apart from reflection it's not possible, unless (as you did) perhaps writing your own index provider.

May I ask you what you need from that low level access?

Best,
Mattias

2012/4/14 Eelco Hillenius <eelco.h...@gmail.com>



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

Eelco Hillenius

unread,
Apr 15, 2012, 1:10:38 PM4/15/12
to ne...@googlegroups.com
Hi Mattias,

I'm trying to implement facetted search. Either using https://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-facet/userguide.html or rolling my own. In the former case, I'll need to use writers/ readers directly, in the latter case, I'd need at least access to the search documents directly so that I can gather facet fields (which wouldn't be node properties).

The problem with just opening readers/ writers on the directories where the indexes are stored myself is that the writer(s) that neo4j uses is that it may take a while before changes are flushed, and when I tried, I got exceptions like  'org.apache.lucene.index.IndexNotFoundException: no segments* file found', which seems to be exactly because of that.

So ideally, I'd have direct access to a writer and searcher of the current transaction. Direct access to the search documents would make me happy too. Or tell me I'm smoking crack and I should just approach this differently (like for instance make these facets node properties).

Thanks,

Eelco

Eelco Hillenius

unread,
Apr 15, 2012, 2:02:59 PM4/15/12
to ne...@googlegroups.com
I guess using a node property/ properties to store facets isn't so bad... better than having to rely on neo4j's internals. I'll try that approach. Thanks for your help so far!

Eelco

Mattias Persson

unread,
Apr 16, 2012, 4:10:12 AM4/16/12
to ne...@googlegroups.com
You don't need to set a node property to index it in lucene. You could just do:

index.add( myNode, "my facet key", "my facet value" );

without having that property on the node itself. If that helps.

2012/4/15 Eelco Hillenius <eelco.h...@gmail.com>

Rick Bullotta

unread,
Apr 16, 2012, 7:53:40 AM4/16/12
to Neo4j
Yes, but you can't *read* the document properties from Lucene, can
you? If I indexed a document in Lucene using A=123, B=XXXXX, and
C=March 12, 1993, and I retrieve it using a query where A=123, I
cannot access B or C unless they are node (or relationship) properties
also. I think that's what Eelco is trying to achieve (access to all
of a Lucene document's properties).

On Apr 16, 4:10 am, Mattias Persson <matt...@neotechnology.com> wrote:
> You don't need to set a node property to index it in lucene. You could just
> do:
>
> index.add( myNode, "my facet key", "my facet value" );
>
> without having that property on the node itself. If that helps.
>
> 2012/4/15 Eelco Hillenius <eelco.hillen...@gmail.com>
>
>
>
>
>
> > I guess using a node property/ properties to store facets isn't so bad...
> > better than having to rely on neo4j's internals. I'll try that approach.
> > Thanks for your help so far!
>
> > Eelco
>
> > On Sun, Apr 15, 2012 at 10:10 AM, Eelco Hillenius <
> > eelco.hillen...@gmail.com> wrote:
>
> >> Hi Mattias,
>
> >> I'm trying to implement facetted search. Either using
> >>https://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/...rolling my own. In the former case, I'll need to use writers/ readers
> >> directly, in the latter case, I'd need at least access to the search
> >> documents directly so that I can gather facet fields (which wouldn't be
> >> node properties).
>
> >> The problem with just opening readers/ writers on the directories where
> >> the indexes are stored myself is that the writer(s) that neo4j uses is that
> >> it may take a while before changes are flushed, and when I tried, I got
> >> exceptions like  'org.apache.lucene.index.IndexNotFoundException: no
> >> segments* file found', which seems to be exactly because of that.
>
> >> So ideally, I'd have direct access to a writer and searcher of the
> >> current transaction. Direct access to the search documents would make me
> >> happy too. Or tell me I'm smoking crack and I should just approach this
> >> differently (like for instance make these facets node properties).
>
> >> Thanks,
>
> >> Eelco
>
> >> On Sun, Apr 15, 2012 at 3:03 AM, Mattias Persson <
> >> matt...@neotechnology.com> wrote:
>
> >>> Hi,
>
> >>> IndexWriter/IndexSearchers aren't exposed and apart from reflection it's
> >>> not possible, unless (as you did) perhaps writing your own index provider.
>
> >>> May I ask you what you need from that low level access?
>
> >>> Best,
> >>> Mattias
>
> >>> 2012/4/14 Eelco Hillenius <eelco.hillen...@gmail.com>
>
> >>>> The workaround I'm chipping away at now (between getting the first sun
> >>>> shine on my skin and dealing with whiny kids) is to create a custom index
> >>>> provider in my own org.neo4j.index.impl.lucene package. That at least gives
> >>>> me access to most of the things I'm interested in. I can probably make this
> >>>> work for me, though I'd prefer a solution where I don't have to rely on
> >>>> Neo4J's internals :-)
>
> >>>> Cheers,
>
> >>>> Eelco
>
> >>>> On Sat, Apr 14, 2012 at 6:48 AM, Peter Neubauer <
> >>>> peter.neuba...@neotechnology.com> wrote:
>
> >>>>> Guys,
> >>>>> trying to get hold of Matthias to shed some more expert light on this
> >>>>> :)
>
> >>>>> Cheers,
>
> >>>>> /peter neubauer
>
> >>>>> G:  neubauer.peter
> >>>>> S:  peter.neubauer
> >>>>> P:  +46 704 106975
> >>>>> L:  http://www.linkedin.com/in/neubauer
> >>>>> T:   @peterneubauer
>
> >>>>> Neo4j                                - Graphs rule.
> >>>>> Program or be programmed - Computer Literacy for kids.
> >>>>>http://foocafe.org/#CoderDojo
>
> >>>>> On Sat, Apr 14, 2012 at 1:07 AM, Lawrence Stone <vtgbe...@gmail.com>
> >>> Mattias Persson, [matt...@neotechnology.com]
> >>> Hacker, Neo Technology
> >>>www.neotechnology.com
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technologywww.neotechnology.com

Eelco Hillenius

unread,
Apr 16, 2012, 12:25:34 PM4/16/12
to ne...@googlegroups.com
Hi,

Yes, but you can't *read* the document properties from Lucene, can
you?  If I indexed a document in Lucene using A=123, B=XXXXX, and
C=March 12, 1993, and I retrieve it using a query where A=123, I
cannot access B or C unless they are node (or relationship) properties
also.  I think that's what Eelco is trying to achieve (access to all
of a Lucene document's properties).

Yep, that's exactly it :-)

I do now store my facets as a property in nodes, and this works fine. The unfortunate bit is that I have to have nodes instantiated while I gather my facets (instead of reading fields directly from the documents), but for my case this is acceptable.

Thanks,

Eelco
 
Reply all
Reply to author
Forward
0 new messages