Nodes with id > 2500000 not found in index

19 views
Skip to first unread message

Sean Timm

unread,
May 14, 2012, 3:49:19 PM5/14/12
to ne...@googlegroups.com
I used the BatchInserter to create my DB with an index on the nodes, flushing the index every 50K nodes and at the end.  After insertion (and the final flush), I print out the count of nodes added and do some sample queries on the index.  I can find all of the nodes in the DB and Lucene index before shutting down the process.  When bringing the created DB up in server mode, only the first 2.5MM (exactly) are in the Lucene index, though all in the DB and can be directly fetched by id (or traversal).  Loading the node index directly in Solr as a test, I can only see the first 2.5MM nodes as well.  I'm at a loss to explain this behavior.  Any ideas?

Thanks,
Sean

The testIndex method is as follows:
  public void testIndex(String user)
  {
    IndexHits<Long> hits = index.query("userName",user);
    System.out.println( "Test index search." );
    System.out.println( "Found " + hits.size() + " matches." );
    for( Long id : hits ) {
      System.out.println( "id: " + id );
    }
  }

output of batch index run:

2797568 users imported in 86 seconds.
Testing...
Test index search.
Found 1 matches.
id: 5203
Test index search.
Found 1 matches.
id: 2797559
Test index search.
Found 1 matches.
id: 2797568

output of standalone test using GraphDatabaseService

Testing...
Test index search.
Found 1 matches.
id: 5203
Test index search.
Found 0 matches.
Test index search.
Found 0 matches.

Michael Hunger

unread,
May 14, 2012, 3:58:57 PM5/14/12
to ne...@googlegroups.com
Sean,

that is weird,

a single lucene indexes can only hold 2.5 BN documents. (MAX_INT) we haven't seen this 2.5M boundary so far.
Could you have a look with a tool like luke (http://code.google.com/p/luke/

at the indices to see if there was something amiss?

Did you cleanly shut down the BatchIndex AND the batch-inserter ?

Michael

Sean Timm

unread,
May 14, 2012, 4:08:33 PM5/14/12
to ne...@googlegroups.com
Ah, while I explicitly called flush() and shutdown the BatchInserter, I didn't explicitly call shutdown on the BatchInserterIndexProvider.  Let me try that.

Thanks,
Sean

Sean Timm

unread,
May 14, 2012, 5:21:47 PM5/14/12
to ne...@googlegroups.com
That worked.  It is even explicitly called out in the Javadocs.  :-)

http://api.neo4j.org/1.7/org/neo4j/unsafe/batchinsert/BatchInserterIndex.html
"Additions/updates to a BatchInserterIndex doesn't necessarily gets added to the actual index immediately, but are instead forced to be written when the index is shut down, BatchInserterIndexProvider.shutdown()."

Michael Hunger

unread,
May 14, 2012, 5:26:34 PM5/14/12
to ne...@googlegroups.com
Great that it worked out.

Michael
Reply all
Reply to author
Forward
0 new messages