Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
BatchInserterIndexProviders for Memcached, Redis and MongoDB
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Tero Paananen  
View profile  
 More options Jun 4 2012, 12:04 pm
From: Tero Paananen <teropaana...@gmail.com>
Date: Mon, 4 Jun 2012 09:04:07 -0700 (PDT)
Local: Mon, Jun 4 2012 12:04 pm
Subject: BatchInserterIndexProviders for Memcached, Redis and MongoDB
https://github.com/gorbachev/neo4j-batchinserterprovider-contrib

Michael Hunger asked me to open source the BatchInserterIndexProvider
implementations
I experimented with during the time I was implementing the batch
import process for the
application I've been working on.

Our graph data model has quite a few unique nodes. The data import
process had to do
a LOT of index lookups. The Lucene based index provider was slow, and
performance
started degrading once the index sizes grew larger. This is probably
mostly due to an
undersized server (I had too little memory on the server).

As a replacement I tried Redis and Memcached first. While they were
extremely quick,
they failed, because I simply didn't have a server that could hold the
entire index in
memory, as required by Redis and Memcached. YMMV.

The MongoDB BatchInserterIndexProvider, however, gave me a good
constant
performance. It wasn't as fast as Redis/Memcached, but it didn't
degrade the
way the Lucene based one did.

So I was using these for speeding up the lookups for unique nodes
during the
batch import. I'm still using Lucene indexes with Neo4j.

In the batch import process I was essentially adding properties into
two indexes,
one using Mongo and one using Lucene. I was then doing lookups only
using
the Mongo index:

BatchInserter inserter = new BatchInserterImpl("/data/graph.db");
BatchInserterIndexProvider indexProvider = new
LuceneBatchInserterIndexProvider(inserter);
BatchInserterIndexProvider lookupIndexProvider = new
MongoBatchInserterIndexProvider();

nodeIndex = indexProvider.nodeIndex(...);
lookupIndex = lookupIndexProvider.nodeIndex(...);

if (!lookupIndex.get("property", "value").hasNext()) {
    Long node = inserter.createNode(...);
    lookupIndex.add(node, ...);
    nodeIndex.add(node, ...);

} else {

    ... node already exists...update or ignore

}

Given the amount of data we had (90M nodes, 240M relationships, and
growing),
the time savings with faster index lookups were definitely worth it.

I don't have benchmarking numbers, because they would depend heavily
on
your particular use case. For my  ircumstances MongoDB based index was
a
good solution.

If you have any questions, let me know. There are basic unit tests
included in
the project, but it is entirely possible there are bugs left in the
code that I didn't
cover during my use of these classes.

Please feel free to fork the GitHub project and make whatever changes
you
need.

-TPP


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Jun 4 2012, 7:03 pm
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Mon, 4 Jun 2012 16:03:25 -0700
Local: Mon, Jun 4 2012 7:03 pm
Subject: Re: [Neo4j] BatchInserterIndexProviders for Memcached, Redis and MongoDB
That is very cool Tero,
thank you for that contribution! I would love to take the next lab day
and try out the MongoDB or MemCached index for inserting spatial data.
That will be fun!

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »