Graph DB first impressions from a noob

95 views
Skip to first unread message

Paul

unread,
Feb 7, 2012, 2:35:19 PM2/7/12
to Gremlin-users
I've been dipping my toes into the world of graph databases over the
past week or so, and thought I'd share some of the impressions I've
had so far as a noob/outsider...

We are evaluating using a graph database for an application that
currently runs against a MySql slave. It's an ideal fit for traversals
on a graph database - eg. calculating "personalized" pagerank against
about 50 million edges. The current MySql implementation works fine,
as load is low and speed isn't critical, but a RDBMS is not a great
fit to the problem.

I've spent a bit of time trying to get up and running with both Neo4J
and OrientDB using the basic Tinkerpop graph stack (Blueprints/Pipes/
Gremlin), and here are my current feelings:

Tinkerpop stack
+ very nice clean API
+ clear focus on its intended purpose
+ additional functionality well separated out into packages: Rexster,
Frames etc.
+ Gremlin's pretty nice, but I'm curious how much traversals/
Pipelines can be optimized against a backend
- documentation well-written but a bit "fragmented" - hard to find
the wiki page/slideset you want

Neo4J
+ seems mature, quick and stable
+ able to handle bulk imports (with index) quite quickly (10million/
hr on a desktop PC)
+ works well with Tinkerpop stack
- licensing - community edition ok for basic apps, but $$$ if you
want, for example, basic master/slave replication

OrientDB
+ promising features and active development
+ liberal licensing
- "local" protocol seems very slow at bulk import
- "remote" protocol doesn't seem to work well with Tinkerpop,
particularly automatic indexing
- doesn't seem stable yet - many changes between release candidates

I've been trying the OrientDB 1.0RC7 that's linked to the Blueprints
1.1 release. I know there have been a lot of fixes made to the 1.0RC8,
but we need to develop against a stable platform rather than
continually chasing the latest release for fixes/performance.

I haven't had a chance to try the other backends, or the spring-data-
graph approach.

I really like the Blueprints abstraction - the ability to develop
against that API, and have the freedom to choose the most appropriate
backend, and change backend if need be. But I've yet to find the
perfect backend for our needs - our current options:
- stick with existing MySql implementation
- use Neo4J community edition, and perform nightly offline backups
for our disaster recovery needs (we can get away with this as app is
non-critical)
- implement our own quick'n'dirty Blueprints backend in eg. Mongo/
Redis/MySql - I know a Mongo implementation was abandoned for speed
reasons, but I can't imagine it being any slower than our current
MySql implementation

I would appreciate any suggestions for avenues I may have missed :)

Luca Garulli

unread,
Feb 7, 2012, 2:54:36 PM2/7/12
to gremli...@googlegroups.com
Hi,
just from the OrientDB side 2 thing:
1) about the bulk import have you used the suggestions for massive insertion?  http://code.google.com/p/orient/wiki/PerformanceTuning#Massive_Insertion 
2) changes between releases don't affect the API that remain the same, but usually introduce news. Fortunately every release has many improvements. This seems to me a + not a cons!

Lvc@

Pierre De Wilde

unread,
Feb 7, 2012, 2:58:46 PM2/7/12
to gremli...@googlegroups.com
Hi,

Welcome and thank you for your feedback. We appreciate it.

We're expecting a stable release 1.0 of OrientDB shortly, probably this month.

Neo4j is stable. Backup/restore is a critical missing feature of the community edition. Peter, comments on this point?

Thanks again,
Pierre

Peter Neubauer

unread,
Feb 7, 2012, 3:11:09 PM2/7/12
to gremli...@googlegroups.com
Guys,
yes, we are working on two fronts with Neo4j - getting a community and startup pricing into place (so contact me if you want to run Neo4j Enterprise as a startup - we will work it out along the lines of contribution other than code to the involved projects in exchange for closed source usage), and to provide backup for the Neo4j Server.

Thanks for the feedback to the community - it is greatly appreciated!

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.6 released                 - dzone.com/6S4K
The Neo4j Heroku Challenge   - http://neo4j-challenge.herokuapp.com/

Paul

unread,
Feb 7, 2012, 4:00:45 PM2/7/12
to Gremlin-users
Many thanks for your fast responses!

I suspect some or all of my problems with OrientDB come from trying to
use it through Blueprints rather than access its native API - I
understand that the Blueprints implementation will necessarily lag
behind the main API.

I tried a few suggestions about manual transactions and intents from
the wiki/newsgroup, and I had the following experience:

- using "local" - import started quickly, and automatic indexes
created, but speed rapidly deteriorated

- using "remote" - import was fast - comparable to Neo4J, but I
couldn't manage to set up the automatic indexes through Tinkerpop. (I
know the "remote" protocol isn't properly supported by Blueprints
yet).

So I imagine I could probably sit down with the OrientDB API for
another week and get to a nice place with it, bypassing Blueprints,
but I'd then be locked in somewhat to OrientDB.

But I look forward to the 1.0 release - if you can iron out the few
remaining wrinkles then you will have a very nice system! And I do
appreciate the amount of effort required to develop a system of this
scope - I know these things are never "finished". (Same goes for Neo4J
team - excellent work! )

Paul

On Feb 7, 7:54 pm, Luca Garulli <l.garu...@gmail.com> wrote:
> Hi,
> just from the OrientDB side 2 thing:
> 1) about the bulk import have you used the suggestions for massive
> insertion?http://code.google.com/p/orient/wiki/PerformanceTuning#Massive_Insertion
> 2) changes between releases don't affect the API that remain the same, but
> usually introduce news. Fortunately every release has many
> improvements. This seems to me a + not a cons!
>
> Lvc@
>
Reply all
Reply to author
Forward
0 new messages