Native tripleStore vs neo4j

1,895 views
Skip to first unread message

Ganesh kumar

unread,
Sep 4, 2013, 7:50:03 PM9/4/13
to ne...@googlegroups.com
Hi All,

I would like to know the difference and sameness between Native triplestores and Neo4j .

There are several links describing these, but none of them explained why neo4j was created when we had RDF graphs already.

Your help is much appreciated.

Thanks
Ganesh

Wes Freeman

unread,
Sep 4, 2013, 9:22:33 PM9/4/13
to ne...@googlegroups.com
Maybe I'm not fully qualified to answer this, as I don't have a deep understanding of triple store implementations, but here are a few discussion points:

a) neo4j aims to capture the 95% use case of very fast localized traversals
b) neo4j is a pragmatic graph database that implements a "property graph" model, with nodes and relationships that can have properties on each, along with labels on nodes in 2.0--I understand that this is all possible in RDF, but it is a bit harder to grasp how properties on predicates work; the concept of RDF reification is a bit unfriendly, at least to newcomers
c) under the hood, neo4j stores nodes and relationships separately from their properties (with pointers between them), so traversals don't need to be bogged down by the properties if you don't need to inspect them
e) cypher is a compelling new query language designed to match graph patterns from start points, designed by the neo team for neo4j

As for similarity, they both store connected data... I recently worked on building a converter from RDF turtle to Neo4j, building nodes/relationships with properties out of the triples. The main thing it requires is some understanding of the types of predicates you have, to determine which triples indicate nodes/relationships and which indicate properties.

Wes

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michael Hunger

unread,
Sep 5, 2013, 2:20:56 AM9/5/13
to ne...@googlegroups.com
Thanks Wes,

yes a property graph is more pragmatic than RDF or linked data in general. 
Neo4j like other graph databases was developed as part of a real world application not a scientific research project.
So property graphs are closer to object models but with rich, relationship-entities. And as Wes said properties are not kept a non-semantic relationship away but merged into the node or relationship.

Most of our users and customers found it much easier to work with the property graph model than with RDF or linked data structures.

That said both are isomorphic so you can do one within the other.

Michael

Peter Vandenabeele

unread,
Sep 5, 2013, 8:08:55 AM9/5/13
to ne...@googlegroups.com
On Thu, Sep 5, 2013 at 3:22 AM, Wes Freeman <freem...@gmail.com> wrote:
As for similarity, they both store connected data... I recently worked on building a converter from RDF turtle to Neo4j, building nodes/relationships with properties out of the triples. The main thing it requires is some understanding of the types of predicates you have, to determine which triples indicate nodes/relationships and which indicate properties.

Could you talk a bit more about this conversion. I am quite curious how you determine to map an RDF data property.

E.g. for the triple

<http://me.com/peter> rdfs:comment "Peter"

mapping it to
[1] a "comment" property (with value "Peter") (on a node that represents the resource <http://me.com/peter>)
or
[2] a relationship -[comment]-> from that node to a node that represents the string "Peter"

While [1] seems more obvious, maybe in the Neo4j design, using relationships may more beneficial
to query performance than looking up properties.

Thanks,

Peter

Peter Neubauer

unread,
Sep 5, 2013, 8:14:13 AM9/5/13
to Neo4j User
Peter,
there are different strategies depending on what you are aiming for. Look at http://www.neo4j.org/develop/linked_data for some of the different mapping approaches.


/peter


G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Meetu you at GraphConnect in SF, NYC or London? - http://www.graphconnect.com/
Neo4j questions? Use GraphGist.                            - http://gist.neo4j.org


--

Wes Freeman

unread,
Sep 5, 2013, 10:58:22 AM9/5/13
to ne...@googlegroups.com
The way I set it up is you configure which predicates indicate type instances, so in theory you'd have another triple like so:

<http://me.com/Person> rdfs:type <http://me.com/peter

And in the configuration you would declare that rdfs:type indicates a type instance, and you can configure a label of Person for the node Peter. I actually had it do two passes in my implementation, because of ordering problems. So in the first pass I collect nodes and labels (and create them), and in the second pass I collect relationships between nodes and properties. Once I know that <http://me.com/peter> is a node (and I know all of the other nodes by their URI), it's a matter of checking whether the object is a node or not, and if it's not a node, it's a property. If it's a property, the predicate is the property name, and the object is the value. It's a rather simple script. Right now it uses a trove Map to hold the URIs -> node ids, but I need to rework that because my freebase import runs out of RAM on my 16GB laptop (or I need to just try it on my server). Michael Hunger had some good ideas for using less RAM with an array, but I haven't had a chance to try them out yet.

If you don't have the concept of types/instances of types in your RDF (although that seems common), you could also specify the predicates that indicate relationship types in a configuration, and assume that the subject is a node, and the object is another node. All other predicate types would be properties, in that case.

Feel free to look at my scala implementation here--it's aimed at freebase and probably won't work with anything else at the moment, but I am hoping to make it more general:

Wes

Patrick Logan

unread,
Sep 6, 2013, 4:42:19 PM9/6/13
to ne...@googlegroups.com
See inline...


On Wednesday, September 4, 2013 6:22:33 PM UTC-7, Wes Freeman wrote:
Maybe I'm not fully qualified to answer this, as I don't have a deep understanding of triple store implementations, but here are a few discussion points:

a) neo4j aims to capture the 95% use case of very fast localized traversals

Where did the figure of 95% use case come from? Do you have any comparisons for how neo4j accomplishes these relative to one of the standard graph notations?

Also I am not sure how "very fast localized traversals" compares to other graph databases. I think it is fair to say the programming style of traversing a small part of a large graph is *different* in neo4j than it is using, say, the sparql query language. I am not sure you can say it is faster, especially given that neo4j is an implementation, while sparql is a standard query language that may be running in a wide variety of implementations from browsers/javascript to various server implementations.
 
b) neo4j is a pragmatic graph database that implements a "property graph" model, with nodes and relationships that can have properties on each, along with labels on nodes in 2.0--I understand that this is all possible in RDF, but it is a bit harder to grasp how properties on predicates work; the concept of RDF reification is a bit unfriendly, at least to newcomers

How are you measuring "pragmatic" in this case? I would say that neo4j will likely be more immediately familiar to a java (or ruby or... OO) programmer than would a logic-based language like sparql, owl2, etc. Although people with SQL experience may be more immediately comfortable with sparql.

I have not seen any anecdotes about pragmatism once people have gone through the learning curves of either neo4j nor other graph systems. I can only speak for myself and I have relatively little neo4j experience.
 
c) under the hood, neo4j stores nodes and relationships separately from their properties (with pointers between them), so traversals don't need to be bogged down by the properties if you don't need to inspect them

RDF per se is a conceptual data model and so does not have anything under the hood. Specific implementations of graph databases that read/write RDF serializations and that provide the sparql query language can employ a wide variety of in-memory, disk-based, distributed, etc. strategies. These are implementation distinctions and there are a variety of them in the wild.
 
e) cypher is a compelling new query language designed to match graph patterns from start points, designed by the neo team for neo4j

Yes, I would summarize the differences as neo4j is an implementation of a graph database. RDF, sparql, owl2, etc. are specifications of a fairly vast array of capabilities (inference, logic, schema, serialization, universal identifiers, etc.) that have many implementations and capabilities that go well beyond what neo4j per se provides. But there's no reason these standards and capabilities could not be fully implemented with neo4j at the core.

Wes Freeman

unread,
Sep 6, 2013, 10:40:49 PM9/6/13
to ne...@googlegroups.com
On Fri, Sep 6, 2013 at 4:42 PM, Patrick Logan <patric...@gmail.com> wrote:
See inline...


On Wednesday, September 4, 2013 6:22:33 PM UTC-7, Wes Freeman wrote:
Maybe I'm not fully qualified to answer this, as I don't have a deep understanding of triple store implementations, but here are a few discussion points:

a) neo4j aims to capture the 95% use case of very fast localized traversals

Where did the figure of 95% use case come from? Do you have any comparisons for how neo4j accomplishes these relative to one of the standard graph notations?

I may have made that number up. I guess I meant the most common use cases for graph databases, rather than being a 100% general solution.
Also I am not sure how "very fast localized traversals" compares to other graph databases. I think it is fair to say the programming style of traversing a small part of a large graph is *different* in neo4j than it is using, say, the sparql query language. I am not sure you can say it is faster, especially given that neo4j is an implementation, while sparql is a standard query language that may be running in a wide variety of implementations from browsers/javascript to various server implementations.
 
It would be cool to back my claim up with benchmarks. Eventually I may be able to. I haven't actually tested any triple stores myself, so unfortunately I'm relying on things I've heard/read. Knowing the internals of Neo, I think it would be hard for a triple store to have superior optimizations for this use case, but that's just speculation.
b) neo4j is a pragmatic graph database that implements a "property graph" model, with nodes and relationships that can have properties on each, along with labels on nodes in 2.0--I understand that this is all possible in RDF, but it is a bit harder to grasp how properties on predicates work; the concept of RDF reification is a bit unfriendly, at least to newcomers

How are you measuring "pragmatic" in this case? I would say that neo4j will likely be more immediately familiar to a java (or ruby or... OO) programmer than would a logic-based language like sparql, owl2, etc. Although people with SQL experience may be more immediately comfortable with sparql.

I have not seen any anecdotes about pragmatism once people have gone through the learning curves of either neo4j nor other graph systems. I can only speak for myself and I have relatively little neo4j experience.
 
It's subjective, sure. But *I think* the property graph model is more pragmatic than a general triple store. Easier to understand how properties on nodes and relationships work. 
c) under the hood, neo4j stores nodes and relationships separately from their properties (with pointers between them), so traversals don't need to be bogged down by the properties if you don't need to inspect them

RDF per se is a conceptual data model and so does not have anything under the hood. Specific implementations of graph databases that read/write RDF serializations and that provide the sparql query language can employ a wide variety of in-memory, disk-based, distributed, etc. strategies. These are implementation distinctions and there are a variety of them in the wild.
 
e) cypher is a compelling new query language designed to match graph patterns from start points, designed by the neo team for neo4j

Yes, I would summarize the differences as neo4j is an implementation of a graph database. RDF, sparql, owl2, etc. are specifications of a fairly vast array of capabilities (inference, logic, schema, serialization, universal identifiers, etc.) that have many implementations and capabilities that go well beyond what neo4j per se provides. But there's no reason these standards and capabilities could not be fully implemented with neo4j at the core.
Agree.
Reply all
Reply to author
Forward
0 new messages