Re: [TinkerPop] Neo4J, OrientDB: Relative merits and demerits

daniel...@gmail.com

unread,

Jul 16, 2012, 11:15:13 PM7/16/12

to gremli...@googlegroups.com

Hey Kapali,

Honestly, the requirements as stated are too vague to give a good answer. Having tried all 3, I have scaled Neo to 30-60 million relations. I believe Orient also works at about that scale (Luca?). They are both great choices in production. Titan is still beta, I personally am interested to see how it goes going forward- but having used Cassandra in the past, I would say Titan looks very promising. Not something I would roll to production for a couple of releases, personally. Neo has good scaling options if big data is not your problem and you want high availability on a modest graph (say millions of relations but not billions). Pipes is really what you want to use RIGHT NOW. It is robust, scalable, and very high performance. Even with big data, pipes can be great under some impressive use cases. Also, the guys on this list are the wonderful in terms of answering questions.

Please correct me if I am wrong, I don't want to start something... Just relating some personal experiences.
HTH
Dan

Sent from my iPhone

On Jul 16, 2012, at 8:40 AM, Kapali Viswanathan <kapa...@gmail.com> wrote:

> Hello
>
> I need to choose the best graph database for my project: Neo4J, OrientDB, or Titan. What database is best suited for which application?
>
> I am interested in technological (functionality, correctness, quality, performance, ease-of-programming, scaling,and reliability) and commercial (licensing, cost, support, software evolution frequency) aspects of the database technologies.
>
> Can anyone help me with a pointer to a reliable document on the web?
>
> Thanks in advance.
>
> Regards
> Kapali

Message has been deleted

Kapali Viswanathan

unread,

Jul 17, 2012, 1:35:15 AM7/17/12

to gremli...@googlegroups.com

Hello Dan:

Many thanks for sharing your experience on this subject. Although I am aware of Pipes, I did not know about its value. Thanks for the tip -- I shall investigates Pipes for regular use in my project.

My project requirements are similar to the application that Marko has been expressing in his lectures about Twitter followers and web page references but in a completely different setting -- the number of web-pages can be expected to grow faster with time than the number of twitter users (theoretically capped by the human population at 6-7 Billion). There will be in-line and off-line queries on the graph in my project. Since my posting yesterday, I am starting to like Titan at least for one reason: its objective for being an infinite graph. Although I am not able to get any information about OrientDB, I understand that Neo4J has a limitation of around 30-50 billion vertices, edges, and properties, which is limiting for purpose. Such requirement is a shortcoming when considering recording web-pages as nodes in a graph. Since my project uses H-Base independently of the graph database, Titan may again be a natural fit.

But, as you mentioned, my only concern is that Titan is still early stages. I see my choice as either grow along with Titan or go with a mature graph DB and then eventually move to Titan when it is ready. The latter choice will work only if there is a migration tool that migrates data from the mature graph DB to Titan. I am not sure of any tool support for that at this stage -- especially because Titan is in its inception.

If there are any more advice, I would be happy to receive them. I believe that early advice is the best advice.

Best regards

Kapali

Matthias Broecheler

unread,

Jul 17, 2012, 1:42:11 AM7/17/12

to gremli...@googlegroups.com

If you implement your application at the tinkerpop/blueprints level, then you can relatively easily migrate between implementations.
that would allow you to use neo4j/orientdb initially until you are comfortable with titan. while data migration is never trivial, tinkerpop has export/import tools to make it easier.

Kapali Viswanathan

unread,

Jul 17, 2012, 2:17:22 AM7/17/12

to gremli...@googlegroups.com

Hi Matthias:

Thanks for the advice. I have been contemplating on working at the tinkerpop/blueprints level and with the imperative query language (Gremlin) for graph query for the stated reasons although I really like the functional query language in Neo4J: Cypher. Since I am keen on using as much Scala as possible in my project, Cypher is conceptually compatible. Cypher also appears to closely resemble the SQL "Select From Where" syntax with its "Start Match Where Return" syntax.

Since a majority of my thoughts are voting for growing with Titan, please can you share any tentative or even sketchy road-map for Titan's development?

PS: I am aware that summer 2012 is/was the first major milestone for release of a version of Titan. Indian summer is drawing to a close but the European summer may just be starting.

Best regards

Kapali

Luca Garulli

unread,

Jul 17, 2012, 3:39:33 AM7/17/12

to gremli...@googlegroups.com

HI,

about OrientDB the home page (http://code.google.com/p/orient/) tells 9.223.372.036 Billions of records and maximum 19.807.040.628.566.084 Terabytes. In OrientDB each vertex and edge are records, so the theoretical limit is so huge I think no one is able today to overcome it.

AFAIK there are users that are working with few tens of billions of records. In OrientDB we're working hard to provide a super scalable solution with a multi-master architecture and the upcoming release will be focused on this. On July 23rd will be released the 1.1.0 with the ability to split the graph among multiple servers.

At the end, the Matthias's suggestion seems reasonable: Blueprints is the common API between all, just try what is the best technology for your use case.

Lvc@

Kapali Viswanathan

unread,

Jul 17, 2012, 5:24:38 AM7/17/12

to gremli...@googlegroups.com

Hi Luca:

Thanks for the inputs and advice. I am in general agreement with exploring and using the Tinkerpop API so as to keep the source code independent of the graph database instance. I am starting to view Tinkerpop API is to graph databases as ODBC (or JDBC) is to RDBMS.

In the past week, I have installed OrientDBGraph and I am experimenting with it as well as with Neo4J and Titan. I did read the numbers that you have mentioned but I was unable to correlate them in terms of number of vertices, edges, and properties because I do not possess insights into graph databases and their internal working.

Can I safely assume that OrientDB can store and work on a graph with (say) 3 Billion Billion Vertices, 3 Billion Billion Edges and 3 Billion Billion Properties?

Also, for the sake of clarity, please can you express 19.807.040.628.566.084 Terabytes in a different order of magnitude as described in the following link: http://en.wikipedia.org/wiki/Orders_of_magnitude_(data)? Is the number approximately equal to 19,807,040,628 Exabyte - pardon the use of commas instead of dots to separate decimal places in thousands? This will help me understand the on-disk storage implications better.

Best regards

Kapali

Luca Garulli

unread,

Jul 17, 2012, 6:04:07 AM7/17/12

to gremli...@googlegroups.com

On 17 July 2012 11:24, Kapali Viswanathan <kapa...@gmail.com> wrote:

Hi Luca:

Can I safely assume that OrientDB can store and work on a graph with (say) 3 Billion Billion Vertices, 3 Billion Billion Edges and 3 Billion Billion Properties?

I've updated the WiKi. The actual limit of OrientDB is higher: 302,231 exabit records. The records can be: 9.223.372.036 (2^63) Billions per cluster x 32,768 (2^15) clusters = 302,231,454,903,657 Billions (2^78) of records in total for a database. You can have infinite database in a server. Cross references among database are not supported. Please note that these are theoretical numbers never tried, but the internal representation of the record it's 78bits.

So you could have, for example, 100,000 billions of billions of vertexes and 200,000 billions of billions of edges. What about properties? In OrientDB properties are stored inside the record, so doesn't matter about how many properties you can handle, the limit is the record size that now is about 2^30 bytes = 1GB.

Lvc@

Kapali Viswanathan

unread,

Jul 17, 2012, 9:35:00 AM7/17/12

to gremli...@googlegroups.com

Hi Luca:

Many thanks for these inputs. Now, I can make a comfortable choice.

PS: In order to advertise these properties of OrientDB better, I feel that you can consider having a WiKi Topic contributions by OrientDB users in order to share their usage pattern. Having an open commenting option on the Wiki Topic (possibly using Discus?) can help in generating awareness of such, otherwise, hidden properties of products. That way your users can also contribute to OrientDB like OrientDB committers are contributing.

Have a nice day.

Best regards

Kapali

Luca Garulli

unread,

Jul 17, 2012, 9:50:14 AM7/17/12

to gremli...@googlegroups.com

Hi,

what someone told us is "OrientDB is 99% substance and 1% marketing". In effects marketing, advertising and good presentation of contents is the weak point we're aware and we're improving.

Thank you for the suggestions,

Lvc@

daniel...@gmail.com

unread,

Jul 18, 2012, 9:31:58 AM7/18/12

to gremli...@googlegroups.com

Glad I could help:)

Dan

Sent from my iPhone

On Jul 17, 2012, at 12:31 AM, Kapali Viswanathan <kapa...@gmail.com> wrote:

Hello Dan:

Many thanks for sharing your experience on this subject. Although I am aware of Pipes, I did not know about its value. Thanks for the tip -- I shall investigates Pipes for regular use in my project.

My project requirements are similar to the application that Marko has been expressing in his lectures about Twitter followers and web page references but in a completely different setting -- the number of web-pages can be expected to grow faster with time than the number of twitter users (theoretically capped by the human population at 6-7 Billion). There will be in-line and off-line queries on the graph in my project. Since my posting yesterday, I am starting to like Titan at least for one reason: its objective for being an infinite graph. Although I am not able to get any information about OrientDB, I understand that Neo4J has a limitation of around 30-50 billion vertices, edges, and properties, which is limiting for purpose. Such requirement is a shortcoming when considering recording web-pages as nodes in a graph. Since my project uses H-Base independently of the graph database, Titan may again be a natural fit.

But, as you mentioned, my only concern is that Titan is still early stages. I see my choice as either grow along with Titan or go with a mature graph DB and then eventually move to Titan when it is ready. The latter choice will work only if it is possible there is a migration tool that migrates data from the mature graph DB to Titan. I am not sure of any tool support for that at this stage -- especially because Titan is in its inception.

If there are any more advice, I would be happy to receive them. I believe that early advice is the best advice.