[GraphDB-Bench] Introducing GraphDB-Bench

189 views
Skip to first unread message

Alex Averbuch

unread,
Oct 16, 2010, 3:03:39 PM10/16/10
to gremlin-users
Fellow TinkerPop'ers, 
Over the past month or two, with Marko's help, I've been trying to get the GraphDB-Bench project started. If you're not familiar with this project, here's an excerpt from the wiki:

GraphDB-Bench is an easily extensible graph database benchmarking tool. Its goal is to provide an easy-to-use library for defining and running application/domain-specific benchmarks against different graph database implementations. 

The stage we're at now is that the library is "usable" and "functional", but not extensively tested. More importantly, there are still a lot of open questions and things to do:
  • Who/how to run the benchmarks? One of Marko's ideas is to run benchmarks at regular intervals (e.g. every major release of the TinkerPop stack) and then publish results on the wiki. I think that's a cool idea, but we need to make the process transparent to remove as much bias as possible. Any ideas about this?
  • An extension of the last point, how often should official benchmarks be run? Personally, I think if we want to make results available it makes little sense using SNAPSHOT builds as it's not a repeatable (read: not scientific) process. thoughts?
  • Currently the library is functional, but it's lacking in content: there is only one benchmark definition and the operations which comprise it are basic (please see wiki if you don't understand what I mean). Is anyone keen on helping to flesh out the library with interesting operations/algorithms?
  • Configuration plays a significant role in the performance of different Blueprints Graph implementations (Neo4j, OrientDB, etc). To avoid claims of bias we need to ensure that all Graphs are either optimized by experienced people, or completely unoptimized. Any thoughts on how, and by who, this should be handled? In any case, the process should always be transparent and all benchmark parameters should be published.
Ultimately, it would be great if this becomes an open community project like the rest of the TinkerPop stack. I think the graph database community has a lot to gain from a vendor-agnostic graph database evaluation tool.

Looking forward to hear everyone's thoughts/suggestions/questions/ideas.

Cheers,
Alex

Luca Garulli

unread,
Oct 16, 2010, 4:45:33 PM10/16/10
to gremlin-users
Hi,
this is a really good news!

I'm sure that benchmarks will contribute to improve each product even more.

bye,
Luca Garulli

Alex Averbuch

unread,
Dec 28, 2010, 8:51:11 AM12/28/10
to gremli...@googlegroups.com
Hey again,
So it's been quite a while as I was busy with a bunch of other things, but I've written some more detailed examples now. If you haven't read the documentation please look there first, some may be slightly outdated but it's generally correct and by far the best place to start. 

The docs are here (refer to Sections 3 & 4):

You can find the new example Benchmarks here:

And the new Operation definitions here:

I've also added a bit more to the library, now making it possible to open/close/delete a Graph instance during a Benchmark run. Before this wasn't possible, so we couldn't do things like: 
  • Flush a Graph's cache between sequences of operations (now happens by default)
  • Test the write performance by loading multiple different graphs into a Graph instance (there was no way to delete the instance... no, Graph.clear() isn't good enough for this purpose).
I believe the library's now robust and expressive enough to create meaningful benchmarks and build a benchmark suite. Luca, Marko, Peter, It would be AWESOME if you guys could submit some operations. Gremlin/Pipes/Blueprints code, all welcome... even English if that's all you can offer, as long as it's clear and detailed enough.

As an incentive, Marko recently let me know that the guys from DAMA-UPC (home of the DEX GraphDB) have published their benchmark results (Neo4j vs Jena vs HypergraphDB vs DEX). 
Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark

I'd love to do that too!

Thanks in advance for all contributions!

Alex

Luca Garulli

unread,
Dec 28, 2010, 10:17:32 AM12/28/10
to gremlin-users
Hi Alex,
well done! I'm trying to run some benchmarks but the "mvn install" goes in error:

[INFO] Unable to find resource 'com.tinkerpop:blueprints:pom:0.4-SNAPSHOT' in repository maven repository (http://mvnrepository.com)
...

Are the pom files correctly published?

Lvc@

Marko Rodriguez

unread,
Dec 28, 2010, 11:07:41 AM12/28/10
to gremli...@googlegroups.com
Hi,

[INFO] Unable to find resource 'com.tinkerpop:blueprints:pom:0.4-SNAPSHOT' in repository maven repository (http://mvnrepository.com)
...

Are the pom files correctly published?

No they were not. I just deployed the latest build SNAPSHOT. Should work now.

See ya,
Marko.

Luca Garulli

unread,
Dec 28, 2010, 12:02:36 PM12/28/10
to gremlin-users
The problem persists yet: the pom is unavailable (404).

Lvc@

Marko Rodriguez

unread,
Dec 28, 2010, 12:35:00 PM12/28/10
to gremli...@googlegroups.com
Hi,

Doh---the jar was corrupt?! ... Never had that happen before. I just redeployed. Cleared out my .m2/blueprints and built GraphDB-Bench. Should work now.... 

See ya,
Marko.

Luca Garulli

unread,
Dec 28, 2010, 12:56:29 PM12/28/10
to gremlin-users
No way:

C:\work\dev\os\tinkerpop\graphdb-bench>mvn clean install
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building GraphDB-Bench: A Benchmark Suite for GraphDBs
[INFO]    task-segment: [clean, install]
[INFO] ------------------------------------------------------------------------
[INFO] [clean:clean {execution: default-clean}]
[INFO] Deleting directory C:\work\dev\os\tinkerpop\graphdb-bench\target
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Unable to find resource 'com.tinkerpop:blueprints:pom:0.4-SNAPSHOT' in repository maven repository (http://mvnrepository.com)
[INFO] Unable to find resource 'com.tinkerpop:blueprints:pom:0.4-SNAPSHOT' in repository tinkerpop-repository (http://tinkerpop.com/maven2)
[INFO] snapshot com.orientechnologies:orientdb-core:0.9.25-SNAPSHOT: checking for updates from orientechnologies-repository
[INFO] snapshot com.orientechnologies:orientdb-parent:0.9.25-SNAPSHOT: checking for updates from orientechnologies-repository
[INFO] snapshot com.orientechnologies:orient-commons:0.9.25-SNAPSHOT: checking for updates from orientechnologies-repository
[INFO] Unable to find resource 'com.tinkerpop:blueprints:jar:0.4-SNAPSHOT' in repository maven repository (http://mvnrepository.com)
[INFO] Unable to find resource 'com.tinkerpop:blueprints:jar:0.4-SNAPSHOT' in repository tinkerpop-repository (http://tinkerpop.com/maven2)
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Missing:
----------
1) com.tinkerpop:blueprints:jar:0.4-SNAPSHOT

  Try downloading the file manually from the project website.

  Then, install it using the command:
      mvn install:install-file -DgroupId=com.tinkerpop -DartifactId=blueprints -Dversion=0.4-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file

  Alternatively, if you host your own repository you can deploy the file there:
      mvn deploy:deploy-file -DgroupId=com.tinkerpop -DartifactId=blueprints -Dversion=0.4-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -
DrepositoryId=[id]

  Path to dependency:
        1) com.tinkerpop:graphdb-bench:jar:0.1-SNAPSHOT
        2) com.tinkerpop:blueprints:jar:0.4-SNAPSHOT

----------
1 required artifact is missing.

for artifact:
  com.tinkerpop:graphdb-bench:jar:0.1-SNAPSHOT

Lvc@

Marko Rodriguez

unread,
Dec 28, 2010, 3:20:22 PM12/28/10
to gremli...@googlegroups.com

Marko Rodriguez

unread,
Dec 28, 2010, 3:22:50 PM12/28/10
to gremlin-users
Sorry--I mean: 
http://tinkerpop.com/maven2/com/tinkerpop/blueprints/0.4-SNAPSHOT/blueprints-0.4-20101228.172524-2.pom

and..

marko:~/.m2/repository/com/tinkerpop$ rm -rf blueprints/
marko:~/.m2/repository/com/tinkerpop$ 

marko:~$ cd software/graphdb-bench/
marko:~/software/graphdb-bench$ mvn clean install
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building GraphDB-Bench: A Benchmark Suite for GraphDBs
[INFO]    task-segment: [clean, install]
[INFO] ------------------------------------------------------------------------
[INFO] [clean:clean {execution: default-clean}]
[INFO] Deleting directory /Users/marko/software/graphdb-bench/target
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] snapshot com.tinkerpop:blueprints:0.4-SNAPSHOT: checking for updates from maven repository
[INFO] snapshot com.tinkerpop:blueprints:0.4-SNAPSHOT: checking for updates from tinkerpop-repository
8K downloaded  (blueprints-0.4-20101228.172524-2.pom)
[INFO] snapshot com.tinkerpop:blueprints:0.2.1-SNAPSHOT: checking for updates from maven repository
[INFO] snapshot com.tinkerpop:blueprints:0.2.1-SNAPSHOT: checking for updates from tinkerpop-repository
7K downloaded  (blueprints-0.2.1-20101210.222409-23.pom)
....
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30 seconds
[INFO] Finished at: Tue Dec 28 13:22:29 MST 2010
[INFO] Final Memory: 37M/83M
[INFO] ------------------------------------------------------------------------
marko:~/software/graphdb-bench$ 

Alex Averbuch

unread,
Dec 28, 2010, 4:05:47 PM12/28/10
to gremli...@googlegroups.com
Hey Luca,
I just deleted ~/.m2/repository/com/tinkerpop/ completely, then mvn clean installed and it was successful.

Have you been able to solve the problem yet?

Luca Garulli

unread,
Dec 29, 2010, 12:23:22 PM12/29/10
to gremlin-users
Hi,
yes this time worked. I'm going to play with it.

Thx,
Lvc@

Alex Averbuch

unread,
Jan 3, 2011, 5:55:57 AM1/3/11
to gremli...@googlegroups.com
Hey everyone,

The first round of results are here:
Note, though, that these benchmarks were run on relatively/extremely (depending on who you ask) small graphs due to time constraints. This is because loading graphs containing millions of elements takes a considerable amount of time. We'll need to find a way of working around this in future, but at the moment the graphs are created (loaded from GraphML) every time a benchmark is run.

For those that have seen this page already, note that the Echo benchmark is now run against a 100,000v 500,000e graph (10x bigger than before).

Lastly, I've just started a new run of the benchmark suite and have included one more graph size: 1,000,000v 5,000,000e.
This will take forever and a day to complete because at present the write performance of OrientDB is very limited (it's in the process of being overhauled), but I'll let everyone know once it's finished.

Cheers,
Alex

Gian Luca Farina Perseu

unread,
Jan 3, 2011, 5:58:48 AM1/3/11
to gremli...@googlegroups.com
Very interesting document.

Thank you Alex !

Gian Luca Farina Perseu
www.21-style.com

Alex Averbuch

unread,
Jan 3, 2011, 6:08:11 AM1/3/11
to gremli...@googlegroups.com
Hi Gian,
Just one more thing, please take the current results with a big grain of salt. 
At present they don't tell us much about real-world performance because the graphs are so small and we have not yet written complex traversals to benchmark against.

We're working on making this much more extensive though!

Gian Luca Farina Perseu

unread,
Jan 3, 2011, 6:09:37 AM1/3/11
to gremli...@googlegroups.com
ACK ;-)

Marko Rodriguez

unread,
Jan 3, 2011, 9:51:31 AM1/3/11
to gremli...@googlegroups.com
Hi,

> Just one more thing, please take the current results with a big grain of salt.
> At present they don't tell us much about real-world performance because the graphs are so small and we have not yet written complex traversals to benchmark against.

While the graph is small, note how many elements are being traversed. EchoTraversal is a nasty traversal that reverberates on every iteration. On a 1 million vertex/4 million edge graph, a depth 5 traversal *returned* (WAY LESS than *touched*): 358,765,631 vertices. The touch is in the billions.
http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_Large-Scale_Graph_Traversal.html

So while the graphs may be small, note the amount of processing that is being done on them---lots and lots of reads. However, if you want to test large to avoid full graph caching, then yes, a larger graph would be great.

See ya,
Marko

http://markorodriguez.com

Alex Averbuch

unread,
Jan 3, 2011, 10:01:07 AM1/3/11
to gremli...@googlegroups.com
True, for these benchmark runs I can confirm the maximum amount of RETURNED (not touched) vertices at Depth=7 is 1,216,390,275.

Alfredo Serafini

unread,
Jan 27, 2012, 10:48:36 AM1/27/12
to gremli...@googlegroups.com
Hi i can't find the benchmark suite anymore on github
is it still in active development?

thanks,
Alfredo

Marko Rodriguez

unread,
Jan 27, 2012, 10:51:19 AM1/27/12
to gremli...@googlegroups.com
Hello,

The primary developer of GraphDB-Bench stopped working on the project and thus, the project was moved to the Tinkubator.


In short, it is a nearly dead project.

Marko.

Alfredo Serafini

unread,
Jan 27, 2012, 11:27:59 AM1/27/12
to gremli...@googlegroups.com
thanks Marko

it could help me as a base for some specific test

(btw: sorry to hear the project it's near to an end)

Alex Averbuch

unread,
Jan 28, 2012, 2:52:46 PM1/28/12
to gremli...@googlegroups.com
Hi Alfredo,
I was the initial developer of the library, but as Marko said I haven't worked on it in a while.

If you can you share what you would like to use it for I'd love to hear.

FYI, although it's been dormant for a while, a number of others, from academia, have used the project recently, and two graph database benchmarking papers have-been/are-being published as a result...
but their contributions to the library have not made it back to TinkerPop yet.

Cheers,
Alex

Alfredo Serafini

unread,
Feb 4, 2012, 8:09:49 AM2/4/12
to gremli...@googlegroups.com
Hi Alex

thanks for the answer

in my company we are exeprimenting tinkerpop as a stack for a new project, and in this context i'd like to test the performance for massive rdf load via sail interface over different triplestores, as well as insert/update/delete atomic operations for SPARQL 1.1.
In short : i'd like to make some test/benchmark over virtuoso via the sail interface, or over a graph implementation as well (pretty much neo4j or OrientDB), so the idea of a "suite" of test really interesting me. (The best could be have the repository/sparql endpoint and the rdf data to import as parameters).

are this papers readable online?

thanks for any help or suggestion

Alfredo

Pierre De Wilde

unread,
Feb 4, 2012, 10:09:04 AM2/4/12
to gremli...@googlegroups.com
Hey,

Cool! When done, don't forget to publish your benchmarks via this mailing list.

Thanks,
Pierre

Peter Neubauer

unread,
Feb 4, 2012, 10:13:28 AM2/4/12
to gremli...@googlegroups.com

Yes,
That could even be relevant when and if the Linked Data Benchmarking Council is lifting through EU funds later this year.

Send from a device with crappy keyboard and autocorrection.

/peter

Reply all
Reply to author
Forward
0 new messages