Hello,
For those who like Gremlin and have Big Graph Data, please direct your attention to Faunus.
Faunus is a Hadoop-based graph computing framework that provides a breadth-first implementation of Gremlin. With Faunus, you simply use the Gremlin REPL and Gremlin expressions to rip out a series of MapReduce jobs on graphs represented across a machine cluster.
[Titan]
gremlin> g.V.out.out.count()
==>28
[Faunus]
gremlin> g.V.out.out.count()
12/09/21 01:52:42 INFO mapreduce.FaunusCompiler: Compiled to 3 MapReduce job(s)
12/09/21 01:52:42 INFO mapreduce.FaunusCompiler: Executing job 1 out of 3: MapSequence[com.thinkaurelius.faunus...
...
==>28
.......what about a degree distribution?
gremlin> g.V.sideEffect('{it.degree = it.outE.count()}').degree.groupCount
12/09/21 01:56:05 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
...
==>0 7
==>1 1
==>3 1
==>4 2
==>5 1
......but that is just over the toy Graph of the Gods graph :|.
The funnest number to date with Faunus was yesterday -- we did a 4 step traversal off g.V in a social network dataset (orkut) that yielded 308 trillion paths in 30 minutes over a 8 machine m1.xlarge cluster in EC2.
Finally, note that Faunus currently works over any Rexster-fronted Blueprints-enabled graph database, natively with Titan (via HBase and Cassandra Hadoop connectivity) and with binary/text representations of graphs in HDFS.
Please enjoy... and if you are good with Hadoop, please feel free to contribute to the effort,
Marko.