Gremlin Mashups

75 views

Skip to first unread message

James Thornton

unread,

Oct 14, 2011, 1:56:45 PM10/14/11

to gremli...@googlegroups.com

Nathan Marz is the guy who built Storm for Twitter.

Yesterday he posted an article "How to Beat the CAP Theorem" (http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html, http://news.ycombinator.com/item?id=3108087) where he proposes a batch + real-time mashup model, and uses Hadoop + ElephantDB for batch and Storm + Cassandra for real-time.

Here is another article by the same guy on Hadoop graph schemas (http://nathanmarz.com/blog/thrift-graphs-strong-flexible-schemas-on-hadoop.html).

It would be interesting to see how to use Neo4j for the real-time piece, and then a Hadoop Graph or GoldenOrb/Pregel (http://www.goldenorbos.org/) implementation for the batch processing piece, and then use Gremlin to perform a unified query that does a mashup of the two.

This might involve creating an ElephantDB Blueprints implementation and then providing a way for Gremlin to do a mashup using two Blueprints sources.

In such a model, disk persistence on the Neo4j/graph DB side may not even be needed because everything is also persisted to the batch side so it eventually "catches up" after some processing delay.