Gremlin Mashups

75 views
Skip to first unread message

James Thornton

unread,
Oct 14, 2011, 1:56:45 PM10/14/11
to gremli...@googlegroups.com
Nathan Marz is the guy who built Storm for Twitter. 

Yesterday he posted an article "How to Beat the CAP Theorem" (http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.htmlhttp://news.ycombinator.com/item?id=3108087) where he proposes a batch + real-time mashup model, and uses Hadoop + ElephantDB for batch and Storm + Cassandra for real-time.

Here is another article by the same guy on Hadoop graph schemas (http://nathanmarz.com/blog/thrift-graphs-strong-flexible-schemas-on-hadoop.html). 

It would be interesting to see how to use Neo4j for the real-time piece, and then a Hadoop Graph or GoldenOrb/Pregel (http://www.goldenorbos.org/) implementation for the batch processing piece, and then use Gremlin to perform a unified query that does a mashup of the two.

This might involve creating an ElephantDB Blueprints implementation and then providing a way for Gremlin to do a mashup using two Blueprints sources.

In such a model, disk persistence on the Neo4j/graph DB side may not even be needed because everything is also persisted to the batch side so it eventually "catches up" after some processing delay.

- James
Reply all
Reply to author
Forward
0 new messages