Note that the various DSLs these systems use are analogous to Gremlin --- they all use the "functional-fluent"-style. We need to stress to people that if they use Spark/Storm/Flink/Samza/Scala/Java8/Clojure, then Gremlin fits their already existing mental model of data flows and aggregations. When people say a query language needs to be "like SQL," point them to the fact that most modern data processing frameworks don't use that style. When people say that SQL is declarative and thus can be optimized, tell them that these functional-fluent languages build a query plan that is optimized for the underlying execution engine. By making an "SQL language," all you are doing is making another layer of indirection -- now you have to compile a String down to the underlying execution language (e.g. Java). Modern data processing languages don't waste the effort as the constructs in modern programming languages provide enough expressivity. Moreover, these languages lead (I believe) to execution engine designs that naturally support both single machine and compute cluster executions (they have a map/reduce-foundation inherent in their representation).
GREMLIN
text.map(line -> line.split(" "))
.unfold()
.groupCount()
STORM
topology.newStream("spout1", spout)
.each(new Fields("sentence"),new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(),
new Count(), new Fields("count"));
SPARK
text.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
SAMZA
text.split(" ").foldLeft(Map.empty[String, Int]) {
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}
FLINK
text.flatMap ( _.split(" ") )
.map ( (_, 1) )
.groupBy(0)
.sum(1)
Take care,
Marko.