XXXGraphComputer should evolve to XXXGraphActors.

126 views
Skip to first unread message

Marko Rodriguez

unread,
Jan 13, 2017, 1:23:21 PM1/13/17
to gremli...@googlegroups.com, d...@tinkerpop.apache.org
Hi,

CURRENT: The GraphComputer framework assumes “vertex-centric” computing. That is, a vertex receives a message and does something with it. Moreover, it can send messages to other vertices.

We got this wrong and I think we should do it right with GraphActors.

FUTURE: The GraphActors framework assumes “partition-centric” computing. That is, a partition receives a message and does something with it. Moreover, it can send messages to other partitions.

——

VertexProgram.execute(final Vertex vertex, Iterator<M> messages)

should have been:

PartitionProgram.execute(Partition partition, Iterator<M> message)

in fact, ActorProgram’s execute() method is defined as:

ActorProgram.execute(M message).

1. Every Actor owns a Partition and thus, you don’t need to pass in the Partition.
2. To support ASP (asynchrounous) and BSP (synchronous) computing, you don’t provide an Iterator<M>, just an M as they come through (event-driven).
3. All partitions are assumed to have random access capabilities. All the data in the partition is randomly accessible.
4. A partition is a generalization of GraphComputer’s Vertex, where at the micro-limit, every Vertex is in its own Partition. This is how we think about SparkGraphComputer, GiraphGraphComputer, etc. — the “star graph." However, by generalizing to larger subgraphs than just Vertex, we can have more work being done per iteration in SparkGraphComputer, etc. Moreover, by generalizing to partition, we don’t have to have all edges of a vertex co-located and thus, can support edge-cut systems (liked DSEGraph).

So, what does this mean for the future? This injection from “vertex-centric” to “partition-centric” allows us to easily create SparkGraphActors. Next, how do you verify if a traversal will be able to legally execute against the underlying GraphActors system? It depends on “the rules” of the Partitioner. A Partitioner should have Features which define the boundaries of its data sphere. By looking at those Features and looking at the semantics of the Traversal, it is possible to ensure that the Traversal will work against the Features. If not, ActorVerificationException. If so, execute it.

In conclusion — I’m starting to see GraphComputer as our OLAP 1.0 and GraphActors as our OLAP/OLTP 2.0. I put in there OLTP because with systems like Akka that don’t require big bulk data migrations, you can execute against the Graph connection object…. Even with SparkGraphActors, you could just have workers that work against Graph connection objects (the only RDD data is messages!!!). Thus, with GraphActors, we start to smear the concept of OLAP and OLTP.

Anywho — I think if we get GraphActors right, we will solve many of the shortcomings of GraphComputer while, at the same time, providing a powerful distributed graph computing framework. 

Take care,
Marko.

Luca Garulli

unread,
Jan 14, 2017, 3:08:41 PM1/14/17
to gremlin-users, d...@tinkerpop.apache.org
Hi Marko,

I definitely like this and it would be closer to what we are doing with OrientDB distributed architecture. How quick do you think you could create a draft of this in TP 3.x?


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/49B70143-DB77-429D-956F-869B20A554C4%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jack Park

unread,
Jan 14, 2017, 7:24:08 PM1/14/17
to gremlin-users
This is, IMHO, seriously cool.

I'd like to add that I see another aspect in which I see a need:

Edges are about relations; in many of those relations, especially relations that matter (e.g. causal relations) there are typically large and always evolving biographies associated with them.

All that to suggest that I think it's an oversight to treat edges as second-class citizens; in my view, edges have every reason to be treated like actors.

I get this from the migration in biology away from components to relations.  Counting components in a living (now dead because you opened it to count the parts) cell doesn't get you close enough to understand how it works; you end up studying, instead, the relations among the components, and between them and the cell's environment (context).

Just a tenth Yen.
Greetings from Sunny Baja Tokyo

Jack

Marko Rodriguez

unread,
Jan 18, 2017, 6:35:53 AM1/18/17
to gremli...@googlegroups.com, d...@tinkerpop.apache.org
Hello,

I definitely like this and it would be closer to what we are doing with OrientDB distributed architecture. How quick do you think you could create a draft of this in TP 3.x?

We already have a working version with the interfaces in gremlin-core/ and an akka-gremlin/ implementation.


You can see TraversalActorsProgram which is able to execute a Traversal over the Actors framework:
- a few hardcoded constants in there right now as I’m still developing/playing.

And akka-gremlin/ passes the traversal test suite (save for a few issues I’m still sorting out):

Hope that is clear. Please provide any feedback as you have it.

Question: Can OrientDB do a “distributed transaction”? That is, if you have X number of connections to OrientDB, can you say that you want all connections to be tied to a single TX?

Thanks Luca,
Marko.


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAGud20-wdi48JUeUGaurcvGSPReRu3J6gcKxV6zCVn1JFh-YDg%40mail.gmail.com.

Cecil New

unread,
Jan 18, 2017, 8:48:52 AM1/18/17
to Gremlin-users
When I spent time researching Gremlin last year, I found the tight binding to Java to be a hindrance to understanding. I'm much more comfortable with a normal language that I can embed inside the language of my choice.

So if you are beginning to think about Tinkerpop4, I encourage you to divorce the graph language from the implementation of the graph engine. This need not be entirely from scratch. Perhaps you could adopt Neo4J's Cypher or XQuery's FLWOR or, even better, an extended SPARK SQL.

And to keep things even simpler why not a "driverless" approach, where the client only has two requirements:
- web socket support
- JSON support

The client transmits over a web socket connect a JSON payload with the graph language statement with any other needed information. Then it receives a JSON payload back with the results or errors.

Just my two cents...

Marko Rodriguez

unread,
Jan 18, 2017, 9:29:19 AM1/18/17
to gremli...@googlegroups.com
Hello,

When I spent time researching Gremlin last year, I found the tight binding to Java to be a hindrance to understanding. I'm much more comfortable with a normal language that I can embed inside the language of my choice.

So Gremlin (the language) leverages a host language’s parser to compile it. The host language must support function composition and function nesting (which every modern/popular language does).
So Gremlin (the machine) accepts bytecode (which has a JSON and binary serialization format) and executes bytecode regardless what language above create the bytecode.



So if you are beginning to think about Tinkerpop4, I encourage you to divorce the graph language from the implementation of the graph engine. This need not be entirely from scratch. Perhaps you could adopt Neo4J's Cypher or XQuery's FLWOR or, even better, an extended SPARK SQL.

The problem is that the Gremlin virtual machine isn’t tied to a language. Its tied to Bytecode. Thus, any language that generates Bytecode can execute against the Gremlin traversal machine. Therefore, we don’t want to send a particular language string over the wire for the following reasons:

1. The Gremlin machine doesn’t care about language, it cares about bytecode.
http://tinkerpop.apache.org/providers.html (see the last section on query language providers)
2. This allows users to use any language client side — e.g. SPARQL, SQL, Gremlin, …


And to keep things even simpler why not a "driverless" approach, where the client only has two requirements:
- web socket support
- JSON support

The client transmits over a web socket connect a JSON payload with the graph language statement with any other needed information. Then it receives a JSON payload back with the results or errors.

This is currently what we do. You send Gremlin bytecode (e.g. as JSON) over the write and get back an iterator of traversers.



Thoughts?,
Marko.

Luca Garulli

unread,
Jan 19, 2017, 1:49:09 AM1/19/17
to gremlin-users, d...@tinkerpop.apache.org
On 18 January 2017 at 03:35, Marko Rodriguez <okram...@gmail.com> wrote:

Question: Can OrientDB do a “distributed transaction”? That is, if you have X number of connections to OrientDB, can you say that you want all connections to be tied to a single TX?

OrientDB can do distributed transactions across nodes. What do you mean for connections?
 

Thanks Luca,
Marko.


Luca

Marko Rodriguez

unread,
Jan 19, 2017, 6:24:30 AM1/19/17
to gremli...@googlegroups.com, d...@tinkerpop.apache.org
Hi,

OrientDB can do distributed transactions across nodes. What do you mean for connections?

So, I’m not so skilled when it comes to transactions and the like so bear with me…

One of the models I want to push is that each worker in GraphActors will have a Graph “connection.” That is:

configuration.setProperty(“host”,worker.address())
Graph graph = GraphFactory.open(configuration); // Graph = “connection"

This way, the worker is always talking directly to the node it is physically executing at. Moreover, it means that its Partition is the data contained at the node in the cluster. Everything is processed locally with GraphActors.

Now, lets say we do a muting traversal such as:

g.V().as(‘a’).out(‘knows’).as(‘b’).
  addE(‘likes’).from(‘a’).to(‘b’)


So, this will have it such that each Graph “connection” will have writes to it. Now lets say we want to “globally commit” such that each Graph “connection” commits its transaction but if any particular one fails, they all fail …. or something like that. That is, how can we (OrientDB and/or TinkerPop) enable transaction guarantees across multiple Graph "connections"?

Thanks,
Marko.



 

Thanks Luca,
Marko.


Luca

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Luca Garulli

unread,
Jan 19, 2017, 12:44:25 PM1/19/17
to gremlin-users, d...@tinkerpop.apache.org
On 19 January 2017 at 03:24, Marko Rodriguez <okram...@gmail.com> wrote:
Hi,
OrientDB can do distributed transactions across nodes. What do you mean for connections?
So, I’m not so skilled when it comes to transactions and the like so bear with me…

One of the models I want to push is that each worker in GraphActors will have a Graph “connection.” That is:

configuration.setProperty(“host”,worker.address())
Graph graph = GraphFactory.open(configuration); // Graph = “connection"

This way, the worker is always talking directly to the node it is physically executing at. Moreover, it means that its Partition is the data contained at the node in the cluster. Everything is processed locally with GraphActors.

Now, lets say we do a muting traversal such as:

g.V().as(‘a’).out(‘knows’).as(‘b’).
  addE(‘likes’).from(‘a’).to(‘b’)


So, this will have it such that each Graph “connection” will have writes to it. Now lets say we want to “globally commit” such that each Graph “connection” commits its transaction but if any particular one fails, they all fail …. or something like that. That is, how can we (OrientDB and/or TinkerPop) enable transaction guarantees across multiple Graph "connections"?

In OrientDB the distributed transaction starts from any node (multi-master) that acts as coordinator for the transaction. So the client has to execute all the operation inside the same connection to the coordinator server. At that point the coordinator divide the operation based on sharding/replication and send them to all the involved nodes.

Once the write quorum is satisfied (by default is "majority" of the servers), then the coordinator sends a final commit (2 phase commit) to the servers. If any server doesn't agree with the quorum, it's forced to have the same state.

If the coordinator crashes before to send the final ok, well each server has a timeout per transaction and after a while they rollback autonoumously. 

So if all the mutations are sent through the same connection, then OrientDB can execute all of them as an ACID distributed transaction.
 

Thanks,
Marko.



 

Thanks Luca,
Marko.


Luca

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/18D3D712-6AC9-471D-AAE6-8FFDB4BE4A39%40gmail.com.

Cecil New

unread,
Feb 1, 2017, 12:04:53 PM2/1/17
to Gremlin-users
Sorry I took so long to respond. But I'm glad I did since I discovered that there are efforts where what I described have been attempted. And one appears to be active! I'm referring to the thread on SPARKQL

Marko Rodriguez

unread,
Feb 1, 2017, 12:14:14 PM2/1/17
to gremli...@googlegroups.com
Hello Cecil,

So Gremlin OLTP, GraphComputer (OLAP), GraphActors (OLTP/OLAP) are all able to execute Gremlin bytecode. 

What is being discussed in this thread is not about Gremlin the language, but about Gremlin the virtual machine. Any language that compiles to Gremlin bytecode (including Gremlin the language) can be executed by the Gremlin virtual machine. Thus, because SPARQL-Gremlin compiles SPARQL to Gremlin bytecode, you are able to execute it OLTP/OLAP/etc.

I hope that is clear,
Marko.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages