How to connect Gremlin to Spark GraphX in Tinkerpop 3.1.1 ?

1,675 views
Skip to first unread message

Kyle Zhu

unread,
Nov 21, 2015, 12:29:40 PM11/21/15
to Gremlin-users

Hello everyone,

I want to use Gremlin to connect to Spark GraphX as following image, and then in step 1, use Java API to send Web Socket request to Gremlin, and have the Spark GraphX to implement the graph calculation.



Here is what I have now:
Since Tinkerpop 3.0.2 -- Tinkerpop 3.1.0 does not support Spark plugin, I downloaded the Tinkerpop 3.1.1 source code Maven project from Github, and build the Maven project to get the Tinkerpop 3.1.1 snapshot.
The build is successful, and I can run the Tinkerpop 3.1.1 in Linux box. I can see the Spark plugin when I start the Tinkerpop Server.


Here is my question:

1. I can see tinkerpop.spark after I start the gremlin.sh, so how can I connect to this tinkerpop.spark ? How can I know if the tinkerpop.spark is running or not ?

2. How to write  the graph data to tinkerpop.spark through gremlin ?

3. I want to use Java to send Web Socket request to Gremlin. The first step is to establish the connection to the server through the following code which connect to localhost as default.
    But if I want to connect a remote machine, how can I do that ? I checked the API "public static Cluster open(String configurationFile)". How can I get the sample of this 
    configurationFile file , and also the sample of the String address in "public static Cluster.Builder build(String address)" .

Cluster cluster = Cluster.open(); Client client = cluster.connect();


I really appreciate if someone can help me with this.

Thanks,
Kyle





















Marko Rodriguez

unread,
Nov 23, 2015, 10:42:19 AM11/23/15
to gremli...@googlegroups.com
Hello Kyle,

Just to be clear, Spark-Gremlin doesn't use GraphX, it has its own distributed "message passing" execution engine built atop of Spark. If you want to use GraphX, then you will need to get the graphRDD into the structure that GraphX requires. To do this, you would have to write your own OutputRDD. Please say what you are trying to accomplish and I can point you in the right direction.

Thanks,
Marko.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/73e87862-263b-4246-aee6-a12b6f2f1be6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kyle Zhu

unread,
Nov 23, 2015, 9:23:47 PM11/23/15
to Gremlin-users
Hi Marko,

Thank you so much for your reply.

I am looking to do the following.

1. build a graph of interconnected entities that have different kind of relationships amongst them. Graph exhibits OLAP like features  i.e it is immutable. It needs to scale out.

2. Compute various metrics like degree, page rank etc.

3. From a web based interface, allow users to specify a node of their choice to start with and then after specifying degrees of separation from the node of interest, bring back the subnet around it of edges and nodes. A near realtime response response paradigm would be desirable from a users perspective.

4. Allow users to navigate around this subnet and traverse around it progressively as desired.

5. Store user selection of subnet of choice for retrieval later on.

6.  At some point in the future, potentially, use clustering algorithms to find groups in the future.

7. Be able to invoke the graph from a Java app server layer which in turn is hooked up to the web layer.


I was wondering if gremlin-Java could be used to connect to GraphX. And use its API to traverse the graph. Or Perhaps use Kafka to connect to spark and GraphX and communicate
asynchronously to the Java mid tier layer.

Thanks,
Kyle

Marko Rodriguez

unread,
Nov 24, 2015, 3:15:16 PM11/24/15
to gremli...@googlegroups.com
Hi,

What you are trying to do seems best solved just using SparkGraphComputer (OLAP) and then use GraphTraversal.subGraph() to grab the local subgraph around the vertex for your OLTP situation.

You can try and use GraphX, but I don't know how that would work. If you do try, please report back your experience as it would be good to know how it goes.

Good luck,
Marko.

Diana Du

unread,
Mar 12, 2019, 11:19:00 AM3/12/19
to Gremlin-users

Hi Kyle,

Any updates or thoughts for this topic?


Thanks

Stephen Mallette

unread,
Mar 12, 2019, 2:07:17 PM3/12/19
to gremli...@googlegroups.com
I'm not sure what you're asking here, but a fair bit has changed since the early days of 3.0.x. You might want to read up on the latest information about SparkGraphComputer:


if you're just asking how to remotely execute gremlin-spark jobs that would be hosted in Gremlin Server then I think you should take a look at this Gremlin Server config:


and the related Gremlin Server init script:




--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages