Janusgraph + Spark standalone without hadoop

Wei Ding

unread,

Aug 22, 2018, 11:50:42 AM8/22/18

to JanusGraph developers

Hi All,

I am pretty new to Janusgraph and want to get some suggestions from you. Previously I posted a question about using ES as backend storage, and got some good feedback from Jason (Thanks!). Now here comes another question: If I want to use janusgraph spark standalone without Hadoop for OLAP, can some one point me a direction? Basically I have spark standalone deployed on kubernetes, how could that be used for OLAP?

Thanks a lot!

Wei

Jerry He

unread,

Aug 23, 2018, 8:56:58 PM8/23/18

to Wei Ding, JanusGraph developers

I don't think it will work. Spark needs input (to read graph data) and output (to write graph data). JanusGraph currently only provides Hadoop InputFormat based reading from JanusGraph for OLAP.

In Tinkerpop, there are InputRDD and OutputRDD interfaces, which are by Spark (SpackGraphComputer). (Search for Tinkerpop InputRDD.)

Unfortunately, JanusGraph provides no implementations other than the InputFormat based at the moment.

Thanks,

Jerry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/512224e3-5b20-4e31-afa5-e2fd74591182%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jerry He

unread,

Aug 23, 2018, 9:23:20 PM8/23/18

to Wei Ding, JanusGraph developers

That being said, to be clear, you don't need a Hadoop cluster or any kind if that is what you mean. JanusGraph packages the Hadoop jars it needs. That is all you need to run SparkComputer on JansGraph.

Thanks

Jerry

On Thu, Aug 23, 2018 at 5:56 PM Jerry He <jerr...@gmail.com> wrote:

I don't think it will work. Spark needs input (to read graph data) and output (to write graph data). JanusGraph currently only provides Hadoop InputFormat based reading from JanusGraph for OLAP.
In Tinkerpop, there are InputRDD and OutputRDD interfaces, which are by Spark (SpackGraphComputer). (Search for Tinkerpop InputRDD.)
Unfortunately, JanusGraph provides no implementations other than the InputFormat based at the moment.

Thanks,

Jerry

On Wed, Aug 22, 2018 at 8:46 AM, Wei Ding <dw84...@gmail.com> wrote:

Hi All,
I am pretty new to Janusgraph and want to get some suggestions from you. Previously I posted a question about using ES as backend storage, and got some good feedback from Jason (Thanks!). Now here comes another question: If I want to use janusgraph spark standalone without Hadoop for OLAP, can some one point me a direction? Basically I have spark standalone deployed on kubernetes, how could that be used for OLAP?

Thanks a lot!

Wei

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.

Debasish Kanhar

unread,

Aug 24, 2018, 8:05:06 AM8/24/18

to JanusGraph developers

@Jerry:

JanusGraph doesn't need Hadoop Cluster to run OLAP yes, but doesn't JanusGraph needs to point to a live Hadoop Cluster by setting HADOOP_CONF_DIR in CLASSPATH? I guess that was my understanding, and that was missing piece in docs for which it took me really long time to crack OLAP using Spark cluster.

Jerry He

unread,

Aug 24, 2018, 11:34:24 AM8/24/18

to Debasish Kanhar, JanusGraph developers

The need for Hadoop conf (only the hdfs conf) is to read from or write to graph data files on hdfs. Direct interacting with JanusGraph backend without involving any graph data files on hdfs won't need that, I think.

Thanks

Jerry

To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/d5561b80-bd25-4529-8698-cd605f5bab0a%40googlegroups.com.

Debasish Kanhar

unread,

Aug 24, 2018, 11:41:49 AM8/24/18

to JanusGraph developers

Ah. I think we might be wrong in our understanding there. As I was trying to read the graph data from my underlaying backend (Cassandra) and not any Graph stored on HDFS using JanusGraph's Cassandra3InputFormat class. The same was also failing when I was trying to run OLAP using a Spark Cluster without setting HADOOP_CONF_DIR variable.

Ideally that should not have been scenario, as TP > 3.3 doesn't need intermediate HDFS storage, but doesn't look like that's happening. Well we can track this thing if needed. :-)

Jerry He

unread,

Aug 24, 2018, 1:13:38 PM8/24/18

to Debasish Kanhar, JanusGraph developers

Yeah, a stack trace from Gremlin will us help to see what is going on. That should not be a dependency in that case.

Thanks,

Jerry

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/042b6daa-ff2f-4230-80b4-c10ba5a740cf%40googlegroups.com.

Reply all

Reply to author

Forward