hadoop-gremlin with cassandra

Lisa Fiedler

unread,

Jul 5, 2019, 9:11:59 AM7/5/19

to Gremlin-users

Hi everyone,

I am using Janusgraph with cassandra as backend and a hadoop cluster with pseudo distributed mode.

Within the gremlin console I tried the following:

gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cassandra-3.properties')
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
gremlin> g.V().count()

This produced the following output:

15:02:17 WARN org.apache.spark.util.Utils - Your hostname, Woodstock resolves to a loopback address: 127.0.1.1; using 192.168.178.32 instead (on interface wlp3s0)
15:02:17 WARN org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 0:=======================================>1209 (0 + 4) / 5]

The properties file was provided by janusgraph and looks like this:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# JanusGraph Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=localhost
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.serializer=org.apache.spark.serializer.KryoSerializer

Now where are my outputs? This property file apparently tells them to be in output but where is this folder?

I also tried to create it first via

gremlin> hdfs.mkdir('output')

But after the query I still got nothing for

gremlin>hdfs.ls('output')

What am I doing wrong?

Thanks!!

Antriksh Shah

unread,

Jul 5, 2019, 11:02:34 AM7/5/19

to Gremlin-users

Hey Lisa,

g.V().count() would not persist the output on hdfs.

You can execute

g.V().program(BulkDumperVertexProgram.build().create(graph)).limit(1)

You should find the entire graph persisted.

The output location should be the root location from where you execute gremlin.sh

Antriksh Shah

unread,

Jul 5, 2019, 11:03:17 AM7/5/19

to Gremlin-users

The output of g.V().count() is 1209

Lisa Fiedler

unread,

Jul 5, 2019, 11:41:02 AM7/5/19

to Gremlin-users

Hey Antriksh,

Thanks so much!!

Reply all

Reply to author

Forward

hadoop-gremlin with cassandra - outputLocation

Lisa Fiedler

Antriksh Shah

Antriksh Shah

Lisa Fiedler