hadoop-gremlin with cassandra - outputLocation

95 views
Skip to first unread message

Lisa Fiedler

unread,
Jul 5, 2019, 9:11:59 AM7/5/19
to Gremlin-users
Hi everyone,

I am using Janusgraph with cassandra as backend and a hadoop cluster with pseudo distributed mode.
Within the gremlin console I tried the following:


gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cassandra-3.properties')
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
gremlin> g.V().count()

This produced the following output:
15:02:17 WARN  org.apache.spark.util.Utils  - Your hostname, Woodstock resolves to a loopback address: 127.0.1.1; using 192.168.178.32 instead (on interface wlp3s0)
15:02:17 WARN  org.apache.spark.util.Utils  - Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 0:=======================================>1209               (0 + 4) / 5]

The properties file was provided by janusgraph and looks like this:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# JanusGraph Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=localhost
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.serializer=org.apache.spark.serializer.KryoSerializer

Now where are my outputs? This property file apparently tells them to be in output but where is this folder?
I also tried to create it first via
gremlin> hdfs.mkdir('output')

But after the query I still got nothing for
gremlin>hdfs.ls('output')

What am I doing wrong?

Thanks!!

Antriksh Shah

unread,
Jul 5, 2019, 11:02:34 AM7/5/19
to Gremlin-users
Hey Lisa,

g.V().count() would not persist the output on hdfs.
You can execute 
g.V().program(BulkDumperVertexProgram.build().create(graph)).limit(1)  
You should find the entire graph persisted.

The output location should be the root location from where you execute gremlin.sh


 

Antriksh Shah

unread,
Jul 5, 2019, 11:03:17 AM7/5/19
to Gremlin-users
The output of g.V().count() is 1209 

Lisa Fiedler

unread,
Jul 5, 2019, 11:41:02 AM7/5/19
to Gremlin-users
Hey Antriksh,

Thanks so much!!
Reply all
Reply to author
Forward
0 new messages