Hi,
I am trying to use the BulkLoaderVertexProgram in TinkerPop 3.0.0.M6 with Titan 0.9.0-M1 to load the grateful dead dataset into a Hadoop+C* cluster.
I am using the documentation in:
http://s3.thinkaurelius.com/docs/titan/0.9.0-M1/titan-hadoop-tp3.html
From
gremlin.sh I run the following:
g = GraphFactory.open('conf/hadoop-load-gd.properties')
r = g.compute().program(BulkLoaderVertexProgram.build().titan('conf/titan-cassandra-gd.properties').create()).submit().get()
This is the error I get:
java.lang.IllegalStateException: Wrong FS: hdfs://doop-01:9000/user/royl/grateful-dead-vertices.gio, expected: file:///
* What is the correct way to do this?
* Also --- what is the correct way to run this program outside from the gremlin shell?
Thanks,
Roy.
Here are my config files:
conf/hadoop-load-gd.properties
# Hadoop-Gremlin settings
gremlin.graph=com.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=com.tinkerpop.gremlin.hadoop.structure.io.kryo.KryoInputFormat
gremlin.hadoop.graphOutputFormat=com.tinkerpop.gremlin.hadoop.structure.io.kryo.KryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.inputLocation=hdfs://doop-01:9000/user/royl/grateful-dead-vertices.gio
gremlin.hadoop.outputLocation=output
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
# Giraph settings
giraph.SplitMasterWorker=false
giraph.minWorkers=1
giraph.maxWorkers=1
giraph.zkConnectionAttempts=200
giraph.zkServerPort=2181
# my settings
giraph.pure.yarn.job true
giraph.trackJobProgressOnClient true
conf/titan-cassandra-gd.properties
storage.backend=cassandra
storage.hostname=doop-01
storage.port=9160
storage.cassandra.keyspace=greatfuldead