Bulk loading from HDFS using Hadoop-Gremlin

423 views
Skip to first unread message

Roy Levin

unread,
Jun 11, 2015, 6:50:21 AM6/11/15
to aureliu...@googlegroups.com
Hi,

I am trying to use the BulkLoaderVertexProgram in TinkerPop 3.0.0.M6 with Titan 0.9.0-M1 to load the grateful dead dataset into a Hadoop+C* cluster.
I am using the documentation in:
http://s3.thinkaurelius.com/docs/titan/0.9.0-M1/titan-hadoop-tp3.html

From gremlin.sh I run the following:

g = GraphFactory.open('conf/hadoop-load-gd.properties')
r = g.compute().program(BulkLoaderVertexProgram.build().titan('conf/titan-cassandra-gd.properties').create()).submit().get()

This is the error I get:
java.lang.IllegalStateException: Wrong FS: hdfs://doop-01:9000/user/royl/grateful-dead-vertices.gio, expected: file:///


* What is the correct way to do this?
* Also --- what is the correct way to run this program outside from the gremlin shell?


Thanks,
Roy.



Here are my config files:

conf/hadoop-load-gd.properties
# Hadoop-Gremlin settings
gremlin.graph=com.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=com.tinkerpop.gremlin.hadoop.structure.io.kryo.KryoInputFormat
gremlin.hadoop.graphOutputFormat=com.tinkerpop.gremlin.hadoop.structure.io.kryo.KryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.inputLocation=hdfs://doop-01:9000/user/royl/grateful-dead-vertices.gio
gremlin.hadoop.outputLocation=output
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true

# Giraph settings
giraph.SplitMasterWorker=false
giraph.minWorkers=1
giraph.maxWorkers=1
giraph.zkConnectionAttempts=200
giraph.zkServerPort=2181

# my settings
giraph.pure.yarn.job true
giraph.trackJobProgressOnClient true


conf/titan-cassandra-gd.properties
storage.backend=cassandra
storage.hostname=doop-01
storage.port=9160
storage.cassandra.keyspace=greatfuldead

Edi Bice

unread,
Jun 17, 2015, 12:07:21 PM6/17/15
to aureliu...@googlegroups.com
Hi Roy,

I'm running into the same issue and was wondering if you resolved it?

I just noticed the following when doing :show variables in gremlin console

==>hdfs=org.apache.hadoop.fs.LocalFileSystem@29f0802c
==>local=org.apache.hadoop.fs.LocalFileSystem@29f0802c
gremlin>

It looks to me that Gremlin is not really plugged into Hadoop - ie both local and hdfs are same. As a matter of fact when I do hdfs.ls() in gremlin console I see all the local files.

Maybe you're in a similar corner.

Edi

Matthias Broecheler

unread,
Jun 19, 2015, 8:43:49 PM6/19/15
to aureliu...@googlegroups.com
Please update to Titan09M2 since we added a whole bunch of fixes in there.

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/753a7e0e-4f95-4fc7-af4d-8cfa6a981b05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Roy Levin

unread,
Jun 21, 2015, 2:04:52 AM6/21/15
to aureliu...@googlegroups.com
Hi,

@Edi --- yes I think you are correct. I am getting the same ...

Thanks Matthias, I will try with M2 and update on my findings.

Best,
Roy.

Edi Bice

unread,
Jun 22, 2015, 9:26:13 AM6/22/15
to aureliu...@googlegroups.com
Mathias - unlike Roy I am, and was, using 0.9M2 and still seeing this issue.

Matthias Broecheler

unread,
Jun 22, 2015, 3:31:44 PM6/22/15
to aureliu...@googlegroups.com
Thanks, we will look into this.

Reply all
Reply to author
Forward
0 new messages