Errors with GiraphGraphComputer

vnma...@gmail.com

unread,

Jun 29, 2016, 2:04:13 PM6/29/16

to Gremlin-users

Hello,

I am having some issues with running the GiraphGraphComputer in the gremlin console. I have been able to successfully install the tinkerpop.giraph plugin and activate it, and the HADOOP_GREMLIN_LIBS variable is set to the ext/giraph-gremlin/lib directory.

I am trying to run the gremlin console on VM. The actual graph I am trying to read in has been successful using the SparkGraphComputer, but when I try and use Giraph, I get the following output:

plugin activated: tinkerpop.giraph

gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-script.properties')

==>hadoopgraph[scriptinputformat->graphsonoutputformat]

gremlin> g = graph.traversal(computer(GiraphGraphComputer))

==>graphtraversalsource[hadoopgraph[scriptinputformat->graphsonoutputformat], giraphgraphcomputer]

gremlin> g.V().count()

17:54:13 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17:54:14 INFO org.apache.tinkerpop.gremlin.hadoop.process.computer.giraph.GiraphGraphComputer - HadoopGremlin(Giraph): TraversalVertexProgram[GraphStep([],vertex), CountGlobalStep, ComputerResultStep]

java.lang.IllegalStateException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time!

Also, my properties file looks like this:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat

gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat

gremlin.hadoop.jarsInDistributedCache=true

gremlin.hadoop.inputLocation=data/mygraph.txt

gremlin.hadoop.scriptInputFormat.script=data/script-input-tinkerpop.groovy

gremlin.hadoop.outputLocation=output

#####################################

# GiraphGraphComputer Configuration #

#####################################

giraph.minWorkers=1

giraph.maxWorkers=2

giraph.useOutOfCoreGraph=true

giraph.useOutOfCoreMessages=true

mapred.map.child.java.opts=-Xmx1024m

mapred.reduce.child.java.opts=-Xmx1024m

giraph.numInputThreads=4

giraph.numComputeThreads=4

# giraph.maxPartitionsInMemory=1

# giraph.userPartitionCount=2

####################################

# SparkGraphComputer Configuration #

####################################

spark.master=local[4]

# spark.master=yarn-client

spark.executor.memory=1g

spark.serializer=org.apache.spark.serializer.KryoSerializer

# spark.kryo.registrationRequired=true

# spark.storage.memoryFraction=0.2

spark.eventLog.enabled=true

spark.eventLog.dir=tmp/spark-event-logs

# spark.ui.killEnabled=true

Has anyone else been having similar issues or figured out how to run the GiraphGraphComputer?

Jason Plurad

unread,

Jun 29, 2016, 4:46:52 PM6/29/16

to Gremlin-users

Try adding this property

giraph.SplitMasterWorker=false

I have took notes here https://github.com/pluradj/ambari-vagrant/blob/tp3/ubuntu14.4/tp3/GiraphGraphComputer.md

-- Jason

vnma...@gmail.com

unread,

Jun 29, 2016, 5:49:28 PM6/29/16

to Gremlin-users

Hi Jason,

Thanks for your reply. I added that to the properties file, and that issue no longer appears. Unfortunately, now I am encountering a connection issue with Zookeeper.

Here is the (first) error that shows up when i try to run the same commands as above:

21:39:16 WARN org.apache.giraph.zk.ZooKeeperManager - onlineZooKeeperServers: Got ConnectException

java.net.ConnectException: Connection refused

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:589)

at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)

at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)

at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)

at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)

at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor

Does anyone know of any configurations or properties to deal with ZooKeeper?

Jason Plurad

unread,

Jun 30, 2016, 9:34:52 AM6/30/16

to Gremlin-users

Giraph will start ZooKeeper on its own on port 22181, if you don't specify your own. Have you successfully run any of the standalone Giraph samples to confirm it works with your setup?

If you stand up your own ZooKeeper, you specify it in the properties file like this:

# Use external ZooKeeper instead of local ZooKeeper (optional)
giraph.zkList=192.168.0.1:2181

-- Jason

Marko Rodriguez

unread,

Jun 30, 2016, 9:39:02 AM6/30/16

to gremli...@googlegroups.com

Hi,

Note that if you let Giraph standup ZooKeeper “on the fly” for you, you will get 1 or 2 “connection refused” Exceptions before a valid connection is made because Giraph tries to connect before ZooKeeper is fully loaded. This is a known “issue” discussed in the Giraph documentation. Its not really an issue as it just throws an Exception and retries, but it does fill your logs with Exception messages unfortunately.

HTH,

Marko.

http://markorodriguez.com

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/a8ec91d8-24ab-463a-a338-eeeff4cae3b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

HadoopMarc

unread,

Aug 20, 2017, 9:50:18 AM8/20/17

to Gremlin-users

Hi gremlin on Giraph testers,

I also found that to run the GiraphGraphComputer example from the ref docs locally (so without external or pseudo hadoop services), you need to add to gremlin-console's classpath (working form gremlin-console' s root):

export CLASSPATH=$PWD/lib/*
bin/gremlin.sh

This seems to be a bug: GiraphGraphComputer attempts to start a Zookeeper from its Zookeeper directory using the classpath from gremlin.sh with relative paths. So, adding the same artifacts with absolute paths to the classpath as above can be used as a workaround.

If nobody reports back that it does work without the workaround classpath, I'll make the ticket.

Cheers, Marc

Op donderdag 30 juni 2016 15:39:02 UTC+2 schreef Marko A. Rodriguez:

HadoopMarc

unread,

Aug 20, 2017, 10:39:51 AM8/20/17

to Gremlin-users

Hi all,

In retracing the changes I made, I hit on a step I omitted above (which is kind of obvious except for the hundreds of lines of output logging without a sensible cue...). When using GiraphGraphComputer locally, hdfs falls back to the local file system and the corresponding line in conf/hadoop/hadoop-gryo.properties should read:

gremlin.hadoop.inputLocation=data/tinkerpop-modern.kryo

Marc

Op zondag 20 augustus 2017 15:50:18 UTC+2 schreef HadoopMarc:

HadoopMarc

unread,

Aug 23, 2017, 3:23:41 PM8/23/17

to Gremlin-users

OK, I wrapped up this entire discussion in the ticket below:
https://issues.apache.org/jira/browse/TINKERPOP-1757

Cheers, Marc

Op zondag 20 augustus 2017 16:39:51 UTC+2 schreef HadoopMarc:

Reply all

Reply to author

Forward