CQL for OLAP issue with Syclla as backed both Local and Yarn Mode

rakesh...@zeotap.com

unread,

May 2, 2019, 1:48:18 PM5/2/19

to JanusGraph users

Hi All,

I am unable to run any analytics (OLAP) on JanusGraph with Syclla as backend.

I tried both in Local and Yarn mode on AWS EMR cluster

In Yarn mode, it Throws an exception 10:07:58 ERROR org.apache.spark.SparkContext - Error initializing SparkContext.
In Local mode, It runs perfectly around 500 tasks and gives an empty output (This I tried with SparkGraphComputer it gives the result)

I build the distribution archive from here (from branch Issue_985_spark_via_cql)

Following are the properties given in conf/hadoop-graph/read-cql.properties:

# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
#
# JanusGraph Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.hostname=X.0.X.1
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=graph1
storage.cassandra.keyspace=graph1

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

#
# SparkGraphComputer Configuration
#
#spark.master=spark://X.X.X.X:7077
spark.master=yarn
spark.submit.deployMode=client
spark.yarn.jars=/usr/lib/spark/jars/
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Full stack error while running in yarn mode:

ava.lang.IllegalStateException: org.apache.spark.SparkException: Unable to load YARN support
	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50)
	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:192)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:214)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:264)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034)
	at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:460)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:196)
	at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)
	at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)
	at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
	at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:130)
	at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:89)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:146)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:453)
Caused by: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Unable to load YARN support
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)
	... 56 more
Caused by: org.apache.spark.SparkException: Unable to load YARN support
	at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:405)
	at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:400)
	at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:400)
	at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:425)
	at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2387)
	at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:156)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:351)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
	at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
	at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52)
	at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60)
	at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:233)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
	at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:401)
	... 18 more

Is there anything required as classpath or required jars? also whats the problem with local mode?

Do we have any alternative for this purpose (analytics on Janusgraph using spark), Currently I am running connected component using graphframes.

you help is appreciated, thanks in advance :)

HadoopMarc

unread,

May 3, 2019, 2:33:34 PM5/3/19

to JanusGraph users

Hi,

Regarding spark-yarn, this was included in the spark-gremlin plugin for the gremlin-console distributed with TinkerPop since TinkerPop-3.3.1, but it did not make it into the spark-gremlin maven dependency, yet. Any project with JanusGraph and spark-yarn OLAP queries has to explicitly include the spark-yarn maven dependency itself.

If you work in the gremlin-console of the JanusGraph distribution, you can add the spark-yarn jars manually, like in:

http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html

Regarding the OLAP query output, you did not specify what line of code, either in gremlin console or in your project, should have resulted in any output. If you did not take the graph.traversal().withComputer() approach, take a look at:

http://tinkerpop.apache.org/docs/current/reference/#interacting-with-hdfs

Cheers, Marc

Op donderdag 2 mei 2019 19:48:18 UTC+2 schreef rakesh...@zeotap.com:

rakeshsh...@gmail.com

unread,

May 6, 2019, 11:17:12 AM5/6/19

to JanusGraph users

Thanks for your response Marc,

I tried the link you provided above, But still, it's showing the same error.

Regarding OLAP query in local mode, I am running the following queries

//properties which provided above, with spark.master=local[4]

graph = GraphFactory.open('conf/hadoop-graph/read-cql.properties')

g = graph.traversal().withComputer(SparkGraphComputer)

g.V().limit(5)

g.V().count()

In both above queries, more than 500 tasks run and finished with an empty output screen

If I run without SparkGraphComputer it's giving the proper output for limit or has queries

rakeshsh...@gmail.com

unread,

May 6, 2019, 11:54:44 AM5/6/19

to JanusGraph users

throws below error:

gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-syclla.properties')
==>hadoopgraph[cqlinputformat->gryooutputformat]
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().limit(2)
java.lang.NullPointerException
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.IllegalStateException: java.lang.NullPointerException

Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException

	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)
	... 56 more

Caused by: java.lang.NullPointerException
	at org.apache.tinkerpop.gremlin.hadoop.process.computer.AbstractHadoopGraphComputer.loadJars(AbstractHadoopGraphComputer.java:169)
	at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:251)

	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Used read-cql-properties as follow:

#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output


#
# JanusGraph Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cql

janusgraphmr.ioformat.conf.storage.hostname=10.X.X.X
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=graphname


#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

cassandra.input.keyspace= graphnamecassandra.input.predicate=0c00020b0001000000000b000200000000020003000800047fffffff0000
cassandra.input.columnfamily=edgestore
cassandra.range.batch.size=214748364

#
# SparkGraphComputer Configuration
#

spark.master=yarn
spark.deploy.mode=client
#spark.master=local[4]
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

gremlin.spark.persistContext=true

# Default Graph Computer
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

When I change spark.master=local[4]

it starts some execution and print empty output after all tasks finishes,

followed this link http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html

Nitin Poddar

unread,

May 27, 2020, 10:12:35 AM5/27/20

to JanusGraph users

Hi Rakesh,

were you able to resolve this issue? I am getting the exact same error message and there isn't much help available online around this. Could you please help me here.

Thanks,

Nitin

Evgeniy Ignatiev

unread,

May 27, 2020, 11:29:47 AM5/27/20

to janusgra...@googlegroups.com

Hello Nitin,

Looks like your installation lacks required Spark jars - https://groups.google.com/d/msg/gremlin-users/LYv-cvZ66hU/TJUTvLzCAAAJ - you have to provide full installation
Please, also see:

Best regards,
Evgenii Ignatev.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/b65d7bd7-2df4-42c3-819e-f6bd15c9825a%40googlegroups.com.

-- 
Best regards,
Evgeniy Ignatiev.

rakesh...@zeotap.com

unread,

May 27, 2020, 12:45:07 PM5/27/20

to JanusGraph users

Hi Nitin,

Yeah I was able to resolve issue mentioned above, please follow the below steps

Steps,

create bash file eg jg.sh

#!/bin/bash

GREMLIN_HOME=/tmp/janusgraph-0.4.0-hadoop2
cd $GREMLIN_HOME

# Have janusgraph find the hadoop and hbase cluster configs and spark-yarn dependencies
#export CLASSPATH=/etc/hadoop/conf:$GREMLIN_HOME/lib/*:$GREMLIN_HOME/lib2/*:/usr/lib/hadoop/client/*:/usr/lib/hadoop/*:/usr/lib/spark/jars/*

export SPARK_HOME=/usr/lib/spark

export CLASSPATH=/etc/hadoop/conf:/usr/lib/spark/jars/*:$GREMLIN_HOME/lib/*:$GREMLIN_HOME/lib2/*:/usr/lib/hadoop/client/*:/usr/lib/hadoop/*

# Have hadoop find its native libraries
# export JAVA_OPTIONS="-Djava.library.path=/usr/lib/hadoop/client/*:/usr/lib/hadoop/*:/usr/lib/spark/jars/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*"
export JAVA_OPTIONS="-Djava.library.path=/usr/lib/spark/jars/*:/usr/lib/hadoop/client/*:/usr/lib/hadoop/*"

# Does not work for spark-yarn, see spark.yarn.appMasterEnv.CLASSPATH and
# spark.executor.extraClassPath. Set nevertheless to get rid of the warning.
export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty

bin/gremlin.sh

copy jg.sh to the bin folder of Janusgraph folder
update GREMLIN_HOME to point to janusgraph folder absolute path (e.g - `GREMLIN_HOME=/mnt/spark-olap/janusgraph-0.4.0-hadoop2`)
cd jg-folder
bin/jg011.sh
graph = GraphFactory.open('conf/hadoop-graph/read-cql-dynalloc.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
graph.compute(SparkGraphComputer).program(BulkDumperVertexProgram.build().create()).submit().get()

Was unable to finish the job as we have more than billions of linkages in the graph and after that didn't get time to start OLAP again, let me know if you get any progress and best practices on Tuning for large scale graph for OLAP queries

To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/b65d7bd7-2df4-42c3-819e-f6bd15c9825a%40googlegroups.com.

Nitin Poddar

unread,

May 27, 2020, 12:55:52 PM5/27/20

to JanusGraph users

Thank you Evgenii, I followed the post and have been trying to resolve the issue for almost a week now. I might been getting some version conflicts as well. Will try again.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/b65d7bd7-2df4-42c3-819e-f6bd15c9825a%40googlegroups.com.

Nitin Poddar

unread,

May 27, 2020, 12:58:10 PM5/27/20

to JanusGraph users

Hi Rakesh,

Thank you for your reply. I will definitely share the performance tuning best practices and learning as I find more. However, can you please share the properties file (read-cql-dynalloc.properties) you are using to start the GraphFactory

Thanks

Nitin

rakesh...@zeotap.com

unread,

May 27, 2020, 1:15:16 PM5/27/20

to JanusGraph users

Sure please find the read-cql-dynalloc.properties

Can change param as per your requirement.

# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
# gremlin.hadoop.outputLocation=output
# gremlin.hadoop.outputLocation=s3://bucket-name/jgdump_20190711
gremlin.hadoop.outputLocation=/tmp/jgdump_20190711_2


#
# JanusGraph Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cql

janusgraphmr.ioformat.conf.storage.hostname=<hostName>
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cql.keyspace=keyspaceName


#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

cassandra.input.widerows=false
cassandra.range.batch.size=2147483640

cql.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cql.input.widerows=false
cql.range.batch.size=2147483640

spark.master=yarn
spark.deploy.mode=client
spark.jars=hdfs:///tmp/janusgraph-0.4.0-hadoop2/lib/
spark.yarn.queue=long_run
spark.serializer=org.apache.spark.serializer.KryoSerializer
#gremlin.spark.graphStorageLevel=MEMORY_AND_DISK
gremlin.spark.graphStorageLevel=DISK_ONLY
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
spark.executor.extraJavaOptions="-verbose:class"
spark.executor.instances=500
spark.executor.cores=2
spark.executor.memory=9g
spark.yarn.executor.memoryOverhead=2g
spark.driver.memory=5g
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true


gremlin.spark.persistContext=true

# Default Graph Computer
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

spark.cassandra.input.page.row.size=1000000
spark.cassandra.input.split.size=1000000
spark.cassandra.input.split.size_in_mb=256

spark.cql.input.page.row.size=1000000
spark.cql.input.split.size=1000000
spark.cql.input.split.size_in_mb=256

mapred.max.split.size=268435456
mapreduce.input.fileinputformat.split=268435456
spark.network.timeout=240

Nitin Poddar

unread,

May 27, 2020, 1:55:34 PM5/27/20

to JanusGraph users

Thanks Rakesh. I will try the with changes from your properties file and let you know if I face any issues. I have been struggling with this for over a week now. :)

Just curios, I see that you did not use Elasticsearch for indexing, any reasons why? It can significantly improve your OLAP performance.

Thanks,

Nitin

Evgeniy Ignatiev

unread,

May 27, 2020, 1:58:52 PM5/27/20

to janusgra...@googlegroups.com

How does ES affects OLAP performance? Correct me, if I am wrong, but unless it is explicitly used in Spark custom code, JanusGraph integration will not leverage it, and it is definitely not being contacted when loading graph data in-memory for Spark VertexProgram execution.

Best regards,
Evgenii Ignatev.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/b4d5f6a0-2c44-484a-b02a-4628ea81dc66%40googlegroups.com.

Reply all

Reply to author

Forward