JanusGraph - OLAP Traversal Fails with Spark + Cassandra with SSL

23 views
Skip to first unread message

Saket P

unread,
May 3, 2023, 6:33:12 AM5/3/23
to Gremlin-users
Hi Gremlin Users,
I am facing issue when using olap traversal on janusgraph with backend as cassandra with ssl enabled.
When i follow the docs as mentioned in JanusGraph with TinkerPop’s Hadoop-Gremlin - JanusGraph, and when cassandra is not ssl enabled, i am able to traverse the graph with SparkGraphComputer.
But in development environment we have cassandra which is ssl enabled and performing olap traversals on this configuration through gremlin console is not working. We are seeing errors when connection is getting established with cassandra. Cassandra server rejects the request siting it is not a ssl type of connection. Similar configuration ( related to ssl and cassandra) is working fine on OLTP traversals (normal janusgraph ).Same truststore config and cassandra endpoints iwth credentials work when using regular JanusGraph for traversals. (OLTP Traversals)

I feel somehow the ssl configuration is not being passed over to spark executor which tries to establish connection with cassandra or I am missing some configuration which needs to be there to enable ssl communication with cassandra from spark.


Below is olap config:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
#
# JanusGraph Cassandra InputFormat configuration
#
# These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cql
storage.backend=cql
janusgraphmr.ioformat.conf.storage.ssl.enabled=true
storage.ssl.enabled=true
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=cassandra.cassandra.svc.cluster.local
storage.hostname=cassandra.cassandra.svc.cluster.local
anusgraphmr.ioformat.conf.storage.port=9042
conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.username=cassandra
storage.username=cassandra
janusgraphmr.ioformat.conf.storage.password=cassandra123
storage.password=cassandra123
# This specifies the keyspace where data is stored.
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
storage.cql.keyspace=janusgraph
janusgraphmr.ioformat.conf.storage.lock.wait-time = 60000
storage.lock.wait-time = 60000

janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/etc/config/tls/gremlin/client/truststore
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.type=jks

storage.cql.ssl.enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/gremlin/client/truststore
storage.cql.ssl.truststore.password=cassandra123
storage.cql.ssl.truststore.type=jks

janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE
storage.cql.read-consistency-level=ONE

janusgraphmr.ioformat.conf.cache.db-cache = true
janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20
janusgraphmr.ioformat.conf.cache.db-cache-time = 180000
janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.keyspace=janusgraph
cassandra.input.widerows=true

#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator





Below is the error encountered on traversal:
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
10:17:48 WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin> graph=GraphFactory.open('conf/olapssl.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().limit(10).count()
10:18:52 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.cassandra.svc.cluster.local/10.0.165.158:9042 (com.datastax.driver.core.exceptions.TransportException: [cassandra.cassandra.svc.cluster.local/10.0.165.158:9042] Connection has been closed))
Type ':help' or ':h' for help.
Display stack trace? [yN]



Below is the error from cassandra database logs:
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 




HadoopMarc

unread,
May 5, 2023, 8:54:46 AM5/5/23
to Gremlin-users
Reply all
Reply to author
Forward
0 new messages