newbe questions - how to save/access JanusGraph in Cassandra via gremlin_python?

982 views
Skip to first unread message

John B

unread,
Jul 25, 2017, 12:31:03 PM7/25/17
to Gremlin-users
Hello. I am new to all parts of a Cassandra, JanusGraph, gremlin_python setup. And my Java skills are weak. I have a basic setup working via gremlin_python (all downloaded via JanusGraph), but don't understand how to use gremlin_python to save a new graph to Cassandra, or how to access a JanusGraph saved in Cassandra. All the tutorials seem to use in-memory tinkergraphs for illustration. I am trying to learn Cassandra-JanusGraph-gremlin_python, and running the scripts only on my laptop. My setup is JanusGraph 0.1.1 with tinkergraph-gremlin-3.2.3 and gremlin-server-3.2.3.

My gremlin-server.yaml file contains:

host: localhost
port: 8182
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
  graph: conf/gremlin-server/janusgraph-cassandra-es-server.properties}
plugins:
  - janusgraph.imports
scriptEngines: {
  gremlin-groovy: {
    imports: [java.lang.Math],
    staticImports: [java.lang.Math.PI],
    scripts: [scripts/empty-sample.groovy]},
    gremlin-python: {},
    gremlin-jython: {}}

My janusgraph-cassandra-es-server.properties file contains:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cassandrathrift
storage.hostname=127.0.0.1

The following python code works to access the in-memory example graph:

from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
a = g.V().both()[1:3].toList()

So far so good. But using gremlin_python only, how do I create a new graph and save it to Cassandra, giving it a name for later access? How do I query a JanusGraph that has been saved in Cassandra, make changes to it, and then save it back to Cassandra? Finally, how can I see what has been saved in Cassandra? The JanusGraph package does not include a bin/cqlsh to open a cql shell. Is there some other way to manage Cassandra while the gremlin and cassandra servers are running?

If I want to move a large set of data from a relational db into JanusGraph form, should I use the python client for JanusGraph, rather than gremlin_python? For example, I could import gremlin_python and a relational database connector, write sql queries in python and then send results to the Cassandra store for the graph db version. I'm not sure what the best practices are for using different python tools for Cassandra, JanusGraph, and TinkerPop. Any thoughts on this would be appreciated.

I've spent quite a while searching the web for answers to these questions, but no luck yet. It seems that tutorials/explanations are in short supply, other than for showing the simple python code that I've included above.

John



Jason Plurad

unread,
Jul 25, 2017, 4:35:59 PM7/25/17
to Gremlin-users
Hi John,

Great questions. Welcome to TinkerPop.

Graph management with the Gremlin Server is currently a manual process. There is a pull request in queue that will start to help with this. In the meantime, you can add multiple graph definitions in the gremlin-server.yaml. For example:

graphs: {
  graph
: conf/gremlin-server/janusgraph-cassandra-es-server.properties,
  godsgraph
: conf/gremlin-server/graph-of-the-gods.properties
}

graph-of-the-gods.properties looks pretty similar to the other one, with an extra property for the keyspace:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage
.backend=cassandrathrift
storage
.hostname=127.0.0.1
# each graph gets a separate keyspace, default keyspace is named janusgraph
storage
.cassandra.keyspace=godsgraph

Then update the scripts/empty-sample.groovy to bind a graph traversal source for the new graph:

globals << [g : graph.traversal(), godsg : godsgraph.traversal()]

Then from your Python script, you can connect to either or both:

graph = Graph()
g
= graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))

godsg
= graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','godsg'))

At this point, you can make changes to the graphs using Gremlin, and those changes would be persisted in the JanusGraph's Cassandra backend. For example:

godsg.addV('god').property('name', 'jupiter').property('age', 5000).as_('J').addV('titan').property('name', 'saturn').property('age', 10000).as_('S').addE('father').from_('J').to('S').iterate()

You can use gremlin_python to create and to explore your graph. So as you mentioned, you could use a relational database connector, write sql queries in python, then use Gremlin to construct the vertices/edges/properties for your graph. JanusGraph doesn't have a separate Python client right now, so you can use the same TinkerPop client library.

You're right that JanusGraph doesn't package a cqlsh shell. If you want to get more serious about working with a Cassandra backend, you're best off downloading it from the Apache Cassandra website rather than using the quick-start version packaged in JanusGraph.

-- Jason

John B

unread,
Jul 28, 2017, 11:48:18 AM7/28/17
to Gremlin-users
Wonderful Jason. This is exactly what I need. Thanks.
John
Reply all
Reply to author
Forward
0 new messages