Making JanusGraph work in both OLTP and OLAP in JanusGraph Server with HBase

851 views
Skip to first unread message

John Helmsen

unread,
Feb 27, 2018, 3:08:56 PM2/27/18
to JanusGraph users
Due to the considerable efforts of HadoopMarc and others on the board, as well as a great deal of blood sweat and tears from our team here, we have a working implementation of JanusGraph 0.1.1 running on HDP 2.6.4.

This implementation has the following properties:

1) Runs both JanusGraph and SparkGraphComputer.
2) Uses HBase 1.1.2 as its backend.
3) Is addressable from Python.

The general description of how we approached it is listed in: http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

The catch, however, is that now we'd like to have this system run as a part of a general program.

To do this, it would be helpful to be able to address all the capabilities as part of a Python program, which requires:
- Running everything through the Gremlin Server, since that is the way that python programs interact with Tinkerpop.
- Having the server somehow have two different methods of addressing the same graph, hopefully through different variables on the server.
- This is because the SparkGraphComputer is not addressable from the JanusGraph instance.  You need to use a HadoopGraph instance to run it.

(If the above is incorrectly assumed, please correct me.)

So, how do we run Spark operations through the server?

HadoopMarc

unread,
Feb 28, 2018, 2:34:25 AM2/28/18
to JanusGraph users
Hi John,

I have never been there, but what did you try? Does anything goes wrong when configuring a HadoopGraph in gremlin server? Any stacktraces?

You may not be aware that you can simply configure multiple graphs in gremlin server. I show two fragments of the .yaml and script files below as an example. From gremlin console each graph is accessible with differently defined Graph and TraversalSource variables.

channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs
: {
  graph
: conf/gremlin-server/titan-berkeleyje-server.properties,
  graph_tst
: conf/gremlin-server/tst.properties}
plugins
:
 
- tinkerpop.tinkergraph
 
- aurelius.titan
scriptEngines
: {
  gremlin
-groovy: {
    imports
: [java.lang.Math],
    staticImports
: [java.lang.Math.PI],
    scripts
: [scripts/empty-sample.groovy],
    config
: {
      compilerCustomizerProviders
: {
             
"org.apache.tinkerpop.gremlin.groovy.jsr223.customizer.ThreadInterruptCustomizerProvider":[],
             
"org.apache.tinkerpop.gremlin.groovy.jsr223.customizer.TimedInterruptCustomizerProvider":[10000],
             
"org.apache.tinkerpop.gremlin.groovy.jsr223.customizer.CompileStaticCustomizerProvider":["org.apache.tinkerpop.gremlin.groovy.jsr223.customizer.SimpleSandboxExtension"]}}}}

// define the default TraversalSource to bind queries to.
g
= graph.traversal()
g_tst
= graph_tst.traversal()


HTH,    Marc



Op dinsdag 27 februari 2018 21:08:56 UTC+1 schreef John Helmsen:

John Helmsen

unread,
Feb 28, 2018, 3:53:32 PM2/28/18
to JanusGraph users
Thanks for the response Marc,

What I am seeing so far, is the following.  Here is my yaml file:


host: localhost
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
  graph: conf/gremlin-server/janusgraph-stuff-server.properties,
  hgraph: conf/hadoop-graph/hadoop-gryo-stuff.properties}
plugins:
  - janusgraph.imports
scriptEngines: {
  gremlin-groovy: {
    imports: [java.lang.Math],
    staticImports: [java.lang.Math.PI],
    scripts: [scripts/stuff.groovy]},
  gremlin-python: {}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}


The stuff.groovy script is the following:

// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]

// defines a sample LifeCycleHook that prints some output to the Gremlin Server console.
// note that the name of the key in the "global" map is unimportant.
globals << [hook : [
        onStartUp: { ctx ->
            ctx.logger.info("Executed once at startup of Gremlin Server.")
        },
        onShutDown: { ctx ->
            ctx.logger.info("Executed once at shutdown of Gremlin Server.")
        }
] as LifeCycleHook]

// define the default TraversalSource to bind queries to - this one will be named "g".
globals << [g : graph.traversal()]
globals << [hg : hgraph.traversal()]

- When looked at through the remote system, both g and hg return as expected.  However, when we try to query with hg:

gremlin> :> hg.V().count()
IllegalStateException
Type ':help' or ':h' for help.
Display stack trace? [yN]
 
and if I try to use the graph computer:

Enter code here...gremlin> :> hg.withComputer(SparkGraphComputer).V().count()
No such property: SparkGraphComputer for class: Script9
Type ':help' or ':h' for help.
Display stack trace? [yN]


So I assume some sort of plugins are not active on the server, and I am going to try to activate those.  If you have any other ideas, please let me know.

John Helmsen

unread,
Feb 28, 2018, 4:10:57 PM2/28/18
to JanusGraph users
Okay, so I added tinkerpop.hadoop and tinkerpop.spark to the plugins list and the following happened:

Enter code here...gremlin> :> hg.V().count()
org.apache.tinkerpop.gremlin.groovy.plugin.RemoteException
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> :> hg.V().count()
TraversalInterruptedException
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> :> hg.V().count()
==>12
gremlin> :> hg.V().count()
==>12


So it failed the first two times, but since then it works fine.  It seemed to quit quickly earlier, so perhaps it learned to be patient?
Anyhow, it works for:
g.V().count()
hg.V().count() //Long Yarn Run
g.V().properties()
hg.V().properties() //Also a long yarn run

John Helmsen

unread,
Feb 28, 2018, 4:40:58 PM2/28/18
to JanusGraph users
Okay, so I accessed the server through GremlinPython (which was kind of the whole point of the server) and the following happens:

g.V().count().toList() => [12L]
hg.V().count().toList() => Lots of Yarn, but then [12L]
g.V().properties().toList() => Graph of the Gods Properties
hg.V().properties().toList() => Throws a stack trace for some odd reason in the toList() routine
g.V().valueMap().toList() => Graph of the Gods Properties in a slightly different form
hg.V().valueMap().toList() => Graph of the Gods Properties in the same form as g.V().valueMap().toList()

g.E().<function>.toList() works the same as hg.E().<function>.toList(), including function==properties().

Don't know why only vertex properties are giving trouble.

So clearly, this works, but there are some operations that seem to have a little trouble.  So far, I've found work arounds, so we'll keep pushing forward.


On Wednesday, February 28, 2018 at 2:34:25 AM UTC-5, HadoopMarc wrote:
Reply all
Reply to author
Forward
0 new messages