turning off gremlin class cache

36 views
Skip to first unread message

Peter Musial

unread,
Sep 17, 2018, 10:55:25 AM9/17/18
to Gremlin-users
Hi all,

I am using JansuGraph in a NodeJS environment and connect to it using gremlin client.  It was observed that gremlin will cache compiled queries as classes, which eventually puts stress on memory and affects latency of query execution.  At my end there is effort to change queries to be parametrized, but that is a significant code change, hence the desire to simply turn off cache all together.  

From the gremlin documentation, we found this flag:

#jsr223.groovy.engine.keep.globals : phantom

Which is supposed to accomplish the task, but having trouble to understand how it works (if it does at all).  Here is a sample code.

'use strict';

const Gremlin = require('gremlin');
const GremlinClient = require('./GremlinClient.js');

const client = Gremlin.createClient(port, host, {path: '/gremlin'});

client.execute(script, {'#jsr223.groovy.engine.keep.globals':'phantom'}, (err, results) => { /* process response */ });

Specific environment information: gremlin 2.7.0 node module, Janus 0.2.1 with Cassandra 3.10

An example of a query from gremlin logs where you can see that bindings include the phantom flag: 

45805 [gremlin-server-worker-1] DEBUG log-aggregator-encoder  - [id: 0x1279ba5c, L:/127.0.0.1:8182 - R:/127.0.0.1:51764] READ: RequestMessage{, requestId=2393b6e0-ba85-11e8-9194-87a8c590e3d7, op='eval', processor='', args={gremlin=g.V().has(...).as(...).optional(__.in(...).as(...).select(...), bindings={#jsr223.groovy.engine.keep.globals=phantom}, accept=application/json, language=gremlin-groovy}}


However, in VisualVM monitor, class unloading is not very predictable and in general the number of loaded classes keeps accumulating.  Perhaps this is the desired behavior, or perhaps I should use other flag value such as 'weak' or 'soft'.  

When the flag is not provided as class cache increases monotonically and so does latency of query execution.  We did a POC with parametrized queries and latency remains constant regardless of the duration of the run.

Some more resources.

Tried several variants of setting it, but nothing worked.
1. added to scripts/empty-sample.groovy
2. added directly to our driver code
3. set Java args -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled to clean up perm heap

some more links:

There is no tenable evidence that the flag works as advertised.

Question to the community. Is my approach to disabling individual queries correct (meaning that indeed there is no caching and each query will have some constant compilation time)?  

Regards,

Peter

Stephen Mallette

unread,
Sep 17, 2018, 11:11:32 AM9/17/18
to Gremlin-users
I think your expectation for what this is supposed to do is a little off. First of all, you can see that we do have a test for phantom reference cleanup and it does seem to work:


but, note that it's cleaning up the global function cache not the cache that holds your scripts. that's a different cache all together and you don't have a lot of control over that. It's basically configured by default to use "soft" cache values which means that cached compiled classes should release in a least-used manner on GC in response to memory demand. There is currently no way to change that behavior:


I wouldn't be against offering more flexibility around cache configuration options - perhaps that would be helpful to folks, especially since soft isn't super predictable.


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/94f1adda-8b70-4022-b059-17cf589b4d7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Dale

unread,
Sep 18, 2018, 11:30:52 AM9/18/18
to gremli...@googlegroups.com
Peter, just curious, have you tried turning on stats?  I wonder if that shows anything useful for this particular issue.

I think caches should always be tunable as different use cases have different needs.  Caffeine also recommends not using 'soft' and instead setting a max value for more predictable behavior [1].  Gremlin Server does this for the side-effects cache.  I think we should do the same here.  I've created https://issues.apache.org/jira/browse/TINKERPOP-1644


Robert Dale


Stephen Mallette

unread,
Sep 18, 2018, 12:50:26 PM9/18/18
to Gremlin-users
Reply all
Reply to author
Forward
0 new messages