JanusGraph 0.5.2 and BigTable

Assaf Schwartz

unread,

Dec 17, 2020, 3:54:32 AM12/17/20

to JanusGraph users

Hi All,

I'm experiencing an issues with running JanusGraph (on top of GKE) against BigTable.

This is the general setup description:

We are using a single node BigTable cluster (for development / integration purposes) with the vanilla 0.5.2 docker.
Indexing is configured to be done with ES (also running on GKE)
JanusGraph is configured through environment variables:
- janusgraph.storage.backend=hbase
- janusgraph.storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
- janusgraph.storage.hbase.ext.google.bigtable.project.id=<project_id>
- janusgraph.storage.hbase.ext.google.bigtable.instance.id=<bt>
Interaction with JanusGraph are done only through a single gRPC server that is running gremlin-python, let's call it DB-SERVER.
Last time we've done testing against BT was with version 0.4.1 of JanusGraph, precompiled to support HBase1.
All of our components communicate via gRPC.

Description of the problem:

The DB-SERVER creates a Vertex i, generate some XML to represent work to be done, and sends it to another service for processing, let's call in ORCHESTRATOR.
The ORCHESTRATOR generates two properties, w and r (local identifiers) and sends them back to the DB-SERVER, so they will be set as properties on Vertex i. These two properties are also mixed String indexes.
After setting the properties, DB-SERVER will ack ORCHESTRATOR, which will start processing. As part of the processing, ORCHESTRATOR will send updates back to the DB-SERVER using w and r.
On getting these updates DB-SERVER, it will try looking up Vertex i based on w and r, like so:
g.V().has("r", <some_r>).has("w", <some_w>).next()
At that point, a null / None is returned as the traversal fails to find Vertex i.
Trying the same traversal in a separate console (python and gremlin) does fetch the vertex. Since it's a single instance cluster, I ruled out any eventual consistency issues.

I'm not sure if it's a regression introduced after 0.4.1.

I've also validated that db-caching is turned off.

Help! :)

Many thanks in advance,

Assaf

Assaf Schwartz

unread,

Dec 17, 2020, 5:09:05 AM12/17/20

to JanusGraph users

From time to time, usually after setting up a fresh copy of the BT and Janus, I'll encounter errors relating to locking. However this doesn't happen every time.

Sorry, I can't see to be able copy the logs nicely from the GCP Cloud logging.

org.janusgraph.diskstorage.locking.PermanentLockingException: Local lock contention at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:327) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.acquireLock(ExpectedValueCheckingStore.java:103) at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.acquireLock(KCVSProxy.java:51) at org.janusgraph.diskstorage.BackendTransaction.acquireIndexLock(BackendTransaction.java:255) at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:591) at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:712) at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1425) at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlueprintsGraph.java:297) at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:104) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.onTraversalSuccess(TraversalOpProcessor.java:471) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:603) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$4(TraversalOpProcessor.java:411) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Assaf Schwartz

unread,

Dec 17, 2020, 5:16:43 AM12/17/20

to JanusGraph users

Could this be related to delays in indexing? I don't know how to figure out of such exists, but assuming this happens -

how does a traversal behave when looking up based on an index key when the key is not yet indexed?

On Thursday, December 17, 2020 at 10:54:32 AM UTC+2 Assaf Schwartz wrote:

owner....@gmail.com

unread,

Dec 17, 2020, 5:38:17 AM12/17/20

to JanusGraph users

You can try to use composite index for checking mixedindex delay

четверг, 17 декабря 2020 г. в 13:16:43 UTC+3, Assaf Schwartz:

BO XUAN LI

unread,

Dec 17, 2020, 5:39:48 AM12/17/20

to janusgra...@googlegroups.com

Hi Assaf,

I am not familiar with GKE but I can try to answer some of your questions:

> how does a traversal behave when looking up based on an index key when the key is not yet indexed

Assuming the index has been enabled. If a particular key is still in the indexing process (e.g. you are in the middle of a committing process) in one thread, then another thread will not be able to find any data because it finds nothing in the index key lookup. Note that when you are using mixed index, the data is written to your primary backend (e.g. hbase) first, and then mixed index backend (e.g. Elasticsearch). If the data has already been written into hbase but not into Elasticsearch yet, the querying thread cannot find any data (if JanusGraph decides your query can be satisfied by a mixed index).

> org.janusgraph.diskstorage.locking.PermanentLockingException: Local lock contention at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:327)

This usually happens when you have multiple local threads (running on the same JVM instance) contending for the same lock. You might want to check your application logic.

Best regards,

Boxuan

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/d4373c5a-ab97-4aa4-a143-f26c3ce50677n%40googlegroups.com.

Assaf Schwartz

unread,

Dec 19, 2020, 2:41:40 AM12/19/20

to JanusGraph users

Thanks a lot Boxuan!

For some reason I missed being notified on your response.

The indexes were indeed the issue (as I began to suspect), switching them to being composite indexes (there was no real need for them being mixed) solved the issue :)

About the locking, what do you consider a JVM instance? An instance of the Gremlin server? JanusGraph iteslf? If I try and use Janus as a cluster (multiple dockers instead of one), will that translate to having more than 1 JVM?

Again thanks,

Assaf

BO XUAN LI

unread,

Dec 19, 2020, 3:09:42 AM12/19/20

to janusgra...@googlegroups.com

> About the locking, what do you consider a JVM instance? An instance of the Gremlin server? JanusGraph iteslf? If I try and use Janus as a cluster (multiple dockers instead of one), will that translate to having more than 1 JVM?

Sorry I wasn’t very clear about this. I meant by the JVM instance where JanusGraph itself runs. To be accurate, you could see local lock contentions when multiple threads, under the same process, contend for the same lock. This is due to JanusGraph’s locking mechanism:

Step 1: Local lock resolution (inter-thread synchronization), utilizing in-memory data structures (concurrent hashmap). If conflict detected, you typically see error message like "Local lock contention”.

Step 2: Inter-process synchronization, utilizing data backend (e.g. HBase). If conflict detected, you typically see other error messages like “Lock write retry count exceeded”.

You could read the javadoc to find out more: https://www.javadoc.io/doc/org.janusgraph/janusgraph-core/latest/org/janusgraph/diskstorage/locking/consistentkey/ConsistentKeyLocker.html

If you have multiple transactions contending for the same lock, then it’s better to have them running on the same JVM instance because local lock synchronization is faster and can let conflicting transactions fail early.

Glad to hear you don’t have the problem anymore. To be honest I don’t know why switching to composite indexes helps you resolve the locking exception issues.

Cheers,

Boxuan

To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/1afb5dd4-2802-4be5-b62d-c7d13ce9b7d2n%40googlegroups.com.

Assaf Schwartz

unread,

Dec 26, 2020, 3:43:37 AM12/26/20

to JanusGraph users

Hi Boxuan!

I perhaps wasn't clear. The composite indexing didn't solve the locking issue (it went away by itself 🙄, as if there's a cold start issue).
However, my actual problem, about failing to lookup, was indeed solved.