Re: Janusgraph Temporary backend operation

415 views

Skip to first unread message

Misha Brukman

unread,

Oct 3, 2017, 10:42:08 AM10/3/17

to Dilan Ranasinghe, JanusGraph users

Hi Dilan,

Looks like you're using the HBase backend, and this is an INFO level exception telling you that when issuing many mutations in parallel ("mutateMany"), some of the mutations succeeded, while some did not. As stated, this is a temporary (retriable) operation, where a backoff strategy typically helps. However, if you are sending a large, steady amount of traffic, there may not be enough time for the cluster to recover from the previous set of operaitons, before the new ones also cause these retriable errors, leading to a cascade of such messages.

What this implies is that your application is doing one or both of the following:

putting a significant amount of load on your HBase cluster, so either it should send less traffic (for the same cluster size) or you need to increase the size of your HBase cluster, or
hotspotting the HBase cluster, i.e., sending a lot of operations to the same (small) set of vertices

If your load is well-distributed, you need to grow your HBase cluster with the size of the traffic you are sending. If you are hotspotting, then growing the cluster may not help, so you may need to use a read-only cache (if read hotspotting), or change your model to avoid sending many concurrent requests to the same set of vertices.

Hope this helps,

Misha

On Tue, Oct 3, 2017 at 12:39 AM, Dilan Ranasinghe <dila...@gmail.com> wrote:

Hello ,
I run janusgraph/gremlin server with hbase and connect to it using web socket.

In the gremlin server log i constantly see the following exception.

[gremlin-server-exec-2] INFO org.janusgraph.diskstorage.util.BackendOperation - Temporary exception during backend operation [CacheMutation]. Attempting backoff retry.
org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at org.janusgraph.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:434)
        at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:79)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:98)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:95)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:95)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:137)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.mutate(CacheTransaction.java:90)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.KCVSCache.mutateEntries(KCVSCache.java:66)
        at org.janusgraph.diskstorage.BackendTransaction.mutateEdges(BackendTransaction.java:200)
        at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:602)
        at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:713)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1363)
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlueprintsGraph.java:272)
        at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:105)
        at org.apache.tinkerpop.gremlin.server.GraphManager.lambda$commitAll$2(GraphManager.java:122)
        at java.util.concurrent.ConcurrentHashMap$EntrySetView.forEach(ConcurrentHashMap.java:4795)
        at org.apache.tinkerpop.gremlin.server.GraphManager.commitAll(GraphManager.java:119)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.attemptCommit(AbstractOpProcessor.java:299)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:168)
        at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$4(AbstractEvalOpProcessor.java:248)
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:298)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.InterruptedIOException
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.waitUntilDone(AsyncProcess.java:1635)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:911)
        at org.janusgraph.diskstorage.hbase.HTable1_0.batch(HTable1_0.java:51)
        at org.janusgraph.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:429)
        ... 26 more

My application constantly send requests to the server so the load for the server is high.

What can be the reason for this exception?

Thanks,
Dilan.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/d984cb41-4cc4-47bf-9153-5dcc93764357%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dilan Ranasinghe

unread,

Oct 3, 2017, 11:31:42 AM10/3/17

to JanusGraph users

Hi Misha,

You are right.

We are at the requirement of inserting and updating vertices to janusgraph where we need an insert frequency of 500,000 vertices per second and update frequency of 5000 vertices per second.
Currently what we could achieve via a poc is 100,000 vertices per second insertions and 450 vertices per second updates using one janusgraph/gremlin server connected to a five node hBase cluster.
It is clear from your information that we may need to increase the size of the cluster.

Thank you for the information.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.

Reply all

Reply to author

Forward

0 new messages