graph shutdown hangs..

113 views
Skip to first unread message

Bharat Dighe

unread,
Feb 2, 2015, 8:07:27 PM2/2/15
to aureliu...@googlegroups.com
I have a two DC cassandra cluster DC1 and DC2 and Titan is connected to DC2

When DC2 goes down, the Titan starts showing following exceptions:

===

17:02:40 ERROR connectionpool.impl.CountingConnectionPoolMonitor [Thread-4] - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.182.183.162(10.182.183.162):9160, latency=0(0), attempts=1]UnavailableException()

com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.182.183.162(10.182.183.162):9160, latency=0(0), attempts=1]UnavailableException()

at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)

===

In this situation, if I call graph.shutdown() api, it hangs forever. Is there any workaround for this issue?  

We are using Titan 0.5.3.

Thanks
Bharat

Bharat Dighe

unread,
Feb 9, 2015, 5:48:25 PM2/9/15
to aureliu...@googlegroups.com
If anyone can suggest a workaround, that will help a lot.

Thanks
Bharat

Dan LaRocque

unread,
Feb 10, 2015, 3:59:15 PM2/10/15
to aureliu...@googlegroups.com
Hi Bharat,

Can you please paste the full exception?  I would like to see all of the stack frames.

thanks,
Dan
--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/06682bb4-fca5-41be-9606-6cb40eb3e1d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Bharat Dighe

unread,
Mar 12, 2015, 9:04:40 PM3/12/15
to aureliu...@googlegroups.com
Hi Dan,

Please find the complete stack trace below.

I tried to debug the code and found that during graph shutdown there are attempts made to write data to the backend which fails as other cassandra node is down and there is no quorum. It makes shutdown to throw an uncaught exception and various threads keep running.

For example following line in StandardTitanGraph.shutdown()

globalConfig.remove(REGISTRATION_TIME,config.getUniqueGraphId());

Since only other cassandra node is down at this time,  it throws TemporaryBackendException which is not handled, it aborts the shutdown execution leaving other threads running.

I modified the code to catch above exception but then again data is written during shutdown of Bakend.

I am curious, is it a good idea to write something to storage during shutdown of the graph??

Complete stack trace:

com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:144)

at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.set(KCVSConfiguration.java:145)

at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.set(KCVSConfiguration.java:122)

at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.remove(KCVSConfiguration.java:163)

at com.thinkaurelius.titan.diskstorage.configuration.ModifiableConfiguration.remove(ModifiableConfiguration.java:42)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.shutdownInternal(StandardTitanGraph.java:189)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.shutdown(StandardTitanGraph.java:157)

at com.abc.itrp.ers.graph.DataSource$1.run(DataSource.java:60)

at com.abc.itrp.ers.graph.DataSource.releaseGraph(DataSource.java:64)

at com.abc.itrp.ers.graph.QuorumCheckThread.run(QuorumCheckThread.java:82)

Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after Duration[60000 ms]

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:86)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)

... 10 more

Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend

at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.mutateMany(AstyanaxStoreManager.java:383)

at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.mutateMany(AstyanaxKeyColumnValueStore.java:172)

at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.mutate(AstyanaxKeyColumnValueStore.java:168)

at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.mutate(ExpectedValueCheckingStore.java:65)

at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration$2.call(KCVSConfiguration.java:150)

at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration$2.call(KCVSConfiguration.java:145)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:133)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation$1.call(BackendOperation.java:147)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)

... 11 more

Caused by: com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.182.183.161(10.182.183.161):9160, latency=2(2), attempts=1]UnavailableException()

at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)

at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)

at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)

at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)

at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)

at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:493)

at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)

at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)

at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.mutateMany(AstyanaxStoreManager.java:381)

... 19 more

Caused by: UnavailableException()

at org.apache.cassandra.thrift.Cassandra$atomic_batch_mutate_result$atomic_batch_mutate_resultStandardScheme.read(Cassandra.java:29445)

at org.apache.cassandra.thrift.Cassandra$atomic_batch_mutate_result$atomic_batch_mutate_resultStandardScheme.read(Cassandra.java:29422)

at org.apache.cassandra.thrift.Cassandra$atomic_batch_mutate_result.read(Cassandra.java:29356)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)

at org.apache.cassandra.thrift.Cassandra$Client.recv_atomic_batch_mutate(Cassandra.java:1099)

at org.apache.cassandra.thrift.Cassandra$Client.atomic_batch_mutate(Cassandra.java:1085)

at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:133)

at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:127)

at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)

Matthias Broecheler

unread,
Mar 17, 2015, 3:26:13 PM3/17/15
to aureliu...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages