Some queries with proper indexes fail to return with OperationTimeoutException

915 views
Skip to first unread message

Praveen Peddi

unread,
Sep 30, 2014, 3:42:56 PM9/30/14
to aureliu...@googlegroups.com
Hi,
We are using titan 0.5 with cassandra 2.8. I am running following query from gremlin console and this query never returned but failed with exception (see at the end of message). There should be around 3000 vertices matching that criteria. There is a composite index on "key1, key2, key3" (which is why 2nd query returns really fast)

This query never returns but throws com.netflix.astyanax.connectionpool.exceptions.OperationTimeoutException: OperationTimeoutException: after few minutes
gremlin> g.V('key1', 'value1').has('key2', 'value2').has('key3', 'value3').count()


So I started timing by selecting only top few vertices. The below query returns 26 almost immediately.

gremlin> t = System.currentTimeMillis();
gremlin
> g.V('key1', 'value1').has('key2', 'value2').has('key3', 'value3').[0..25].count()

==>26
gremlin
> System.currentTimeMillis() - t
==>129

But when I select top 30, again it never returned but threw below exception. If 25 returned in 129 ms, I can't imagine returning 30 would take forever.
gremlin> t = System.currentTimeMillis();
gremlin
> g.V('key1', 'value1').has('key2', 'value2').has('key3', 'value3').[0..30].count()

Could this be due to some kind of locking?

com.thinkaurelius.titan.core.TitanException: Could not process individual retrieval call 
at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:215)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8.execute(StandardTitanTx.java:1125)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8.execute(StandardTitanTx.java:994)
at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
at com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54)
at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40)
at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30)
at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:56)
at com.google.common.collect.Iterables$7.iterator(Iterables.java:610)
at com.tinkerpop.pipes.transform.GraphQueryPipe.processNextStart(GraphQueryPipe.java:49)
at com.tinkerpop.pipes.transform.GraphQueryPipe.processNextStart(GraphQueryPipe.java:17)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:107)
at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1534)
..
..
at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:61)
at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:68)
at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.main(Console.java:73)
Caused by: com.thinkaurelius.titan.core.TitanException: Could not call index
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6.call(StandardTitanTx.java:1118)
at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:213)
... 69 more
Caused by: com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
..
..
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6.call(StandardTitanTx.java:1111)
... 70 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after Duration[10 s]
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:86)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
... 83 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:114)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:78)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:67)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
at com.thinkaurelius.titan.diskstorage.BackendTransaction$5.call(BackendTransaction.java:383)
at com.thinkaurelius.titan.diskstorage.BackendTransaction$5.call(BackendTransaction.java:380)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
... 84 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.OperationTimeoutException: OperationTimeoutException: [host=172.23.142.40(172.23.142.40):9160, latency=5002(30009), attempts=6]TimedOutException()
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:171)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:527)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112)
... 91 more
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14526)
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14463)
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14389)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:732)
at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:716)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:533)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:530)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
... 97 more
gremlin> 


Praveen Peddi

unread,
Oct 1, 2014, 2:59:46 PM10/1/14
to aureliu...@googlegroups.com
Hi,
We have started hitting this error when querying from java also (using gremlin pipeline). Is this a known issue in titan 0.5? Is there anyway to fix this?

Any help would be appreciated.

Thanks
Praveen

Matthias Broecheler

unread,
Oct 1, 2014, 3:19:22 PM10/1/14
to aureliu...@googlegroups.com
Can you drop into the Titan API and rewrite this query as:

g.query().has('key1', 'value1').has('key2', 'value2').has('key3', 'value3').vertices()

and see if that helps? Do you have a composite index on those keys?

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/0963156c-c146-47e6-86a5-77022abeb489%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matthias Broecheler
http://www.matthiasb.com

Praveen Peddi

unread,
Oct 1, 2014, 4:11:42 PM10/1/14
to aureliu...@googlegroups.com
Hi Matthias,
That is the exact query we have in java that throws the OperationTimedoutException when iterating through vertices. Yes we have a composite index on these 3 fields. The wierd part is from gremlin console when I retrieved top 25 it returned instantaneously but when I ask for 30, it throws exception which is what makes me feel index got corrupted or something?

com.thinkaurelius.titan.core.TitanException: Could not process individual retrieval call
        at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:215)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8.execute(StandardTitanTx.java:1125)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8.execute(StandardTitanTx.java:994)
        at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
        at com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54)
        at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40)
        at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30)
        at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:56)
        at com.google.common.collect.Iterables$7.iterator(Iterables.java:610)
        ...
        ...
Caused by: com.thinkaurelius.titan.core.TitanException: Could not call index
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6.call(StandardTitanTx.java:1118)
        at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:213)
        ... 92 more
Caused by: com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:380)
        at com.thinkaurelius.titan.graphdb.database.IndexSerializer.query(IndexSerializer.java:495)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6$1.call(StandardTitanTx.java:1114)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6$1.call(StandardTitanTx.java:1111)
        at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
        at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6.call(StandardTitanTx.java:1111)
        ... 93 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after Duration[10 s]
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:86)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
        ... 106 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:114)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:78)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:67)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction$5.call(BackendTransaction.java:383)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction$5.call(BackendTransaction.java:380)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
        ... 107 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.OperationTimeoutException: OperationTimeoutException: [host=<removed>:9160, latency=5002(30009), attempts=6]TimedOutException()
        at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:171)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
        at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
        at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)
        at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
        at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:527)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112)
        ... 114 more
Caused by: TimedOutException()
        at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14526)
        at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14463)
        at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14389)

Matthias Broecheler

unread,
Oct 1, 2014, 6:11:21 PM10/1/14
to aureliu...@googlegroups.com
The fact that is suddenly takes a lot longer seems to suggest that its not using indexes or not using them correctly.

Can you do this for your query (in gremlin):

qd = g.query().has('key1', 'value1').has('key2', 'value2').has('key3', 'value3').describeForVertices()
qd.getNoCombinedQueries()
qd.getNoSubQueries()
for (sq : qd.getSubQueries) { println sq.isFitted() }

And send me the results?
Thanks,
Matthias




For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 1, 2014, 10:11:04 PM10/1/14
to aureliu...@googlegroups.com
Hi Matthias,
Thanks again for quick reply. Because we were stuck with the issue we had to truncate the tables and proceed with our testing. Currently we do not see the issue on the current dataset but here is the output from your queries. I am not sure if the output is helpful since this particular issue does not happen now. If I see the issue tomorrow, I will re-run your queries and send the output again or if you can tell what you are expecting I will check that too.

Here is the output:

gremlin> qd = g.query().has('key1', 'value1').has('key2', 'value2').has('key3', 'value3').describeForVertices()
==>[(key1 = value1 AND key2 = value2 AND key3 = value3)]:VERTEX
gremlin
> qd.getNoCombinedQueries()
==>1
gremlin
> qd.getNoSubQueries()
==>1
gremlin
> for (sq in qd.getSubQueries()) { println sq.isFitted() }
true
==>null
gremlin
>

Matthias Broecheler

unread,
Oct 2, 2014, 1:20:29 AM10/2/14
to aureliu...@googlegroups.com
Okay, that output looks correct, which means the query is correctly optimized and executed. Hence, there must be something fishy going on with the underlying table as you suspected.


For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 3, 2014, 10:28:02 AM10/3/14
to aureliu...@googlegroups.com
Hi Matthias,
The same issue occurs again. I have the run the same queries and the output is exactly same as the previous output I sent which means it is using indexes properly. How else do you think I can debug the problem? How can I confirm its the bad data or index and how could I fix that? Is this a knwon issue with titan 0.5 (since we have not seen this when using titan-0.4.2)?

Thanks
Praveen
...

Matthias Broecheler

unread,
Oct 3, 2014, 2:13:52 PM10/3/14
to aureliu...@googlegroups.com
Hello Praveen,

no, this is not a known issue. Is the cutoff point the same one as last time? Do you get a result if you increase the timeout?
Thanks,
Matthias

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 3, 2014, 2:35:34 PM10/3/14
to aureliu...@googlegroups.com
I didn't try increasing the timeout. It fails after 5 minutes or so. Even if it returns results after 5 mins its not useful for us. Which property can I change to increase timeout on gremlin console. I looked at the configuration reference but could not find relevant property.

Thanks
Praveen
Hi,
Praveen
<span
...

Matthias Broecheler

unread,
Oct 3, 2014, 2:48:11 PM10/3/14
to aureliu...@googlegroups.com
I understand that the performance is not acceptable, but I would like to know whether it does actually eventually return the right data.
Have the look at the configuration options for thrift/astyanax (depending on what you use to connect).

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 3, 2014, 4:40:24 PM10/3/14
to aureliu...@googlegroups.com
I have changed following property from 10 seconds to 100 seconds (don't see any property in astyanax related to max wait or timeout). I still didn't get results but failed with same exception.

storage.read-time=100000
Hi,
Praveen
at com.thinkaurelius.titan.</u
...

Matthias Broecheler

unread,
Oct 3, 2014, 5:45:40 PM10/3/14
to aureliu...@googlegroups.com

Can you try changing the backend to cassandrathrift and see what happens then with increased timeout?

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

Praveen Peddi

unread,
Oct 4, 2014, 7:25:53 PM10/4/14
to aureliu...@googlegroups.com
Hi Matthias,
Our Cassandra is SSL enabled and I believe cassandrathrift mode has bugs with SSL (which is why we had to move to astyanax). I can't disable SSL on this cluster but I will have to reproduce the problem on another cluster with ssl disabled. I will get back to you with results.

Thanks
Praveen

--
You received this message because you are subscribed to a topic in the Google Groups "Aurelius" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aureliusgraphs/pMczfzBA6mo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/CAEsQWxo8i_AAakTW4noqm94qVY8jSDLCQOuTXLB8YShYqBd3hw%40mail.gmail.com.

Praveen Peddi

unread,
Oct 6, 2014, 12:25:36 PM10/6/14
to aureliu...@googlegroups.com
Hi Matthias,
I have finally disabled SSL on the cassandra cluster and connected using cassandrathrift backend. I left the storage.read-time=100000. When I run the query I get same exception but with in 5 seconds. With astyanax it takes 5 minutes or so before I see the exception. Wierd thing is if I change 'state1' to 'state2' I get the count back correctly (both with astyanax and cassandrathrift). So depending on the value the query fails.

gremlin> g.V('key1', 'value1').has('type', 'type1').has('state', 'state1').count()
Could not process individual retrieval call
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not process individual retrieval call 
at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:215)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8.execute(StandardTitanTx.java:1125)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8.execute(StandardTitanTx.java:994)
at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:195)
at com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54)
at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40)
at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30)
at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:56)
at com.google.common.collect.Iterables$7.iterator(Iterables.java:610)
at com.tinkerpop.pipes.transform.GraphQueryPipe.processNextStart(GraphQueryPipe.java:49)
at com.tinkerpop.pipes.transform.GraphQueryPipe.processNextStart(GraphQueryPipe.java:17)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:107)
at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1534)
at com.tinkerpop.gremlin.java.GremlinFluentPipeline$count.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
at groovysh_evaluate.run(groovysh_evaluate:82)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67)
at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)
at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:61)
at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:68)
at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.main(Console.java:73)
Caused by: com.thinkaurelius.titan.core.TitanException: Could not call index
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6.call(StandardTitanTx.java:1118)
at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:213)
... 67 more
Caused by: com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:380)
at com.thinkaurelius.titan.graphdb.database.IndexSerializer.query(IndexSerializer.java:495)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6$1.call(StandardTitanTx.java:1114)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6$1.call(StandardTitanTx.java:1111)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$8$6.call(StandardTitanTx.java:1111)
... 68 more
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:249)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getNamesSlice(CassandraThriftKeyColumnValueStore.java:148)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getNamesSlice(CassandraThriftKeyColumnValueStore.java:91)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getSlice(CassandraThriftKeyColumnValueStore.java:80)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
at com.thinkaurelius.titan.diskstorage.BackendTransaction$5.call(BackendTransaction.java:383)
at com.thinkaurelius.titan.diskstorage.BackendTransaction$5.call(BackendTransaction.java:380)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
... 81 more
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14526)
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14463)
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14389)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:732)
at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:716)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getNamesSlice(CassandraThriftKeyColumnValueStore.java:129)
... 89 more
...

Matthias Broecheler

unread,
Oct 6, 2014, 10:56:02 PM10/6/14
to aureliu...@googlegroups.com
How many index entries are there for state1 and state2 respectively?

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matthias Broecheler
http://www.matthiasb.com

Praveen Peddi

unread,
Oct 6, 2014, 11:18:46 PM10/6/14
to aureliu...@googlegroups.com
For this particular use case 'state1' should have returned the count of 300 or so (but failed) where as 'state2' returns count of 255000. Ironically query that returns larger count succeeds and query that's supposed to return small dataset fails.

Praveen
You received this message because you are subscribed to a topic in the Google Groups "Aurelius" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aureliusgraphs/pMczfzBA6mo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/CAEsQWxqp%2BbqwDvJ5wXMNUOvYzd4wdQ-%2B-LTBGpaPCHE0BFr8UQ%40mail.gmail.com.

Matthias Broecheler

unread,
Oct 7, 2014, 12:01:33 AM10/7/14
to aureliu...@googlegroups.com
That is strange. The type of exception you are seeing is indicative of having to many records in the index. Hence, this would be a "reasonable" exception to get in the case of 'state2' in the sense that I wouldn't have been surprised. Is there any chance the index entries for state1 got polluted so that there are equally many records? Many deletes maybe? Cross index usage?


For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 7, 2014, 10:05:09 AM10/7/14
to aureliu...@googlegroups.com
Hi Matthias,
I am not sure if I understand all your questions.

Is there any chance the index entries for state1 got polluted so that there are equally many records?
 'state'1 definitely doesn't have many records. Here is the query I ran. As you can see, 'state2' has 264672 records but if I don't filter on state, I see 268184 records. So for 'state2' there cannot be more than 3512 records unless index is returning incorrect count.
gremlin> g.V('key1', 'value1').has('type', 'type1').has('state', 'state2').count()
==>264672
gremlin> g.V('key1', 'value1').has('type', 'type1').count()
==>268184
gremlin> 


Many deletes maybe?
 We use TTL feature since we need to cleanup these vertices after few days (they have no value then). Currently I am using TTL of 7 days however I saw this issue before 7 days. I am not sure if TTL has a bug where index might get corrupted even though no vertices have expired yet. Is this a possibility?

Cross index usage?
Could you eloborate more on this? What do you mean by cross index usage?

Let me know if if you need me to provide more info.

Thanks
Praveen
...

Matthias Broecheler

unread,
Oct 9, 2014, 3:00:10 AM10/9/14
to aureliu...@googlegroups.com
Hello Praveen,

it might have something to do with TTL. You have some pretty "fat" index entries. TTL creates tombstones until they are cleaned up. Together, they might overload the index entry row and lead to timeout. At least, that's one possible hypothesis.

Cheers,
Matthias

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 11, 2014, 10:48:01 AM10/11/14
to aureliu...@googlegroups.com
Hi Matthias,
I have disabled TTL a on Thursday and running the test again. We haven't seen it so far but I don't want to jump to conclusion yet. I will let you know by next week. I still didn't understand how TTL would affect since we set it to7 days and saw the issue before 7 days. Let's see.

Assuming we won't have the issue anymore, what is the best way to cleanup the data periodically? We don't want to retain more than few days since the data size pretty quickly and all th data after few days is unused and useless.

Thanks
Praveen

You received this message because you are subscribed to a topic in the Google Groups "Aurelius" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aureliusgraphs/pMczfzBA6mo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/CAEsQWxotb8FUBFzag_eAA7gsidH834%3Dy%2Bj-cebaZ4UDz-UJA-g%40mail.gmail.com.

Matthias Broecheler

unread,
Oct 13, 2014, 4:51:36 PM10/13/14
to aureliu...@googlegroups.com
Hi Praveen,

it would be weird if this solved the issue despite the larger time horizon of the TTL. If it does indeed work, I think this warrants further investigation since there is something strange happening with TTL in that case.

Cheers,
Matthias


For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Oct 29, 2014, 11:52:35 AM10/29/14
to aureliu...@googlegroups.com
After doing few more tests with and with out TTL I think its safe to say its not related to TTL. We are seeing this same issue even with 100 thousand of vertices. Our indexes are pretty simple indexes. Does the index size matter? We could not figure out the pattern for the failed queries but some queries work and some don't.
...

Praveen Peddi

unread,
Nov 4, 2014, 3:57:51 PM11/4/14
to aureliu...@googlegroups.com
Hi Matthias,
After spending more time investigating this issue, it looks like we are hitting org.apache.cassandra.db.filter.TombstoneOverwhelmingException. All we are doing is updating vertex properties extensively. More specifically we are updating a field called "state" from value1->value2->value3->value4->value5 as part of a workflow. Does this scenario let Titan create lots of tombstones? Any ideas on how we avoid this? We have also changed compaction strategy to Levelled and went with aggressive tombstone params.

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
    at org
.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
    at org
.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
    at org
.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
    at org
.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
    at org
.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
    at org
.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
    at org
.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
    at org
.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
    at org
.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
    at org
.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
    at org
.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1348)
    at org
.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1912)
   
... 3 more


Thanks
Praveen
...

Matthias Broecheler

unread,
Nov 6, 2014, 1:24:13 AM11/6/14
to aureliu...@googlegroups.com
Hello Praveen,

this might be an issue with data modeling - depending on the cardinality of the key an update might require a remove (i.e. tombstone) or not.

Cheers,
Matthias

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Nov 7, 2014, 11:11:32 AM11/7/14
to aureliu...@googlegroups.com
Hi Matthias,
Since it is internal to Titan, could you eloborate more on when tombstones will be created so I can think about optimizing the data model. As an application developer, w/o knowing the details I won't be able to change the model. May be you can point me to either wiki page or even the code and I can try to understand the logic. As I mentioned what we are doing is very simple updates of a single property multiple times (the property is indexed).
...

Matthias Broecheler

unread,
Nov 19, 2014, 4:08:48 PM11/19/14
to aureliu...@googlegroups.com
I did some analysis on this. There is a tombstone problem if you repeatedly update an indexed key. With every update to the key, we simply overwrite the original value. That should not create tombstones. But in the index, we need to remove the old entry and write a new one. That does create a tombstone.

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Praveen Peddi

unread,
Nov 19, 2014, 5:48:17 PM11/19/14
to aureliu...@googlegroups.com
After changing compaction strategy to LevelledCompaction and doing more frequent tombstoning our problem I resolved (at least for now)

Thanks
Praveen

You received this message because you are subscribed to a topic in the Google Groups "Aurelius" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aureliusgraphs/pMczfzBA6mo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/CAEsQWxrni%2B1FftNQ-a6Mns_uS7BNe0yh%2BvxKsGVPpatD9AVD4Q%40mail.gmail.com.

Praveen Peddi

unread,
Feb 2, 2015, 5:38:28 PM2/2/15
to aureliu...@googlegroups.com
In my previous post I mentioned that the problem is resolved after doing more frequent LeveledCompaction but we hit this problem again. I think the problem just got delayed and gave us few more weeks. We are definitely updating indexed keys often. Is there any solution to updating indexed keys then? Does it mean Titan/Cassandra is not optimal if there are usecases of updating indexed keys? We are hitting this issue with 20% scalability than we ultimately want to support. If anyone else has hit this tombstone issue and found an alternate solution, I would love to hear that.
...

Roch Delsalle

unread,
Feb 12, 2015, 5:12:40 AM2/12/15
to aureliu...@googlegroups.com
Hi,

I ran into the same issue this morning after updating most of the records.

I'm using frames and basically I have an indexed property "_class" which store the name of the object.
After updating the value of "_class" on 200k records I can no longer query using this index.

Could not process individual retrieval call / TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after Duration[10 s]

I ran a compaction in Cassandra but it did not solve the issue.

Roch
...

Roch Delsalle

unread,
Feb 12, 2015, 10:29:28 AM2/12/15
to aureliu...@googlegroups.com
I have increased my tombstones limit for now which solves it.
...

Praveen Peddi

unread,
Feb 12, 2015, 2:01:14 PM2/12/15
to aureliu...@googlegroups.com
We ended up doing that and it only delays the problem but doesn't solve it I think. Which property did you end up modifying in cassandra.yaml?

--
You received this message because you are subscribed to a topic in the Google Groups "Aurelius" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aureliusgraphs/pMczfzBA6mo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aureliusgraph...@googlegroups.com.

Roch Delsalle

unread,
Feb 12, 2015, 6:24:38 PM2/12/15
to aureliu...@googlegroups.com
I have changed tombstone_failure_threshold to 1m.
I guess you could also lower the gc grace period.
The error message Titan sends back could be improved but its more of a limitation of Cassandra than Titan issue I think.

Praveen Peddi

unread,
Feb 13, 2015, 7:17:33 AM2/13/15
to aureliu...@googlegroups.com
Yes we changed that to 500K and changed gc grace and compaction parameters to be more aggressive. We thought that fixed the problem but it didn't. We started seeing this issue after few weeks. I wouldn't say it's problem with cassandra alone. The fact that cassandra is not handling the tombstones so well is the problem with cassandra but Titan creating so many tombstones even though we are not deleting vertices is really a Titan issue. We are updating indexed keys and that's creating tombstones in Cassandra. I am still surprised that Titan/Cassandra cannot handle normal updates to vertices (not even in large scale). I really hope there is a simple solution to this problem.

NewTitanOps

unread,
Jun 26, 2015, 1:50:59 PM6/26/15
to aureliu...@googlegroups.com
Hi Praveen and Others, 
  We too are facing similar issues and would like to check if you could diagnose the issue or found a solution. The issue is documented on https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/aureliusgraphs/HvkHIKmpsvk/y63DQKPNnNQJ

Just to iterate it over here:

Our Setup comprises of Titan 0.5.3 with Cassandra 2.0.8 Cluster with ElasticSearch.  We routinely run cassandra repair to reduce the tombstone issues.

Recently we started having some issues with few of the data points. 

The query which is quite simple: 
gremlin> g.V.has('profile_id', 123424).bothE('GraphWeight').count()  

This query return properly for almost all the profile_ids but for a few  just hangs and the traceback shows:

com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:428)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction.edgeStoreQuery(BackendTransaction.java:253)
        at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.edgeQuery(StandardTitanGraph.java:365)
        at com.thinkaurelius.titan.graphdb.query.vertex.SimpleVertexQueryProcessor$2.get(SimpleVertexQueryProcessor.java:109)
        at com.thinkaurelius.titan.graphdb.query.vertex.SimpleVertexQueryProcessor$2.get(SimpleVertexQueryProcessor.java:106)
        at com.thinkaurelius.titan.graphdb.vertices.CacheVertex.loadRelations(CacheVertex.java:47)
        at com.thinkaurelius.titan.graphdb.query.vertex.SimpleVertexQueryProcessor.getBasicIterator(SimpleVertexQueryProcessor.java:106)
        at com.thinkaurelius.titan.graphdb.query.vertex.SimpleVertexQueryProcessor.iterator(SimpleVertexQueryProcessor.java:63)
        at com.thinkaurelius.titan.graphdb.transaction.RelationConstructor$1$1.<init>(RelationConstructor.java:36)
        at com.thinkaurelius.titan.graphdb.transaction.RelationConstructor$1.iterator(RelationConstructor.java:34)
        at com.tinkerpop.pipes.transform.VertexQueryPipe.processNextStart(VertexQueryPipe.java:113)
        at com.tinkerpop.pipes.transform.VertexQueryPipe.processNextStart(VertexQueryPipe.java:19)
        at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)
        at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
        at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:107)
        at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1534)
        at com.tinkerpop.gremlin.java.GremlinFluentPipeline$count.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
        at groovysh_evaluate.run(groovysh_evaluate:84)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:70)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:83)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.main(Console.java:87)
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after Duration[10 s]
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:86)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
        ... 63 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:114)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:78)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:67)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
        at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:65)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction$1.call(BackendTransaction.java:256)
        at com.thinkaurelius.titan.diskstorage.BackendTransaction$1.call(BackendTransaction.java:253)
        at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
        ... 64 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.OperationTimeoutException: OperationTimeoutException: [host=10.63.191.178(10.63.191.178):9160, latency=5001(100025), attempts=20]TimedOutException()
        at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:171)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
        at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
        at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)
        at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
        at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:527)
        at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112)
        ... 71 more
Caused by: TimedOutException()
        at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14526)
        at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14463)
        at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14389)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:732)
        at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:716)
        at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:533)
        at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:530)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60
        ... 77 more


  1. We have bumped the cassandra storage read timeout from 60sec to as large as 1200 sec [tested in increments], but the error is just the same.
  2. We ran the above just after the cassandra repairs were run [thus in a way eliminating the issues due to tombstones], but the error is just the same.
Following Daniel's advise there, we bumped the timeouts at Cassandra to very large number. The effect was that the simple query g.V.has('profile_id', 123424).bothE('GraphWeight').count()   hangs there on the gremlin shell for hours without any output and without timeouts. It looks like it goes into perpetual execution for hours.

Also we know from our data set that the said node is not a super-node since the data associated with it is very small.


Reply all
Reply to author
Forward
0 new messages