Error handling using execute_async

94 views
Skip to first unread message

Kostas Chalikias

unread,
Apr 25, 2016, 5:25:44 AM4/25/16
to python-dr...@lists.datastax.com
Hello,

A slightly open ended question as I am looking for ideas on what is causing my problem.

I am starting to suspect that I might be using the async mode of the driver wrong in terms of error handling which is leading to my application not writing to cassandra for large amounts of time, until other random issues cause it to crash & restart at which point it starts writing again.

My applications wakes up every few seconds, produces some data, creates a BatchStatement (quorum consistency level), populates it with a few prepared statements and calls execute_async, attaching a logging function as the errback, so kind of like this:

def log_cassandra_error(exc):
logging.error('Cassandra operation failed: %s', exc)

batch = BatchStatement(consistency_level=ConsistencyLevel.QUORUM)
batch.add(insert_query, params)
future = cassandra_session.execute_async(batch)
future.add_errback(log_cassandra_error)

While it is obvious that the application starts writing again after restarting, unfortunately the nature of the data I am collecting doesn't make it easy for me to exactly pin point the time when it stops in order for me to look at the application logs at that time, which is why I can only speculate. However, looking at the logs I do see some error messages being logged (obviously I need to debug those in my application).

2016-04-24 22:26:53,197 - root - ERROR - Cassandra operation failed: code=2200 [Invalid query] message="Invalid null value for clustering key part time"
2016-04-24 22:27:23,376 - root - ERROR - Cassandra operation failed: code=2200 [Invalid query] message="Invalid null value for clustering key part time"

The fact that I see error messages in quick succession, without a restart of my process in between tells me the driver 'loop' doesn't die because of the error being reported, so I should expect further valid queries to work, correct?
Are there any other ideas on how I can debug further? 

Many thanks,
Kostas

Laing, Michael

unread,
Apr 25, 2016, 7:44:51 AM4/25/16
to python-dr...@lists.datastax.com
Your errback method is called from within the libev event loop. Therefore libev will absorb the exc when the method returns and your app will continue.

It may be useful to pass additional data, e.g. the params or a subset, into the future, which will be returned to you in the errback and add context to your error, e.g. a message id.

Unless you need the batch, I would get rid of it and use single prepared statements, each of which will be directed by the driver to a node that has the partition of interest.

If you use single statements, you should also use a semaphore to limit the number of overlapping async requests. In that case you also need to add a callback method to your future so you can release the semaphore - do the same in the errback.

HTH,
ml

--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Kostas Chalikias

unread,
May 4, 2016, 6:42:25 AM5/4/16
to python-dr...@lists.datastax.com
Hi Michael,

Sounds like those errors don't explain what I am seeing then but maybe I should start by making them go away anyway.

Can you please elaborate on how to pass more info to the future so I can print it, sounds like a good idea.

Also can you explain what you mean by 'need the batch', isn't it better for performance?

Thanks!
Reply all
Reply to author
Forward
0 new messages