[JIRA] (MLHR-1951) Gracefully Handle Cassandra Down Time In Cassandra Output Operator

1 view

Skip to first unread message

Timothy Farkas (JIRA)

unread,

Dec 17, 2015, 12:40:03 PM12/17/15

to malhar...@googlegroups.com

Timothy Farkas created an issue

Malhar /

MLHR-1951

Gracefully Handle Cassandra Down Time In Cassandra Output Operator

Issue Type:	Improvement
Assignee:	Timothy Farkas
Created:	17/Dec/15 9:39 AM
Priority:	Major
Reporter:	Timothy Farkas

One of our users is outputting to Cassandra, but they want to handle a Cassandra failure or Cassandra down time gracefully from an output operator. Currently a lot of our database operators will just fail and redeploy continually until the database comes back. This is a bad idea for a couple of reasons:

1 - We rely on buffer server spooling to prevent data loss. If the database is down for a long time (several hours or a day) we may run out of space to spool for buffer server on many nodes since it spools to local disk, and data is purged only after a window is committed.

2 - If there is another failure further upstream in the dag, upstream operators will be redeployed to a checkpoint less than or equal to the checkpoint of the database operator. This could mean redoing several hours or a day worth of computation.

We should support a mechanism to detect when the connection to a database is lost and then spool to hdfs using a WAL, and then write the contents of the WAL into the database once it comes back online.

Add Comment

This message was sent by Atlassian JIRA (v7.1.0-OD-02-030#71001-sha1:2ba8c0f)

Reply all

Reply to author

Forward

0 new messages