I have a distributed system where worker processes use Redis to keep track of the work they have to do and the current state of their work. As each worker completes some work, it needs to update some Redis keys to reflect the current state of the work.
To detect and deal with worker failure, I'm thinking of using a special worker-specific key for heartbeat. Periodically, the worker will have to renew the TTL of that key to stay alive. If that key expires, the worker is deemed to be dead.
When that heartbeat key expires and thus disappears, the worker is deemed to be dead and a 'reaper' will salvage the state of that worker so that the work doesn't get lost. Additionally, the worker should not make further changes to its work state in the Redis in order to avoid race conditions between the officially-dead worker and the reaper. In other words, a worker with an expired heartbeat key should know that it's dead and not make further changes in the Redis DB.
Unfortunately, the following attempt to detect key expiration doesn't work. I would like key expiration to cause the transaction to fail, but in an experiment, I see that it doesn't work that way:
SET heartbeat 0
EXPIRE heartbeat 10
WATCH heartbeat
SET work 'something'
(wait for 10 seconds)
EXEC
Although the heartbeat key has expired by the time EXEC is issued, the work key still gets set.
I know of a more complicated way which is to use a zset and do expiration manually with the reaper, with score being the expiration time and value being the worker ID, but it has drawbacks:
1. If there are many workers, a worker watching the zset key to make sure it doesn't expire while it's updating its work will get spurious failures from the expiration of other workers being updated.
2. More complicated than the native expiration solution (if it worked as I hoped)
Is there a simpler way?
Josh