Using expiration to implement heartbeat

465 views

Skip to first unread message

Joshua Chia

unread,

Oct 9, 2013, 5:09:58 PM10/9/13

to redi...@googlegroups.com

I have a distributed system where worker processes use Redis to keep track of the work they have to do and the current state of their work. As each worker completes some work, it needs to update some Redis keys to reflect the current state of the work.

To detect and deal with worker failure, I'm thinking of using a special worker-specific key for heartbeat. Periodically, the worker will have to renew the TTL of that key to stay alive. If that key expires, the worker is deemed to be dead.

When that heartbeat key expires and thus disappears, the worker is deemed to be dead and a 'reaper' will salvage the state of that worker so that the work doesn't get lost. Additionally, the worker should not make further changes to its work state in the Redis in order to avoid race conditions between the officially-dead worker and the reaper. In other words, a worker with an expired heartbeat key should know that it's dead and not make further changes in the Redis DB.

Unfortunately, the following attempt to detect key expiration doesn't work. I would like key expiration to cause the transaction to fail, but in an experiment, I see that it doesn't work that way:

SET heartbeat 0

EXPIRE heartbeat 10

WATCH heartbeat

SET work 'something'

(wait for 10 seconds)

EXEC

Although the heartbeat key has expired by the time EXEC is issued, the work key still gets set.

I know of a more complicated way which is to use a zset and do expiration manually with the reaper, with score being the expiration time and value being the worker ID, but it has drawbacks:

1. If there are many workers, a worker watching the zset key to make sure it doesn't expire while it's updating its work will get spurious failures from the expiration of other workers being updated.

2. More complicated than the native expiration solution (if it worked as I hoped)

Is there a simpler way?

Josh

Emil Vikström

unread,

Oct 12, 2013, 3:49:03 AM10/12/13

to redi...@googlegroups.com

The transactions manual do state that "Note that if you WATCH a volatile key and Redis expires the key after you WATCHed it, EXEC will still work", so what you see is expected behavior. In fact, I am not sure why you would want to use a transaction at all here because the commands in a transaction are only sent at the EXEC call at the end.

What you can do is that you check the state of the heartbeat every time you update the ttl of the key. EXPIRE will return 0 if the key did not exist, so then you know that the key expired. If you do this before every (write) command you can abort before sending the write.

I now this is an unsatisfactory solution in a pipeline of commands, though, because it inflicts a pause for a round-trip-time to Redis each time you need to check the heartbeat state. Perhaps you can gain something by keeping a client-side clock in your code as well?