Hey All, I've found myself in a bit of a pickle, possibly due to not understanding some underlying concepts.
I've got a Python daemon that uses Zookeeper (and hence Kazoo) extensively for various tasks. At one point in the daemon, there's an event that happens whereby one node "transitions to primary" from another. For specifics, this is doing some stuff like removing an IP address from the "old" node and adding it to the "new" node.
I'd like the steps of this process to happen between the two servers in lockstep, so for instance:
Node A Node B
Add IP
Remove IP
Do Task
Do Other Task
In such a way that B doesn't start "Remove IP" until A is done with "Add IP", then subsequently A doesn't start "Do Task" until B is done "Remove IP", and so on down a chain of about 7 commands.
To me this seems like something that Locks should accomplish, so I tried that. For each task combination, I set up a Write lock for Node A's task, and a Read lock for Node B's task.
Now, before this begins, I need to make sure they both start at the same time, so I alternate the read and write locks. Therefore something like this (commands to create the locks left out for brevity):
def become_primary():
write_lock.acquire()
sleep(1)
write_lock.release()
read_lock.acquire()
# I expect to block while the other node does stuff
read_lock.release()
write_lock.acquire()
do_add_ip()
write_lock.release()
def become_secondary():
read_lock.acquire()
# I expect this to block ONLY while the first write lock exists in become_primary(), then release
read_lock.release()
write_lock.acquire()
do_some_stuff()
write_lock.release()
read_lock.acquire()
do_remove_ip()
read_lock.release()
The problem I'm having is that I acquire that first set of locks, but the read side (become_secondary) never releases. The write site (become_primary) does, but then eventually gets stuck as well acquiring the 3rd lock.
Am I missing something here? Is this even what shared locks are designed for? Is this just some weird quirk or bug in Kazoo?
Thanks,
Joshua