Alternating tasks between two hosts using Zookeeper locks?

10 views
Skip to first unread message

Joshua Boniface

unread,
Dec 19, 2019, 12:50:13 AM12/19/19
to python-zk
Hey All, I've found myself in a bit of a pickle, possibly due to not understanding some underlying concepts.

I've got a Python daemon that uses Zookeeper (and hence Kazoo) extensively for various tasks. At one point in the daemon, there's an event that happens whereby one node "transitions to primary" from another. For specifics, this is doing some stuff like removing an IP address from the "old" node and adding it to the "new" node.

I'd like the steps of this process to happen between the two servers in lockstep, so for instance:

Node A            Node B

Add IP
                       Remove IP
Do Task
                       Do Other Task

In such a way that B doesn't start "Remove IP" until A is done with "Add IP", then subsequently A doesn't start "Do Task" until B is done "Remove IP", and so on down a chain of about 7 commands.

To me this seems like something that Locks should accomplish, so I tried that. For each task combination, I set up a Write lock for Node A's task, and a Read lock for Node B's task.

Now, before this begins, I need to make sure they both start at the same time, so I alternate the read and write locks. Therefore something like this (commands to create the locks left out for brevity):

def become_primary():
    write_lock
.acquire()
    sleep
(1)
    write_lock
.release()

    read_lock
.acquire()
   
# I expect to block while the other node does stuff
    read_lock
.release()

    write_lock
.acquire()
    do_add_ip
()
    write_lock
.release()


def become_secondary():
    read_lock
.acquire()
   
# I expect this to block ONLY while the first write lock exists in become_primary(), then release
    read_lock
.release()

    write_lock
.acquire()
    do_some_stuff
()
    write_lock
.release()

    read_lock
.acquire()
    do_remove_ip
()
    read_lock
.release()


The problem I'm having is that I acquire that first set of locks, but the read side (become_secondary) never releases. The write site (become_primary) does, but then eventually gets stuck as well acquiring the 3rd lock.

Am I missing something here? Is this even what shared locks are designed for? Is this just some weird quirk or bug in Kazoo?

Thanks,
Joshua

Joshua Boniface

unread,
Dec 19, 2019, 12:52:06 AM12/19/19
to python-zk
Here's some actual log output from my program - I call the print statements before and after each acquire and release:

Node 1:

>>> 2019/12/19 00:23:15.518330 - Setting node hv1 to secondary state
>>> 2019/12/19 00:23:15.569297 - Acquiring read lock for synchronization A


Node 2:

>>> 2019/12/19 00:23:15.517874 - Setting node hv2 to primary state
>>> 2019/12/19 00:23:15.528030 - Acquiring write lock for synchronization A
>>> 2019/12/19 00:23:15.535850 - Acquired write lock for synchronization A
>>> 2019/12/19 00:23:16.536951 - Releasing write lock for synchronization A
>>> 2019/12/19 00:23:16.557612 - Released write lock for synchronization A
>>> 2019/12/19 00:23:16.658158 - Acquiring read lock for synchronization B
>>> 2019/12/19 00:23:16.667455 - Acquired read lock for synchronization B
>>> 2019/12/19 00:23:16.667566 - Releasing read lock for synchronization B
>>> 2019/12/19 00:23:16.675858 - Released read lock for synchronization B
>>> 2019/12/19 00:23:16.676137 - Acquiring write lock for synchronization C

The end of each of these logs is where it hangs.
Reply all
Reply to author
Forward
0 new messages