consul lock doesn't seem to work for background services

243 views
Skip to first unread message

Vikram Kone

unread,
Aug 19, 2015, 7:01:25 PM8/19/15
to Consul
Hi,
I'm trying to setup a consul cluster with 3 nodes to ensure that only a singleton service instance runs at any given time on the cluster.
It seems to work correctly for services that run in the foreground but doesn't seem to work with services that run in the background.
I want to run an open service tool which provides a start.sh script on whose execution the service starts and runs in the background.
Is there a way I can make this service singleton using consul lock?

Michael Fischer

unread,
Aug 19, 2015, 7:23:42 PM8/19/15
to consu...@googlegroups.com
That's not really possible as a practical matter. Consul lock needs
to know when the program has exited so that it can release the lock,
and to do so it has to wait on the process to exit. If it forks and
exits, there's not really a way for Consul to know when it's safe to
release the lock.

If you can configure the program not to daemonize itself, that'd be
best. Otherwise you can try to wrap it with fghack from daemontools,
but I can't personally vouch for that solution.
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/0f7a6671-858a-427c-bbb3-5d531b5cfb6d%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Vikram Kone

unread,
Aug 19, 2015, 7:35:43 PM8/19/15
to Consul
Ok. Assuming I dont have access to the source code of the said tool or don't want to change the code in that tool, what are my other options?
From what I'm seeing, when I do consul lock on a foreground service on all 3 machines, it runs on the first machine and block on the rest..which means the lock is acquired by the first instance and not necessarily by the leader of the cluster (I thought that would be the case initially).
Is there a consul command I can call to check if the current machine got the lock or not and wrap my service start call in a simple bash script that does something like if (consul lock-holder) { ./start.sh }

ja...@fpcomplete.com

unread,
Sep 13, 2015, 12:15:26 PM9/13/15
to Consul


On Wednesday, August 19, 2015 at 7:35:43 PM UTC-4, Vikram Kone wrote:
Ok. Assuming I dont have access to the source code of the said tool or don't want to change the code in that tool, what are my other options?
From what I'm seeing, when I do consul lock on a foreground service on all 3 machines, it runs on the first machine and block on the rest..which means the lock is acquired by the first instance and not necessarily by the leader of the cluster (I thought that would be the case initially).
Is there a consul command I can call to check if the current machine got the lock or not and wrap my service start call in a simple bash script that does something like if (consul lock-holder) { ./start.sh }


You could use a wrapper script using the consul-lock python library. I have been wanting something similar to what you've described, with the service running as a docker container.

Here is a simplified proof of the concept to demonstrate: https://gist.github.com/ketzacoatl/1a25503445a684e31dbd

ATM, this seems to work very well (in my limited testing), except I can see there being one problem I am not sure how to address. Maybe @armon or other gurus will have feedback: The lock is ephemeral, and eventually it will be released.. even if my client does not explicitly release it (I set the timeout high in my example, 3600 seconds / 1 hour). Conceptually, I guess the solution is to retrieve a new lock before the existing one is released. Does Consul (or python-consul / consul-lock) have a recommended method for addressing this?

ja...@fpcomplete.com

unread,
Sep 13, 2015, 12:32:57 PM9/13/15
to Consul


On Sunday, September 13, 2015 at 12:15:26 PM UTC-4, ja...@fpcomplete.com wrote:


ATM, this seems to work very well (in my limited testing), except I can see there being one problem I am not sure how to address. Maybe @armon or other gurus will have feedback: The lock is ephemeral, and eventually it will be released.. even if my client does not explicitly release it (I set the timeout high in my example, 3600 seconds / 1 hour). Conceptually, I guess the solution is to retrieve a new lock before the existing one is released. Does Consul (or python-consul / consul-lock) have a recommended method for addressing this?

 https://groups.google.com/forum/#!searchin/consul-tool/lock/consul-tool/lI46VgBCgWI/Zbs34Thu8JkJ lead me to http://www.consul.io/docs/internals/sessions.html, which points out a few relevant details - namely the association and dependency with sessions and service registration/etc. It seems the problem noted here could be worked around by integration service (de)registration and checks as part of the PoC. Maybe careful use of sessions if there was some desire to separate out the registration and check from the python code?

Thanks for the feedback!
Reply all
Reply to author
Forward
0 new messages