Per-server GUID

31 views
Skip to first unread message

Fred Wulff

unread,
May 12, 2009, 9:18:09 PM5/12/09
to beanstalk-talk
Hi all,

At the moment, there doesn't appear to be a way to easily and globally
identify a beanstalk job. I've got a use case where I'd like to log a
beanstalk job's id and use it to tie the producer and consumer
together after the fact that would be easier if this were the case. It
seems like the logical way to do this is for the server to compute a
GUID at startup (I'm thinking {fixed size uname() prefix or hash}#
{padded timestamp}#{random number} where # is concatenation?). Then
{server GUID}#{job ID} is globally unique and clients can cache their
server GUIDs for performance. It seems like the easiest thing to do
would be to add this to stats, but it might also make sense to make it
a new command. What are your all's thoughts on the proposal?

I'm happy to modify the code and submit it, but figured I'd see if the
design seemed reasonable to you all before I did.

-Fred

Keith Rarick

unread,
May 13, 2009, 5:30:57 AM5/13/09
to beansta...@googlegroups.com
On Tue, May 12, 2009 at 6:18 PM, Fred Wulff <frew...@gmail.com> wrote:
> At the moment, there doesn't appear to be a way to easily and globally
> identify a beanstalk job.
> ...

I like this idea. I have a couple of thoughts about the details.

* I think it should appear as a field (or fields) in the stats response.

* I'd prefer if it were transparent. For example, if it is based on
the host name and a time stamp, those should be plainly visible. In
other words, I don't want it to be just a SHA-1 hash or something.

* Perhaps the actual constuction can be left to the clients, who
already have almost enough information. They already know (or can get)
the IP address and port number of the service, the pid of the daemon,
and the job id. All they need in addition is the time the server was
started. Perhaps a random number is also necessary. How likely is a
server OS to reuse a pid within one second? These things could be
added as two separate fields in the stats output.

kr

Erich

unread,
May 13, 2009, 11:52:53 AM5/13/09
to beanstalk-talk
I think a unique instance id, (sha-1 or whatever) is a good idea also.
There are weird edge cases otherwise. For instance if there is a
beanstalkd on 2 subnets for whatever reason, what is the convention?

I suspect however that I am just excited about he idea that this makes
clock-hash style resource management way trivial, and that I'm looking
for other excuses to back me up.

Either way, the globally unique id saves code on the client side as
well, by not needing to maintain references to connections in jobs.
This allows for easier fun like passing jobs off to internal sub
processes and whatnot.

Regards,
Erich

On May 13, 4:30 am, Keith Rarick <k...@xph.us> wrote:

Fred Wulff

unread,
May 16, 2009, 9:49:39 PM5/16/09
to beansta...@googlegroups.com
Hi Erich,

Could you elaborate on what you mean by clock-hash style resource
management? I don't think I'm familiar with the concept.

Thanks
-Fred

Erich

unread,
May 18, 2009, 9:48:55 AM5/18/09
to beanstalk-talk
Hi Fred,

You may also know it as consistent hashing, it always sticks in my
head as clock hashing for some reason. Here is article about it:
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

Regards,
Erich

On May 16, 8:49 pm, Fred Wulff <frewst...@gmail.com> wrote:
> Hi Erich,
>
> Could you elaborate on what you mean by clock-hash style resource
> management? I don't think I'm familiar with the concept.
>
> Thanks
> -Fred
>

Fred Wulff

unread,
May 20, 2009, 7:38:05 PM5/20/09
to beanstalk-talk
Hmm...looks like I sent this from the wrong address and it didn't go
out. Apologies if it did and this is a double post.

Okay, I added a first stab at this in my github fork:
http://github.com/frew/beanstalkd/commit/829879a73898f9334538ffc50360d9e9960f7727

I ended up using {first 255 chars of node name}.{start up unix
time}.{random value}

Thoughts:
* We could serve the node name, start_time, and random value
separately if we wanted, but I don't think we want to delegate any
more than that to the client (since node name can be different
depending on how the client is connecting, and the other two aren't
determinable client-side).
* I thought about including a hash, but since we're in C land here, I
think it would be more trouble than it's worth in terms of additional
code or dependencies. It should be easy to add on the client-side,
since most recent languages have it as part of their standard
libraries.
* I'm assuming time_t is an integral type. I think this is the case on
every reasonable system, but it's not actually a standard.
* It might be worth 0 padding the time so that guids sort
lexicographically. *shrug*

-Fred

Fred Wulff

unread,
Jun 2, 2009, 9:06:50 AM6/2/09
to beanstalk-talk
Ping.

Keith Rarick

unread,
Dec 4, 2009, 7:10:02 PM12/4/09
to beansta...@googlegroups.com
On Wed, May 20, 2009 at 3:38 PM, Fred Wulff <frew...@gmail.com> wrote:
> Okay, I added a first stab at this in my github fork:
> http://github.com/frew/beanstalkd/commit/829879a73898f9334538ffc50360d9e9960f7727

I added some comments there. And I'm ashamed it's taken me so long to
reply. Sorry.

kr

Nathaniel Cook

unread,
Dec 4, 2012, 5:28:44 PM12/4/12
to beansta...@googlegroups.com, k...@xph.us
I have added a pull request to hopefully get this done finally. :)
Reply all
Reply to author
Forward
0 new messages