ARM-15 - Master to Produce Meaningful Status Messages (Gerard Hickey)

57 views
Skip to first unread message

Dustin J. Mitchell

unread,
Aug 22, 2013, 8:36:55 PM8/22/13
to puppe...@googlegroups.com, hic...@kinetic-compute.com
I thought I'd kick off discussion of this ARM,
https://github.com/puppetlabs/armatures/pull/60/files
(I started discussion in the pull req, but this is probably a better
place for it)

First, the summary is about the implementation - it should probably
start with the goal: "load balancers should be able to poll masters
for their status, to support removal of malfunctioning masters from
the pool" or the like. The summary's the first (and sometimes only)
place people will look, so you have to hook them in early. Then you
can go on to discuss implementation.

Let me know if I've mischaracterized the intent.

Second, as someone who had never heard of the status indirection, the
description is confusing as to what exists and what doesn't. It might
be nice to split that up into "current functionality" and "proposed
functionality".

Perhaps include an example of both -- for me:
dmitchell@releng-puppet2 ~ $ curl -k
https://puppet:8140/production/status/no_key
Forbidden request:
releng-puppet2.srv.releng.scl3.mozilla.com(10.26.48.50) access to
/status/no_key [find] at :115

I think you're also suggesting additional keys beyond `is_alive`.
Again, an example might make that clearer.

Finally, on a technical basis, I'm not clear on how command-line
switches would translate into changes of state for a running master.
Is that through another operation on the REST endpoint?

I think this sounds like a great idea. Will you be able to implement
it, once the ARM is ironed out?

Dustin

Gerard Hickey

unread,
Aug 23, 2013, 5:04:35 PM8/23/13
to puppe...@googlegroups.com, hic...@kinetic-compute.com, dus...@v.igoro.us
Thanks Dustin for your suggestions. 

I think you have the basic understanding of the intent. Today I have the load balancers looking at the status REST call to determine if the master is capable of servicing agents. The advantage of looking at the REST call vs. looking only at if the TCP connection connects is that the REST call will allow the LB to detect if the service is actually responding. 

I think your idea of getting some examples and more suggested responses is a good idea. Thank you for that. I did push on this a bit as Eric thought it might be good to have it for the Armature meeting at PuppetConf, so I probably released it a bit sooner than I should have. I will go back to revise and get more information into ARM-15. 

I have already started some preliminary code changes for this ARM, but there is going to be some thought work on the actual implementation. On Wednesday, Eric hit me with the question about how to make this work under Rack. Previously I have been only thinking about a single thread and a multi threaded environment with access to a shared data segment. Rack adds a new twist to the problem and need to think about the best way handle this. 

Thanks. 
--
Gerard

Dustin J. Mitchell

unread,
Aug 23, 2013, 7:06:19 PM8/23/13
to Gerard Hickey, puppe...@googlegroups.com, Gerard Hickey
You could probably use a simple semaphore file to indicate the status
- perhaps just a YAML file, the contents of which are returned by the
indirector?

Gerard Hickey

unread,
Aug 25, 2013, 1:54:43 AM8/25/13
to puppe...@googlegroups.com, Gerard Hickey, Gerard Hickey, dus...@v.igoro.us
Yes, this was sort of the approach I was thinking too. The file would only be hit when the status REST call was hit so the impact would not be too much. Even with a load balancer hitting the REST call every few seconds, the impact would be pretty minor. 
--
Gerard.

Andy Parker

unread,
Aug 26, 2013, 1:34:02 PM8/26/13
to puppe...@googlegroups.com
I just read over the PR that you linked to. It looks like an interesting idea, but I have to wonder about it a little bit. This proposal isn't to add automated metrics to the puppet master to keep track of things like compile stats, file request stats, etc. It is more of a manual mechanism for the sysadmin to mark a master on- or off-line. If I'm understanding that correctly, then I'm kinda wondering about the utility given that you can (if you are using the master in passenger, which if you are using load balancers you probably are) just setup an apache Location (or similar) that will serve the status file. At that point it removes that from the masters responsibility, which for this kind of switch seems like the right thing.


On Thu, Aug 22, 2013 at 5:36 PM, Dustin J. Mitchell <dus...@v.igoro.us> wrote:
I thought I'd kick off discussion of this ARM,
  https://github.com/puppetlabs/armatures/pull/60/files
(I started discussion in the pull req, but this is probably a better
place for it)

First, the summary is about the implementation - it should probably
start with the goal: "load balancers should be able to poll masters
for their status, to support removal of malfunctioning masters from
the pool" or the like. The summary's the first (and sometimes only)
place people will look, so you have to hook them in early. Then you
can go on to discuss implementation.


I think this is my concern. I didn't see anything in the arm about the system tracking itself for the status. It looked like it is entirely up to the operator to mark the status.

I think that getting a monitoring system into puppet is absolutely necessary and the best way of doing that is if it tracks what is happening. Part of doing this, I think, is going to require some clear definitions of what it means for the puppet master to be "working". Maybe some examples of situations where the master stopped working by some definition would be a good place to start, then we can figure out how the master might have been able to signal that it has reach that "non-working" state.
 
Let me know if I've mischaracterized the intent.

Second, as someone who had never heard of the status indirection, the
description is confusing as to what exists and what doesn't.  It might
be nice to split that up into "current functionality" and "proposed
functionality".

Perhaps include an example of both -- for me:
  dmitchell@releng-puppet2 ~ $ curl -k
https://puppet:8140/production/status/no_key
  Forbidden request:
releng-puppet2.srv.releng.scl3.mozilla.com(10.26.48.50) access to
/status/no_key [find] at :115

I think you're also suggesting additional keys beyond `is_alive`.
Again, an example might make that clearer.

Finally, on a technical basis, I'm not clear on how command-line
switches would translate into changes of state for a running master.
Is that through another operation on the REST endpoint?

I think this sounds like a great idea.  Will you be able to implement
it, once the ARM is ironed out?

Dustin

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To post to this group, send email to puppe...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
Andrew Parker
Freenode: zaphod42
Twitter: @aparker42
Software Developer

Join us at PuppetConf 2013, August 22-23 in San Francisco - http://bit.ly/pupconf13
Register now and take advantage of the Final Countdown discount - save 15%!

Dustin J. Mitchell

unread,
Aug 26, 2013, 1:48:02 PM8/26/13
to puppe...@googlegroups.com
The current situation is that master monitoring basically consists of
a no-op HTTP request. We'd like to get to a point where it includes
live status of the master. It makes sense for that live status to
include among other things some manually-flippable "in_service?" kind
of switch.

My understanding of the ARM is that it gets us a small step closer to
the latter, with a small code change. I think it makes a lot of sense
in that light, although it is perhaps not very ambitious for an ARM.

Dustin

Andy Parker

unread,
Aug 26, 2013, 2:02:12 PM8/26/13
to puppe...@googlegroups.com
On Mon, Aug 26, 2013 at 10:48 AM, Dustin J. Mitchell <dus...@v.igoro.us> wrote:
The current situation is that master monitoring basically consists of
a no-op HTTP request.  We'd like to get to a point where it includes
live status of the master.  It makes sense for that live status to
include among other things some manually-flippable "in_service?" kind
of switch.


Yeah, the current /<env>/status endpoint isn't very useful. 
 
My understanding of the ARM is that it gets us a small step closer to
the latter, with a small code change.  I think it makes a lot of sense
in that light, although it is perhaps not very ambitious for an ARM.


I'd like to get it more useful, but I think need to have a clearer definition of what the "status of the master" means. If it is just the static content, then I see no point in it.

So what might the master track in order to have a meaningful status page available? Since the status URL currently is /<env>/status/<unused key> it could be at a minimum if the environment exists by doing a stat of the manifest file for that environment. The key might be usable as something, but I'm drawing a blank on that right now.

Because puppet works in a pre-fork model normally any given worker doesn't automatically have access to a global view of the whole master. It would need some way of communicating and collating information between the different workers.

Since puppet is a Rack app, some of the Rack monitoring tools, might also fill this niche already. It may be a exercise in either documenting how to use them or giving puppet a few callbacks to send data to them.

 
Dustin

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To post to this group, send email to puppe...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Dustin J. Mitchell

unread,
Sep 1, 2013, 12:40:25 PM9/1/13
to puppe...@googlegroups.com
It's not quite *static* content. I can see a lot of ways that this
might be useful to users. For example, I host yum/apt/DMG repos (and
maybe nuget too!) on my puppet masters, and as I scale puppet masters
I will want to use shared storage for those - probably NFS. So being
able to add the state of that NFS mount to my server status would be
useful -- if the mount is unavailable, the load balancer should skip
that master and I should get an alert via the monitoring system. But,
I could do that with a crontask that just kills the puppetmaster if
the mount is inaccessible. The effect would be the same, and probably
just as quick.

Another use is to take a master out of service gracefully, manually.
That's the original motivation for this ARM. However, it occurs to me
that this is usually best done via the load balancer itself - it
should have some UI for removing a node or nodes from the pool.

The more we discuss this, as written, the less useful it sounds.
Morphing it into a "make puppet monitorable as a rack app" seems like
a *big* change.

Dustin

Andy Parker

unread,
Sep 3, 2013, 2:29:32 PM9/3/13
to puppe...@googlegroups.com
On Sun, Sep 1, 2013 at 9:40 AM, Dustin J. Mitchell <dus...@v.igoro.us> wrote:
It's not quite *static* content.   I can see a lot of ways that this
might be useful to users.  For example, I host yum/apt/DMG repos (and
maybe nuget too!) on my puppet masters, and as I scale puppet masters
I will want to use shared storage for those - probably NFS.  So being
able to add the state of that NFS mount to my server status would be
useful -- if the mount is unavailable, the load balancer should skip
that master and I should get an alert via the monitoring system.  But,
I could do that with a crontask that just kills the puppetmaster if
the mount is inaccessible.  The effect would be the same, and probably
just as quick.


In that kind of a setup I would expect that you would get the monitoring hooked up to the load balancer to remove "bad" masters. I'm not sure that monitoring aspects of the underlying system should be a part of the master's job. I would expect that a status page on the master would deal with how the master itself is acting.
 
Another use is to take a master out of service gracefully, manually.
That's the original motivation for this ARM.  However, it occurs to me
that this is usually best done via the load balancer itself - it
should have some UI for removing a node or nodes from the pool.

The more we discuss this, as written, the less useful it sounds.
Morphing it into a "make puppet monitorable as a rack app" seems like
a *big* change.


It might be a big change, or it might not. I haven't looked into it too deeply.

However, it is sounding like the conclusion on this ARM is "not compelling as described".
 
Dustin

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To post to this group, send email to puppe...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
Andrew Parker
Freenode: zaphod42
Twitter: @aparker42
Software Developer

Join us at PuppetConf 2014September 23-24 in San Francisco
Reply all
Reply to author
Forward
0 new messages