Re: [Puppet-dev] Export and collect data, not just resources.

1,810 views
Skip to first unread message

Jeff McCune

unread,
Jun 19, 2012, 11:32:00 AM6/19/12
to puppe...@googlegroups.com
Exported data as a concept and mental model is more important than exported resources IMHO.  You've nailed it.

Add me to the list of people who have passed hostnames through exported resources to a reverse proxy when all I really want is a variable containing an up to date list of worker hostnames.

The use cases for exported data outnumber the use cases for exported resources.

If you had an API to set values you could then read using Hiera, would that satisfy your needs?

For example, you could append a hostname to a specific array and would have a puppet sub command action to do this for you.

puppet data append load_balancer_one_workers newworker.example.com

Something like this, but you'd also be able to set the hierarchy point as an option.

--
Jeff McCune

On Tuesday, June 19, 2012 at 3:21 AM, Ryan Bowlby wrote:

I'm bumping up against the apparent limits of puppet. There appears to be a need for the ability to export data and not just resources. Then collecting this data and filtering based on specific values and or node facts. Nature will find a way....

Examples of exported resources being used to overcome lack of exported "data":

- Gary Larriza contributed an haproxy module to the forge which exports a define containing a concat fragment. These are then used to assemble the haproxy config with appropriate balance members.  https://github.com/glarizza/puppet-haproxy

- Dan Bode created puppet-nodesearch to fill this need. I commend the effort but hope a final result would resemble the existing collection and exporting syntax, and generally be more puppety.  https://github.com/puppetlabs/puppetlabs-nodesearch

- Countless puppet-user threads related to abusing the exported file resource, then collecting them in  some tempdir, then doing exec and looping through said files values. :(



Perhaps being able to do something like:

Export a variable to central datastore:

@@$ganglia_cluster = "www1"

Again with metaparam:

@@$ganglia_cluster = "www1" { tag => "foo" }


Realize said variables as hash whose key is $::fqdn on another node:

$ganglia_clusters = realize("$ganglia_cluster")

Creating:

{ "some.fq.dn" => "www1", "another.fq.dn" => "mail1" }

or with filtering:

$ganglia_clusters = $ganglia_cluster <<|  tag == "foo" |>>

or with fact filtering

$ganglia_clusters = $ganglia_cluster <<|  tag == "foo" or $::architecture == "i386" |>>


Obviously these examples are crap but I think the idea has some value. Allowing the use of these collected data types within templates would make sense. Doing so in a way that fits with the DSL and isn't overly complicated would add tremendous power.


Thanks,
Ryan Bowlby

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-dev/-/o2pJGxJSNasJ.
To post to this group, send email to puppe...@googlegroups.com.
To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.

Ryan Bowlby

unread,
Jun 20, 2012, 3:50:02 PM6/20/12
to puppe...@googlegroups.com
Hi Jeff,

That sounds great! If I could set and get in puppet manifests as well as from a face that would be killer. Would this be in stdlib as a function or more deeply embedded into the DSL? I'm assuming I can further filter a list of hosts based on facter facts using Hiera (I honestly haven't used it, just read about it)?


- Ryan
To unsubscribe from this group, send email to puppet-dev+unsubscribe@googlegroups.com.

Zach Leslie

unread,
Jun 20, 2012, 4:16:56 PM6/20/12
to puppe...@googlegroups.com
As was pointed out to me today, there is also https://github.com/dalen/puppet-puppetdbquery if you are running PuppetDB.

I too have been looking for a way to just export data, since what you really care to export is the data thats used to create a given resource, not really to export the resource itself.  With puppetdbquery above, it looks like I would be able to just turn any value I care to collect on other nodes in the form of a fact, and then query it.

Much of this could be doled out to functions that run on the master, update some file on disk, then read it back out to nodes on request, but then the filtering becomes the difficult thing, not to mention that if you may have "secret" data as a fact.  I wish puppet supported some syntax for doing this sort of thing.


To view this discussion on the web visit https://groups.google.com/d/msg/puppet-dev/-/KgBYoHn5ZwoJ.

To post to this group, send email to puppe...@googlegroups.com.
To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.



--
Zach Leslie :: Puppet Labs :: 503.208.9791

Luke Kanies

unread,
Jun 20, 2012, 4:42:09 PM6/20/12
to puppe...@googlegroups.com
On Jun 20, 2012, at 12:50 PM, Ryan Bowlby <rbow...@gmail.com> wrote:

> Hi Jeff,
>
> That sounds great! If I could set and get in puppet manifests as well as from a face that would be killer. Would this be in stdlib as a function or more deeply embedded into the DSL? I'm assuming I can further filter a list of hosts based on facter facts using Hiera (I honestly haven't used it, just read about it)?

We don't have a clear answer on how we're going to do this - we
recognize the need, but it's unfortunately very difficult to provide
this capability without at the same time making it impossible to
manage. That get/set capability basically results in site-wide global
variables, which are obviously very powerful, but they're also
basically impossible to maintain, especially in the long term.

Can you explain why you need the ability to do this? I think
understanding the use cases behind it might lead us to a solution that
gives you what you need, and hopefully even more than that, without
resulting in what amounts to global variables in a central database.

--
http://puppetlabs.com/ | +1-615-594-8199 | @puppetmasterd

Ryan Bowlby

unread,
Jun 20, 2012, 9:24:20 PM6/20/12
to puppe...@googlegroups.com
I imagine the common use case - and the one I had - is generation of a configuration file where certain values must be dynamically derived based on data that is beyond the "node-scope". These values are inherently known to a subset of nodes who could provide them if given a means to do so. This is within the space of "configuration" management and not something like zookeeper.

To be fair my use case involves software using an out of date design. In my case I was thinking through the design of a ganglia module. I wanted to use unicast instead of multicast within the cluster for ec2 compatibility. To do so would mean the gmond.conf must specify a node within the cluster as the "master"; with whom to publish metrics. It's a design that scales well due to the tiered approach to data collection, but was obviously created at a time when a cluster lived for years not hours. It also requires a node within a cluster be more important than the others.

The gmond.conf must list hosts who are masters within the cluster. The central ganglia gmetad.conf must list these cluster masters. I could statically specify a list of masters to each node's gmond.conf as a parameter:


node "host1" {
  class { "ganglia::gmond":
    cluster                  => "www",
    udp_send_channel => [ "host1.www.domain", "host2.www.domain" ],  # cluster master gmond/s
  }

node "host2" {
  class { "ganglia::gmond":
    cluster                  => "mail",
    udp_send_channel => [ "host1.mail.domain", "host2.mail.domain" ],
  }

Also specifying a list of data_sources (nodes to be polled) within each cluster to the central server as a parameter. 

# central ganglia server
node "ganglia.domain" {
  class { "ganglia::gmetad":
    data_sources => { "www" => [ "host1", "host2" ],
                                "mail" => [ "host1", "host2" ], }
  }
}

This works but is suboptimal since the parameter being specified in the central servers node definition is already specified within each node. Also, I rather have each node within the cluster be a "master" so we don't have to consider a node within a cluster sacred. Further, if I add an additional cluster I'll need to remember to update the ganglia servers node definition.

Id rather:

- have each gmond node collect a list of nodes within the same cluster.
- have the central gmetad collect a list of clusters and their masters.

I could do this by exporting a file resource on each node whose name and content are /tmp/ganglia-${cluster}-${fqdn} and setting tags to cluster name and "ganglia", collect on other nodes with tag filter for cluster and collect on central with tag ganglia. Parse with shell exec to sed for modification of appropriate config value.

Really I just need to append a hostname to an array within a site-global hash whose keys are the cluster name

ganglia[${cluster}] += [${fqdn}]
{"mail"=>["host1", "host2"], "www"=>["host1", "host2", "host3"]}

In my case normal data types would work just fine. Perhaps there's a need for the ability to further filter a list of hosts based on facts, but I'm only speculating. 


Thanks,
Ryan Bowlby

Andrew Forgue

unread,
Jun 22, 2012, 3:48:15 PM6/22/12
to puppe...@googlegroups.com
On Wednesday, June 20, 2012 9:24:20 PM UTC-4, Ryan Bowlby wrote:
I imagine the common use case - and the one I had - is generation of a configuration file where certain values must be dynamically derived based on data that is beyond the "node-scope". These values are inherently known to a subset of nodes who could provide them if given a means to do so. This is within the space of "configuration" management and not something like zookeeper.

We solved this problem using Volcane's registration system.  It's quite complicated and consists of mcollective, his registration agent, and MongoDB (ugh).  However it works quite well.  He describes it in his blog from about two years ago: http://www.devco.net/archives/2010/09/18/puppet_search_engine_with_mcollective.php.  

The basic premise is this:
1.) Puppet runs on a node
2.) Puppet saves the state (facts, classes applied, etc) for that run
3.) Mcollective uses 'registration' to broadcast a nodes state  (fact key/values, class list, agents loaded) every 5 minutes
4.) There's a listening agent on the puppet masters that pull the registration data and feed it into a mongo database.
5.) Server side functions on the puppet master query the mongo database for whatever you want to search against.
6.) Catalog is 'generated' from dynamic data.

An example interface for Gmond (which we've actually done exactly what you asked for):

$nodes = search_nodes('{ 'class' => 'gmond_master', 'facts.location' => $location }"

$nodes will be an array of all hosts that have a $location fact that matches the current node, and has the class gmond_master.  To define a 'gmond_master', it can be just an empty class.  The mere inclusion of this class makes it searchable to other nodes.

I haven't done anything with PuppetDB and I don't know if what we do is superseded by it, but it might be.  Querying for current node facts/classes is by far more flexible and useful than stored configurations IMHO.  We took the extra step and adding geospacial indexing as well so search_nodes will return the list of nodes in order of distance.

This is overkill I think for what you're trying to do, and has its own problems, but it's the solution to the general problem.  Let us search for other nodes by classes/facts/(distance!) or basically any report field (such as last puppet run time, to remove stale nodes, etc) and allow us to use facts from those nodes to make decisions in catalog compilation time.

-Andrew

Luke Kanies

unread,
Jun 28, 2012, 11:40:35 AM6/28/12
to puppe...@googlegroups.com
Do you have cases where you need to search for things other than host names?

Both your use case and Andrew's seem to be about finding hosts in sets, which is quite different from arbitrary data.

Luke Kanies

unread,
Jun 28, 2012, 11:42:13 AM6/28/12
to puppe...@googlegroups.com
What are the problems with this approach?

Are there functions you use beyond 'search_nodes'?

If you look at all of your search_nodes functions, are they similar calls, or do you have a wide variety of types of calls?

R.I.Pienaar

unread,
Jun 28, 2012, 11:49:20 AM6/28/12
to puppe...@googlegroups.com


----- Original Message -----
> From: "Luke Kanies" <lu...@puppetlabs.com>
> To: puppe...@googlegroups.com
> Sent: Thursday, June 28, 2012 4:42:13 PM
> Subject: Re: [Puppet-dev] Export and collect data, not just resources.
>
>
>
>> On Jun 22, 2012, at 12:48 PM, Andrew Forgue wrote:
>>
>>
>> On Wednesday, June 20, 2012 9:24:20 PM UTC-4, Ryan Bowlby wrote:
>>
>> We solved this problem using Volcane's registration system. It's
>
> What are the problems with this approach?
>
>
> Are there functions you use beyond 'search_nodes'?
>
>
> If you look at all of your search_nodes functions, are they similar
> calls, or do you have a wide variety of types of calls?

the ones I wrote supports simple class and fact matchers and returns
just a list of nodes.

There is another function that retrieves facts for a nodes back as a hash

It's functional ofcourse, I think its completely the wrong solution but it
scratches the itch till we can do better.

Find all the machines that wants to connect to me using ipsec:

$clients = search_nodes({'classes' => 'ipsec::endpoint::monitor1'})

Create ipsec tunnels back to them:

ipsec::endpoints_from_nodes{$clients: }

And this is the define, it will take each node name and load facts
then create a tunnel back:


define ipsec::endpoints_from_nodes {
$node = load_node($name)

ipsec::endpoint{$node["fqdn"]:
dest => $node["facts"]["ipaddress"]
}
}


R.I.Pienaar

unread,
Jun 28, 2012, 11:59:37 AM6/28/12
to puppe...@googlegroups.com


----- Original Message -----
> From: "R.I.Pienaar" <r...@devco.net>
> To: puppe...@googlegroups.com
> Sent: Thursday, June 28, 2012 4:49:20 PM
> Subject: Re: [Puppet-dev] Export and collect data, not just resources.
>
>
>
> ----- Original Message -----
> > From: "Luke Kanies" <lu...@puppetlabs.com>
> > To: puppe...@googlegroups.com
> > Sent: Thursday, June 28, 2012 4:42:13 PM
> > Subject: Re: [Puppet-dev] Export and collect data, not just
> > resources.
> >
> >
> >
> >> On Jun 22, 2012, at 12:48 PM, Andrew Forgue wrote:
> >>
> >>
> >> On Wednesday, June 20, 2012 9:24:20 PM UTC-4, Ryan Bowlby wrote:
> >>
> >> We solved this problem using Volcane's registration system. It's
> >
> > What are the problems with this approach?
> >
> >
> > Are there functions you use beyond 'search_nodes'?
> >
> >
> > If you look at all of your search_nodes functions, are they similar
> > calls, or do you have a wide variety of types of calls?
>
> the ones I wrote supports simple class and fact matchers and returns
> just a list of nodes.

Actually just to clarify this is not quite correct - the search hash gets
passed direct to MongoDB and its query language is very complete.

Additionally the data in the MongoDB that gets fed from MCollective is
data from a user provided plugin - so you can literally chuck anything
in there keyed to the fqdn and search against it.

The load_node() function will fetch everything there.

By default this is facts, classes, collectives a node belong to and list
of agents installed on that node but this is not a limitation, u can put
anything in there

Ryan Bowlby

unread,
Jun 28, 2012, 10:53:06 PM6/28/12
to puppe...@googlegroups.com


Do you have cases where you need to search for things other than host names?

Both your use case and Andrew's seem to be about finding hosts in sets, which is quite different from arbitrary data.


All the use cases I can think of could be keyed off hostname. This ganglia use case requires that the gmetad "discover" a list of cluster names too. While using a host-centric approach would work...it appears rather expensive to iterate through hundreds or thousands of hosts to determine if a key exists and retrieve the value. If I could key off gmond_clusters - or any arbitrary key-value combination - it would likely be faster. I can appreciate wanting to keep it host centric.

It would be nice to have this be decoupled from mcollective. Where both puppet and mcollective have access, but where one is not a prerequisite of the other. Requiring mongodb (storedata = true?), makes sense; right tool for the job and all that. Trying to force this into mysql or puppetdb works too; not picky.

Andrew Forgue

unread,
Jun 28, 2012, 11:36:37 PM6/28/12
to puppe...@googlegroups.com
On Thursday, June 28, 2012 11:40:35 AM UTC-4, Luke Kanies wrote:
Do you have cases where you need to search for things other than host names?

We also search for nodes with bind::slave class and then get the ipaddress fact of the node and use that for resolv.conf.  I can't think of any other data that's not hostname or ip we retrieve.

-Andrew

Daniel Pittman

unread,
Jun 29, 2012, 1:31:29 PM6/29/12
to puppe...@googlegroups.com
As near as I can tell, at least 90 percent of the needs real people
have are satisfied with collecting those two - search on some
criteria, then collect either hostnames or IP addresses that match.

If you extend that to "any fact" rather than just IP address you
probably hit 99 percent.

--
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

Luke Kanies

unread,
Jun 29, 2012, 5:49:28 PM6/29/12
to puppe...@googlegroups.com
On Jun 29, 2012, at 10:31 AM, Daniel Pittman wrote:

> On Thu, Jun 28, 2012 at 8:36 PM, Andrew Forgue <andrew...@gmail.com> wrote:
>> On Thursday, June 28, 2012 11:40:35 AM UTC-4, Luke Kanies wrote:
>>>
>>> Do you have cases where you need to search for things other than host
>>> names?
>>
>> We also search for nodes with bind::slave class and then get the ipaddress
>> fact of the node and use that for resolv.conf. I can't think of any other
>> data that's not hostname or ip we retrieve.
>
> As near as I can tell, at least 90 percent of the needs real people
> have are satisfied with collecting those two - search on some
> criteria, then collect either hostnames or IP addresses that match.
>
> If you extend that to "any fact" rather than just IP address you
> probably hit 99 percent.

That's great to know, and is a much smaller use case than what is otherwise described as a general search interface.

Thanks.

Stephen Gran

unread,
Jun 29, 2012, 3:54:40 AM6/29/12
to puppe...@googlegroups.com
On Thu, 2012-06-28 at 08:40 -0700, Luke Kanies wrote:


> Do you have cases where you need to search for things other than host
> names?
>
>
> Both your use case and Andrew's seem to be about finding hosts in
> sets, which is quite different from arbitrary data.

Hi,

We have a wrapper around User resources that also sets up an
authorized_key file for them and does a little bit of home directory
management and so on. We also create authorized_keys for role accounts
that people can log in as to perform various tasks. It would be great
if we could retrieve a list of Users with $tag (are in this or that
group, eg) and retrieve their keys to automatically build the role
account authorized_keys. This is probably doable now, but not without
an awful lot of scaffolding and things that aren't all that nice.

Cheers,
--
Stephen Gran
Senior Systems Integrator - guardian.co.uk

Please consider the environment before printing this email.
------------------------------------------------------------------
Visit guardian.co.uk - newspaper of the year

www.guardian.co.uk www.observer.co.uk www.guardiannews.com

On your mobile, visit m.guardian.co.uk or download the Guardian
iPhone app www.guardian.co.uk/iphone

To save up to 30% when you subscribe to the Guardian and the Observer
visit www.guardian.co.uk/subscriber
---------------------------------------------------------------------
This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.

Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

Luke Kanies

unread,
Jul 4, 2012, 4:55:12 PM7/4/12
to puppe...@googlegroups.com
On Jun 29, 2012, at 12:54 AM, Stephen Gran wrote:

> On Thu, 2012-06-28 at 08:40 -0700, Luke Kanies wrote:
>
>
>> Do you have cases where you need to search for things other than host
>> names?
>>
>>
>> Both your use case and Andrew's seem to be about finding hosts in
>> sets, which is quite different from arbitrary data.
>
> Hi,
>
> We have a wrapper around User resources that also sets up an
> authorized_key file for them and does a little bit of home directory
> management and so on. We also create authorized_keys for role accounts
> that people can log in as to perform various tasks. It would be great
> if we could retrieve a list of Users with $tag (are in this or that
> group, eg) and retrieve their keys to automatically build the role
> account authorized_keys. This is probably doable now, but not without
> an awful lot of scaffolding and things that aren't all that nice.

I *think* you can retrieve users by tags right now:

User <<| tag == foo |>>

but yeah, I don't think you could easily extract the names from those users.

Thanks.
Reply all
Reply to author
Forward
0 new messages