Data-driven.

67 views
Skip to first unread message

Igor Galić

unread,
Jul 9, 2013, 4:12:20 AM7/9/13
to puppet...@googlegroups.com

Hi folks,

last week I asked a question on how to sensibly structure
a puppet setup:

https://ask.puppetlabs.com/question/1932/how-to-tier-a-puppet-setup/

I received a quite sensible request, even though it's not
what I wanted to hear. The reason it's not, might be because
I haven't been quite able to get my point across:

Let's go with the example of hosting and develop that:

We have (among other things) a hosting service. Each customer
gets their own instance of httpd running on a seperate port
and a seperate user. They can have any number of vhosts, but
most only have one. They can have any kind of setup, but most
choose PHP + MySQL, the rest has static pages. Each of these
vhosts has a separate scp-user. (The db/user and scp-user for
each vhost have the same name). The web is terminated by a
caching proxy (Apache Traffic Server).

If I were to express this as data of its own, not data that
will fill a puppet apache, or puppet trafficserver or mysql
module, I'd express it like this in yaml:

---
excom:
instance: example.com
port: 8005
vhosts:
- servername: www
user: excomwww
scp_password: xxx
type: static

- servername: beta
user: excombeta
scp_password: zzz
db_password: yyy

---
exorg:
instance: example.org
port: 8006
vhosts:
- servername: www
user: exorgwww
scp_password: xxx
db_password: zzzz

- servername: mail
user: exorgmail
scp_password: sseeecret
db_password: reellyyseecrreet


With the implicit defaults in mind for this service, we
specify only when something deviates from them. With 90% of
vhosts being PHP, we specify that the `www` vhost is `static`

This `static` may in turn overwrite other defaults such as
`db_provider => mysql` with `db_provider => none`.

What I'm trying to say here is: Rather than building this data
as each puppet module would expect it, I build it such that
it makes sense to us admins and developers.


* * *

This whole aproach has a number of implications. First of
which is that we don't treat hiera as a nice-to-have tack on,
but that we intristically rely on it.

Second, the data in hiera does in no way reflect what a puppet
module that finally writes the configuration would expect.

This in turn means that we need a puppet module which can make
sense of the data, enriching it and filling the gaps where
necessary.

If we continue from the hosting example, we can roughly split
it in `hosting::lb` (for trafficserver), `hosting::web` (for
handling the httpd config, as well as the creation of users as
necessary), and `hosting::db` module (which, as we've seen can be
optional in some cases where the site is static).

There are two competing solutions to my problem:

#1: have the "parsing" code in each of those classes

#2: have the "parsing" code in an über class. This would be the
web class, and it would dispatch the data to the other two
via exports.

The first solution has the obvious problem of code-duplication.
The second doesn't scale with large numbers when customer grow
beyond a couple of thousand, as export/collect will be very slow.
However, this is not a problem we're facing right now… ;)

I'm leaning towards the über-class, but would love to hear some
feedback on whether I'm making any sense at all and how you
would aproach such problems!

Thank you very much in advance,

-- i
Igor Galić

Tel: +43 (0) 664 886 22 883
Mail: i.g...@brainsware.org
URL: http://brainsware.org/
GPG: 6880 4155 74BD FD7C B515 2EA5 4B1D 9E08 A097 C9AE

jcbollinger

unread,
Jul 9, 2013, 11:34:28 AM7/9/13
to puppet...@googlegroups.com


On Tuesday, July 9, 2013 3:12:20 AM UTC-5, Igor Galić wrote:

Hi folks,

last week I asked a question on how to sensibly structure
a puppet setup:

  https://ask.puppetlabs.com/question/1932/how-to-tier-a-puppet-setup/

I received a quite sensible request, even though it's not
what I wanted to hear.


Do have a look at "Roles and Profiles" (http://www.craigdunn.org/2012/05/239/), as your respondent over at Ask Puppet suggested.  I know that doesn't really address the question you are raising now, but you may find it to your advantage to build your overall framework along proven lines.

 
The reason it's not, might be because
I haven't been quite able to get my point across:

Let's go with the example of hosting and develop that:

We have (among other things) a hosting service. Each customer
gets their own instance of httpd running on a seperate port
and a seperate user. They can have any number of vhosts, but
most only have one. They can have any kind of setup, but most
choose PHP + MySQL, the rest has static pages. Each of these
vhosts has a separate scp-user. (The db/user and scp-user for
each vhost have the same name). The web is terminated by a
caching proxy (Apache Traffic Server).


[...]
 

What I'm trying to say here is: Rather than building this data
as each puppet module would expect it, I build it such that
it makes sense to us admins and developers.


                         * * *

This whole aproach has a number of implications. First of
which is that we don't treat hiera as a nice-to-have tack on,
but that we intristically rely on it.

Second, the data in hiera does in no way reflect what a puppet
module that finally writes the configuration would expect.

This in turn means that we need a puppet module which can make
sense of the data, enriching it and filling the gaps where
necessary.


And that's what you mean by "parsing" below?

[...]
 

There are two competing solutions to my problem:

#1: have the "parsing" code in each of those classes

#2: have the "parsing" code in an über class. This would be the
  web class, and it would dispatch the data to the other two
  via exports.


Exported resources don't seem relevant here.  In particular, they do not serve as a communications channel between classes, and especially not as a channel between different classes in the same node's configuration.  Exported resources are a means for the configuration for one node to declare resources that can later be incorporated by reference into other nodes' configurations.  Although exported resources can be collected on the same node that exports them (modulo bugs in a few versions of Puppet), if the exporting node is the only one collecting them then they are hurting rather than helping.

 

The first solution has the obvious problem of code-duplication.
The second doesn't scale with large numbers when customer grow
beyond a couple of thousand, as export/collect will be very slow.
However, this is not a problem we're facing right now… ;)



Puppet is very well aligned with data-driven approaches to configuration.  In fact, I think such approaches are far and away the best way to implement a manifest set.

The actual structure of the data for any given node or site is an open question, with multiple good approaches offering different advantages and disadvantages.  Puppet can certainly work with the kind of data structure you would like to use.

 
I'm leaning towards the über-class, but would love to hear some
feedback on whether I'm making any sense at all and how you
would aproach such problems!



If your data is structured in a small number of monolithic pieces, then I would have a small number of classes to serve as data stewards.  Maybe just one.  For your particular case, I would make use of these tools:
  • The keys() function provided by Puppetlabs' "stdlib" add-in module, for extracting the keys of a has to an array
  • Puppet's standard facility for declaring multiple resources of the same type and with the same parameters by specifying an array of their titles
  • Puppet defined types
  • Maybe Puppet's built-in create_resources() function
It might work out something like this:

class customer_sites {
  # The data is expected to be a hash of all the
  # data pertaining to all customer sites that should
  # be configured for this node.  Values of some keys
  # may be arrays or hashes, possibly nested, as
  # appropriate:
  $customer_info = hiera('customer_info')

  # One customer site for each customer described
  # by the data:
  $cusomters = keys($customer_info)
  customer_sites::customer_site { $customers: }
}

# represents one customer web site, with its own httpd instance,
# user name, vhosts, etc.
define customer_sites::customer_site () {
  include 'customer_sites'
  $this_customer_info = $customer_sites::customer_info[$title]

  # ...

  create_resources('customer_sites::vhost', $this_customer_info['vhosts'])
}

# Represents one vhost in one httpd instance
define customer_sites::vhost($servername, $user, $scp_password, $db_password) {
  # ...
}

Please understand that I'm really only trying to show you some of the things you can do to handle data within Puppet.  I'm not trying to layout a directly usable module setup.  In particular, the create_resource() functions has limitations that may make it unsuitable for your needs (though it's pretty slick when it's applicable).


John

Reply all
Reply to author
Forward
0 new messages