simple node classification and custom facts

Berkeley

unread,

Sep 6, 2016, 7:03:33 AM9/6/16

to Puppet Users

I'm doing a refactor of my puppet code with the profiles+roles design pattern. I'm encountering what should be a simple problem, but I'm having trouble finding an answer.

With roles/profiles, you instantiate classes using 'include' and fetch the parameter values from hiera. Then, for each node, you specify one role, which in turn includes all the relevant classes. Right now I have a hiera hierarchy that references the node's OS, environment and host name. However, I also want to provide different data based on the role of a node. For example, a webserver and an email server might run postfix; the webserver uses it to email the admin when something goes wrong, but the email server uses it for much more. However, I still don't want to instantiate the parameters of the postfix class explicitly from any role or profile to avoid resource conflicts. So, I would like to write hiera data that applies to a particular role (or even a particular profile), like in the following hierarchy below, but I have no idea how to do this:

:hierarchy:
    - "fqdn/%{clientcert}"
    - "env/%{environment}"
    - "roles/%{role}"
    - "os/%{osfamily}
    - common

I should add that I'm very comfortable writing ruby code, but for something so basic I don't feel like I should be trying to write custom facts. Even a simple way to just specify, per-node, an extra fact I could use for hiera's hierarchy would be helpful.

Luke Bigum

unread,

Sep 6, 2016, 9:05:22 AM9/6/16

to Puppet Users

Hi,

This is mostly a "rethink what you are doing" reply, based on my experience of starting with our business logic of an estate almost entirely coded in Hiera, and now moving towards a role/profile design. If it doesn't fit, feel free to tell ignore, I've answered your question at the end :-)

On Tuesday, 6 September 2016 12:03:33 UTC+1, Berkeley wrote:

I'm doing a refactor of my puppet code with the profiles+roles design pattern. I'm encountering what should be a simple problem, but I'm having trouble finding an answer.

With roles/profiles, you instantiate classes using 'include' and fetch the parameter values from hiera.

Not necessarily, the design pattern is open to interpretation / what works for you. I have a certain strict view of the pattern and I personally would say the exact opposite of what you've said above. I would never allow Hiera data to control a role, and I would only allow Hiera data to control the functionality of a profile based on the design of my Hiera tree.

Then, for each node, you specify one role, which in turn includes all the relevant classes.

Relevant "profiles" (which are still classes, yes, but making the point). Roles would include relevant profiles, but I wouldn't limit myself to just the "include" statement. I would not put component modules or resources into roles. If profiles are functionally flexible enough I might use the resource style declaration. For example, the profile::mail class below takes an Enum type that controls whether it's a simple internal mail server that you could use on your web server, or a fully fledged external mail relay with all the bells and whistles. In such a design my Postfix code is de-duplicated (to a point) in profile::mail, and I provide an interface where the operator can change the postfix mode to one of several pre-set methods:

class role::webserver {

class { 'profile::mail':

type => 'internal',

}

class role::mailserver {

class { 'profile::mail':

type => 'relay',

}

Right now I have a hiera hierarchy that references the node's OS, environment and host name.

It's probably a bit late to change now, but I cannot think of any reason why you'd need to put OS in as a Hiera level. A well written component class (postfix, mysql, apache, etc) should be operating system agnostic, so you should never have to do anything different yourself between different OS (in theory). In practice there are probably some edge cases, however the introduction of a role/profile design means you can put if/case statements in your profiles to handle different operating systems, so you still don't need OS in your Hiera tree:

class profile::mail {

if ($::operatingsystem == 'RedHat') {

$version = 'latest'

} else {

$version = '1.1.1'

package { 'postfix':

ensure => $version,

}

If you can remove OS, I would recommend you do that eventually. I imagine it might take a while.

Hiera should be for business level / design stuff, I would never put a standard Facter Fact in a Hiera tree. I have three reasons why. First, It's a key value store, and that's it. It can merge Hashes and Arrays, and do recursive lookups, not much else. When things expand and get more complicated, relying too much on Hiera means you end up doing more and more "tricks" to get things to work. Conversely if you have a simpler Hiera tree and more of your logic in Puppet code (where you have if statements, selectors, and all new Map functions in Puppet 4) you get a lot more control. With a Puppet profile, I could request 4 differerent Hiera keys, take the first two characters of each, sum them together into a Hex number and write that to disk. Try do something complicated like that in Hiera.

Second reason has to do with entropy. If you make everything ultra configurable with Hiera, you're making it easy for servers to be configured differently. If you have several staff with different thought patterns and ways of solving problems, then very soon you'll see drift appear in your Hiera tree. Someone will do it one way, someone will do it another way, servers will be different. If the code that configures your servers is a lot more static (in Puppet code) then I would say it's easier to audit, and so I could assert to my boss something like "Postfix is only ever configured in one of two ways". If I relied on a complicated set of Hiera keys to control Postfix, I'm not sure I could make that assertion. There's nothing stopping some fool overriding the version in Hiera on the node level for our mail server if I allow it to happen. If it is a hard coded parameter, they can't do that. However, if you have a small estate and only one Puppet guy, you can probably keep on top of this.

Third reason is I find it a lot easier to test. The more a piece of Puppet code relies on external Hiera data to work properly, the trickier it is to test in all circumstances. In the example above I can write a unit test so that profile::mail installs the latest version on Red Hat. If that piece of data is regulated to Hiera, it becomes more difficult. Yes, I could either stub the data or, import my entire Hiera tree as part of my unit tests (shudder), but the more I rely on an external data source, the harder it is to test all eventual outcomes.

However, I also want to provide different data based on the role of a node.

I would not do this, but this is my ultra-purist way of looking at the design pattern talking. I think that any time you need to modify a role slightly to do a different job, you've got a different role. At first glance this would lead to a lot of duplication, but if you structure your profiles right, you shouldn't have too much duplication.

For example, a webserver and an email server might run postfix; the webserver uses it to email the admin when something goes wrong, but the email server uses it for much more. However, I still don't want to instantiate the parameters of the postfix class explicitly from any role or profile to avoid resource conflicts.

I would. There's no rule that says all Profiles should be able to co-exist, and I'd like to see someone's implementation if they do have this. Here's your example, using [camptocamp/puppet-postfix] as a component module:

class profile::mail::internal {

class { 'postfix':

relayhost => '192.168.0.1',

}

class profile::mail::relay {

include amavisd

include something_else

class { 'postfix':

relayhost => 'external.mail.com',

mynetworks => '192.168.0.0/16',

}

Those two profiles are specifically designed not to run together. How can they? How can a server be an internal-only mail forwarder and a full blown relay at the same time?

So, I would like to write hiera data that applies to a particular role (or even a particular profile), like in the following hierarchy below, but I have no idea how to do this:
:hierarchy:
    - "fqdn/%{clientcert}"
    - "env/%{environment}"
    - "roles/%{role}"
    - "os/%{osfamily}
    - common
I should add that I'm very comfortable writing ruby code, but for something so basic I don't feel like I should be trying to write custom facts. Even a simple way to just specify, per-node, an extra fact I could use for hiera's hierarchy would be helpful.

I would argue that with a role/profile design, you don't need Role in the Hierarchy. I introduced Role to our Hierarchy several years ago, and am trying very hard to undo it now. Any time that you think you'd need to configure a Role differently with a Hiera key, then create a new role and/or refactor your profiles to handle your case.

Remember that the role profile design pattern was originally designed to abstract more and more Puppet details away from various levels of users (business users, developers, admins). So if you are a business user up at the role level, you probably don't even know what Postfix is. All a business guy wants to know is "Web Server" and "Mail Server". If you are at the developer level, you probably don't care whether Postfix 1.1.1 or Postfix 1.1.2 is installed, and you probably don't have the knowledge to know which is correct, so all a developer wants to know is "it's an internal mail server" or an "external mail relay". You as an administrator probably want in depth control over what version of postifx is installed in what profile and how it is configured.

Now, if you'd prefer to ignore me and just want me to answer your question, here's how I added Role to my Hierarchy :-)

I have a very simple External Node Classifier script that reads a YAML file on the Puppet Master and says for certain nodes "you have this role". It inserts a top level ENC Parameter (which is almost a Fact, but not really). I can then use this parameter in my Hiera Hierarchy. Below is the Hiera config, the Puppet configuration line, the Python script, and a little bit of the YAML data that controls it:

[root@puppet puppet]# cat hiera.yaml

# Managed by Puppet

---

:backends:

- "eyaml"

- "yaml"

:hierarchy:

- "node/%{fqdn}"

- "role/%{lmax_role}_role"

- "zone/%{zone}_zone"

- "pop/%{pop}.lmax"

- "global"

:yaml:

:datadir: "/etc/puppet/environments/%{::environment}/hiera/"

:eyaml:

:datadir: "/etc/puppet/environments/%{::environment}/hiera/"

[root@puppet puppet]# grep external_nodes /etc/puppet/puppet.conf

external_nodes = /etc/puppet/simple_lmax_roles_enc.py

[root@puppet puppet]# cat /etc/puppet/simple_lmax_roles_enc.py

#!/usr/bin/env python

#Managed by Puppet

#LB: simple ENC classifier to get the $lmax_role top level variable before any

#classes are evaluated, which makes our Hiera hierarchy work properly.

#Copied a lot from: http://wiki.unixh4cks.com/index.php/Simple_External_Node_Classifier(ENC)_for_puppet_in_python

# 20140102

# Add in support for regex matching hostnames

# - millara

import sys

import yaml

import re

host = sys.argv[1]

if len(sys.argv) > 2:

pretty = sys.argv[2]

else:

pretty = False

enc_yaml = '/etc/puppet/lmax_roles.yaml'

enc_yaml_regex = '/etc/puppet/lmax_roles_regex.yaml'

f = open(enc_yaml, 'r')

yaml_host = yaml.load(f)

empty = { 'parameters': {} }

if host in yaml_host:

if pretty:

print yaml_host[host]['parameters']['lmax_role']

else:

print yaml_host[host]

#LB: if there is no host in the YAML we need to print an empty hash, or Puppet thinks there is no node

#definition.

else:

# we didn't get a specific match, now find the first regex match we can..

f = open(enc_yaml_regex, 'r')

yaml_regex = yaml.load(f)

for regex in yaml_regex.values():

p = re.compile(regex['regex'], re.IGNORECASE)

if p.match(host):

if pretty:

print regex['value']['parameters']['lmax_role']

else:

print regex['value']

sys.exit(0)

if pretty:

print "nil"

else:

print empty

[root@puppet puppet]# head /etc/puppet/lmax_roles.yaml

#Managed by Puppet

#Generated by puppet::master::config

---

something.internal.lmax:

parameters:

lmax_role: woof

another.internal.lmax:

parameters:

lmax_role: meow

jcbollinger

unread,

Sep 7, 2016, 9:12:42 AM9/7/16

to Puppet Users

On Tuesday, September 6, 2016 at 6:03:33 AM UTC-5, Berkeley wrote:

I'm doing a refactor of my puppet code with the profiles+roles design pattern. I'm encountering what should be a simple problem, but I'm having trouble finding an answer.

With roles/profiles, you instantiate classes using 'include' and fetch the parameter values from hiera.

That's not a given. I think choice of data binding approach is largely perpendicular to use of roles & profiles.

Then, for each node, you specify one role, which in turn includes all the relevant classes. Right now I have a hiera hierarchy that references the node's OS, environment and host name. However, I also want to provide different data based on the role of a node. For example, a webserver and an email server might run postfix; the webserver uses it to email the admin when something goes wrong, but the email server uses it for much more. However, I still don't want to instantiate the parameters of the postfix class explicitly from any role or profile to avoid resource conflicts.

This is fine if the requirement of every class that wants Postfix is simply that Postfix must be installed. But suppose two profiles want conflicting Postfix configurations -- then you want Puppet to notice the conflict. Puppet class / resource conflicts are not a misfeature simply to be avoided or worked around at any cost; they are an intentional feature serving to help ensure internal consistency of your configurations. The biggest problem with them is that they can cast too broad a net.

So, I would like to write hiera data that applies to a particular role (or even a particular profile), like in the following hierarchy below, but I have no idea how to do this:
:hierarchy:
    - "fqdn/%{clientcert}"
    - "env/%{environment}"
    - "roles/%{role}"
    - "os/%{osfamily}
    - common

And what prevents you from doing exactly that? $role does not need to be a fact for that to work -- it can be an ordinary top-scope variable or a node-scope variable, too. And there are other alternatives, too.

John

Arnau

unread,

Sep 7, 2016, 9:52:04 AM9/7/16

to puppet...@googlegroups.com

Hi,

my 2 cents below:

2016-09-04 22:32 GMT+02:00 Berkeley <berkeley...@gmail.com>:

[...]

:hierarchy:
    - "fqdn/%{clientcert}"
    - "env/%{environment}"
    - "roles/%{role}"
    - "os/%{osfamily}
    - common
I should add that I'm very comfortable writing ruby code, but for something so basic I don't feel like I should be trying to write custom facts. Even a simple way to just specify, per-node, an extra fact I could use for hiera's hierarchy would be helpful.

I use https://forge.puppet.com/abstractit/puppet to define custom facts.

Then, in hiera, for each node I do something like:

nodeXYZ.yaml
facts::custom_facts: { role: sge_execution_node }

during the first puppet run, it creates the custom fact, the second, the node know what role it is and does what its expected (same when you switch roles in a node).

I prefer using a custom fact cause I can interrogate the node about its role:

# facter -p role
sge_execution_node

I also use the role yaml file in hiera for including classes based:

role/sge_execution_node.yaml
---
classes:
 - role::sge_execution_node
sge::node_type: 'execution'