Our site is using a collection of puppet modules to manage various Linux components using the roles and profiles model. While it works OK for the most part, I often find it necessary to update a module or profile for some reason or other. Modules obtained from puppet forge sometimes don't quite do what is needed, and writing good quality modules on your own can be a challenge.
When using roles and profiles you end up declaring all the module parameters again to avoid losing functionality and flexibility.
You also need to be familiar with all the classes, types, and parameters from all modules in order to use them effectively.
To avoid all of the above, I put together the 'basic' module and posted it on the forge: https://forge.puppet.com/southalc/basic
This module uses the hiera_hash/create_resources model for all the native puppet (version 5.5) types, using module parameters that match the type (exceptions for metaparameters, per the README). The module also includes the 'file_line' type from puppetlabs/stdlib, the 'archive' type from puppet/archive, and the local defined type 'binary', which together provide a simple and powerful way to create complex configurations from hiera. All module parameters default to an empty hash and also have a merge strategy of 'hash' to enable a great deal of flexibility. With this approach I've found it possible to replace many single purpose modules it's much faster and easier to get the results I'm looking for.
Yes, the hiera data can become quite large, but I find it much easier to manage data in hiera than coding modules with associated logic, parameters, templates, etc. Is this suitable for hyper-scale deployment? Maybe not, but for a few hundred servers with a few dozen configuration variants it seems to work nicely. Is everyone else using puppet actually happy with the roles/profiles method?
Quite a similar question was posted about two weeks back, you might find that very interesting:
If you are a confident Puppet Coder, you might prefer to import the source, patch the module to add your feature, then submit the patch back upstream.
When using roles and profiles you end up declaring all the module parameters again to avoid losing functionality and flexibility.... Not sure I agree with that statement. That sounds odd. Why would you be re-declaring module parameters if you're not changing something from the defaults? And if you are intending to change something, then of course you are supplying different parameters?
You also need to be familiar with all the classes, types, and parameters from all modules in order to use them effectively.Ideally the README page of a module would contain amazing user level documentation of how the module should work... but not that many do. I often find I have to go read the Puppet code itself to figure out exactly what a parameter does.
To avoid all of the above, I put together the 'basic' module and posted it on the forge: https://forge.puppet.com/southalc/basicOk :-) I'm beginning to see what the core of your problem is. The fact that you've created your own module to effectively do create_resources() hash definitions says to me that you haven't quite grasped the concepts of the Role / Profile design pattern. I know I have a very strong view on this subject and many others will disagree, but personally I think the Role / Profile pattern and the "do-everything-with-Hiera-data" pattern are practically incompatible.
This module uses the hiera_hash/create_resources model for all the native puppet (version 5.5) types, using module parameters that match the type (exceptions for metaparameters, per the README). The module also includes the 'file_line' type from puppetlabs/stdlib, the 'archive' type from puppet/archive, and the local defined type 'binary', which together provide a simple and powerful way to create complex configurations from hiera. All module parameters default to an empty hash and also have a merge strategy of 'hash' to enable a great deal of flexibility. With this approach I've found it possible to replace many single purpose modules it's much faster and easier to get the results I'm looking for.A Hiera-based, data-driven approach will always be faster to produce a "new" result (just like writing Ansible YAML is faster to produce than Puppet code)... It's very easy to brain dump configuration into YAML and have it work, and that's efficient up to a certain point. For your simple use cases, yes, I can completely see why you would be looking at the Role Profile pattern and saying to yourself "WTF for?". I think the tipping point of which design method becomes more efficient directly relates to how complicated (or how much control) you want over your systems.
The more complicated you go, the more I think you will find that Hiera just doesn't quite cut it. Hiera is a key value store. You can start using some neat tricks like hash merging, you can look up other keys to de-duplicate data... When you start to model more and more complicated infrastructure, I think you will find that you don't have enough power in Hiera to describe what you want to describe, and that you need an imperative programming language (eg: if statements, loops, map-reduce). The Puppet DSL is imperative.
Yes, the hiera data can become quite large, but I find it much easier to manage data in hiera than coding modules with associated logic, parameters, templates, etc. Is this suitable for hyper-scale deployment? Maybe not, but for a few hundred servers with a few dozen configuration variants it seems to work nicely. Is everyone else using puppet actually happy with the roles/profiles method?
I also think team size and composition is a big factor. If I was in a team of one or two people I'm sure I'd be saying "Yeah! Hiera! I can change anything really really easily!". If I was in a team of a dozen engineers geographically spread across the world with vastly different levels of Puppet knowledge I think I'd be saying "Oh god... Everything's in Hiera... It's so easy for someone to mess up. What on earth has someone changed now". If you haven't guessed already, I've been here before.
Personally I think the most useful part of the Role Profile design pattern is the encapsulation of implementation details behind business-specific Profiles. Jesus, what a mouthful. How about "hiding away all the details behind an interface that makes sense to me and my team"?
Best demonstrated with a real life example we use here...The above is the Profile for an LMAX "Statistics Collection Server". A statistics collection server collects statistics. If someone wants to collect statistics, all they have to do is put:include ::profile::statistics::collection_serverSomewhere in a node definition and set *AT MOST* nine Hiera parameters for that Profile. That's the real win - an LMAX statistics collection server has only 9 parameters that can be changed. They don't really have to understand exactly what goes into building a Statistics Collection Server if they don't want to (in practice they might need to browse the code to check what a parameter does though, because we are lazy and don't document our Profiles).If you go read that profile in detail you'll see I pull in several component modules: Puppetlabs Apache, Influxdb, a private LVM module that's a wrapper for Puppetlabs' LVM, Grafana, and Chronograf. Apache (with SSL) is set up to proxy Grafana and Chronograf. Our LVM module creates the file system underneath Influx before is installed. Most of the parameters to the component modules are hard coded, and this is a great thing because that means every single one of our Statistics Collection Servers are exactly the same. I even pull in a (private) Nagios module to define monitoring resources, so when one of my Engineers uses that profile they get the monitoring _automatically_.I count 81 parameters to component modules in that Profile, so that would be at least 81 lines of Hiera needed to reproduce that functionality in YAML (and even then, good luck ensuring that the LVM disk is there before Influx is installed). I have condensed that to 9 possible parameters where I think someone should legitimately change something. Otherwise, you use my defaults, and that keeps things the same, reducing entropy across our estate. Yes, writing this profile took a lot longer than doing it in YAML, but our engineers shouldn't need to "figure out" how to build an InfluxDB server ever again.
Another big win for me: testing. I can write puppet-rspec units tests for the above Profile to make sure that if someone tries to change it, they don't break any existing functionality. Our current workflow has our engineers committing onto branches and creating Merge Requests in our private Git Lab. All tests must pass before they can merge code to Master. They usually get notified within minutes if something they've pushed hasn't passed tests.
You can do testing of Hiera-defined infrastructure, however all approaches I've read about seem awfully cumbersome and wasteful. I won't rant about that today.
So tell me, how did I go at convincing you? :-)
Lets say a module has 10 parameters and supplies defaults for most of them. When writing a profile you have to choose how many of the class parameters can remain defaults, how many to override, and how many to expose as profile parameters. It's sounds fine to limit the number of parameters at the profile, right up until you hit an edge case that doesn't work with the default values and the parameter you need to change now requires a profile update...
Hi Luke. Thanks for a thoughtful and detailed response.
I'd like to think I grasp the roles/profiles concept, but am just not convinced it's a better approach. Abstracting away configuration details and exposing a limited set of parameters results in uniform configurations. In doing so it also seems it limits flexibility and ensures that you'll continue to spend a good deal of time maintaining your collection of profiles/modules.
Speaking of hiera tricks, I created an exec resource with the command defined as a multi-line script to include variables and function declarations. I use this to collect data and create local facts. The next puppet run creates additional resources based on the presence of these facts. This is basically the same as creating a module with external facts, but doesn't require a module. An upside is that the fact script doesn't need to execute on every puppet agent run, with the downside being that the host takes a second puppet run to create all resources. I'm not sure if I should be proud or ashamed of what I did, but it works!
This may be the greatest factor to influence the decision. In my case we have 2 people working with puppet, and the system we're building is to be handed over to team with little to no puppet experience. This system runs at a single site with only a couple hundred managed nodes and maybe a couple dozen unique configurations.
Well, you have caused me some guilt that maybe I've taken the easy way out rather than becoming more proficient with puppet. Once you've had that first hit and instant high from the hiera crack pipe... it's hard not to go back.
1) create_resources() is a bit of a kludge left over from puppet 3. Starting in puppet 4 (and 3’s future parser), iteration was added. Instead of create_resources($some_hash), you would say $some_hash.each |$title, $options| {} and create each resource inside the block. You can still use hiera to get the hash as an automatic parameter lookup on the class, but the creation of resources is a bit more explicit.
2) you also get the chance to define defaults, which means users don’t necessarily have to provide everything! Create a $defaults hash and assign it plus the defined overrides as (say for a user) user {$title: * => $defaults + $options}. This merges the options and defaults and applies the resulting hash as the parameters and values for the resource. You can keep your hiera tidier by creating sane defaults and only specifying the overrides in hiera. Have a new default? Modify the code once and all the resources in hiera benefit from it, unless they explicitly override it.
A practical example of this might be creating local users on nodes without access to a central auth mechanism, maybe in staging. In your code you create some defaults:
$defaults = {
ensure => present,
password_max_age => 90,
shell => ‘/bin/fish’,
}
Your hiera might look like:
profile::linux::local_users:
rnelson0:
password: ‘hash1’
groups:
- wheel
password_max_age: 180
root:
password: “hash2”
password_max_age: 9999
lbigum:
ensure: absent
In your code, you iterate over the class parameter local_users and combine your defaults with the specific options:
$local_users.each |$title, $options| {
user { $title:
* => defaults + $options,
}
}
Now my user is created, root’s password is changed and set to basically never expire, and Luke’s account is deleted if it exists.
This is a good way to combine the power of hiera with the predictability of puppet DSL, maintain unit and acceptance tests, and make it easy for your less familiar puppet admins to manage resources without having to know every single attribute required or even available in order to use them without going too far down the road of recreating a particular well known CM system. It’s always a bit of a balancing act, but I find this is a comfortable boundary for me and one that my teammates understand.
There a lot more power to iteration that can be found in the puppet documentation and particularly this article by RI that I still reference frequently https://www.devco.net/archives/2015/12/16/iterating-in-puppet.php
> Good points and a nice example. In the case of my basic module I'm currently using a separate create_resources line for each class parameter. Is there a way to iterate over all class parameters using each() so I can use a single nested loop to create everything?
You can - add an extra tier to the hash with the first level being the resource name and then create a default hash with a key for each type you use - but I simply don’t think it scales, especially once you need to merge data from multiple layers of hiera. Even the deepest merge will, to my knowledge, end up replacing and not augmenting the hash values under each key.
A deep merge will merge in the the new key ‘package’, but *replace* the ‘user’ key, resulting in rnelson0 and appuser everywhere but only localbackups on node ‘somenode’. Because of this, it’s not as flexible as you’d think. You can see more detail at https://puppet.com/docs/puppet/5.0/hiera_merging.html (can’t find the 6.x link but to the best of my knowledge, it works the same).
It also doesn’t scale because you’re writing YAML not code, as Luke suggested earlier. Testing is difficult, and troubleshooting is difficult, and ordering is even more difficult. If you want to, say, add a repo and make sure it’s managed prior to any packages, you’re gonna have to spell out the ordering in your YAML, whereas something like ‘Repo <| tag == “class” |> -> Package <| tag == “class” |>’ within a class can set that ordering only for the related resources much more easily.
The last thing I’d point out is that composition is a really good pattern, and a one-class-does-it-all is an anti-pattern to that. Doing just what you need in a series of single, small classes allows you to easily compose a desired state through a role that includes the relevant, and just the relevant, classes. Within each profile, you should be able to delineate much of the specifics, rather than dynamically determine them at runtime via a superclass.
Perhaps a question to ask is, how opinionated are your profiles, and how opinionated should they be? IMO, very, and that would probably lower the number of resources you need to dynamically define.
Since reading the reasoning here I've continued to think about this off and on and still have a hard time with the idea of hard-coding configuration. It seems like a bit of a paradox within puppet. When writing modules it is generally accepted to separate any configuration data from the module code, but when writing profiles go ahead and hard code as many values as possible. I've been trained to think that separating data from code is a "Good Thing", so going counter to that makes me question my own existence.