hiera level explosion

Darragh Bailey

unread,

Mar 20, 2017, 11:09:48 AM3/20/17

to Puppet Users

Hi,

Looking at how our hiera levels are already exploding due to some preferences, I'm wondering how others describe use hiera.

We have a preference to group related data within separate files, however some colleague concerns about using '%{module_name}' and '%{calling_class}' means that for each separate application related class within our main module we end up with a dedicated level in hiera.

While our existing hierarchy doesn't quite look like the following, once we've migrated to using a eyaml backend (in addition to the current yaml backend) instead of a separate restricted access git repo I expect to see it look like the following:

:hierarchy:
- "node/%{::domain}/%{::hostname}"
- "gerrit"
- "database"
- "jenkins"
- "server"
- "web"
- "%{calling_class}"
- "%{module_name}"
- defaults

Tbh, I'd favour simply doing something like the following:

:hierarchy:
- "node/%{::domain}/%{::hostname}"
- "%{calling_class}"
- "%{module_name}"
- defaults

And have anything in 'gerrit', 'database', 'jenkins', 'server' & 'web' that needs to be accessible by other classes placed in 'defaults' and for anything specific to that class simply put in a name that is picked up by '%{calling_class}'.

However there are concerns that it's difficult to remember that data is only visible to the associated class/module when made accessible under '%{calling_class}' and '%{module_name}', and I think '%{module_name}' goes away in hiera 5 or at least it's deprecated and support for it will be removed in hiera 6.

What concerns me however is whether there is a performance impact of creating lots of levels to keep data nicely separately on a service/application basis in the name of keeping it easy to understand.

Do others simply use a single file? Or do you favour use of '%{module_name}', '%{calling_class}', and/or '%{calling_class_path}'? If so what are your plans around hiera's future behaviour?

Any clues on assessing the performance impact of either approach? I doubt it currently makes much difference, but I'm sure as we add more and more puppet code to manage additional services/applications and consequently many more levels this will have to start impacting at some point.

Perhaps it makes more sense to have these in separate files and then a additional step to the deployment that combines the application specific files into a single yaml entry to be used by hiera. Giving us separation at the source/review level and simple single file at the point of usage to ensure good performance.
It also seems to more in line with hiera as these application specific files are not really separate levels of hierarchy, they are just separated for human reading convenience.

Anyone care to provide some insight:
Have you encountered this?
Do you just stick everything for different services/applications in the same file?
Does that isolate which puppet modules/classes where that data is accessible from?
Do you prefer explicit isolation though using the special variables? and just trust that people remember these are not visible everywhere?

--
Darragh

Rob Nelson

unread,

Mar 20, 2017, 11:30:12 AM3/20/17

to puppet...@googlegroups.com

If you're looking up hiera data based on the calling class, I'd question whether that's useful to split out to hiera at all - every instance of the class would get the same values, right? And would you really want ALL nodes that `include jenkins` to get the same jenkins server? Even if they're in DCs on opposite sides of the world supporting different groups?

It's more likely that you want to use some facts about the nodes - datacenter, network, owning organization, etc. - to provide that data. Your hierarchy should be modeled after that. Mine is:

:hierarchy:
- "clientcert/%{clientcert}"
- "puppet_role/%{puppet_role}"
- "osfamily-release/%{osfamily}-%{operatingsystemmajrelease}"
- "datacenter/%{datacenter}"
- global

Clientcert lets individual nodes override settings groups normally inherit; puppet_role is a custom fact reflecting a service like AppX, AppY, DHCP, DNS, etc; the next tier is OS since we run a few versions that often require different values; next is the datacenter, where routes and DNS and such might differ; and finally global is things standardized across the board (called 'common' in default installations). IMO, the only tier that should reference a single filename would be global/common, anything else doing so is really just replicating that tier higher up the stack and adding complexity. I'm sure there's some valid use case for it, though.

There's some perf impact when you have more tiers, but hiera lookups don't have a high enough cost for us to worry about it. There is a cost to architecting and maintaining additional tiers and that's my main concern. You can only keep so much in your head, so it's easy to lose track of where things are configured and where they should be configured, and of course it affects troubleshooting times as well.

I believe that answers some of your questions and obviates the need for answers to others.

Rob Nelson
rnel...@gmail.com

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/8d142857-c985-4902-9346-aaeb577dc2e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Darragh Bailey

unread,

Mar 21, 2017, 12:27:44 PM3/21/17

to Puppet Users

Hi Rob,

Thanks for some of the suggestions, I suspect part of our problem is that we have multiple applications/services deployed per node using docker containers.

We probably can't easily map a single role per machine because we want to be able to move the application/services between machines relatively easily. This kind of suggests this should all be in a single layer such as the datacenter one suggested. Although our usage of the '%{calling_class}' is analogous to your '%{puppet_role}' above, see comment below.

On Monday, March 20, 2017 at 3:30:12 PM UTC, Rob Nelson wrote:

If you're looking up hiera data based on the calling class, I'd question whether that's useful to split out to hiera at all - every instance of the class would get the same values, right? And would you really want ALL nodes that `include jenkins` to get the same jenkins server? Even if they're in DCs on opposite sides of the world supporting different groups?

Have a tendancy to use a specific class for a specific instance that can then 'include jenkins'. To properly make use of calling_class across the board to collapse our levels, would need to rename the yaml files so jenkins -> local_module_name::my_jenkins, gerrit -> local_module_name::my_gerrit, etc.

This meant that the information would only be available when configuring that specific service and if we needed two of them, we would end up creating two separate classes wrapping the same specific base class in order to pull in the desired info.

Seems to follow the idea of similarly the information for specific roles is only available to those systems with the given roles.

It's more likely that you want to use some facts about the nodes - datacenter, network, owning organization, etc. - to provide that data. Your hierarchy should be modeled after that. Mine is:

:hierarchy:
- "clientcert/%{clientcert}"
- "puppet_role/%{puppet_role}"
- "osfamily-release/%{osfamily}-%{operatingsystemmajrelease}"
- "datacenter/%{datacenter}"
- global

Clientcert lets individual nodes override settings groups normally inherit; puppet_role is a custom fact reflecting a service like AppX, AppY, DHCP, DNS, etc; the next tier is OS since we run a few versions that often require different values; next is the datacenter, where routes and DNS and such might differ; and finally global is things standardized across the board (called 'common' in default installations). IMO, the only tier that should reference a single filename would be global/common, anything else doing so is really just replicating that tier higher up the stack and adding complexity. I'm sure there's some valid use case for it, though.

This is closer to a layout I had expected to be used, and it was the desired to split up some of the information into management chunks that drove the separate files.

There's some perf impact when you have more tiers, but hiera lookups don't have a high enough cost for us to worry about it. There is a cost to architecting and maintaining additional tiers and that's my main concern. You can only keep so much in your head, so it's easy to lose track of where things are configured and where they should be configured, and of course it affects troubleshooting times as well.

I believe that answers some of your questions and obviates the need for answers to others.

Thanks, I've a feeling we'll have to think about the hiera layers some more and how data is organized in order to get a better handle on it.

Rob Nelson
rnel...@gmail.com

Henrik Lindberg

unread,

Mar 21, 2017, 5:32:54 PM3/21/17

to puppet...@googlegroups.com

Both %{module_name} and %{calling_class} are going away - you cannot use
them when you are switching from legacy hiera 3 style hiera.yaml to
hiera 5 style hiera.yaml. Support for those variables will be removed in
Puppet 6.0.0 where hiera 3 backwards compatibility will be dropped (at
least that is what we think now, but it depends on several factors).

The main problem with interpolation of those "dynamic values" into paths
is that the value for a given key changes during the cause of the
compilation.
This creates a performance problem (all caches have to be evicted), and
it makes it a lot harder to debug since the value of a key - say x::y is
different depending on where it is obtained. Such designs should be
avoided as they are confusing and hard to maintain.

There are several new mechanisms in hiera 5 that can be used for various
purposes - hard to say exactly what you would be using as it depends on
what you are trying to achieve with your current design (why is it there
in the first place, who gets to change what were, how to you review and
do QA on data, etc. etc).

Best,
- henrik

> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send

> an email to puppet-users...@googlegroups.com
> <mailto:puppet-users...@googlegroups.com>.

> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/8d142857-c985-4902-9346-aaeb577dc2e6%40googlegroups.com

> <https://groups.google.com/d/msgid/puppet-users/8d142857-c985-4902-9346-aaeb577dc2e6%40googlegroups.com?utm_medium=email&utm_source=footer>.