On Sun, Mar 08, 2015 at 11:55:03AM -0700, Bostjan Skufca wrote:
> With hiera:
> - How would you go about when certain nodes need data merged from all
> scopes, but other nodes need data from just the last scope?
I've usually had a "classname::merge: true" key in hiera, controlling whether I use hiera() or hiera_hash() to obtain the data I need.
In your position I might try doing hiera('myclass::data', fail()) to mimic a class parameter with no default, in case there was no sensible default and catalog compilation should fail without this data. If I recall correctly, a failed hiera*() lookup function just means a variable set to undef. That's if a default isn't fine, of course.
Of course, maybe the disparate syslog infrastructure is a sign that things have become tangly and you need to prune syslog listeners a bit? Or, to rephrase, maybe spend the time correcting your syslog infrastructure rather than dealing with it in puppet?
On Monday, 9 March 2015 14:45:38 UTC+1, Christopher Wood wrote:On Sun, Mar 08, 2015 at 11:55:03AM -0700, Bostjan Skufca wrote:
> With hiera:
> - How would you go about when certain nodes need data merged from all
> scopes, but other nodes need data from just the last scope?
I've usually had a "classname::merge: true" key in hiera, controlling whether I use hiera() or hiera_hash() to obtain the data I need.And this hits the nail on the spot, even if unknowingly:)The problem I am seeing here and which I am only now being able to articulate, is the clash of two contradictory elements:1. Puppet development is pushed towards decoupling code (manifest) from data, a noble goal2. Puppet provides two functions, hiera() and hiera_array(), and the very existence of more than one function to retrieve data destroys the notion, that code should be unaware of underlying data storage details.
> Puppet in fact provides three functions functions for lookups: there is
> also hiera_hash().
>
> In any case, you are quite right. Which sort of lookup is intended is an
> attribute of the data -- part of the definition of each key -- but it is
> not represented in or alongside the data. Each user of the data somehow
> has to know. That could be tolerated, inconvenient as it is, except that
> it is incompatible with automated data binding. This is an issue that has
> been recognized and acknowledged, though I'm uncertain whether it is
> actively being addressed.
Could you possibly expound on the "Each user of the data somehow has to know" part? I'm having trouble with the notion that people would use puppet manifests and hiera data without knowing what's in them.
(Replying to two people in one email, hum.)
I rather take your point, but isn't the requirement for different data handling just another data item? Is any code unaware of the underlying data structure? Even if you have a single type of data (plain string-like variables) your code is implicitly aware that it can treat them as that type.
I'm not really sure there's a way to automagically distinguish
"this is an array, do not retrieve its contents from all levels"
"this is an array, do retrieve its contents from all levels"
while still preserving our sanity.
(I've had some nasty run-ins with merging lookups and have decided they're mostly not for me, maybe the smarter people on this list are having better results.)
On Wednesday, March 11, 2015 at 1:57:00 PM UTC, Christopher Wood wrote:
Could you possibly expound on the "Each user of the data somehow has to know" part? I'm having trouble with the notion that people would use puppet manifests and hiera data without knowing what's in them.
I can't speak for John but I think I get his meaning, but if I don't, here's my own opinion ;-)
If a user of a module is reading that module's documentation and parameters, it seems a bit nasty to assume they user must also go read the Puppet module code in great detail to find out what type of Hiera call is being used. Passing data to the module should be simply defined, eg: "this parameter takes an array" or "this parameter is a comma separated string". For a module to assume that it can or should attempt to do some sort of deep merging seems overly complicated and it shifts the focus away from the user providing the right data to a well written module.
Rather than have "classname::merge => true" I would advocate something like this which puts the user in complete control of the data reaching it's modules in a correct and easily testable manner:
class 'profile::dns' {
#lookup my DNS data
$hiera_dns_server_array = hiera_array('dns::server')
$common_dns_server = '127.0.0.1'
class { 'resolv':
dns_servers => [ $hiera_dns_server_array, $common_dns_server ]
}
Something like this seems like I'm telling a module *how* to look up my own data, rather than passing the right data to the module:
class resolv (
$dns_servers_key_name = 'dns_servers',
$dns_servers_key_merge = false,
) {
if ($dns_servers_key_merge) {
$dns_servers = hiera_array($dns_servers_key_name)
} else {
$dns_servers = hiera($dns_servers_key_name)
}
}
class { 'resolv': dns_servers_key_merge => true }
I'd also have to code it to selectively use Hiera or not (some people don't) and that would get even worse. The second example of module design may be super awesomely flexible in terms of how I can structure my Hiera data, but it doesn't fit the direction the community is moving in terms of module design.
The second example of module design may be super awesomely flexible in terms of how I can structure my Hiera data, but it doesn't fit the direction the community is moving in terms of module design.
Something like this seems like I'm telling a module *how* to look up my own data, rather than passing the right data to the module:
class resolv (
$dns_servers_key_name = 'dns_servers',
$dns_servers_key_merge = false,
) {
if ($dns_servers_key_merge) {
$dns_servers = hiera_array($dns_servers_key_name)
} else {
$dns_servers = hiera($dns_servers_key_name)
}
}
class { 'resolv': dns_servers_key_merge => true }
I'd also have to code it to selectively use Hiera or not (some people don't) and that would get even worse. The second example of module design may be super awesomely flexible in terms of how I can structure my Hiera data, but it doesn't fit the direction the community is moving in terms of module design.This is almost what I am looking for. I have an alternate approach: what if merging vs nonmerging is decided based on hiera key?
On Wednesday, March 11, 2015 at 4:35:36 PM UTC, Bostjan Skufca wrote:
Something like this seems like I'm telling a module *how* to look up my own data, rather than passing the right data to the module:
class resolv (
$dns_servers_key_name = 'dns_servers',
$dns_servers_key_merge = false,
) {
if ($dns_servers_key_merge) {
$dns_servers = hiera_array($dns_servers_key_name)
} else {
$dns_servers = hiera($dns_servers_key_name)
}
}
class { 'resolv': dns_servers_key_merge => true }
I'd also have to code it to selectively use Hiera or not (some people don't) and that would get even worse. The second example of module design may be super awesomely flexible in terms of how I can structure my Hiera data, but it doesn't fit the direction the community is moving in terms of module design.This is almost what I am looking for. I have an alternate approach: what if merging vs nonmerging is decided based on hiera key?That is my approach, that class would do an implicit Hiera lookup for those class parameters, I just illustrated the point with a resource-like declaration as an example. While the above method would work, I don't think I've made my point about not putting this personalised logic in the "resolv" module itself. The above example is not so good. Gary Larizza explains it very well here if you haven't seen it (https://www.youtube.com/watch?v=v9LB-NX4_KQ). That video should answer your questions in your second reply to me too, BTW.
The above code example is a bad idea for these reasons:- the resolv module is tightly coupled to the data, it's in control of how it should look up data, rather than just be *given* data- you won't be able to replace that resolv module with the super awesome puppetlabs_resolv module because of your custom way of handling data- it makes a *very* bad assumption that everyone uses Hiera, it is not compatible for people who use ENCs that supply all class parameters for example- there's a higher barrier to entry on understanding the module, some people would have to read the body of the resolv module code to figure out what's going on (or there would be a long README)- it's more complicated to test because the range of data it can take is more complicated
Now expand on my first example:********************class puppetlabs_resolv($dns_servers) {file { '/etc/resolv.conf': content => template(...) }}class profile::dns_base {#lookup my DNS data from Hiera$hiera_dns_server_array = hiera_array('dns::server')#and add a global DNS server I have$common_dns_server = '127.0.0.1'class { 'puppetlabs_resolv':dns_servers => [ $hiera_dns_server_array, $common_dns_server ]}}class profile::dns_special {#don't do a hiera lookup, DNS here is special$special_dns = '10.1.1.1'class { 'puppetlabs_resolv':dns_servers => [ $special_dns ]}}node dc1 { include profile::dns }node dc1_special { include profile::dns_special }********************The puppetlabs_resolv module I downloaded from GitHub does one thing well, resolv.conf, in a simple and easily understood manner, and it comes with Rspec tests, so I don't have to reinvent the wheel.All of my business logic about how I get IP addresses into that resolv module is in my profile::dns* classes. These are *my* profile classes, I can do whatever crazy Hiera lookups and string manipulation I want/need to get the data into a format that puppetlabs_resolv takes. In other words my profiles are the "glue" between my data and the "building block" puppetlabs_resolv module. At any time I can replace puppetlabs_resolv with lukebigum_resolv (which is obviously better) with a few tweaks to my profiles. If I replace my data backend or get rid of Hiera entirely, my profile might have to be adjusted but I don't have to stop using that awesome lukebigum_resolv I downloaded.
Why the use of a second profile, profile::dns_special? It takes complexity out of Hiera. I don't need a complicated Hierarchy when I've got profiles, and I rarely need inheritance at all. I've got my "tpl_%{::domain}" which is where my profile::dns looks up data from, and anything that's special is actually a different implementation of how I usually do DNS, so it gets it's own profile, hence profile::dns_special. It is better to handle these exceptions in Puppet code because it's an *actual* language, rather than trying to model something complex into Hiera which is just a key-value store.
Your Hiera example where you have tpl_dc1.yaml and tpl_dc1-special.yaml is going to bite you. Your joke about mimicking node inheritance functionality in Hiera worries me a little, because it reminds me of some of my colleagues. Just because it can be modelled in Hiera, doesn't mean it should be. To give you an example, at my work place we can build an entire platform where each node's Hiera file looks like this:---ip_address_fourth_octet: 10And the rest is abstracted, inherited and hidden away. In some ways it's really awesome, but it is also very hard to debug, and extraordinarily hard to understand. I once spent 2 hours tracing a string in a configuration file through too many Hiera files each with over a dozen levels of dictionary/hash depth, about 7 create_resource() calls, several exported resources and luckily only 3-4 recursive Hiera lookups. I was not happy by the end of that. Not long after my team lead forced us to re-read the Roles and Profiles design pattern and to watch that video ;-)
(Replying to two people in one email, hum.)
On Wed, Mar 11, 2015 at 06:01:39AM -0700, jcbollinger wrote:
> On Tuesday, March 10, 2015 at 9:59:41 PM UTC-5, Bostjan Skufca wrote:
>
> On Monday, 9 March 2015 14:45:38 UTC+1, Christopher Wood wrote:
>
> On Sun, Mar 08, 2015 at 11:55:03AM -0700, Bostjan Skufca wrote:
> > With hiera:
> > - How would you go about when certain nodes need data merged from
> all
> > scopes, but other nodes need data from just the last scope?
>
> I've usually had a "classname::merge: true" key in hiera, controlling
> whether I use hiera() or hiera_hash() to obtain the data I need.
>
> And this hits the nail on the spot, even if unknowingly:)
> The problem I am seeing here and which I am only now being able to
> articulate, is the clash of two contradictory elements:
> 1. Puppet development is pushed towards decoupling code (manifest) from
> data, a noble goal
> 2. Puppet provides two functions, hiera() and hiera_array(), and the
> very existence of more than one function to retrieve data destroys the
> notion, that code should be unaware of underlying data storage details.
I rather take your point, but isn't the requirement for different data handling just another data item?
Is any code unaware of the underlying data structure? Even if you have a single type of data (plain string-like variables) your code is implicitly aware that it can treat them as that type.
I'm not really sure there's a way to automagically distinguish
"this is an array, do not retrieve its contents from all levels"
"this is an array, do retrieve its contents from all levels"
while still preserving our sanity.
> Puppet in fact provides three functions functions for lookups: there is
> also hiera_hash().
>
> In any case, you are quite right. Which sort of lookup is intended is an
> attribute of the data -- part of the definition of each key -- but it is
> not represented in or alongside the data. Each user of the data somehow
> has to know. That could be tolerated, inconvenient as it is, except that
> it is incompatible with automated data binding. This is an issue that has
> been recognized and acknowledged, though I'm uncertain whether it is
> actively being addressed.
Could you possibly expound on the "Each user of the data somehow has to know" part? I'm having trouble with the notion that people would use puppet manifests and hiera data without knowing what's in them.
On Wed, Mar 11, 2015 at 09:25:04AM -0700, Bostjan Skufca wrote:
> On Wednesday, 11 March 2015 14:57:00 UTC+1, Christopher Wood wrote:
> (I've had some nasty run-ins with merging lookups and have decided
> they're mostly not for me, maybe the smarter people on this list are
> having better results.)
>
> Care to elaborate a bit, especially how did you overcome them (define all
> data for each node)?
> b.
My desirable behaviour for the puppetmaster compiling the catalog for an unlicenced host is to error out and fail the catalog compilation, highlighting the missing data at the earliest possible stage of the build. (Your mileage may vary, my rationale is that a server without all its requirements in place should not build. The "Finished catalog run" seems to instill a bit of false confidence that everything worked.)
On Thu, Mar 12, 2015 at 06:32:21AM -0700, jcbollinger wrote:
> No, it is metadata. The metadata could be lumped in with the data the
> regular data -- and in fact, the default back end provides no other
> alternative if you want to provide that metadata at all -- but that's
> untidy, and it doesn't play nicely with automated data binding.
>
>
>
> Is any code unaware of the underlying data structure? Even if you have a
> single type of data (plain string-like variables) your code is
> implicitly aware that it can treat them as that type.
>
> You're commingling two different concepts: the structure of the data
> provided by Hiera to Puppet, and the structure of the data in the external
> storage on which Hiera relies. Puppet needs to know about the former, but
> it shouldn't have to know or care about the latter. THAT's the whole
> point. The fact that there are three different Hiera lookup functions,
> and that they can return different data for the same key -- even data with
> different structure -- makes Puppet sensitive to the internal layout of
> Hiera's data files.
I grant that I'm not seeing the whole picture; I'm perfectly fine with the notion that code/data/metadata/structure are all subsets of the information required to correctly manage a host. I presume structure has to go somewhere and if it's not in the pp file it's just somewhere else I will have to know about and account for so I'm not really seeing what difference it makes. For instance, what breaks with the current thing that wouldn't if puppet just got data and the hiera_array vs hiera_hash determination was made elsewhere?
> And it would be possible. For example, the YAML back end could be
> modified to refer to an ancillary metadata file that flagged certain keys
> for array or hash-merge lookup. That's a bit ugly, but sometimes ugly
> happens when you have to retrofit.
I don't know that this is better or worse than having structural information about hiera in my pp files. I go from:
having two places where things go (hiera and puppet)
having structural information in each (yaml anchor/alias etc., puppet data bindings and hiera functions)
To:
having three places where things go (hiera, hiera metadata, puppet)
having structural information in two (yaml anchor/alias etc., hiera key flagging)
I've added a place and now I have more to think about, plus it's not obvious from my puppet code where my data is coming from. and I have a lab host where I don't actually want things tagged as merge-only to be merged while I'm experimenting. Ouch my brain.