Hiera Hash Merge - Avoiding Duplicating

571 views
Skip to first unread message

Dan Gibbons

unread,
Apr 29, 2015, 5:39:39 AM4/29/15
to puppet...@googlegroups.com
Hi,

I'm just starting to use create_resources and hash merging which I have working but I'm not sure how I can avoid duplicating some of the hash values further up in the hierarchy.

For example:

- I have set :merge_behavior: deeper and installed the deep_merge gem
- I'm using the eyaml backend

# hiera.yaml (partial)
:hierarchy:
  - "node/%{::fqdn}"
  - "%{::environment}"
  - "%{::location}"
  - "%{::flavour}"
  - common


# common.eyaml
---
windows_webconfig:
  website1:
    name: MyWebSiteExample
    32_bit: false
    pipeline_mode: Integrated
    runtime_version: v4.0
    root_web_folder: 'd:\webroot'
    bindings:
      -
        port: 80
        host_header: dan.local.com
        ip_address: *
        protocol: http 
  website2:
    name: MyWebSiteExample2
    32_bit: false
    pipeline_mode: Integrated
    runtime_version: v4.0
    root_web_folder: 'd:\webroot'
    bindings:
      -
        port: 80
        host_header: dan.somewhere.com
        ip_address: *
        protocol: http 


Now what I'd like to do is only update the bindings for particular environments and not have to duplicate the whole hash, something like this:

# uat.eyaml
---
windows_webconfig:
  website1:
     bindings:
      -
        port: 80
        host_header: dan.uat.com
        ip_address: *
        protocol: http 
  website2:
     bindings:
      -
        port: 80
        host_header: dan.uat.com
        ip_address: *
        protocol: http 
 


The above (uat.eyaml) doesn't work unless I replicate the whole hash so I was wondering if this is by design of the merging and if there is another way I can achieve my aim of not duplicating the site definitions.  I experimented with separating the "bindings" out into a separate hash and passing this into create_resources but it's kind of messy and doesn't work very well.

Thanks in advance.

Dan


Dan White

unread,
Apr 29, 2015, 8:30:46 AM4/29/15
to puppet...@googlegroups.com
Have you tried plain YAML ?
Your code looks OK, but I cannot be certain without tinkering. My initial thought is that the eyaml backend may be to blame. 

"Sometimes I think the surest sign that intelligent life exists elsewhere in the universe is that none of it has tried to contact us."

Bill Waterson (Calvin & Hobbes)

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/9936a2a4-0dd7-44ec-b286-919b999448a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jcbollinger

unread,
Apr 29, 2015, 9:22:30 AM4/29/15
to puppet...@googlegroups.com
Like LinuxDan, I don't immediately see an issue with your YAML, and I think it should support hash merging in the way you want.  The issue might very well be with the (3rd-party) eyaml back end.  Hiera's architecture makes each back end responsible for all aspects of translating the keys presented to it into values.  For some, hash merging doesn't even make sense.

However, if you have updated any Ruby components used by the master, then do also be sure to restart it afterward.  The master has means to switch out Ruby classes once it has loaded them.


John

Joseph Swick

unread,
Apr 29, 2015, 9:25:17 AM4/29/15
to puppet...@googlegroups.com
On 29/04/15 08:30, Dan White wrote:
> Have you tried plain YAML ?
> Your code looks OK, but I cannot be certain without tinkering. My initial thought is that the eyaml backend may be to blame.

So this is an issue I've run into on a couple of machines. As far as I
can tell, it's not eyaml, I found this reference to the issue with
relation to the automatic data bindings for hiera and hash merging:

https://tickets.puppetlabs.com/browse/HI-118

It's been a bit of an issue for a couple of machines for us. As far as
I've been able to tell, for where we do an actual heira lookup call it's
not affected, just for automatic bindings to parameters.


> "Sometimes I think the surest sign that intelligent life exists elsewhere in the universe is that none of it has tried to contact us."
> Bill Waterson (Calvin & Hobbes)
>
>> On Apr 29, 2015, at 5:39 AM, Dan Gibbons <dangibb...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm just starting to use create_resources and hash merging which I have working but I'm not sure how I can avoid duplicating some of the hash values further up in the hierarchy.
>>
<trim>

--
Joseph Swick <joseph...@meltwater.com>
Senior SaaS Operations Engineer
Meltwater Group

signature.asc

jcbollinger

unread,
Apr 30, 2015, 9:30:43 AM4/30/15
to puppet...@googlegroups.com


On Wednesday, April 29, 2015 at 8:25:17 AM UTC-5, Joseph Swick wrote:
On 29/04/15 08:30, Dan White wrote:
> Have you tried plain YAML ?
> Your code looks OK, but I cannot be certain without tinkering. My initial thought is that the eyaml backend may be to blame.

So this is an issue I've run into on a couple of machines.  As far as I
can tell, it's not eyaml, I found this reference to the issue with
relation to the automatic data bindings for hiera and hash merging:

https://tickets.puppetlabs.com/browse/HI-118



If you are talking about automated data binding then you threw us off by claiming that you were doing hash merging.  You are not.  Automated data binding uses only priority lookups, which is indeed the subject of HI-118.  Some time ago I recorded my thoughts about that as a comment on that issue, and I will not repeat them here.  At this point, however, if you want a hash-merge lookup then you must use the hiera_hash() function to get it.  That's its whole point.

 
It's been a bit of an issue for a couple of machines for us.  As far as
I've been able to tell, for where we do an actual heira lookup call it's
not affected, just for automatic bindings to parameters.



It seems like you may be working under a distressingly common misconception: the choice between functions hiera(), hiera_array(), and hiera_hash() is not about the expected data type of the result; rather, it is about the lookup strategy:
  • The hiera() function performs an ordinary priority lookup, and it can return data of any type supported by Hiera.  That certainly includes arrays and hashes.
  • The hiera_array() function performs an array-merge lookup to yield all values associated with the key from all levels of the hierarchy, in a flattened array form.  At any given hierarchy level, the values do not have to be arrays.  The overall result is an array, but that's not the point.  You should not use this function to retrieve arrays if you want a standard priority lookup.
  • The hiera_hash() function performs a hash-merge lookups, the nature of which you seem to understand.  Each value associated with the key at any hierarchy level must be a hash for this to make sense, but you should not use this function to retrieve hashes if you want a standard priority lookup.
Priority lookup is Hiera's focus and default, and automated data binding uses that mode exclusively.  If you want hash-merge lookup then you must call hiera_hash() explicitly in your manifest.


John

Joseph Swick

unread,
Apr 30, 2015, 1:10:19 PM4/30/15
to puppet...@googlegroups.com
On 30/04/15 09:30, jcbollinger wrote:
>
<trim of summary of heira lookups>
>
> Priority lookup is Hiera's focus and default, and automated data binding
> uses that mode exclusively. If you want hash-merge lookup then you must
> call hiera_hash() explicitly in your manifest.
>
>
> John
>

John,
Here's a simplified example of one of my issues. I'm using the Puppet
Labs MySQL module to mange the base configuration of a MySQL
master/slave pair (replication is setup manually after inital
provisioning). Our hiera config is set for deeper merges. The behavior
I expect is that I should be able to set common entries items for both
server's "mysql::server::override_options:" parameter hash at a lower
heira level and put the server specific override options at the host
specific level. However, how automatic data bindings work it only takes
the highest priority hash.

Example Config (this doesn't work as intended for automatic databindings):

site.pp:
node default {
hiera_include('classes')
}

hiera.yaml ('server_role' is a custom fact based on hostname):
---
:backends: yaml
:yaml:
:datadir: "/etc/puppet/environments/%{::environment}/hieradata"
:hierarchy:
- "%{::clientcert}"
- "%{::environment}-%{::server_role}"
- common
:merge_behavior: deeper


dev-db.yaml:
---
classes:
- mysql::server
mysql::server::remove_default_accounts: true
mysql::server::root_password: some_password
mysql::server::override_options:
mysqld:
default_storage_engine: InnoDB
innodb_file_per_table: 1
bind_address: 0.0.0.0
log_bin: mysql-bin
binlog_format: mixed
expire_logs_days: 2
datadir: /var/lib/mysql
innodb_flush_log_at_trx_commit: 1
sync_binlog: 1
max_connections: 2000


dev-db-01.mydomain.net.yaml:
---
mysql::server::override_options:
mysqld:
server_id: 1

dev-db-02.mydomain.net.yaml:
---
mysql::server::override_options:
mysqld:
server_id: 2


The above is what I would expect the deeper merge to work like and I
think the original poster has this same issue. But what I have to do is
duplicate the hash from "mysql::server::override_options:" into both
servers, as in my above example, the only setting that gets applied due
to the priority lookup without hash merging is the server ID.

Since it's the Official Puppet Labs MySQL module, I'm not going to go
and change every hash parameter in the module to a hash lookup function,
because it would probably break something else. So I deal with the work
around of unnecessary duplication of data in hiera and try to let
everyone I work with know of this limitation for hash lookups and
automatic data bindings when working with 3rd party modules.

We certainly can (and do) use an explicit hiera_hash() lookup in some of
our own internal modules, but this results in inconsistent behavior due
to the limitations of the automatic databindings. The hiera issue is
the only reference to it I could find when I first started looking into
what was going on and why I wasn't getting the results I expected. It's
even mentioned in the hiera documentation:

https://docs.puppetlabs.com/hiera/1/lookup_types.html#deep-merging-in-hiera--120
signature.asc

jcbollinger

unread,
May 1, 2015, 9:40:55 AM5/1/15
to puppet...@googlegroups.com


On Thursday, April 30, 2015 at 12:10:19 PM UTC-5, Joseph Swick wrote:
On 30/04/15 09:30, jcbollinger wrote:
>
<trim of summary of heira lookups>
>    
> Priority lookup is Hiera's focus and default, and automated data binding
> uses that mode exclusively.  If you want hash-merge lookup then you must
> call hiera_hash() explicitly in your manifest.
>
>
> John
>

John, 

provisioning).  Our hiera config is set for deeper merges.


And that's relevant only when you are performing hash-merge lookups.

 
 The behavior
I expect is that I should be able to set common entries items for both
server's "mysql::server::override_options:" parameter hash at a lower
heira level and put the server specific override options at the host
specific level.


And you can.  And that should do what you want when you perform a hash-merge lookup to retrieve the data.

 
 However, how automatic data bindings work it only takes
the highest priority hash.



Yes.  That's what I just finished telling you.  Automated data binding uses only priority lookups, not hash-merge lookups.

[...]
 
The above is what I would expect the deeper merge to work like and I
think the original poster has this same issue.  But what I have to do is
duplicate the hash from "mysql::server::override_options:" into both
servers, as in my above example, the only setting that gets applied due
to the priority lookup without hash merging is the server ID.



Yes, that's one viable alternative.

 
Since it's the Official Puppet Labs MySQL module, I'm not going to go
and change every hash parameter in the module to a hash lookup function,
because it would probably break something else.


That's up to you.  I can't say I blame you.  Sometimes you have only bad alternatives.

 
 So I deal with the work
around of unnecessary duplication of data in hiera and try to let
everyone I work with know of this limitation for hash lookups and
automatic data bindings when working with 3rd party modules.



It's not a limitation of hash-merge lookups, nor of lookups of values that happen to be hashes.  Perhaps I wasn't clear: the fact that a particular key in your hiera data happens to have a hash as its associated value does not imply and is not meant to imply that a hash-merge lookup should be performed to retrieve that value.  You get a hash-merge lookup only via the hiera_hash() function.

The issue was first characterized as a limitation of automated data binding, but it is better characterized as a limitation of Hiera's built-in back ends.  Which type of lookup to use (at least by default) should be a characteristic of the data; it should not be the responsibility of the data consumer to guess which type of lookup to use.

 
We certainly can (and do) use an explicit hiera_hash() lookup in some of
our own internal modules, but this results in inconsistent behavior due
 
to the limitations of the automatic databindings.


I'm not following what's inconsistent about getting hash merge lookups when you request them, and not when you don't.  I can understand wishing that you could request them (or better: not need to request them) for automated data binding, but that's not a consistency issue.

 
 The hiera issue is
the only reference to it I could find when I first started looking into
what was going on and why I wasn't getting the results I expected.  It's
even mentioned in the hiera documentation:

https://docs.puppetlabs.com/hiera/1/lookup_types.html#deep-merging-in-hiera--120



I'm not sure what you're trying to say there.  Yes, hash-merge behavior is described in the hiera docs.  That's relevant when you request a hash-merge lookup.  Not so much when you request a priority lookup, whether explicitly or implicitly.


John

Ramin K

unread,
May 1, 2015, 4:59:55 PM5/1/15
to puppet...@googlegroups.com
On 4/30/15 10:10 AM, Joseph Swick wrote:
>
> dev-db-01.mydomain.net.yaml:
> ---
> mysql::server::override_options:
> mysqld:
> server_id: 1
>
> dev-db-02.mydomain.net.yaml:
> ---
> mysql::server::override_options:
> mysqld:
> server_id: 2


John covered the technical in detail. However I'd like to apply some
sysadmin knowledge to your example problem. With Puppet or any other
config framework it's important to specify as little as possible within
your system. Another way to put it is to only manage what you need to
and nothing more. It allows you more flexibility as you grow when your
code never cared that x might be more than a single digit. Also it keeps
you from having to solve technical problems that do not benefit your system.

In this case Mysql's server_id is a 32bit integer. It needs to be
unique within a set of servers that replicate with each other. That's
it. The number has no other meaning than an identifier. If you realize
the value of the number doesn't matter only it's uniqueness, you can
easily manage it programmatically.

# not sure how to quote this for insertion into Hiera
# I'll leave that to you
server_id = <% require 'ipaddr'%><%= IPAddr.new(@ipaddress).to_i %>

IPs should be unique between servers replicating with each other which
makes the ipaddress fact easy to reuse.

I encourage you to think about any values you add to Hiera particularly
though at the fqdn level and decide if they actually need to be
specially managed or if some statement at the env, os, etc level might
be a better option.

Ramin

Joseph Swick

unread,
May 4, 2015, 10:04:37 AM5/4/15
to puppet...@googlegroups.com
My issue is that as more module support the passing of custom parameters
via a hash that people populate with Heira lookups (regardless of
storage backend), they're going to be running into this issue even more
and wondering why things don't work the way they expect they should.
Perhaps the additional information that automatic data bindings with
heira and puppet modules is priory only needs to be explicitly stated
that it won't merge any hashes. Because, the docs imply that merges
would happen with hashes (when they obviously don't).

To me there is a disconnect between the documentation of heira lookups
with the function calls and the automatic data binding feature of heira.

In my organization, I'm not the only one who's gotten burned by the
difference in merge behavior from the automatic data bindings. Those
two pages I linked two have been the only documentation I've been able
to find to pass on to people internally when they come across the problem.
signature.asc

jcbollinger

unread,
May 5, 2015, 8:52:34 AM5/5/15
to puppet...@googlegroups.com


On Monday, May 4, 2015 at 9:04:37 AM UTC-5, Joseph Swick wrote:
My issue is that as more module support the passing of custom parameters
via a hash that people populate with Heira lookups (regardless of
storage backend), they're going to be running into this issue even more
and wondering why things don't work the way they expect they should.
Perhaps the additional information that automatic data bindings with
heira and puppet modules is priory only needs to be explicitly stated
that it won't merge any hashes.  Because, the docs imply that merges
would happen with hashes (when they obviously don't).


I suppose we differ on what the docs imply and what they don't, but certainly they could be more explicit in this area.  If that's indeed what this comes down to then I encourage you to file a ticket against the docs.


John

Reply all
Reply to author
Forward
0 new messages