The other issues are all consequences of these two issues:
As such, my proposed changes to the host type are to:
Furthermore, since a resolution to this issue would almost definitely be a breaking change, I recommend that we try to get it in for Puppet 4. If we can figure out a solution for the problem in change #4, I can hammer out a revised type, provider, tests, and documentation ASAP. Any thoughts?
As per PUP-3901, the host type has some serious issues. Major issues with the current design:
- The namevar:
- It's currently the canonical hostname. This means that a hostname can be the canonical representation for at most 1 IP address. This is a problem if, for example, you want to provide both IPv4 and IPv6 addresses for a hostname.
- Changing it to be the IP address would mean that an IP address could have at most one canonical hostname associated with it. This is less of an issue, but still not ideal.
- Probably the best solution here is to change this to be both the IP address and the canonical hostname (e.g. "1.2.3.4/example.com"). However...
- Parsing is flawed:
- Multiple records with the same value for the namevar (currently the canonical hostname) overlap and only one is registered. Modifying or removing records that overlap behaves inconsistently and, in the case of removal, requires multiple runs to achieve consistency. Examples in the issue's description.
- Changing the namevar to be the IP address or both the IP address and the canonical hostname could cause problems on Windows, where the number of hostname aliases per record is limited. This could be resolved by having the provider split a resource into multiple records in the file if the underlying system has alias count limits.
The other issues are all consequences of these two issues:
- Inconsistent resource modification and removal (examples in the issue's description) is a result of namevar collision.
- Removal of a hostname causing removal of all the aliases is more of a documentation issue than anything. So long as this is explicitly called out as expected behavior, it's not a problem.
As such, my proposed changes to the host type are to:
- Change the generated resource namevar (and, by extension, the alias for specified resources) to use both the IP address and the canonical hostname.
- Fix parsing to handle cases where multiple records specify the same namevar (which, after change #1, would be an IP address and canonical hostname) by merging them into a single resource.
- Update documentation to to indicate that the hostname aliases are not first-class host items and that, when a hostname is removed, all aliases are removed too. If the user wants to retain a hostname alias while removing a hostname, they'll need to put it into a different host resource.
- To allow manifests to set relationships to hosts without knowing ahead of time what the IP address is, potentially provide resource aliases with titles set to the hostname and all the hostname aliases. Unfortunately, this runs into an issue when multiple host resources service the same hostname; blindly making resource aliases would result in each trying to alias to the same name, but conditionally aliasing based on if an alias already exists would result in a relationship attaching to different host resources depending on parse order. This is a problem that I'm not sure how to solve.
Furthermore, since a resolution to this issue would almost definitely be a breaking change, I recommend that we try to get it in for Puppet 4. If we can figure out a solution for the problem in change #4, I can hammer out a revised type, provider, tests, and documentation ASAP. Any thoughts?
As far as I can tell, it is a design characteristic of the current hosts file format that it associates each address with exactly one canonical name, and each canonical name with exactly one address. This is a bit ticklish, though, because there seems to be no canonical reference for the file format itself. Nevertheless, the Linux manpage for it says it has one line per IP address, and that "[f]or each host a single line should be present [...]" (emphasis added).
1.1.1.1 host
1.1.2.2 host
1.1.1.1 other
1.1.1.1 host
1.1.2.2 host other
1.1.1.1 other
1.1.1.1 host other
1.1.2.2 host
1.1.1.1 other
1.1.1.1 host other
1.1.2.2 host
Indeed, though the type's documentation merely says that a Host resource represents a "host entry", the longtime design demonstrates that it more specifically represents a mapping from a canonical hostname to properties of that hostname including a network address. It is implicit in the historic use of hostname alone as namevar that duplicate canonical names cannot be modeled. That these entries are typically recorded in /etc/hosts (on some systems) is in fact a function of the provider and of the 'target' property, so really the format and allowed usage of particular host files in particular contexts can be only weak guidance for whether the model is appropriate.
Objection, Your Honor! Describing the issue as flawed parsing assumes that the files being parsed are correct, and that they are (intended to be) supported by Puppet, but the validity of both assertions is unclear. To be sure, Host files containing more than one record bearing the same canonical name do not comply with Puppet's model for host entries. It is unsurprising that Puppet does not handle such files well, but that could as easily be ascribed to invalid/incompatible files as to flawed parsing.
Additionally, perhaps it would be better to deprecate the Host resource in favor of something different, maybe a "HostEntry", that is not burdened with the same limitations.
On Tuesday, January 27, 2015 at 2:53:46 PM UTC-8, John Bollinger wrote:As far as I can tell, it is a design characteristic of the current hosts file format that it associates each address with exactly one canonical name, and each canonical name with exactly one address. This is a bit ticklish, though, because there seems to be no canonical reference for the file format itself. Nevertheless, the Linux manpage for it says it has one line per IP address, and that "[f]or each host a single line should be present [...]" (emphasis added).
I've done some investigation into various implementations of DNS resolvers and can say that the documentation (or at least this interpretation thereof) is inaccurate. getaddrinfo(3)
, which is used on all platforms including Windows to resolve hostnames, provides a linked list of results and can optionally provide the canonical hostname. When a hostname is listed in the hosts file multiple times, getaddrinfo(3) provides all the IP addresses that are listed for that host, just like it would if it had to fetch that information from DNS.
So yeah. I do think that the host type should support specifying multiple IPs for the same hostname, because every resolver implementation I can track down seems to support that (with the possible exception of Solaris, which I can check tomorrow, though I very much doubt that it'll prove an outlier).
This makes sense as the hosts file is something of a poor man's lightning-fast DNS server. It may be worth also putting some logic in to detect cases where a hostname is specified as both a canonical name and an alias and throwing a warning or an error. Also to detect cases (at least on OS X) where an alias is specified for some but not all of the IP addresses listed for its canonical name.
Indeed, though the type's documentation merely says that a Host resource represents a "host entry", the longtime design demonstrates that it more specifically represents a mapping from a canonical hostname to properties of that hostname including a network address. It is implicit in the historic use of hostname alone as namevar that duplicate canonical names cannot be modeled. That these entries are typically recorded in /etc/hosts (on some systems) is in fact a function of the provider and of the 'target' property, so really the format and allowed usage of particular host files in particular contexts can be only weak guidance for whether the model is appropriate.
By contrast, in Chef the namevar is the IP address. The fact that the canonical hostname is the namevar here is merely a choice that was made when the type was designed, and I am of the opinion that the design is flawed.
Objection, Your Honor! Describing the issue as flawed parsing assumes that the files being parsed are correct, and that they are (intended to be) supported by Puppet, but the validity of both assertions is unclear. To be sure, Host files containing more than one record bearing the same canonical name do not comply with Puppet's model for host entries. It is unsurprising that Puppet does not handle such files well, but that could as easily be ascribed to invalid/incompatible files as to flawed parsing.
This is true, and doesn't really matter for a system entirely provisioned by/with Puppet, but if one is attempting to Puppetize infrastructure that already exists, this may well be a problem, one with no good solution at present other than to fall back to managing the hosts file with a file resource. This is suboptimal.
Additionally, perhaps it would be better to deprecate the Host resource in favor of something different, maybe a "HostEntry", that is not burdened with the same limitations.
I actually suggested this very thing in my most recent comment on the JIRA issue, albeit as part of way to enable relationships without requiring collectors.
DNS resolvers are irrelevant to the hosts file format. The hosts file is not a DNS data source, even if resolver libraries happen to consult it, too. In fact, the original objective of DNS was to altogether replace hosts files.
At the time the type was designed, however, hostname as namevar better suited Puppet's capabilities and users' requirements, particularly with respect to resource relationships. It would be rare for another resource to depend on there being a name mapped to (say) address 10.11.12.13, but it is reasonably common for another resource to depend on a name such as 'myservice.my.com' being mapped to an address. Only with the advent of the chain operators did Puppet gain the ability to declare relationships other than via the target resource's namevar.
I think your suggestion could be tweaked such that introduction of the proposed new resource type didn't require any change to most existing code. If, just as you suggested, the hypothetical new type, "Hostentry" / "Hostname" / whatever,then other resources could continue to 'require' Host resources just as they do now. (I assert that other than as described above, no useful purpose is served by a 'before' edge targeting a Host resource).
- autogenerated a corresponding Host resource, and
- created a 'before' relationship from itself to the corresponding Host,
Furthermore, with a bit of care in the type implementation, and perhaps a new parameter or two for the Host resource, this scheme could also cause duplicate resource errors to be properly thrown when a manually-declared Host resource conflicts with a Hostentry resource.
Avoiding a breaking change is an attractive feature of this strategy in its own right, but it has the additional advantage that it does not require hurrying to get an implementation into Puppet 4 (or waiting years for Puppet 5). With P4 imminent, I am uncertain whether there is any chance at this point to get an additional change/features into 4.0.0.
[rnelson0@test ~]$ cat /etc/hosts
# HEADER: This file was autogenerated at Thu Jan 22 22:08:17 +0000 2015
# HEADER: by puppet. While it can still be managed manually, it
# HEADER: is definitely not recommended.
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
1.2.3.4 test
ffff::0001 test
[rnelson0@build profile]$ puppet resource host
host { 'localhost':
ensure => 'present',
host_aliases => ['localhost.localdomain', 'localhost4', 'localhost4.localdomain4'],
ip => '127.0.0.1',
target => '/etc/hosts',
}
host { 'test':
ensure => 'present',
ip => '1.2.3.4',
target => '/etc/hosts',
}