avoiding duplicate package definitions with stdlib's "ensure_packages"

4,420 views
Skip to first unread message

Jonathan Proulx

unread,
May 21, 2013, 1:27:28 PM5/21/13
to puppet...@googlegroups.com
One of the most frustrating things about puppet is duplicate definitions of packages,

The "ensure_packages" function from stdlib seems very much like the correct way to handle this:

newfunction(:ensure_packages, :type => :statement, :doc => <<-EOS
Takes a list of packages and only installs them if they don't already exist.
    EOS
  ) do |arguments|

Why isn't this a core feature?  Or better why aren't resources that are identical merged?  Clearly there is a conflict if they are defined with different parameter (ensure => installed -vs ensure => latest for example) but if they are really identical where is the conflict?

My current situation is that we define parted and xfsprogs in our local site configs and the enovance ceph modules also do so.  Both of these seem reasonable to me as we need them to make all our systems go and the ceph module needs them to set up the object storage volumes (which is it's primary function) and can't assume everyone uses parted and xfs.

The two prevailing opinions in my web readings seem to be to use Virtual resources which is fine for local modules but not so good for sharing and the purist opinion that if there is a conflict the conflicting parts should be split out into an independent module, which is fine for large functional chunks like httpd but ridiculous for single utilities that require no configuration.

Yet there is little discussion of ensure_packages as an alternative.  Is this for cause or just because it is not well known?

-Jon

Joe Topjian

unread,
May 21, 2013, 2:21:33 PM5/21/13
to puppet...@googlegroups.com
Wow - I never knew about ensure_packages. I'm not sure if I overlooked it or if it's relatively new, but either way, I agree that it should be more widely used.

I run into the same issue that you described when using third-party modules. I try to avoid this in my own modules by only managing packages that are core to the module and expect the user to manage secondary packages on their own. Unfortunately this method imposes more responsibility onto the user. ensure_packages might be a good solution to this.


--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To post to this group, send email to puppet...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Joe Topjian
Systems Architect
Cybera Inc.


Cybera is a not-for-profit organization that works to spur and support innovation, for the economic benefit of Alberta, through the use of cyberinfrastructure.

jcbollinger

unread,
May 21, 2013, 4:18:14 PM5/21/13
to puppet...@googlegroups.com


On Tuesday, May 21, 2013 12:27:28 PM UTC-5, Jonathan Proulx wrote:
One of the most frustrating things about puppet is duplicate definitions of packages,

The "ensure_packages" function from stdlib seems very much like the correct way to handle this:

newfunction(:ensure_packages, :type => :statement, :doc => <<-EOS
Takes a list of packages and only installs them if they don't already exist.
    EOS
  ) do |arguments|

Why isn't this a core feature?  Or better why aren't resources that are identical merged?  Clearly there is a conflict if they are defined with different parameter (ensure => installed -vs ensure => latest for example) but if they are really identical where is the conflict?



This topic has come up before.  You will probably find a lot of discussion about it under the heading of "module compatibility".

Anyway, where resources are indeed identical, I see no inherent harm in merging them.  Your manifest set would become more brittle the more it relied on such merging, but I suspect the feature would have nevertheless been implemented already if it were easy to do.  I can think of several reasons why it might actually be tricky to do this.

 
My current situation is that we define parted and xfsprogs in our local site configs and the enovance ceph modules also do so.  Both of these seem reasonable to me as we need them to make all our systems go and the ceph module needs them to set up the object storage volumes (which is it's primary function) and can't assume everyone uses parted and xfs.

The two prevailing opinions in my web readings seem to be to use Virtual resources which is fine for local modules but not so good for sharing and the purist opinion that if there is a conflict the conflicting parts should be split out into an independent module, which is fine for large functional chunks like httpd but ridiculous for single utilities that require no configuration.


The key factor here is that you very much (should) want a single authoritative source describing each managed resource.  That is what you get by putting it in a separate module on which all others that need the resource rely.  I don't find that ridiculous at all, given how lightweight modules are, but if it made you feel better then you could plan for a single module with which to meet many such needs.

Using virtual resources also gives you a single authoritative source, but it leaves open the question of to which module the resource should belong.  Indeed, it might not be unreasonable to make virtual and put it into a separate module.  Whether use of virtual resources across module boundaries is conducive to sharing matters only if you are planning to share.

The simple fact that two modules both depend on the same resource, but cannot either one depend on the other module, indicates to me that the resource in question does not belong in either module.  It is a question of accurately modeling the target configuration space, and how large or small the resulting classes and modules end up being is at best a secondary consideration.  If you use crude models, then you are likely to encounter problems from time to time, and that's exactly what is happening to you now.

 

Yet there is little discussion of ensure_packages as an alternative.  Is this for cause or just because it is not well known?



It is at least partly because ensure_packages() is little known.  It may also be because ensure_packages() is itself a crude tool, susceptible to producing inconsistent configurations for different machines and/or over time, and applicable only to Package resources.  It is worse than merging identical resource declarations, because it will hide even non-identical declarations of the same resource.  If you use it then it is highly likely to take a chunk out of your gluteus maximus sometime in the future.


John

David Schmitt

unread,
May 22, 2013, 2:48:26 AM5/22/13
to puppet...@googlegroups.com
On Tue, 21 May 2013 13:27:28 -0400, Jonathan Proulx <j...@jonproulx.com>
wrote:
Just because it is not well known. ensure_packages seems to be designed
for exactly the use-case you are describing: you have two modules both
requiring the same utility packages without caring at all about versions or
anything else.

To elaborate: contrary to (most, all?) other resources, packages have this
no-parameter use case, which allows conflict-free merging. Managing more
complex resources (users, files) would require more cooperation between
modules to ensure that the different requirements do not step on each
other.


Regards, David

jcbollinger

unread,
May 22, 2013, 10:00:10 AM5/22/13
to puppet...@googlegroups.com


On Wednesday, May 22, 2013 1:48:26 AM UTC-5, David Schmitt wrote:

Just because it is not well known. ensure_packages seems to be designed
for exactly the use-case you are describing: you have two modules both
requiring the same utility packages without caring at all about versions or
anything else.

To elaborate: contrary to (most, all?) other resources, packages have this
no-parameter use case, which allows conflict-free merging. Managing more
complex resources (users, files) would require more cooperation between
modules to ensure that the different requirements do not step on each
other.


No, the only resource types that could have no-parameter use cases would be those with no parameters, but there are none such.  Some types, including Package, admit sensible use cases involving only default parameter values, but that's a rather different thing, and not necessarily the thing you want.  Also, either ensure_packages() must ignore the context-specific Package parameter defaults (in which case those won't work, which could be a hard-to-debug surprise) or else uses of the function in different contexts can be inequivalent, so that one set of package parameters is chosen over all others, at semi-random (also a potential nasty surprise).

Furthermore, ensure_packages() is inherently parse-order dependent, and parse-order dependencies are a major source of concern in Puppet manifest design.  That is mitigated in this case if you require that packages used with ensure_packages() are never declared any other way, but then you're placing a constraint on your manifest set that is easy to overlook, that might not be enforced by the catalog compiler, and that you cannot by any means expect third-party modules to comply with.

Ensure_packages() is bad juju.  If you design your manifest set well in the first place then you will not need it.  Sadly, the current state of module compatibility is that in some cases you may need to refactor third-party modules to achieve the overall design criteria you want.  At least Puppet will notify you of those cases (provided that ensure_packages() or some similar device doesn't mask it).


John

David Schmitt

unread,
May 24, 2013, 2:42:39 AM5/24/13
to puppet...@googlegroups.com
I totally agree with your sentiment in theory, but would like to note
that in practice, "ensure_package('wget')" is an efficient and low-risk
way to solve a real problem across otherwise non-cooperating modules.

Please also note, that in my years of module writing, I only encountered
a handful of packages that should be given this treatment. Wget and
rsync being the most prominent of those.


Regards, David

Chris Barbour

unread,
Nov 25, 2013, 3:36:49 PM11/25/13
to puppet...@googlegroups.com
ensure_packages is not low risk, IMO.

In testing, the following will result in a Duplicate declaration error:

$packages = ['wget']

package { $packages:
  ensure => 'installed',
}

ensure_packages($packages)

Ensure_packages will not create a resource conflict with another instance of ensure_packages. But, it will create a resource conflict with a conventional package declaration. This is only a very slight improvement over other approaches to in-module package dependency management.

If your 3rd party forge module uses ensure_packages to manage a dependency, and I've setup a module to manage those dependencies using conventional package resources, your module will conflict with my module.

Regards,
Chris
Reply all
Reply to author
Forward
0 new messages