I recently came across a couple of hosts that were hanging when
attempting to install a particular package. This particular package
was always the first in the list, so there were many other packages
that the client never even attempted to install. It turns out when we
fixed that single package, all others installed without a problem.
This got me thinking: a single bad package should not totally prevent
Munki from working for all packages.
Is there a reason we install packages in a particular order (the same
order) on every execution? I haven't dug too deep yet, but I think
it's installing items based on the order defined in the manifest. I
can't think of a reason why this would be necessary for the admin to
control, since they have "update_for" and "requires" pkginfo keys.
So ... would it be worthwhile to change Munki to install items in a
random order, so a single bad package doesn't completely kill a client
and other updates can continue to install successfully on subsequent
runs?
- Justin
I agree that it would be ideal to just skip pkgs that previously
failed, but that's a ton more work to implement, wouldn't be
foolproof, etc.
> I like that it installs
> in the order listed as it provides a deeper level of organization which can
> be helpful.
Can you elaborate on the advantages of maintaining control of the
order? I don't understand what you mean when mentioning
organization...
Sent from my eMate 300
> Greg and all,
>
> I recently came across a couple of hosts that were hanging when
> attempting to install a particular package. This particular package
> was always the first in the list, so there were many other packages
> that the client never even attempted to install. It turns out when we
> fixed that single package, all others installed without a problem.
> This got me thinking: a single bad package should not totally prevent
> Munki from working for all packages.
It depends on how you define "bad package". If you mean one that causes /usr/sbin/install to exit with an installation error, Munki continues on with the rest of the packages from there.
I could easily create a package with a preflight script that did `/sbin/shutdown -h now`. This would kill everything, and so Munki would attempt to install it over and over, always failing.
>
> Is there a reason we install packages in a particular order (the same
> order) on every execution? I haven't dug too deep yet, but I think
> it's installing items based on the order defined in the manifest.
Yes, sort of, with dependencies injected and included_manifests in there, too.
> I
> can't think of a reason why this would be necessary for the admin to
> control, since they have "update_for" and "requires" pkginfo keys.
"update_for" and "requires" information is not currently used at install time; updatecheck.py writes out items in intended install order; installer.py installs them in that order.
So adding some randomness to the install order without screwing up the correct install order for items in a dependency relationship would not be trivial.
> So ... would it be worthwhile to change Munki to install items in a
> random order, so a single bad package doesn't completely kill a client
> and other updates can continue to install successfully on subsequent
> runs?
Seems like an edge case better handled by fixing the bad package(s).
Not opposed to the idea (though there may be people who rely on the current behavior), but not sure it's worth the effort.
-Greg
>
> - Justin
I agree that fixing the broken package is the best thing to do, but
discovering such a problem isn't immediate. In the cases I've found, a
package was working well on 99.x% of clients, so it took a while to
realize that there were a handful of clients that went *weeks* without
installing *any* updates. I'd prefer during those weeks *most* updates
install just fine, and the *one* update that is causing problems
doesn't hold everything else up, as then the machine would at least be
mostly updated while we're not yet aware of the bad package.
As is, installation order seems undocumented and questionable when
taking into consideration mixed managed_updates/managed_installs,
included_manifests, etc., so I just thought this wouldn't be an issue.
But it seems like enough people are (wrongfully?) relying on manifest
defined order too heavily to change this.
That sounds like something you should handle in postflight, not a package.
> If this happens in random order, my entire deployment strategy is
> shot, because if the configuration for Munki access changes halfway
> through, that means half of my private server's Munki updates won't go
> in.
>
> Regardless of whether this is an ideal setup or not is irrelevant to
> the debate; the point is that randomizing the Munki installs from the
> list means you actually have no idea what is going to install when. I
> don't particularly understand how it's advantageous to not know how
> your software deployment is actually going to go. Isn't the entire
> point of something like Munki to make deployment predictable and
> easy? Why would I want to play Russian roulette with updates every
> time I bootstrap a new computer with Munki?
If packages have dependencies, the pkginfo files should ensure those
dependencies are defined and met before installation takes place.
Order in manifest doesn't guarantee that previous installations were
successful and dependencies are all met. See my example below for
iLife.
> As Raul points out, that would make it excruciatingly difficult to
> troubleshoot problems with installation order, especially if you have
> lots of packages that are dependent on other packages being present
> first. I don't want packages deploying MCX settings for iLife
> preferences before iLife has been installed.
For this, the MCX settings package should use an "installs" key in the
pkginfo, so it doesn't get installed until iLife exists. Simply
relying on manifest order is not sufficient. Think of the case where
iLife fails to install for whatever reason (installer returns
non-zero); if you're only relying on order in the manifest, then the
MCX package would still get installed so the same problem you've just
identified would still occur.
As Raul points out, that would make it excruciatingly difficult totroubleshoot problems with installation order, especially if you havelots of packages that are dependent on other packages being presentfirst. I don't want packages deploying MCX settings for iLifepreferences before iLife has been installed.
For this, the MCX settings package should use an "installs" key in the
pkginfo, so it doesn't get installed until iLife exists. Simply
relying on manifest order is not sufficient. Think of the case where
iLife fails to install for whatever reason (installer returns
non-zero); if you're only relying on order in the manifest, then the
MCX package would still get installed so the same problem you've just
identified would still occur.
You can randomise the order of managed_installs items in Simian & that would give you much of the desired behaviour (particularly if you are not using included_manifests).
Rob.
If the order of the content of "managed_installs" is randomized by Simian, Justin can get what he wants without affecting those who rely on the current behavior and without affecting dependency ordering.
in other words, randomize the INPUT to updatecheck instead of the OUTPUT...
-Greg
Yea I thought of that before mailing munki-dev, but thought perhaps
Munki would benefit from a similar change. I was obviously wrong ;)
Currently Simian sorts alphabetically so viewing the manifest in the
web interface is easier on the eyes, but we could just randomly sort
before sending off to the clients. Or we could automatically detect
repeated failures, omit those items from the manifest entirely, and
raise alerts to admins ... if only the day had more hours.
I don't rely on the order - but for me it helps in debugging to have the same behaviour on many machines.
I would prefer a deterministic outcome -- I prefer 100 computers failing in the same way rather than 100 computers failing in an additionally random way. When I'm busy I need a critical mass of failure before I take a look.
Rob.
> (snip snip)
>
> I agree that fixing the broken package is the best thing to do, but
> discovering such a problem isn't immediate. In the cases I've found, a
> package was working well on 99.x% of clients, so it took a while to
> realize that there were a handful of clients that went *weeks* without
> installing *any* updates. I'd prefer during those weeks *most* updates
> install just fine, and the *one* update that is causing problems
> doesn't hold everything else up, as then the machine would at least be
> mostly updated while we're not yet aware of the bad package.
>
When I read this I hear the need for an auditing system where I can ask the question "which systems are not fully patched?" and get an answer. For 'mission critical' packages, this same system could raise an alert.
The auditing system is ideally independent of the processes it is checking so it doesn't fail in the same way. But ideal schameal.
And this comment is tangential to your desire that a package causing munki to fail in some way not interfere with other package installations.
Raúl