Meaningful managedsoftwareupdate exit codes, for crash detection in supervisor

198 views
Skip to first unread message

Justin McWilliams

unread,
Oct 17, 2012, 5:47:40 PM10/17/12
to munk...@googlegroups.com
Greg et al,

For Simian, we've just implemented a feature where
managedsoftwareupdate crashes are reported the server with full stack
traces, using supervisor and the report_broken_client script.

The problem I'm seeing is that graceful exits in preflight are causing
managedsoftwareupdate to exit with the same code as any other crash,
such as uncaught exceptions in Munki source. By "graceful exits" I
mean cases where we don't want Munki to execute (e.g. while connected
via a WWAN card), so preflight exits with a non-zero,
managedsoftwareupdate exits with the same code as it does elsewhere,
and supervisor thinks it crashed.

I propose we:
1) define and document exit codes as constants near the top of
managedsoftwareupdate.
and
2) if preflight/postflight exit with non-zero, exit
managedsoftwareupdate with the same code (currently it's always -1,
which OS X sees as 255).

This way, supervisor can easily decipher between a crash (exit code ==
1), a graceful exit, or some other documented exit case, and act
accordingly.

BTW, we plan to submit our supervisor changes to Munki directly, as
this feature could be used with custom report_broken_client scripts in
non-Simian deployments.

Thoughts?

- Justin

Justin McWilliams

unread,
Oct 24, 2012, 12:24:25 PM10/24/12
to munk...@googlegroups.com
FYI, this is done here:
https://code.google.com/p/munki/source/detail?r=5a4a4f572e7f5b79538291156dfefa150d64fa1d

Internally, our Simian instance is running an enhancement supervisor
which detects Python crashes and uploads the traceback to our server.
This has already exposed a couple of bugs, one of which was fixed
here: https://code.google.com/p/munki/source/detail?r=cc4ddcfb4df358ae7426f9464f90681ae202590e

I'll submit the supervisor changes to Munki soon...

- Justin

On Thu, Oct 18, 2012 at 11:25 AM, Anthony Lieuallen <alie...@google.com> wrote:
> On Wednesday, October 17, 2012 5:48:02 PM UTC-4, Justin McWilliams wrote:
>>
>> I propose we:
>> 1) define and document exit codes as constants near the top of
>> managedsoftwareupdate.
>> and
>> 2) if preflight/postflight exit with non-zero, exit
>> managedsoftwareupdate with the same code (currently it's always -1,
>> which OS X sees as 255).
>
>
> Sounds very reasonable. I'd suggest defining:
>
> A) Currently known errors (no network, avoidable network like airplane wifi,
> etc.).
> B) Catch all unknown values (fatal, temporary, etc.).
> C) Reserved ranges that can be defined in the future, and are guaranteed to
> otherwise not be used until then.
>
> A and B would probably mean "ranges" of goodness and badness happening, much
> like HTTP has 2xx, 3xx, 4xx; you can get a high level of information from
> the most significant digit, and often more detail from the less significant
> ones, if you care.
>
> That way whatever interprets these values has the best chance of being able
> to.

Justin McWilliams

unread,
Nov 12, 2012, 2:47:47 PM11/12/12
to munk...@googlegroups.com
Supervisor updated here:
https://code.google.com/p/munki/source/detail?r=e6cd53188d3562842c49451fc3b302b0cdc6dbe1

Our launchd plists now look like:

<key>ProgramArguments</key>
<array>
<string>/usr/local/munki/supervisor</string>
<string>--delayrandom</string>
<string>600</string>
<string>--timeout</string>
<string>43200</string>
<string>--error-exec</string>
<string>/usr/local/munki/report_broken_client --reason supervisor
--detail-file {STDERR}</string>
<string>--</string>
<string>/usr/local/munki/managedsoftwareupdate</string>
<string>--auto</string>
</array>

So, you can see when Munki crashes (Python exit == 1)
report_broken_clent POSTs stderr (the Python traceback) to our Simian
server. So far we've found 4-5 intermittent/minor bugs in Munki from
such tracebacks, so it's already proving useful.

If widely desired, we could set Munki's default launchd plists
similarly, add to the report_broken_client documentation at
http://code.google.com/p/munki/wiki/ReportBrokenClient , and have
supervisor only attempt to launch report_broken_client if it exists.
Thoughts?

- Justin

Rob Middleton

unread,
Nov 12, 2012, 3:57:09 PM11/12/12
to munk...@googlegroups.com
That seems worthwhile to have as standard to me. I should be capturing such info to a central server .. I'm not yet.

Do we need to make a change to the launchd scripts? Or can we make --error-exec as specified the default unless overridden?

I'm undecided whether it is better to:
- have the launchd plist arguments quite explicit
- not change the launchd plists unless we really need to, as that increments the subpackage version & requires a restart on upgrades.

Rob.

Nate

unread,
Nov 12, 2012, 4:09:26 PM11/12/12
to munk...@googlegroups.com
I second having more information regarding a run and how it went.  I'd vote for not updating the launchd items unless there is a solid reason to do so.  If it is the best way of doing it, then update the launchd item.

Nate

Miq Viq

unread,
Nov 12, 2012, 4:23:40 PM11/12/12
to munk...@googlegroups.com
I think that if only launchdaemons (not launchagents) need updating it can be done on the fly?

First just stop and unload current launchdaemons with:

launchctl stop com.googlecode.munki.managedsoftwareupdate-check
launchctl unload /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-check.plist

launchctl stop com.googlecode.munki.managedsoftwareupdate-manualcheck
launchctl unload /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-manualcheck.plist

Then write or copy new launchdaemons and re-activate them with:

launchctl load /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-check.plist
launchctl start com.googlecode.munki.managedsoftwareupdate-check

launchctl load /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-manualcheck.plist
launchctl start com.googlecode.munki.managedsoftwareupdate-manualcheck

When modifying LaunchAgents a reboot is generally needed, I guess.

And after writing this I noticed that there actually is a launchagent which may need the same mod:

/Library/LaunchAgents/com.googlecode.munki.managedsoftwareupdate-loginwindow.plist

So, can we update all launchd-items AND have those changes be effective immediately without rebooting?


-MiqViq

Gregory Neagle

unread,
Nov 12, 2012, 4:41:13 PM11/12/12
to munk...@googlegroups.com
On Nov 12, 2012, at 1:23 PM, Miq Viq <miq...@gmail.com> wrote:

I think that if only launchdaemons (not launchagents) need updating it can be done on the fly?

First just stop and unload current launchdaemons with:

launchctl stop com.googlecode.munki.managedsoftwareupdate-check
launchctl unload /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-check.plist

launchctl stop com.googlecode.munki.managedsoftwareupdate-manualcheck
launchctl unload /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-manualcheck.plist

Then write or copy new launchdaemons and re-activate them with:

launchctl load /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-check.plist
launchctl start com.googlecode.munki.managedsoftwareupdate-check

launchctl load /Library/LaunchDaemons/com.googlecode.munki.managedsoftwareupdate-manualcheck.plist
launchctl start com.googlecode.munki.managedsoftwareupdate-manualcheck

Now: make sure you are doing the right thing for non-boot volumes, and test the behaviors on all supported OS versions (10.5-10.8). And realize that if Munki itself is doing the install, unloading the wrong launchdaemon will kill the managedsoftwareupdate process (as it was launched via a launchdaemon).

I won't go so far as to say dynamically updating all the launchagents and launchdaemons is impossible, but it's certainly very hard to do completely correctly. A restart is safer and is much more likely to be successful.


When modifying LaunchAgents a reboot is generally needed, I guess.

And after writing this I noticed that there actually is a launchagent which may need the same mod:

/Library/LaunchAgents/com.googlecode.munki.managedsoftwareupdate-loginwindow.plist

So, can we update all launchd-items AND have those changes be effective immediately without rebooting?

No. You cannot reliably update launchagents for all users without all users logging out. Since the installer can't require that, a restart is what must be done.

MiqViq

unread,
Nov 13, 2012, 12:53:35 AM11/13/12
to munk...@googlegroups.com
Yes, a reboot is the easiest way to ensure that all launchd items are truly refreshed.

And for getting these new functionalities enabled in Munki I would say a reboot is a small issue.

- MiqViq

Marnin Goldberg

unread,
Nov 13, 2012, 9:08:27 AM11/13/12
to munk...@googlegroups.com

On 11/13/12 12:53 AM, MiqViq wrote:
> Yes, a reboot is the easiest way to ensure that all launchd items are truly refreshed.
>
> And for getting these new functionalities enabled in Munki I would say a reboot is a small issue.
>
> - MiqViq
>

I agree. I have no problems with requiring a reboot to make sure all the
launchd items are refreshed.

Marnin


nico

unread,
Nov 13, 2012, 6:56:46 PM11/13/12
to munk...@googlegroups.com
Hi everyone,
I am running through the setup instruction for munki web admin.
I am getting the following error when trying to setup mercurial.

pip install mercurial
Downloading/unpacking mercurial
Running setup.py egg_info for package mercurial

Python headers are required to build Mercurial
Complete output from command python setup.py egg_info:
running egg_info

writing pip-egg-info/mercurial.egg-info/PKG-INFO

writing top-level names to pip-egg-info/mercurial.egg-info/top_level.txt

writing dependency_links to pip-egg-info/mercurial.egg-info/dependency_links.txt

warning: manifest_maker: standard file '-c' not found

Python headers are required to build Mercurial

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /Users/Shared/munkiwebadmin_env/build/mercurial
Storing complete log in /Users/macadmin/.pip/pip.log










Here is the Log
------------------------------------------------------------
/Users/Shared/munkiwebadmin_env/bin/pip run on Wed Nov 14 10:18:54 2012
Downloading/unpacking mercurial

Running setup.py egg_info for package mercurial

running egg_info
writing pip-egg-info/mercurial.egg-info/PKG-INFO
writing top-level names to pip-egg-info/mercurial.egg-info/top_level.txt
writing dependency_links to pip-egg-info/mercurial.egg-info/dependency_links.txt
warning: manifest_maker: standard file '-c' not found


Python headers are required to build Mercurial

Complete output from command python setup.py egg_info:

running egg_info

writing pip-egg-info/mercurial.egg-info/PKG-INFO

writing top-level names to pip-egg-info/mercurial.egg-info/top_level.txt

writing dependency_links to pip-egg-info/mercurial.egg-info/dependency_links.txt

warning: manifest_maker: standard file '-c' not found



Python headers are required to build Mercurial

----------------------------------------

Command python setup.py egg_info failed with error code 1 in /Users/Shared/munkiwebadmin_env/build/mercurial

Exception information:
Traceback (most recent call last):
File "/Users/Shared/munkiwebadmin_env/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg/pip/basecommand.py", line 107, i$
status = self.run(options, args)
File "/Users/Shared/munkiwebadmin_env/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg/pip/commands/install.py", line 2$
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/Users/Shared/munkiwebadmin_env/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg/pip/req.py", line 1042, in prepa$
req_to_install.run_egg_info()
File "/Users/Shared/munkiwebadmin_env/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg/pip/req.py", line 236, in run_eg$
command_desc='python setup.py egg_info')
File "/Users/Shared/munkiwebadmin_env/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg/pip/util.py", line 612, in call_$
% (command_desc, proc.returncode, cwd))
InstallationError: Command python setup.py egg_info failed with error code 1 in /Users/Shared/munkiwebadmin_env/build/me$

Gregory Neagle

unread,
Nov 13, 2012, 7:00:25 PM11/13/12
to munk...@googlegroups.com
Please post MunkiWebAdmin questions/topics on the munki-web-admin Google Group.

http://groups.google.com/group/munki-web-admin

This particular topic was covered yesterday, so review the archives first.

-Greg

nico

unread,
Nov 13, 2012, 7:05:14 PM11/13/12
to munk...@googlegroups.com
Hi greg thanks for that, i didn't realise there was an extra google group for munkiwebadmin.
i will go check it out now


Thanks

nick
Reply all
Reply to author
Forward
0 new messages