I'm looking for background information about how bug #7127[1] should
be fixed: prerun_command don't stop puppet on error
I think there's general agreement that if the prerun command fails,
then the catalog should not be applied, but the report should be sent,
and the report's status should be "failed".
However, what about the post-run command? In particular, if the
catalog is applied successfully, but the postrun command fails, should
the overall run be considered a failure? The documentation[2] says it
should be:
"A command to run after every agent run. If this command returns a
non-zero return code, the entire Puppet run will be considered to have
failed, even though it might have performed work during the normal
run."
But there are several problems with the way the code is currently implemented.
* If the postrun command fails, puppet never sends the report.
* Errors that occur while running the pre and postrun commands are
not captured in the report's log.
* If the catalog is applied successfully, but the postrun command
fails, the report status is not changed to "failed".
Right now it doesn't matter because the report is never sent, but if I
fix that, it could matter.
Thoughts? The only use case I know of is etckeeper, but its postrun
command, etckeeper-commit-post[3], always returns 0 even if the
etckeeper command fails.
Finally, the prerun command is executed after dostorage,
download_plugins, download_fact_plugins. Is there reason for the
prerun command to occur first?
It'd be great to hear about your experience with the pre/post run
commands and what use cases you are trying to solve.
Also, is there anything that is being solved with pre/post run
commands that can't be solved using stages? For example, if the prerun
command, catalog, and postrun commands are executed as stages, in that
order, with each stage depending on its predecessor(s), it would
ensure that:
* An error in one stage would prevent the following stage(s) from executing.
* The report would contain all errors from stages that were executed.
* The report status, resource statuses, and metrics would be consistent.
Thanks,
Josh
[1] http://projects.puppetlabs.com/issues/7127
[2] http://docs.puppetlabs.com/references/stable/configuration.html#postruncommand
[3] https://code.launchpad.net/~soren/ubuntu/lucid/puppet/etckeeper-integration
On Wed, Jun 08, 2011 at 05:50:58PM -0700, Josh Cooper wrote:
> It'd be great to hear about your experience with the pre/post run
> commands and what use cases you are trying to solve.
We use the feature to generate additional information about how the
puppet run has changed the system:
http://www.unixdaemon.net/tools/puppet/nagios-wrapped-puppet-runs.html
In our use case we're only using the two stages as information hooks for
their side-effects - not to alter the puppet run.
> Also, is there anything that is being solved with pre/post run
> commands that can't be solved using stages? For example, if the prerun
> command, catalog, and postrun commands are executed as stages, in that
> order, with each stage depending on its predecessor(s), it would
> ensure that:
For my example usage I could quite easily move the commands to be execs
in the pre and post stages.
This makes complete sense and is how the feature was intended to work,
but unfortunately, it never has worked that way (for the agent).
Currently, if the prerun command fails, puppet will attempt to apply
the catalog. Puppet will also always attempt to send a report (due to
#1054), which partially breaks the "master under too much load" use
case.
So there are several ways in which pre/post run failure states can be
handled. I'm curious to think what you think the default behavior
should be and whether you would like to see these other failure states
supported:
If the prerun_command fails:
1. Ignore the failure, continue applying the catalog, send the report, etc.
2. Stop puppet, don't apply the catalog, don't send the report, and exit(1)
3. Stop puppet, don't apply the catalog, but do send the report,
including information about why the prerun command failed, and exit(1)
#1 is the current behavior, but could also be accomplished by
appending "|| true" to the prerun_command option, e.g. prerun_command
= /bin/meow || true.
#2 was how the feature was originally implemented and how it is
documented, but due to the merge with #1054, the default behavior was
changed to #1.
#3 can also be accomplished using stages. This would be best used in
cases where the prerun command should be "in-band" and its failure
should affect the overall report status, resource_statuses, metrics,
etc.
I'd like to propose that we change the default behavior for
prerun_command to #2 and document how to accomplish #1 and #3.
Similarly, if the postrun command fails, there are several different options:
1. Ignore the failure, send the report with whatever status resulted
from applying the catalog, etc.
2. Stop puppet, don't send the report (even though the catalog may
have been applied), and exit(1)
3. Add the postrun command error to the report, change the report
status to "failed", etc., and exit(1)
#1 can be accomplished by appending "|| true" to the postrun_command option.
#2 is the current behavior.
#3 ideally could be handled using stages, but there is no way
currently to ensure a stage is run.
I'm not sure what the default should be here. For example, if the
postrun command "/sbin/iptables -A rule" fails, should the report have
a "failed" status? If we don't send the report, will you ever know,
will you care?
Josh
> One use of pre commands that isn't solved with stages is to check "Should IThis makes complete sense and is how the feature was intended to work,
> even do a Puppet run right now?" or anything else that is out of band in a
> similar sense.
but unfortunately, it never has worked that way (for the agent).
Currently, if the prerun command fails, puppet will attempt to apply
the catalog. Puppet will also always attempt to send a report (due to
#1054), which partially breaks the "master under too much load" use
case.
So there are several ways in which pre/post run failure states can be
handled. I'm curious to think what you think the default behavior
should be and whether you would like to see these other failure states
supported:
If the prerun_command fails:
1. Ignore the failure, continue applying the catalog, send the report, etc.
2. Stop puppet, don't apply the catalog, don't send the report, and exit(1)
3. Stop puppet, don't apply the catalog, but do send the report,
including information about why the prerun command failed, and exit(1)