Hi guys,
I've searched but not found what I'm looking for, sorry if this has been asked before.
Background:
I am trying to monitor puppet run success by monitoring the file /var/lib/puppet/state/last_run_summary.yaml. Then I am trying to break a puppet run, by temporarily removing a manifest on the puppet master, which is needed by a client. This is my test to see if the check works and gets caught by our monitoring system.
A puppet agent -t looks like:
{code}
puppet agent -t
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/filesystems.rb
Info: Loading facts in /var/lib/puppet/lib/facter/postgres_default_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rabbitmq_erlang_cookie.rb
Info: Loading facts in /var/lib/puppet/lib/facter/ip6tables_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/iptables_persistent_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/iptables_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/os_maj_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::ouf for ov28.fqdn on node ov28.fqdn
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
{code}
And then, I run my monitor to see if it detects a broken run:
sudo -u xymon sudo /usr/libexec/xymon/client/ext/check_puppet.rb -w 2000 -c 3600
CRITICAL: FAILED - Puppet failed to run. Missing dependencies? Catalog compilation failed? Last run 23 seconds ago|time_since_last_run=23s;2000;3600;0 failed_resources=99;;;0 failed_events=99;;;0
Great, the check detects that puppet has failed. The last_run_summary looks like this after the run:
cat /var/lib/puppet/state/last_run_summary.yaml
---
version:
config:
puppet: "3.4.3"
time:
last_run: 1401807503
However. After puppet agent schedules a puppet run, I do not get the same errors. The contents of last_run_summary.yaml look like a normal puppet run has completed successfully:
cat /var/lib/puppet/state/last_run_summary.yaml
---
changes:
total: 0
version:
puppet: "3.4.3"
config: 1401798243
time:
last_run: 1401808053
anchor: 0.002382
total: 227.941278069473
exec: 0.552989
datacat_fragment: 0.00575
mount: 0.001974
ssh_authorized_key: 0.025437
schedule: 0.000933
package: 0.542415
datacat_collector: 0.012692
user: 0.130179
host: 0.000364
filebucket: 0.000187
file: 220.198688
config_retrieval: 1.89250206947327
service: 4.57266
group: 0.002126
resources:
changed: 0
failed_to_restart: 0
total: 513
out_of_sync: 0
skipped: 0
restarted: 0
failed: 0
scheduled: 0
events:
failure: 0
total: 0
success: 0
And so the monitor does not pick up the errors.
Any ideas? What am I doing wrong?
Thanks in advance :)