Upgrade to puppet-agent 5.3.5 report failure

44 views
Skip to first unread message

Darragh Bailey

unread,
Nov 27, 2018, 1:10:53 PM11/27/18
to Puppet Users
Hi,


Currently in the process of testing out an upgrade to version 5 of the puppet-agent within our local virtual environment used to validate changes before they can be landed and I'm running into a few problems around the report at the end run.

Have 5 VMs in a vagrant environment, that are initially bootstrapped with some scripts to get the puppet 5 packages installed, then uses puppet apply to perform some initial setup around network/apt-caching, followed by applying the 'puppet_server' provisioner which runs puppet agent.

Unfortunately I'm seeing an error, that doesn't occur on all the VM's and I'm not sure how to debug it further or understand what's missing.

vagrant up
...
==> srv-1: Warning: Event['previous_value'] contains a Process::Status value. It will be converted to the String 'pid 30408 exit 1'
==> srv-1: Warning: Event['previous_value'] contains a Process::Status value. It will be converted to the String 'pid 32434 exit 1'
==> srv-1: Error: Could not send report: Error 400 on SERVER: Bad Request: The request body is invalid: Could not intern from json: Internal Error: Puppet Context ':loaders' missing
....
==> srv-3: Warning: Event['previous_value'] contains a Process::Status value. It will be converted to the String 'pid 28777 exit 1'
==> srv-3: Error: Could not send report: Error 400 on SERVER: Bad Request: The request body is invalid: Could not intern from json: Internal Error: Puppet Context ':loaders' missing


What is also surprising is that it doesn't occur on all of the VM's, and subsequently it doesn't appear if I re-run the provisioning with: vagrant up --provision --provision-with puppet_server

There was a suggestion that there could be some stale code around as the image starts with puppet 3 pre-installed, but I've got the bootstrapping scripts to purge the old packages and delete any files that could have been placed under /var/lib/puppet and /etc/puppet

bash code:

package=puppet5-release-xenial.deb
env https_proxy=$HTTPS_PROXY wget \
    --quiet --continue -O /tmp/$package https://apt.puppetlabs.com/$package
dpkg -i /tmp/$package
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get purge --yes puppet hiera facter
rm -rf /var/lib/puppet /etc/puppet
apt-get install --yes --no-install-recommends puppet-agent=5.3.5-1xenial ruby policykit-1

Currently pinned to 5.3.5 because there was an issues with a subsequent release and decided to just pin to the same version as the upgraded puppet master was running.

I've tried switching the clients to 5.5.8 and I get the same error, so it's not solved by moving to the most recent version.

Grep'ing through /var/lib/puppet hasn't been illuminating, didn't spot anything when switching it to use debug, and neither has been inspecting the puppet master log so I'm not sure where exactly to look?

The quick fix is to disable reporting within the virtual environment, which certainly solves the problem, but seems like the wrong approach.

Any thoughts on how to debug this? What do I need to enable on the puppet master to be able to capture report requests both good and bad so I can see what it is that is being sent that gets rejected, and what should be sent?

--
Darragh Bailey

Eric Sorenson

unread,
Nov 28, 2018, 1:32:44 PM11/28/18
to Puppet Users
Hi Darragh, the fact that the error message contains a '400' error suggests the problem happens on the server when it receives the report. 

My first guess given that error message is also that there's a mix of versions installed, but it's weird that it only happens on some reports. Maybe there is something malformed in those reports that triggers a different code path on the server.

You can save a copy of the reports by adding `store` to the type of report submission on the master: `reports = https,store` and see what they look like. They should go into a subdirectory of /opt/puppetlabs/puppet/cache/reports

HTH
--eric0

Darragh Bailey

unread,
Nov 29, 2018, 6:27:13 AM11/29/18
to Puppet Users


On Wednesday, November 28, 2018 at 6:32:44 PM UTC, Eric Sorenson wrote:
Hi Darragh, the fact that the error message contains a '400' error suggests the problem happens on the server when it receives the report. 

My first guess given that error message is also that there's a mix of versions installed, but it's weird that it only happens on some reports. Maybe there is something malformed in those reports that triggers a different code path on the server.

Indeed there is, I missed that there were errors further back in the output log previously and assumed the failure to post the report successfully was the error. Seems they are not compilation errors but rather runtime errors where something unexpected occurred in applying the catalog.

For successful runs there is no problem in posting back the report, this at least now gives me something to reproduce consistently, just add an error somewhere that will trigger it.


 
You can save a copy of the reports by adding `store` to the type of report submission on the master: `reports = https,store` and see what they look like. They should go into a subdirectory of /opt/puppetlabs/puppet/cache/reports

HTH
--eric0

Ah, fantastic, didn't think of that! That will definitely help, many thanks.

With spotting why it was occurring this will hopefully help me track down the cause. Fortunately now that I've discovered it's not the cause of the failures but a side effect, it's definitely less worrying, but still odd that it occurs.

--
Darragh Bailey
 

Darragh Bailey

unread,
Nov 30, 2018, 10:39:16 AM11/30/18
to Puppet Users


On Thursday, November 29, 2018 at 11:27:13 AM UTC, Darragh Bailey wrote:


On Wednesday, November 28, 2018 at 6:32:44 PM UTC, Eric Sorenson wrote:
Hi Darragh, the fact that the error message contains a '400' error suggests the problem happens on the server when it receives the report. 

My first guess given that error message is also that there's a mix of versions installed, but it's weird that it only happens on some reports. Maybe there is something malformed in those reports that triggers a different code path on the server.

Indeed there is, I missed that there were errors further back in the output log previously and assumed the failure to post the report successfully was the error. Seems they are not compilation errors but rather runtime errors where something unexpected occurred in applying the catalog.

For successful runs there is no problem in posting back the report, this at least now gives me something to reproduce consistently, just add an error somewhere that will trigger it.

So confirmed I can reproduce this by simply adding the following block to the site.pp for our vagrant environment

  exec { 'break the run':
    command => '/bin/false',
  }

However it only triggers the error on the first run, once there is a successful run, can't trigger the problem subsequently.
 
 
You can save a copy of the reports by adding `store` to the type of report submission on the master: `reports = https,store` and see what they look like. They should go into a subdirectory of /opt/puppetlabs/puppet/cache/reports

HTH
--eric0

Ah, fantastic, didn't think of that! That will definitely help, many thanks.

 
So apparently 'store' is the default based on https://puppet.com/docs/puppet/5.4/reporting_about.html#configuring-reporting and it should be 'http' just in case anyone else encounters this.

But it doesn't appear to have any effect in capturing the report being sent. Might need to make use of nginx and report_port as a way to forward on the request while logging it's content it so that I can grab what was sent.

Looking on the agent node I see the /var/lib/puppet/state/last_run_summary.yaml containing

events:
  failure: 1
  success: 362
  total: 363

But unfortunately there's no /var/lib/puppet/state/last_run_report.yaml that corresponds, as it doesn't appear to get updated due to the failure, so there's no actual record of the report being sent to the master.

--
Darragh Bailey

Josh Cooper

unread,
Nov 30, 2018, 5:51:30 PM11/30/18
to puppet...@googlegroups.com
On Fri, Nov 30, 2018 at 7:39 AM Darragh Bailey <daragh...@gmail.com> wrote:


On Thursday, November 29, 2018 at 11:27:13 AM UTC, Darragh Bailey wrote:


On Wednesday, November 28, 2018 at 6:32:44 PM UTC, Eric Sorenson wrote:
Hi Darragh, the fact that the error message contains a '400' error suggests the problem happens on the server when it receives the report. 

My first guess given that error message is also that there's a mix of versions installed, but it's weird that it only happens on some reports. Maybe there is something malformed in those reports that triggers a different code path on the server.

Indeed there is, I missed that there were errors further back in the output log previously and assumed the failure to post the report successfully was the error. Seems they are not compilation errors but rather runtime errors where something unexpected occurred in applying the catalog.

For successful runs there is no problem in posting back the report, this at least now gives me something to reproduce consistently, just add an error somewhere that will trigger it.

So confirmed I can reproduce this by simply adding the following block to the site.pp for our vagrant environment

  exec { 'break the run':
    command => '/bin/false',
  }

However it only triggers the error on the first run, once there is a successful run, can't trigger the problem subsequently.

This makes me think it's related to the transactionstore functionality (it keeps track of what the "current" value is the last time puppet applied a catalog, so that it can tell if a change is due to an intentional manifest change or the system drifted and needed to be remediated). I say that because the transaction store records information differently depending if the event was successful or not. Might want to look in `puppet config print transactionstorefile --section agent` on the agent to see if there are any clues.

Also earlier you mentioned:

> Warning: Event['previous_value'] contains a Process::Status value. It will be converted to the String 'pid 30408 exit 1'

The "previous_value" for an exec should always be "notrun" or the desired value (due to the onlyif/unless/creates checks causing the exec to skip). I wonder if somehow the previous system from the transactionstore is ending up in the next report?

 
You can save a copy of the reports by adding `store` to the type of report submission on the master: `reports = https,store` and see what they look like. They should go into a subdirectory of /opt/puppetlabs/puppet/cache/reports

HTH
--eric0

Ah, fantastic, didn't think of that! That will definitely help, many thanks.

 
So apparently 'store' is the default based on https://puppet.com/docs/puppet/5.4/reporting_about.html#configuring-reporting and it should be 'http' just in case anyone else encounters this.

But it doesn't appear to have any effect in capturing the report being sent. Might need to make use of nginx and report_port as a way to forward on the request while logging it's content it so that I can grab what was sent.

Looking on the agent node I see the /var/lib/puppet/state/last_run_summary.yaml containing

events:
  failure: 1
  success: 362
  total: 363

But unfortunately there's no /var/lib/puppet/state/last_run_report.yaml that corresponds, as it doesn't appear to get updated due to the failure, so there's no actual record of the report being sent to the master.



--
Darragh Bailey

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/45e5e998-f3a1-4aa1-8066-5bed2a4dab29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Josh Cooper | Software Engineer
Reply all
Reply to author
Forward
0 new messages