Jira (PDB-4020) Puppetserver should handle 503s from PuppetDB

9 views
Skip to first unread message

Jarret Lavallee (JIRA)

unread,
Aug 24, 2018, 2:15:04 PM8/24/18
to puppe...@googlegroups.com
Jarret Lavallee moved an issue
 
PuppetDB / Bug PDB-4020
Puppetserver should handle 503s from PuppetDB
Change By: Jarret Lavallee
Fix Version/s: SERVER 6.y
Affects Version/s: SERVER 5.3.1
Affects Version/s: PDB 5.2.4
Component/s: Puppet Server
Component/s: PuppetDB
Key: SERVER PDB - 2230 4020
Project: Puppet Server PuppetDB
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Jarret Lavallee (JIRA)

unread,
Aug 24, 2018, 2:24:04 PM8/24/18
to puppe...@googlegroups.com
Jarret Lavallee updated an issue
Change By: Jarret Lavallee
Labels: maintenance
Team: Server PuppetDB
As a Puppet user I expect the puppetserver to gracefully handle a transient PuppetDB maintenance mode state without reporting agent failures when there are multiple PuppetDB instances configured

When
one of the PuppetDB instances is in maintenance mode, it returns a 503 to the Puppetserver   which was implemented in PDB-1019 should be handled when {{command_broadcast = true}} and {{min_successful_submissions = 1}} . Puppetserver However, puppetserver  sees this as a failure and sends a 500 error to the agent. The 503 indicates that Since the Puppetserver or the agent should resubmit the query again later other PuppetDB instance is available , but puppetserver does not handle this gracefully and it results in a failed agent run.

 

PUP-7451 implemented Retry-After headers in the puppet agent
{{min_successful_submissions = 1}} , so it may be an option to relay the 503 to should be ignored and the agent command to retry it later the other PuppetDB should be successful . Another option The actual result is to implement a queue or retry within that Puppetserver when sends a 503 is received from 500 back to the agent and the agent runs are all failures for the duration that any of the PuppetDB nodes are in maintenance mode .  


 

Steps to reproduce:
# Configure PuppetDB replication
#
Configure {{command_broadcast = true}} in the {{puppetdb.conf}} on the master
#
Run a puppet agent while PuppetDB one of the PuppetDBs is in maintenance mode and the other one is available.


h2. Logs:

 

From the puppetserver.log

 
{code:java}2018-06-08T10:35:23.831-07:00 WARN [qtp2042713953-1209] [puppetserver] Puppet Error connecting to pe-201810-agent-replica.puppetdebug.vlan on 8081 at route /pdb/cmd/v1?checksum=c36ef6428032c548e53a17e5c22e5b4447748574&version=9&certname=pe-201810-agent.puppetdebug.vlan&command=replace_catalog&producer-timestam p=1528479323, error message received was ''. Failing over to the next PuppetDB server_url in the 'server_urls' list 2018-06-08T10:35:23.835-07:00 ERROR [qtp2042713953-1209] [puppetserver] Puppet [503 ] PuppetDB is currently down. Try again later. 2018-06-08T10:35:23.835-07:00 ERROR [qtp2042713953-1209] [puppetserver] Puppet /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/puppetdb/command.rb:8 2:in `submit' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:62:in `block in submit_command' /opt/puppetlabs/puppet/lib/ruby/vendor_rub y/puppet/util/profiler/around_profiler.rb:58:in `profile' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile' /opt/puppetlab s/puppet/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:99:in `profile' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:59:in `submit_comm and' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/catalog/puppetdb.rb:14:in `block in save' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/pup pet/util/profiler/around_profiler.rb:58:in `profile' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile' /opt/puppetlabs/pup pet/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:99:in `profile' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/catalog/puppetdb.rb:11:in `sa ve' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/store_configs.rb:24:in `save' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirecto r/indirection.rb:204:in `find' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/api/indirected_routes.rb:121:in `do_find' /opt/puppetlabs/pup pet/lib/ruby/vendor_ruby/puppet/network/http/api/indirected_routes.rb:48:in `block in call' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/context.rb:65 :in `override' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet.rb:260:in `override' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/api/i ndirected_routes.rb:47:in `call' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/route.rb:82:in `block in process' org/jruby/RubyArray.java: 1735:in `each' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/route.rb:81:in `process' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/n etwork/http/route.rb:87:in `process' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/route.rb:87:in `process' /opt/puppetlabs/puppet/lib/rub y/vendor_ruby/puppet/network/http/handler.rb:64:in `block in process' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58 :in `profile' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile' /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network /http/handler.rb:62:in `process' uri:classloader:/puppetserver-lib/puppet/server/master.rb:42:in `handleRequest'{code}
 

From the agent, which gets a 500 instead of a 503.

 
{code:java} -> "HTTP/1.1 500 Server Error\r\n"
-> "Date: Fri, 08 Jun 2018 17:35:33 GMT\r\n"
-> "Content-Type: application/json;charset=utf-8\r\n"
-> "X-Puppet-Version: 5.5.1\r\n"
-> "Content-Length: 108\r\n"
-> "Server: Jetty(9.4.z-SNAPSHOT)\r\n"
-> "\r\n"
reading 108 bytes...
-> "{\"message\":\"Server Error: [503 ] PuppetDB is currently down. Try again later.\",\"issue_kind\":\"RUNTIME_ERROR\"}"
read 108 bytes
Conn keep-alive
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: [503 ] PuppetDB is currently down. Try again later.
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run{code}
 


I suspect this is coming from an unhandled or unsent exception in https://github.com/puppetlabs/puppetdb/blob/master/puppet/lib/puppet/util/puppetdb/http.rb#L162-L192

Jarret Lavallee (JIRA)

unread,
Aug 24, 2018, 2:24:04 PM8/24/18
to puppe...@googlegroups.com
Jarret Lavallee commented on Bug PDB-4020
 
Re: Puppetserver should handle 503s from PuppetDB

Charlie Sharpsteen I think you are right about where this is occurring. I have moved this to be a PDB ticket.

Jarret Lavallee (JIRA)

unread,
Aug 30, 2018, 5:46:04 PM8/30/18
to puppe...@googlegroups.com

Jarret Lavallee (JIRA)

unread,
Aug 30, 2018, 5:46:05 PM8/30/18
to puppe...@googlegroups.com

Zachary Kent (JIRA)

unread,
Sep 13, 2018, 4:53:05 PM9/13/18
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Fix Version/s: PDB 6.0.0

Kenn Hussey (JIRA)

unread,
Sep 18, 2018, 11:15:05 AM9/18/18
to puppe...@googlegroups.com
Kenn Hussey updated an issue
Change By: Kenn Hussey
Fix Version/s: PDB 5.2.z
Fix Version/s: PDB 5.2.5

Zachary Kent (JIRA)

unread,
Oct 22, 2018, 6:37:07 PM10/22/18
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Fix Version/s: PDB 5.2.5
Fix Version/s: PDB 5.2.6

Austin Boyd (JIRA)

unread,
Dec 5, 2019, 11:29:05 AM12/5/19
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket IDs: 32647
Zendesk Ticket Count: 1

Austin Boyd (JIRA)

unread,
Dec 5, 2019, 11:29:06 AM12/5/19
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket IDs: 32647 ,32663
Zendesk Ticket Count: 1 2

Austin Boyd (JIRA)

unread,
Dec 5, 2019, 11:30:05 AM12/5/19
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket IDs: 32647,32663 ,32694
Zendesk Ticket Count: 2 3

Alvin Rodis (Jira)

unread,
Aug 16, 2022, 4:08:03 PM8/16/22
to puppe...@googlegroups.com
Alvin Rodis updated an issue
Change By: Alvin Rodis
Labels: jira_escalated maintenance
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo

Alvin Rodis (Jira)

unread,
Aug 16, 2022, 4:09:03 PM8/16/22
to puppe...@googlegroups.com
Alvin Rodis updated an issue
Change By: Alvin Rodis
Zendesk Ticket Count: 3 4
Zendesk Ticket IDs: 32647,32663,32694 ,49292
Reply all
Reply to author
Forward
0 new messages