Jira (PUP-1965) Clients are hung when server has intermittent service

6 views
Skip to first unread message

Maggie Dreyer (JIRA)

unread,
May 16, 2017, 1:06:04 PM5/16/17
to puppe...@googlegroups.com
Maggie Dreyer updated an issue
 
Puppet / Bug PUP-1965
Clients are hung when server has intermittent service
Change By: Maggie Dreyer
Labels: triaged
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Moses Mendoza (JIRA)

unread,
May 18, 2017, 1:49:19 PM5/18/17
to puppe...@googlegroups.com

Sam McLeod (JIRA)

unread,
May 22, 2018, 1:52:04 AM5/22/18
to puppe...@googlegroups.com
Sam McLeod commented on Bug PUP-1965
 
Re: Clients are hung when server has intermittent service

We have this issue as well.

We can reproduce it on Puppet Enterprise 2018.1 and CentOS 7.5 clients.

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Sam McLeod (JIRA)

unread,
May 22, 2018, 1:59:02 AM5/22/18
to puppe...@googlegroups.com
Sam McLeod updated an issue
 
Change By: Sam McLeod
Comment:
We have this issue as well.

We can reproduce it on Puppet Enterprise 2018.1 and CentOS 7.5 clients.

R.I.Pienaar (JIRA)

unread,
May 22, 2018, 1:33:03 PM5/22/18
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
May 29, 2018, 12:59:04 PM5/29/18
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-1965

We've added a watchdog/supervisor to the agent (in PUP-7517), but it not activated by default in 5.x. I recommend setting runtimeout to a non-infinite value to activate it.

Also note you may want to change http_read_timeout to a non-infinite value. We will be changing the default value of both settings in 6.0 in PUP-8683.

Josh Cooper (JIRA)

unread,
Jun 7, 2018, 12:17:05 PM6/7/18
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-1965

Sam McLeod can you verify that setting http_read_timeout to a non-infinite value resolves the issue? If not please capture strace output when the agent is stuck.

Josh Cooper (JIRA)

unread,
Jun 7, 2018, 12:17:05 PM6/7/18
to puppe...@googlegroups.com
Josh Cooper assigned an issue to Sam McLeod
 
Change By: Josh Cooper
Assignee: jon yeargers Sam McLeod

Michael Marod (JIRA)

unread,
Jun 14, 2018, 11:51:05 PM6/14/18
to puppe...@googlegroups.com
Michael Marod commented on Bug PUP-1965
 
Re: Clients are hung when server has intermittent service

Suffering from the same symptoms and am led to believe that setting http_read_timeout will resolve the problem. I hooked up gdb to one of our stuck puppet agents and grabbed a backtrace:

(gdb) call (void) rb_backtrace()
	from /opt/puppetlabs/puppet/bin/puppet:5:in `<main>'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/command_line.rb:73:in `execute'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/command_line.rb:137:in `run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application.rb:375:in `run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util.rb:661:in `exit_on_fail'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application.rb:375:in `block in run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application/agent.rb:352:in `run_command'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application/agent.rb:390:in `main'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/daemon.rb:149:in `start'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/daemon.rb:193:in `run_event_loop'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/scheduler/scheduler.rb:11:in `run_loop'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/scheduler/scheduler.rb:34:in `run_ready'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/scheduler/scheduler.rb:34:in `each'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/scheduler/scheduler.rb:39:in `block in run_ready'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/scheduler/job.rb:49:in `run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/daemon.rb:170:in `block in run_event_loop'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:46:in `run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application.rb:179:in `controlled_run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:48:in `block in run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:94:in `run_in_fork'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:94:in `fork'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:98:in `block in run_in_fork'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:49:in `block (2 levels) in run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:130:in `with_client'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:52:in `block (3 levels) in run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent/locker.rb:21:in `lock'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:58:in `block (4 levels) in run'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/timeout.rb:103:in `timeout'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/timeout.rb:93:in `block in timeout'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:59:in `block (5 levels) in run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/configurer.rb:211:in `run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet.rb:260:in `override'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/context.rb:65:in `override'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/configurer.rb:237:in `block in run'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/configurer.rb:383:in `run_internal'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/configurer.rb:382:in `ensure in run_internal'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/configurer.rb:419:in `send_report'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/indirection.rb:289:in `save'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/rest.rb:235:in `save'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/rest.rb:254:in `do_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/request.rb:220:in `do_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/rest.rb:254:in `block in do_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/rest.rb:236:in `block in save'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/rest.rb:146:in `http_put'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/rest.rb:151:in `http_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:118:in `put'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:174:in `do_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:174:in `upto'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:177:in `block in do_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:313:in `with_connection'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/pool.rb:31:in `with_connection'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:314:in `block in with_connection'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:180:in `block (2 levels) in do_request'
	from /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:302:in `execute_request'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1416:in `request'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1443:in `transport_request'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1443:in `catch'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1446:in `block in transport_request'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http/response.rb:29:in `read_new'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http/response.rb:40:in `read_status_line'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:164:in `readline'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:154:in `readuntil'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:176:in `rbuf_fill'
	from /opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:176:in `wait_readable' 

strace'ing the pid gave:

# strace -p 10907
strace: Process 10907 attached
ppoll([{fd=8, events=POLLIN}], 1, NULL, NULL, 8 

/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:176

171     def rbuf_fill
172       case rv = @io.read_nonblock(BUFSIZE, exception: false)
173       when String
174         return @rbuf << rv
175       when :wait_readable
176         @io.to_io.wait_readable(@read_timeout) or raise Net::ReadTimeout
177         # continue looping

So it looks like its stuck waiting for read_timeout to hit which is unset (aka infinity) in our environment. We should be able to set the http_read_timeout value in the puppet agent config to toggle this setting. Its a little bit tricky to test but it looks like the solution.

"runtimeout" in this case does nothing. If you follow the backtrace further you'll get to...

/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/agent.rb:59

 52             lock do
 53               # NOTE: Timeout is pretty heinous as the location in which it
 54               # throws an error is entirely unpredictable, which means that
 55               # it can interrupt code blocks that perform cleanup or enforce
 56               # sanity. The only thing a Puppet agent should do after this
 57               # error is thrown is die with as much dignity as possible.
 58               Timeout.timeout(Puppet[:runtimeout], RunTimeoutError) do
 59                 client.run(client_args)
 60               end
 61             end 

The timeout set on line 58 in this case does nothing. We have runtimeout set to 1800s and it is ineffective.

 

Josh Cooper (JIRA)

unread,
Jul 16, 2018, 7:45:02 PM7/16/18
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-1965

We have runtimeout set to 1800s and it is ineffective.

Michael Marod Did you mean runtimeout is not ineffective if http_read_timeout is left as the default infinite value? That would make sense since ruby is stuck in the call to @io.to_io.wait_readable(@read_timeout). Also Timeout.timeout is evil and shouldn't be used.

Btw, we will be changing http_read_timeout and runtimeout to default to a non-infinite value in puppet 6, see PUP-8683.

Josh Cooper (JIRA)

unread,
Mar 29, 2019, 6:13:06 PM3/29/19
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
Feb 25, 2020, 7:43:04 PM2/25/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-1965
 
Re: Clients are hung when server has intermittent service

Given the changes in PUP-7517 and PUP-8683, and lack of feedback, I'm going to close this ticket. Please reopen if you see this issue when using a non-infinite http_read_timeout.

Reply all
Reply to author
Forward
0 new messages