err: Connection timeout calling puppetmaster.getconfig: execution expired

Arnau Bria

unread,

Jun 9, 2009, 11:33:44 AM6/9/09

to puppet...@googlegroups.com

Hi all,

My current conf splits 188 clients execution in one hour, and puppet
runs as a cron job. My server (2cpu 2 GB RAM) runs with mongrel (with 8
puppetmasterd) and this conf works fine.

We'd like puppet to run clients all at same time (force a change, i.e.),
so we're testing several things. (Previous conf do not support
massive execution).

First and most important, is moving our server to a better host: 4cpus
8GB RAM. We also run mongrel there, but now with 4 puppetmasterd, so
each one has its own cpu.

With first server, we could run up to 40 clients at one time, now,
135. So we're improving.

The error in the nodes where put did not run is:

err: Connection timeout calling puppetmaster.getconfig: execution expired
err: Could not retrieve catalog: Connection Timeout
warning: Not using cache on failed catalog

**What step of
http://reductivelabs.com/trac/puppet/wiki/PuppetInternals gives this error?
Compiling? It's not clear for me, I see no transfer action in that
schema. We have no network errors.

so, inorder to avoid this error, our first idea was an increment of
timeout var in both sides, cleint and server:

# How long the client should wait for the configuration to be
retrieved
# before considering it a failure. This can help reduce flapping if too
# many clients contact the server at one time.
# The default value is '120'.

server:
# puppetmasterd --genconf|grep timeout
configtimeout = 360

client:
# puppetd --genconf|grep timeout
configtimeout = 360

but then we get more errors! only 1 client is able to run its conf.

Does it make sense for anyone?
What other tunning could we test in order to reduce connection timeouts?

server:
# rpm -qa|grep puppet
puppet-0.24.7-4.el5
puppet-server-0.24.7-4.el5
# rpm -qa|grep mongrel
rubygem-mongrel-1.0.1-6.el5

client:
# rpm -qa|grep puppet
puppet-0.24.7-4.el4.x86_64

TIA,
Arnau

David Schmitt

unread,

Jun 10, 2009, 3:47:26 AM6/10/09

to puppet...@googlegroups.com

Arnau Bria schrieb:

> My current conf splits 188 clients execution in one hour, and puppet
> runs as a cron job. My server (2cpu 2 GB RAM) runs with mongrel (with 8
> puppetmasterd) and this conf works fine.
>
> We'd like puppet to run clients all at same time (force a change, i.e.),
> so we're testing several things. (Previous conf do not support
> massive execution).
>
> First and most important, is moving our server to a better host: 4cpus
> 8GB RAM. We also run mongrel there, but now with 4 puppetmasterd, so
> each one has its own cpu.
>
> With first server, we could run up to 40 clients at one time, now,
> 135. So we're improving.
>
> The error in the nodes where put did not run is:
>
> err: Connection timeout calling puppetmaster.getconfig: execution expired
> err: Could not retrieve catalog: Connection Timeout
> warning: Not using cache on failed catalog
>
>
> **What step of
> http://reductivelabs.com/trac/puppet/wiki/PuppetInternals gives this error?
> Compiling? It's not clear for me, I see no transfer action in that
> schema. We have no network errors.

This error means that the client timed out when waiting for the
configuration from the puppetmasterd, marked as "Request to apply the
configuration" in the diagram and "Configuration Transport" in the text.

> so, inorder to avoid this error, our first idea was an increment of
> timeout var in both sides, cleint and server:
>
> # How long the client should wait for the configuration to be
> retrieved
> # before considering it a failure. This can help reduce flapping if too
> # many clients contact the server at one time.
> # The default value is '120'.
>
>
> server:
> # puppetmasterd --genconf|grep timeout
> configtimeout = 360
>
> client:
> # puppetd --genconf|grep timeout
> configtimeout = 360
>
> but then we get more errors! only 1 client is able to run its conf.
>
> Does it make sense for anyone?

Not to me. I'm wondering whether those are the same "getconfig" errors
or do the clients already having a configuration time out on trying to
fetch file resources?

> What other tunning could we test in order to reduce connection timeouts?
>
> server:
> # rpm -qa|grep puppet
> puppet-0.24.7-4.el5
> puppet-server-0.24.7-4.el5
> # rpm -qa|grep mongrel
> rubygem-mongrel-1.0.1-6.el5
>
>
> client:
> # rpm -qa|grep puppet
> puppet-0.24.7-4.el4.x86_64

Two things I would recommend:

1) Do not start all your clients at once. Look at the fqdn_rand
function[1] or --splay[2]. Even spreading the updates over only a few
minutes might make much of a difference for you.

2) Upgrade to 0.24.8. If you are using storeconfigs, this is an absolute
must.

Regards, DavidS

[1] http://reductivelabs.com/trac/puppet/wiki/FunctionReference#fqdn-rand
[2]
http://reductivelabs.com/trac/puppet/wiki/ConfigurationReference#configuration-parameter-reference

--
dasz.at OG Tel: +43 (0)664 2602670 Web: http://dasz.at
Klosterneuburg UID: ATU64260999

FB-Nr.: FN 309285 g FB-Gericht: LG Korneuburg

Arnau Bria

unread,

Jun 10, 2009, 5:10:25 AM6/10/09

to puppet...@googlegroups.com

On Wed, 10 Jun 2009 09:47:26 +0200
David Schmitt wrote:

Hi David,

[...]

> This error means that the client timed out when waiting for the
> configuration from the puppetmasterd, marked as "Request to apply the
> configuration" in the diagram and "Configuration Transport" in the
> text.

yep, sorry:

Puppet currently converts the Transportable objects to YAML, which it
then CGI-escapes and sends over the wire using XMLRPC over HTTPS.

[...]

> > Does it make sense for anyone?
>
> Not to me. I'm wondering whether those are the same "getconfig"
> errors or do the clients already having a configuration time out on
> trying to fetch file resources?

IIRC getconf error...

> Two things I would recommend:
>
> 1) Do not start all your clients at once. Look at the fqdn_rand
> function[1] or --splay[2]. Even spreading the updates over only a few
> minutes might make much of a difference for you.

ok, it loses a little the main purpose of test, but going to see how
many time takes for all nodes to be reconfigured. 5-10 minutes is our
goal.

> 2) Upgrade to 0.24.8. If you are using storeconfigs, this is an
> absolute must.

doing so, but I got some errors just after update... going to open a
new thread.

> Regards, DavidS
>
> [1]
> http://reductivelabs.com/trac/puppet/wiki/FunctionReference#fqdn-rand
> [2]
> http://reductivelabs.com/trac/puppet/wiki/ConfigurationReference#configuration-parameter-reference
>

Many thanks for your reply

Arnau Bria

unread,

Jun 10, 2009, 6:55:32 AM6/10/09

to puppet...@googlegroups.com

On Wed, 10 Jun 2009 11:10:25 +0200
Arnau Bria wrote:

Hi
[...]

I've upgraded, set splay value to true and testes again (against old
server, 2 cpus, 2 GB of RAM and 8 puppetmasterd with mongrel).

My test consists of running puppet complet conf and adding a test file
(tmp/dummy). I remove the file and launch puppet in all hosts, then I
check how many hosts did create dummy file.

on a total of 188 nodes it worked in 135 nodes.

but i see a strange behave:
$ grep " Connection timeout calling puppetmaster.getconfig" *out |wc -l
105
$ grep "Failed to generate additional resources during transaction: Connection Timeout" *out |wc -l
56

so, 161 got errors but did reconfig itself ?¿??¿?

again, does it make sense?

I was thinking of removing puppet file server and work with something
like svn or rsync, so I'd add an exec of svv/rsync and then puppet will
run rpm/exec/services/repos... etc...
I have 150 files and 1.5MB size in total.