Long config retrieval on nodes

65 views
Skip to first unread message

Andrew Stangl

unread,
Mar 6, 2012, 6:46:14 AM3/6/12
to puppet...@googlegroups.com
Hi all, hoping someone may have encountered a problem similar to this before:

On my customer's EC2 based infrastructure, we have implemented the nodeless, truth driven module outlined by Jordan Sissel here http://www.semicomplete.com/blog/geekery/puppet-nodeless-configuration.
It's quite an effective model, especially in the realm of EC2... we still have a puppetmaster configuration, and have decided not to go with the masterless option, since we'd like to implement a nagios module using the storeconfigs approach and exported resources.

Problem is, we're seeing large latency during agent runs, which appears to be down to the config retrieval step, so much so that we've needed to increase timeouts to avoid our Apache/Mongrel puppetmasterd solution from timing out the connection. I've done some basic profiling, using --summarize and --evaltrace, which shows that the bottelneck appears to be happening at the config retrieval level:

(This is on our monitoring server)

Time:
   Attachedstorage: 2.00
            Class: 0.00
    Collectd conf: 0.01
   Config retrieval: 85.91
             Cron: 0.00
             Exec: 34.11
             File: 35.56
       Filebucket: 0.00
            Group: 0.26
        Mailalias: 0.17
            Mount: 3.48
   Nagios command: 0.02
   Nagios contact: 0.00
   Nagios contactgroup: 0.00
      Nagios host: 0.02
   Nagios service: 0.12
   Nagios servicegroup: 0.00
          Package: 3.12
        Resources: 0.00
         Schedule: 0.01

This agent run has been done with storeconfigs enabled, but it appears that the Nagios resources are being processed fairly quickly; when we disable storeconfigs, the config retrieval can be reduced to almost half as long, which would suggest db latency. We've got the puppetmaster running on a m1.small EC2 instance, which only seems to have a single core - I'm not sure if that's perhaps the cause of the bottleneck?

Any suggestions / advice would be much appreciated, thanks in advance!

Cheers,
Andrew



Message has been deleted

Andrew Stangl

unread,
Mar 8, 2012, 7:01:35 AM3/8/12
to puppet...@googlegroups.com
We ended up upgrading the EC2 instance from a m1.small to a c1.medium ..
it was bottoming out on cpu load, and increasing this to a dual core instance resolved the issue :)

No more timeouts!! and a happy customer too
Reply all
Reply to author
Forward
0 new messages