Performance issue with puppetserver

224 views
Skip to first unread message

n.be...@gaijin.ru

unread,
Dec 12, 2017, 12:17:20 PM12/12/17
to Puppet Users
Hello!

We have a problem with very slow work puppetserver.


We have ~300 nodes, master runs on a server with 24 cores and 20 GB of memory.

Pappet agent log:

Notice: Applied catalog in 4845.82 seconds
Changes:
           
Total: 1043
Events:
         
Success: 1043
           
Total: 1043
Resources:
     
Out of sync: 1025
         
Changed: 1025
           
Total: 1521
       
Restarted: 38
Time:
       
Filebucket: 0.00
       
Resources: 0.00
   
Sensu api config: 0.00
   
Sensu redis config: 0.00
   
Sensu enterprise dashboard config: 0.00
     
Concat file: 0.00
         
Schedule: 0.00
   
Sensu client config: 0.00
           
Anchor: 0.00
   
Concat fragment: 0.00
       
Mailalias: 0.00
   
Sensu rabbitmq config: 0.02
       
Ipmi user: 0.03
           
Group: 0.05
             
Cron: 0.07
         
Yumrepo: 0.07
     
Sensu check: 0.08
             
Host: 0.09
             
User: 0.97
         
Last run: 1513034474
             
Exec: 19.23
           
Sshkey: 2.83
         
Package: 232.51
           
Augeas: 5.00
             
File: 4466.81
           
Total: 4810.97
   
Config retrieval: 75.19
         
Service: 8.01
Version:
           
Config: 1513029567
           
Puppet: 4.4.1



In fact, time is even 1.5 times more.

strace master process (and his thread of course):
# cat strace.txt | grep stat | wc -l
2094587
# cat strace.txt | grep -v stat | wc -l
566745

There are a lot of call:
26104 stat ("/ etc / puppetlabs / code / environments / production / shared / modules / elasticsearch / lib / puppet / parser / functions /../../../ hiera / backend / eyaml / encryptors / pkcs7.rb" , <unfinished ...>
Threads call it many of times and hang in this state. We use the elastic module for a long time and there were no this problem.

How can I understand, what is the reason for this?

Martin Alfke

unread,
Dec 13, 2017, 11:23:00 AM12/13/17
to puppet...@googlegroups.com

Hi,

> On 12 Dec 2017, at 12:41, n.be...@gaijin.ru wrote:
>
> Hello!
>
> We have a problem with very slow work puppetserver.
>
>
> We have ~300 nodes, master runs on a server with 24 cores and 20 GB of memory.

Low number of nodes. Enough CPU and RAM.

>
> Pappet agent log:
>
> Notice: Applied catalog in 4845.82 seconds
> Changes:
> Total: 1043

Many changes. Is this initial Puppet run or is this standard that you have 1043 changes on every Puppet run?
Which Puppetserver version are you running?
What are you doing with file resources?
Which packages do you manage?
In the timing list you see that file resource type and the package resource type take the most time.



> Config retrieval: 75.19
> Service: 8.01
> Version:
> Config: 1513029567
> Puppet: 4.4.1
>
>
>
> In fact, time is even 1.5 times more.
>
> strace master process (and his thread of course):
> # cat strace.txt | grep stat | wc -l
> 2094587
> # cat strace.txt | grep -v stat | wc -l
> 566745
>
> There are a lot of call:
> 26104 stat ("/ etc / puppetlabs / code / environments / production / shared / modules / elasticsearch / lib / puppet / parser / functions /../../../ hiera / backend / eyaml / encryptors / pkcs7.rb" , <unfinished ...>
> Threads call it many of times and hang in this state. We use the elastic module for a long time and there were no this problem.

That is OK. That is hiera eyaml called via lookup or automatic data binding.


>
> How can I understand, what is the reason for this?
>
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/deefad16-f62e-4b3f-98f8-eedeeb12b30c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

jcbollinger

unread,
Dec 13, 2017, 11:57:22 AM12/13/17
to Puppet Users


On Tuesday, December 12, 2017 at 11:17:20 AM UTC-6, n.be...@gaijin.ru wrote:

We have a problem with very slow work puppetserver.


We have ~300 nodes, master runs on a server with 24 cores and 20 GB of memory.

Pappet agent log:
[...]
Time:
[...]
             
File: 4466.81

           
Total: 4810.97
   
Config retrieval: 75.19
         
Service: 8.01
Version:
           
Config: 1513029567
           
Puppet: 4.4.1


Your log shows very long runtime (80+ minutes) for the Puppet agent.  That's neither puppetserver nor the master.

The vast majority of the agent runtime is consumed in syncing File resources.  That tells me that you are syncing an enormous volume of files, and possibly also a great number of them.  You will find numerous previous discussions of such problems in this group; here are the usual recommendations:
  1. Prefer to reserve File resources for smallish numbers of smallish files.  Config files are the sweet spot for this resource type.
  2. Prefer to package files and manage them via Package resources, as opposed to recursively syncing directories full of files, or otherwise syncing large numbers of related files.
  3. Avoid syncing temporary files.  If you use a File resource to manage a file, then it should be one that will remain on the system, because if you remove it (or modify it) then Puppet will just sync it again on the next run.  This can mean leaving a file in place that otherwise you would remove.
  4. If you must manage a large file via a File resource then consider specifying a different `checksum` attribute for it.  The default is md5, but 'md5lite' will be faster and still give an ok test for modification.  Or you can even go with 'mtime', which is very fast, but is susceptible to both false positives and false negatives.

Your log also shows a longish runtime (75 seconds) for catalog retrieval.  The `stat()` calls in your strace suggest that you're using the 'eyaml' Hiera back end, and this does add overhead.  If you're storing all your Hiera data in the eyaml back end, then it may add a lot of overhead.  I suggest using eyaml only for those data that actually need to be kept in confidence, and among them, only those for which the general access controls provided by the master's system are insufficient.  Configure the standard back end too, and use it for the rest of your data.  There may be other issues, too, such as the system load from all those File transfers, but I see no details that point me to specific server-side problems.


John

Rob Nelson

unread,
Dec 13, 2017, 12:24:04 PM12/13/17
to puppet...@googlegroups.com
For packing files (or applications, or anything) as John suggested (#2), I recommend FPM. It's really easy to create an rpm, deb, or whatever quickly without having to learn the arcane options for each package builder. It's not considered suitable for distribution-quality packages but I do not think that is a problem here. https://github.com/jordansissel/fpm

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/4f0e88f7-2f2a-4c86-9a22-e1e072879c95%40googlegroups.com.

n.be...@gaijin.ru

unread,
Dec 13, 2017, 12:40:27 PM12/13/17
to Puppet Users


среда, 13 декабря 2017 г., 19:23:00 UTC+3 пользователь Martin Alfke написал:

Hi,

> On 12 Dec 2017, at 12:41, n.be...@gaijin.ru wrote:
>
> Hello!
>
> We have a problem with very slow work puppetserver.
>
>
> We have ~300 nodes, master runs on a server with 24 cores and 20 GB of memory.

Low number of nodes. Enough CPU and RAM.

>
> Pappet agent log:
>
> Notice: Applied catalog in 4845.82 seconds
> Changes:
>             Total: 1043

Many changes. Is this initial Puppet run or is this standard that you have 1043 changes on every Puppet run?
This is the initial run. Installing a new server.
# puppetserver -v
puppetserver version: 2.7.2

What are you doing with file resources?
Create files from templates or simply upload to node.
 
Which packages do you manage?
rpm on centos 7.

Martin Alfke

unread,
Dec 13, 2017, 1:30:56 PM12/13/17
to puppet...@googlegroups.com

> On 13 Dec 2017, at 18:40, n.be...@gaijin.ru wrote:
>
>
> Many changes. Is this initial Puppet run or is this standard that you have 1043 changes on every Puppet run?
> This is the initial run. Installing a new server.

OK.

> > File: 4466.81
>
> Which Puppetserver version are you running?
>
> # puppetserver -v
> puppetserver version: 2.7.2
>
> What are you doing with file resources?
> Create files from templates or simply upload to node.

You are managing lots of files on your agent.
How many are managed?
grep file /opt/puppetlabs/puppet/cache/state/resources.txt | wc -l

Example: a standard Puppet Enterprise Master has ~300 files managed.

Do you manage directories using recursive => true?
This is something you should not do. In this case you should create packages (see https://github.com/jordansissel/fpm/wiki)

Are most files templates (using content => epp(…)) or static files (using source => ‘puppet:///…’)?

Robert

unread,
Dec 13, 2017, 2:27:06 PM12/13/17
to puppet...@googlegroups.com
Hey,

Do you manage directories using recursive => true?

A good question indeed - I don't recommend recurse => true, it needs a LOT of time and memory from a given number of files (no idea how much exactly). We had two servers with a similar role, but different environment, the one had like 20GB of data the other around 80GB. Puppet run was some minutes on the first one vs. 3000+ seconds on the second one.

Without recursion it's about 80 seconds. For ensuring ownership or similar, it's better to use something like fswatch.
 
This is something you should not do. In this case you should create packages (see https://github.com/jordansissel/fpm/wiki)

+1 for FPM as well. Since I got the same advice 1-2 years ago from you guys, I make everything with it and it's perfect.
 
Best
Rp

n.be...@gaijin.ru

unread,
Dec 13, 2017, 2:48:32 PM12/13/17
to Puppet Users


среда, 13 декабря 2017 г., 21:30:56 UTC+3 пользователь Martin Alfke написал:

> On 13 Dec 2017, at 18:40, n.be...@gaijin.ru wrote:
>
>
> Many changes. Is this initial Puppet run or is this standard that you have 1043 changes on every Puppet run?
> This is the initial run. Installing a new server.

OK.

> >              File: 4466.81
>
> Which Puppetserver version are you running?
>
> # puppetserver -v
> puppetserver version: 2.7.2
>
> What are you doing with file resources?
> Create files from templates or simply upload to node.

You are managing lots of files on your agent.
How many are managed?
grep file /opt/puppetlabs/puppet/cache/state/resources.txt | wc -l


grep -c file /opt/puppetlabs/puppet/cache/state/resources.txt
223


Example: a standard Puppet Enterprise Master has ~300 files managed.

Do you manage directories using recursive => true?

No, we do not use it at all.

This is something you should not do. In this case you should create packages (see https://github.com/jordansissel/fpm/wiki)

Are most files templates (using content => epp(…)) or static files (using source => ‘puppet:///…’)?


30% templates\70% static.

Is it possible to determine what file resources require a lot of time?


On the machine where we run agent 32 core and 24GB of memory.

Thanks for your help!
 

n.be...@gaijin.ru

unread,
Dec 13, 2017, 2:57:39 PM12/13/17
to Puppet Users
Thanks for the advice! We will try to use packages more often.

среда, 13 декабря 2017 г., 20:24:04 UTC+3 пользователь Rob Nelson написал:
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

Martin Alfke

unread,
Dec 13, 2017, 3:03:59 PM12/13/17
to puppet...@googlegroups.com

> On 13 Dec 2017, at 20:48, n.be...@gaijin.ru wrote:
>
> You are managing lots of files on your agent.
> How many are managed?
> grep file /opt/puppetlabs/puppet/cache/state/resources.txt | wc -l
>
>
> grep -c file /opt/puppetlabs/puppet/cache/state/resources.txt
> 223

That is not many.
>
> Do you manage directories using recursive => true?
>
> No, we do not use it at all.

Good.
>
> Are most files templates (using content => epp(…)) or static files (using source => ‘puppet:///…’)?
>
>
> 30% templates\70% static.

In this case I would analyze the network latency between the master and the agent.
Every file resource with source attribute gets fetched from master with a new https + client-cert connection.

If network latency is the cause, we usually try to remove source and use content with template on all files.
Yes, this increases compile times on the master and the catalog gets larger.
But for the benefit that the agent has all file content and never must initialize a call to the master.

>
> Is it possible to determine what file resources require a lot of time?

Hard to say.
Maybe enabling agent debug mode prior first puppet run. But that will produce a big debug file to parse.
Either run puppet agent —test —debug or add debug=true to puppet.conf agent section.
Please remeber to remove the debug after you have your analysis.

Are there big files you sync to the agents? Files with several 100 MB? In this case it might be the generation of md5 sums.

>
> On the machine where we run agent 32 core and 24GB of memory.

Enough for puppet agent ;-)

Rob Nelson

unread,
Dec 13, 2017, 11:55:54 PM12/13/17
to puppet...@googlegroups.com
The file /opt/puppetlabs/puppet/cache/state/last_run_report.yaml on the agent (using puppet AIO builds, anyway) has some timing information for every resource. I think that might help determine if a few files are taking a majority of the time or if it’s latency on every file - or even an issue with .epp templates that take a long time to generate for some reason.



--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Rob Nelson
Reply all
Reply to author
Forward
0 new messages