Long puppet catalog run times on certain nodes after "using pson" is called

602 views
Skip to first unread message

r.yeo

unread,
Feb 26, 2013, 1:32:33 PM2/26/13
to puppet...@googlegroups.com
Running the agent in debug mode on a some clients and I am seeing long run times such as this, but only after the "using pson" lines :

Tue Feb 26 10:57:38 2013: Debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
Tue Feb 26 10:58:53 2013: Info: /Stage[main]/Vpn::Config/File[/opt/cisco/vpn/.anyconnect_global]: Evaluated in 75.03 seconds

Tue Feb 26 10:58:53 2013: Debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
Tue Feb 26 11:00:08 2013: Info: /Stage[main]/Amadeuspro/File[/etc/puppet/tmp/com.HairerSoft.AmadeusPro.plist]: Evaluated in 74.78 seconds
Tue Feb 26 11:00:08 2013: Info: /Stage[main]/Adobe/File[/etc/puppet/tmp/com.adobe.Reader.plist]: Starting to evaluate the resource
Tue Feb 26 11:00:08 2013: Debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
Tue Feb 26 11:01:23 2013: Info: /Stage[main]/Adobe/File[/etc/puppet/tmp/com.adobe.Reader.plist]: Evaluated in 75.00 seconds

Tue Feb 26 11:01:23 2013: Debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
Tue Feb 26 11:02:38 2013: Info: /Stage[main]/Puppet::Snow/File[/etc/hooks/cleanpup.pl]: Evaluated in 75.01 seconds

This catalog took 921.02 to complete :

Tue Feb 26 11:02:38 2013: Notice: Finished catalog run in 916.78 seconds
Tue Feb 26 11:02:38 2013: Changes:
Tue Feb 26 11:02:38 2013: Events:
Tue Feb 26 11:02:38 2013: Resources:
Tue Feb 26 11:02:38 2013:             Total: 55
Tue Feb 26 11:02:38 2013:           Skipped: 6
Tue Feb 26 11:02:38 2013: Time:
Tue Feb 26 11:02:38 2013:        Filebucket: 0.00
Tue Feb 26 11:02:38 2013:           Package: 0.00
Tue Feb 26 11:02:38 2013:    Ssh authorized key: 0.00
Tue Feb 26 11:02:38 2013:              User: 0.06
Tue Feb 26 11:02:38 2013:             Group: 0.11
Tue Feb 26 11:02:38 2013:              Exec: 0.59
Tue Feb 26 11:02:38 2013:    Config retrieval: 13.28
Tue Feb 26 11:02:38 2013:          Last run: 1361901758
Tue Feb 26 11:02:38 2013:              File: 906.98
Tue Feb 26 11:02:38 2013:             Total: 921.02
Tue Feb 26 11:02:38 2013: Version:
Tue Feb 26 11:02:38 2013:            Config: 1361893868
Tue Feb 26 11:02:38 2013:            Puppet: 3.1.0

Also, this isn't global as I have nodes getting identical complete configs in 7 seconds and run times are just fine after the using pson lines.

Example of a fast node :

Tue Feb 26 11:22:57 2013: Debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
Tue Feb 26 11:22:58 2013: Info: /Stage[main]/Amadeuspro/File[/etc/puppet/tmp/com.HairerSoft.AmadeusPro.plist]: Evaluated in 0.15 seconds
Tue Feb 26 11:22:58 2013: Info: /Stage[main]/Adobe/File[/etc/puppet/tmp/com.adobe.Reader.plist]: Starting to evaluate the resource
Tue Feb 26 11:22:58 2013: Debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
Tue Feb 26 11:22:58 2013: Info: /Stage[main]/Adobe/File[/etc/puppet/tmp/com.adobe.Reader.plist]: Evaluated in 0.07 seconds

Puppet master and clients are 3.1 with Dashboard 1.2.22 and also using PuppetDB.

Any thoughts why I'd have trouble and long run times on select nodes when pson is called?

Thanks!

Felix Frank

unread,
Feb 27, 2013, 6:37:44 AM2/27/13
to puppet...@googlegroups.com
Hi,

On 02/26/2013 07:32 PM, r.yeo wrote:
> Tue Feb 26 11:02:38 2013: Info:
> /Stage[main]/Puppet::Snow/File[/etc/hooks/cleanpup.pl]: Evaluated in
> 75.01 seconds
...
> Any thoughts why I'd have trouble and long run times on
> select nodes when pson is called?

PSON is not actually called, it's just a serialization method that is
selected during agent/master interaction.

Seeing as fetching the file metadata takes roughly 75 seconds each, I
suspect you hit some sorts of timeouts. Multiples of three make me think
of TCP timeouts.

What is the architecture of your puppet master? Are you using passenger?
Are there possible network bottlenecks?

Cheers,
Felix

刘长元

unread,
Feb 27, 2013, 8:23:58 PM2/27/13
to puppet...@googlegroups.com
Tue Feb 26 11:02:38 2013:    Config retrieval: 13.28
Tue Feb 26 11:02:38 2013:          Last run: 1361901758
Tue Feb 26 11:02:38 2013:              File: 906.98
Tue Feb 26 11:02:38 2013:             Total: 921.02

Your file resources used 906.98ms. how many file resources in your manifests? any directory? and used recurse => true?

r.yeo

unread,
Feb 28, 2013, 1:16:58 PM2/28/13
to puppet...@googlegroups.com
Architecture of PM is Ubuntu 12.04 in a VM with 8GB of RAM (4.7GB used) with Apache 2.2 as the front end.

No network bottlenecks that I have been able to identify and the nodes this is happening on seem to be random.  Different buildings on campus, different vlans, etc.  I have nodes on the same network segment getting identical configs that are performing flawlessly.

recurse => true is not being called in any module and no directories in my file manifests.

My typical module with files has about 5 files and they are all smaller then 5KB.

I'll keep digging and see if I can find any similarities on the client machines that might be causing this.  It really seems client side at this point since looking in dashboard it does seem to be very consistent on the nodes that are having issues and the nodes that are good even though they are getting identical configs.

Thanks,

Rob

Felix Frank

unread,
Mar 1, 2013, 4:11:02 AM3/1/13
to puppet...@googlegroups.com
Alright, then.

Are your agents *nix? Please try and strace an agent run and determine
what's going on during the 75 seconds waiting periods.

A simpler first step will be to wireshark agent/master communication
with a tool such as ssldump. But that will only help if there is in fact
communication happening. (I can still see TCP timeouts being an issue
here, if you're having packet loss you're not yet aware of.)

HTH,
Felix

jcbollinger

unread,
Mar 1, 2013, 9:54:46 AM3/1/13
to puppet...@googlegroups.com


On Thursday, February 28, 2013 12:16:58 PM UTC-6, r.yeo wrote:
Architecture of PM is Ubuntu 12.04 in a VM with 8GB of RAM (4.7GB used) with Apache 2.2 as the front end.

No network bottlenecks that I have been able to identify and the nodes this is happening on seem to be random.  Different buildings on campus, different vlans, etc.  I have nodes on the same network segment getting identical configs that are performing flawlessly.

recurse => true is not being called in any module and no directories in my file manifests.

My typical module with files has about 5 files and they are all smaller then 5KB.

I'll keep digging and see if I can find any similarities on the client machines that might be causing this.  It really seems client side at this point since looking in dashboard it does seem to be very consistent on the nodes that are having issues and the nodes that are good even though they are getting identical configs.


I concur with Felix: your best bet is to monitor and analyze the actual failure behavior.  The characteristics of the machines on which that behavior manifests are secondary.

Also, the more you say, the more I likely Felix's timeout theory seems.  Indeed, I find it particularly striking that the timing of the slow file operations is so tightly clustered around 75s.  That is very suggestive of a fixed-length external delay, or a series of a consistent number of them.

My first guess, then, would be that the failing clients are talking to a name resolution service that accepts requests but never responds (perhaps its firewall intentionally discards the requests, or maybe it's just wedged).  The client eventually falls back to an alternative server, at which point it gets the file.  Note, too, that that theory supposes that the client systems are not caching successful name lookups; that might be something to consider configuring for overall better networking performance.


John

Brice Figureau

unread,
Mar 1, 2013, 10:55:44 AM3/1/13
to puppet...@googlegroups.com
On Thu, 2013-02-28 at 10:16 -0800, r.yeo wrote:
> Architecture of PM is Ubuntu 12.04 in a VM with 8GB of RAM (4.7GB
> used) with Apache 2.2 as the front end.
>
> No network bottlenecks that I have been able to identify and the nodes
> this is happening on seem to be random. Different buildings on
> campus, different vlans, etc. I have nodes on the same network
> segment getting identical configs that are performing flawlessly.
>
> recurse => true is not being called in any module and no directories
> in my file manifests.
>
> My typical module with files has about 5 files and they are all
> smaller then 5KB.
>
> I'll keep digging and see if I can find any similarities on the client
> machines that might be causing this. It really seems client side at
> this point since looking in dashboard it does seem to be very
> consistent on the nodes that are having issues and the nodes that are
> good even though they are getting identical configs.

Could it be that you try to impose a larger concurrency (ie number of
puppet agents trying to access your puppetmaster) than what is
configured in your passenger setup?
Like if you have 30 agents trying to access the master at the same time,
but you can accomodate only 25, the other 5 will have to wait for every
request they're sending to your master VM.
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

r.yeo

unread,
Mar 1, 2013, 4:24:16 PM3/1/13
to puppet...@googlegroups.com
Passenger settings are currently -

PassengerHighPerformance on
PassengerUseGlobalQueue on

PassengerMaxPoolSize 3
PassengerMaxRequests 1000
PassengerPoolIdleTime 1800

I have run an strace (aka dtruss since these are on OS X clients mostly).

I am seeing this in huge blocks (100s) on the nodes having issues -

__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60
__semwait_signal(0x50B, 0x0, 0x1)         = -1 Err#60

They are also scattered throughout other sections of the run, but in smaller 4 or 5 quantities.

The clients vary from 10.6 to 10.8 but the ones having issues are consistent with this error.  They are also punctuated with blocks like this every once in a while -

select(0x7, 0x7FFF62AECE10, 0x7FFF62AECD90, 0x7FFF62AECD10, 0x7FFF62AECD00)         = 1 0
connect(0x5, 0x7F84D40FA5A0, 0x10)         = -1 Err#56
fcntl(0x5, 0x4, 0x2)         = 0 0
fcntl_nocancel(0x5, 0x3, 0x0)         = 2 0
fcntl_nocancel(0x5, 0x3, 0x0)         = 2 0
madvise(0x105128000, 0x5C000, 0x9)         = 0 0
fcntl(0x5, 0x3, 0x0)         = 2 0
fcntl(0x5, 0x4, 0x6)         = 0 0
write(0x5, "\200p\001\003\001\0", 0x72)         = 114 0
dtrace: error on enabled probe ID 1791 (ID 116: syscall::read:return): out of scratch space in action #12 at DIF offset 44

But that looks like dtrace having issues and nothing to do with Puppet.

The disk on the machines check out clean.  DNS is also good and consistent with our working nodes.  These machines are imaged from a standard so having a one off is typically very rare, we also use Puppet to help keep things nice and tidy.  I am not currently seeing network errors on the switches.

This one is quite odd, thanks to everyone who has suggested things.  Very much appreciated.


Rob



On Tuesday, February 26, 2013 11:32:33 AM UTC-7, r.yeo wrote:

Brice Figureau

unread,
Mar 3, 2013, 4:53:29 AM3/3/13
to puppet...@googlegroups.com
On 01/03/13 22:24, r.yeo wrote:
> Passenger settings are currently -
>
> PassengerHighPerformance on
> PassengerUseGlobalQueue on
>
> PassengerMaxPoolSize 3

This parameter means you're allowing only 3 puppet clients to talk to
your master at any time.

I can understand why you've set it up this low, because essentially a
puppet master is a CPU bound process, so your master host might not be
able to compile more than 3 catalog at one time.

Unfortunately, in a scenario where you have 4 clients checking-in at the
approximately the same time (A,B,C,D), D will wait in the passenger
queue, and when A's catalog will be delivered and will start processing,
all file resources pointing back to your master will get delayed until
D's catalog is ready.

To solve this kind of issues, the best way is to dedicate a part of your
passenger pool for the puppet file service (use a vhost for instance).
File resource are theorically much shorter in time (especially metadata)
than compilation requests. Separating both would allow for a smarter
queueing at the master level.

[snip]
--
Brice Figureau
My Blog: http://www.masterzen.fr/

jcbollinger

unread,
Mar 4, 2013, 9:09:59 AM3/4/13
to puppet...@googlegroups.com


On Sunday, March 3, 2013 3:53:29 AM UTC-6, Brice Figureau wrote:
On 01/03/13 22:24, r.yeo wrote:
> Passenger settings are currently -
>
> PassengerHighPerformance on
> PassengerUseGlobalQueue on
>
> PassengerMaxPoolSize 3

This parameter means you're allowing only 3 puppet clients to talk to
your master at any time.

I can understand why you've set it up this low, because essentially a
puppet master is a CPU bound process, so your master host might not be
able to compile more than 3 catalog at one time.


I think you're missing the key characteristics of the problem:
  1. The error does not occur during catalog compilation.  That's fast (enough) in all the failure cases.  Instead, the failure occurs during retrieval of 'sourced' files.
  2. On failing nodes, all the file retrievals exhibit the same delay.
  3. File retrieval delays are always within a few milliseconds of 75s.
If the problem arose from congestion at the master then the delays would be less consistent from file to file and from run to run, and they would sometimes affect catalog compilation.


John

Brice Figureau

unread,
Mar 4, 2013, 11:25:54 AM3/4/13
to puppet...@googlegroups.com
On Mon, 2013-03-04 at 06:09 -0800, jcbollinger wrote:
>
>
> On Sunday, March 3, 2013 3:53:29 AM UTC-6, Brice Figureau wrote:
> On 01/03/13 22:24, r.yeo wrote:
> > Passenger settings are currently -
> >
> > PassengerHighPerformance on
> > PassengerUseGlobalQueue on
> >
> > PassengerMaxPoolSize 3
>
> This parameter means you're allowing only 3 puppet clients to
> talk to
> your master at any time.
>
> I can understand why you've set it up this low, because
> essentially a
> puppet master is a CPU bound process, so your master host
> might not be
> able to compile more than 3 catalog at one time.
>
>
> I think you're missing the key characteristics of the problem:
> 1. The error does not occur during catalog compilation. That's
> fast (enough) in all the failure cases. Instead, the failure
> occurs during retrieval of 'sourced' files.

I know that the problems appears while trying to get a file resource.

What I was saying is that with such low concurrency allowed on the
master side, there are great chances to have large queue forming in the
case of several lengthy operations (like catalog compilation), and thus
slow metadata retrieval for the other poor node.

Let's take a simple example to show what I really meant:

Concurrency is set at 3, with 4 nodes.

A checks in
B checks in a couple of seconds after
C checks in a couple of seconds after
D has to wait.

Now assuming all hosts take the same time to compile, and the machine is
able to provide enough CPU time for 3 compilation at the same time,
then:

A will gets its catalog
D can now run, because a free "concurrency" slot is available
A will now apply its catalog
A requests for file metadata will wait for at least B to finish its
catalog execution, because the concurrency is at 3.

Rinse & repeat

> 1. On failing nodes, all the file retrievals exhibit the same
> delay.
> 2. File retrieval delays are always within a few milliseconds of
> 75s.
> If the problem arose from congestion at the master then the delays
> would be less consistent from file to file and from run to run, and
> they would sometimes affect catalog compilation.

Well, I've seen stranger things to happen :)

Only the OP can tell us if there is queueing or not, but with this
limited concurrency and depending on the number of checking in clients
that's certainly something to check.

jcbollinger

unread,
Mar 4, 2013, 5:38:11 PM3/4/13
to puppet...@googlegroups.com

Only the OP can tell us whether there is queueing, but I can tell you that if queueing factors into the failure behavior then it's only as part of a more complex problem.  Sure, it's worth checking as part of the failure analysis that Felix and I each suggested, but if a queueing issue is in fact detected then that shouldn't be taken as the final answer.


John

r.yeo

unread,
Mar 5, 2013, 11:34:35 AM3/5/13
to puppet...@googlegroups.com
I am not seeing any queueing, but perhaps I am not looking in the right place.  At most I am seeing 1 pending task, but typically all systems go is the only thing the registers in Dashboard.  Also in dashboard I occasionally see this -

/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/validations.rb:1101:in `save_without_dirty!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/dirty.rb:87:in `save_without_transactions!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:200:in `save!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:136:in `transaction'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:182:in `transaction'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:200:in `save!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:208:in `rollback_active_record_state!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:200:in `save!'
/usr/share/puppet-dashboard/app/models/report.rb:117:in `create_from_yaml'
/usr/share/puppet-dashboard/app/models/report.rb:90:in `create_from_yaml_file'

But I haven't mentioned it because I assumed it was unrelated and another thing I needed to look at it later (still might be). 

Should I be looking in other places on the master to show long queues?

I'll Wireshark next on the clients and see if I that adds anything.

Thanks,


Rob



On Tuesday, February 26, 2013 11:32:33 AM UTC-7, r.yeo wrote:

Felix Frank

unread,
Mar 5, 2013, 11:53:52 AM3/5/13
to puppet...@googlegroups.com
On 03/05/2013 05:34 PM, r.yeo wrote:
> Should I be looking in other places on the master to show long queues?

Do the most simple thing: Raise the Passenger workers to 30 and see what
happens :-)

Erik Dalén

unread,
Mar 5, 2013, 2:00:40 PM3/5/13
to puppet...@googlegroups.com
On 5 March 2013 17:34, r.yeo <rob....@gmail.com> wrote:
I am not seeing any queueing, but perhaps I am not looking in the right place.  At most I am seeing 1 pending task, but typically all systems go is the only thing the registers in Dashboard.  Also in dashboard I occasionally see this -

/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/validations.rb:1101:in `save_without_dirty!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/dirty.rb:87:in `save_without_transactions!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:200:in `save!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:136:in `transaction'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:182:in `transaction'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:200:in `save!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:208:in `rollback_active_record_state!'
/usr/share/puppet-dashboard/vendor/rails/activerecord/lib/active_record/transactions.rb:200:in `save!'
/usr/share/puppet-dashboard/app/models/report.rb:117:in `create_from_yaml'
/usr/share/puppet-dashboard/app/models/report.rb:90:in `create_from_yaml_file'

But I haven't mentioned it because I assumed it was unrelated and another thing I needed to look at it later (still might be). 

Should I be looking in other places on the master to show long queues?

You should probably check using the command "passenger-status" instead of looking at the pending tasks in dashboard, that will show very different things. Pending tasks in dashboard is not really relevant for this.
 
--
Erik Dalén

r.yeo

unread,
Mar 6, 2013, 12:33:07 PM3/6/13
to puppet...@googlegroups.com
Nothing in passenger-status while a problem node is running.

----------- General information -----------
max      = 3
count    = 2
active   = 0
inactive = 2
Waiting on global queue: 0

----------- Application groups -----------
/usr/share/puppet-dashboard:
  App root: /usr/share/puppet-dashboard
  * PID: 1936    Sessions: 0    Processed: 43      Uptime: 18m 57s
  * PID: 1951    Sessions: 0    Processed: 38      Uptime: 18m 52s

I'll wireshark this next to see if I can see anything else I can add.


Rob

On Tuesday, February 26, 2013 11:32:33 AM UTC-7, r.yeo wrote:

Felix Frank

unread,
Mar 8, 2013, 8:54:21 AM3/8/13
to puppet...@googlegroups.com
Have you tried raising the worker pool yet?

r.yeo

unread,
Mar 8, 2013, 12:33:40 PM3/8/13
to puppet...@googlegroups.com
Yeah, I have played with both max workers to 30 and PassengerMaxPoolSize to 30. 

No change from my problem clients.  So odd.

Thank you for the suggestion.


Rob

On Tuesday, February 26, 2013 11:32:33 AM UTC-7, r.yeo wrote:

jcbollinger

unread,
Mar 11, 2013, 9:10:42 AM3/11/13
to puppet...@googlegroups.com


On Friday, March 8, 2013 11:33:40 AM UTC-6, r.yeo wrote:
Yeah, I have played with both max workers to 30 and PassengerMaxPoolSize to 30. 

No change from my problem clients.  So odd.


Not so odd.  The proposition that the difficulties arose from congestion at the master was a long shot to begin with.

You can put the final nail in that one by looking at the master's file service logs (probably just part of the apache log) to see how long it takes the master to serve the affected files to the affected clients (request-to-response time).  I think you will find that there is no significant difference from the master's point of view.

Supposing you do find that, you're back to looking at the affected clients' environments.  Did you ever investigate the the clients' name resolution configuration as I suggested?


John

xin.m...@gmail.com

unread,
Jun 10, 2015, 9:40:32 PM6/10/15
to puppet...@googlegroups.com
I met the same problem with you now. Did you figure it out as last?  Can you give me some suggestion?

在 2013年2月27日星期三 UTC+8上午2:32:33,r.yeo写道:

Henrik Lindberg

unread,
Jun 11, 2015, 7:59:51 PM6/11/15
to puppet...@googlegroups.com
> <http://cleanpup.pl>]: Evaluated in 75.01 seconds
>
> This catalog took 921.02 to complete :
>
> Tue Feb 26 11:02:38 2013: Notice: Finished catalog run in 916.78 seconds
> Tue Feb 26 11:02:38 2013: Changes:
> Tue Feb 26 11:02:38 2013: Events:
> Tue Feb 26 11:02:38 2013: Resources:
> Tue Feb 26 11:02:38 2013: Total: 55
> Tue Feb 26 11:02:38 2013: Skipped: 6
> Tue Feb 26 11:02:38 2013: Time:
> Tue Feb 26 11:02:38 2013: Filebucket: 0.00
> Tue Feb 26 11:02:38 2013: Package: 0.00
> Tue Feb 26 11:02:38 2013: Ssh authorized key: 0.00
> Tue Feb 26 11:02:38 2013: User: 0.06
> Tue Feb 26 11:02:38 2013: Group: 0.11
> Tue Feb 26 11:02:38 2013: Exec: 0.59
> Tue Feb 26 11:02:38 2013: Config retrieval: 13.28
> Tue Feb 26 11:02:38 2013: Last run: 1361901758
> Tue Feb 26 11:02:38 2013: File: 906.98
> Tue Feb 26 11:02:38 2013: Total: 921.02
> Tue Feb 26 11:02:38 2013: Version:
> Tue Feb 26 11:02:38 2013: Config: 1361893868
> Tue Feb 26 11:02:38 2013: Puppet: 3.1..0
>
> Also, this isn't global as I have nodes getting identical complete
> configs in 7 seconds and run times are just fine after the using
> pson lines.
>
> Example of a fast node :
>
> Tue Feb 26 11:22:57 2013: Debug: file_metadata supports formats:
> b64_zlib_yaml pson raw yaml; using pson
> Tue Feb 26 11:22:58 2013: Info:
> /Stage[main]/Amadeuspro/File[/etc/puppet/tmp/com.HairerSoft.AmadeusPro.plist]:
> Evaluated in 0.15 seconds
> Tue Feb 26 11:22:58 2013: Info:
> /Stage[main]/Adobe/File[/etc/puppet/tmp/com.adobe.Reader.plist]:
> Starting to evaluate the resource
> Tue Feb 26 11:22:58 2013: Debug: file_metadata supports formats:
> b64_zlib_yaml pson raw yaml; using pson
> Tue Feb 26 11:22:58 2013: Info:
> /Stage[main]/Adobe/File[/etc/puppet/tmp/com.adobe.Reader.plist]:
> Evaluated in 0.07 seconds
>
> Puppet master and clients are 3.1 with Dashboard 1.2.22 and also
> using PuppetDB.
>
> Any thoughts why I'd have trouble and long run times on select nodes
> when pson is called?
>

A number of performance enhancements have gone into puppet after version
3.1 - if your case is helped by one of them I am not sure, but we have
worked on scenarios like "many modules", "many resources".
If you want to find out in more detail, you have to search for closed
tickets in our Jira for the various enhancements - or look at release
notes - this to see if anything looks like it resembles your case.

OTOH - Puppet is now at 3.8.1 on the 3.x branch and 4.1 is released on
4.x, so you may want to consider updating to latest 3.x and see if there
is a difference in time.

- henrik

--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

xin.m...@gmail.com

unread,
Jun 13, 2015, 5:16:57 AM6/13/15
to puppet...@googlegroups.com
I'm using puppet3.7.1 and facter2.2.0. 
Now I find out that the time was spent on finding facts.

class Puppet::Node::Facts::Facter < Puppet::Indirector::Code

 
# Lookup a host's facts up in Facter.
 
def find(request)
   
Facter.reset
   
self.class.setup_external_search_paths(request) if Puppet.features.external_facts?
   
self.class.setup_search_paths(request)

   
Puppet.notice "before Puppet::Node::Facts.new( #{Facter.to_hash})---------"
    result
= Puppet::Node::Facts.new(request.key, Facter.to_hash)

The function “Facter.to_hash” takes about 75 seconds every time if I use "mco puppet runonce -v -F computer_ip='172.16.233.110' " on a remote mcollective master, but works fine if I use "puppet agent -t -v" on the target computer.
Do you have any ideas why facter behaves like this?
Reply all
Reply to author
Forward
0 new messages