Re: [Puppet Users] stdlib take almost 14 minutes to sync on CentOS Vagrant VM

361 views
Skip to first unread message

Ken Barber

unread,
Jan 7, 2013, 12:20:34 PM1/7/13
to Puppet Users
My immediate gut feeling on this would be network not Puppet being the
issue actually, especially if another client is having success at
doing the sync.

Its a virt so it could be hypervisor drivers or some other issue, its
an old version of the kernel as well - its more likely to happen -
although I haven't seen it myself lately :-). Run something like
'iperf' on both ends to confirm your getting the right kind of network
performance, compare that performance to your Ubuntu target ... and
check out netstat -in for errors etc. etc.

Check to make sure the right VirtualBox hypervisor drivers were
installed when building your Centos vagrant box btw
(VBoxGuestAdditions) ... I'm not sure of its origin but this can be an
important step for network performance.


On Mon, Jan 7, 2013 at 4:50 PM, Kirk Steffensen
<ki...@steffensenfamily.com> wrote:
> Hi,
>
> I have a fresh CentOS 5.8 Vagrant VM that I'm using to emulate a customer's
> server. During the first Puppet run, it takes 13 minutes and 48 seconds to
> sync the Puppet Labs stdlib module. On a similar Ubuntu 12.04.1 Vagrant VM,
> Puppet starts up, and almost instantly goes from plugin sync to facter load.
>
> Is this load time for stdlib on a RHEL variant normal? It's only 580K of
> data to transfer. It seems like it should be almost instantaneous, even on
> an older kernel like CentOS 5.8.
>
> I've posted the relevant part of the Puppet run's output to this gist:
> https://gist.github.com/4475110
>
> If anyone has any insight or any troubleshooting steps to try, I'd
> appreciate it.
>
> Thanks,
> Kirk
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/jH71g1du7icJ.
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.

Kirk Steffensen

unread,
Jan 7, 2013, 1:28:11 PM1/7/13
to puppet...@googlegroups.com
Ken,

Thanks.  I agree with your gut feeling, but after running the first round of tests you suggested, I don't think that's it.  Bandwidth is good and there are no netstat errors.  (Test results at the end of this email.)

Actually, I just realized that the Puppet client on the Ubuntu box is running Puppet 2.7.11, which does not have pluginsync set to true by default  (The CentOS boxes have Puppet 3.0.2, which has pluginsync set to true by default.)  I watched the Ubuntu box start up, and it didn't give me a "Info: Retrieving plugin" notice like the CentOS box does.  When I changed pluginsync to true on the Ubuntu box, it also took many minutes to sync the stdlib module.  (I didn't time it during this run, but it was about 10 minutes.  I'd be happy to actually time it, if it would help troubleshoot.)

While I didn't time the entire stdlib sync on the Ubuntu box, I did time several of the individual files.  It's taking about 10 seconds per file to copy it over to the client from the Puppet Master.  Since there are 90 files in stdlib, it is taking many minutes to sync the plugin, even though there are only 852KB of total data.  Each file is resulting in a Puppet notice like the following output:

notice: /File[/var/lib/puppet/lib/puppet/parser/functions/str2bool.rb]/ensure: defined content as '{md5}ab045013031d01a0e9335af92580dde6'

For my short-term needs, my solution is to change pluginsync to false on the CentOS boxes, since I'm only using the stdlib module to get the anchor definition, and that doesn't need to be on the client side.  But this will not be a long-term solution, as I will probably end up developing modules that need pluginsync to be true.  

Is anyone else who is using stdlib with pluginsync = true seeing this same 13-14 minute sync time during the first puppet run on a machine?

Here are the results of the tests:

iperf on CentOS: 298 Mbits/sec
iperf on Ubuntu: 2.55 Gbits/sec

$ netstat -in [on CentOS - no errors]
Kernel Interface table
Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500 0   4813366      0      0 0       3076279      0      0      0 BMRU
lo        16436 0   1140661      0      0 0       1140661      0      0      0 LRU
vboxnet0   1500 0         0      0      0 0           866      0      0      0 BMRU

The VBoxGuestAdditions are added in the basebox generation (using veewee, another great tool!) with the following in postinstall.sh.  This is resulting in the most current (and more importantly, the matching version to the host):

***********
# Install vagrant keys
mkdir /home/vagrant/.ssh
chmod 700 /home/vagrant/.ssh
cd /home/vagrant/.ssh
wget --no-check-certificate 'https://raw.github.com/mitchellh/vagrant/master/keys/vagrant.pub' -O authorized_keys
chown -R vagrant /home/vagrant/.ssh

# Install the virtualbox guest additions
VBOX_VERSION=$(cat /home/vagrant/.vbox_version)
cd /tmp
mount -o loop VBoxGuestAdditions_$VBOX_VERSION.iso /mnt
sh /mnt/VBoxLinuxAdditions.run
umount /mnt

rm VBoxGuestAdditions_$VBOX_VERSION.iso
***********

Thanks,
Kirk

Ken Barber

unread,
Jan 7, 2013, 2:01:00 PM1/7/13
to Puppet Users
There are two primary bottlenecks for pluginsync:

* the md5 sum it needs to do on both ends to compare and see if a
resync is needed
* the transfer itself

If you run puppet a second time, with pluginsync enabled (but this
time nothing should be transferred) - how much time does it take
compared to the first run? I'm only curious as it might rule out
transfer versus the 'md5' operation. *shrug*. Alas though, I haven't
seen this specific problem myself - are you running the puppetmaster
using webrick or some sort of application server like 'passenger' or
'unicorn'? Its a long-shot, but webrick is single-threaded which has
been known to be a problem - alas it shouldn't be a problem during a
sequential pluginsync :-(.

On Mon, Jan 7, 2013 at 6:28 PM, Kirk Steffensen
> https://groups.google.com/d/msg/puppet-users/-/gdGOknbXSW0J.

Kirk Steffensen

unread,
Jan 7, 2013, 4:03:06 PM1/7/13
to puppet...@googlegroups.com
The second time it runs (with pluginsync enabled), it only pauses at the "Info: Retrieving plugin" notice for a few seconds.  So, it sounds like the md5sum is not the bottleneck.

The specific setup that I'm using is a pretty beefy Xeon workstation with 24 GB of RAM acting as the puppetmaster for itself and all of the local vagrant VMs.  It is running webrick, which I hadn't thought would be a problem, since it's only serving to itself and 1-6 VMs.  During "vagrant up", I make sure to do only one machine at at time, so that I can watch them go by.  So, it is only really doing any heavy lifting for one VM at a time.  

We do use passenger on our production system, so I'll be happy to try passenger on my workstation.  I'll let you know if I see any improvement. 

jcbollinger

unread,
Jan 8, 2013, 9:25:06 AM1/8/13
to puppet...@googlegroups.com


On Monday, January 7, 2013 3:03:06 PM UTC-6, Kirk Steffensen wrote:
The second time it runs (with pluginsync enabled), it only pauses at the "Info: Retrieving plugin" notice for a few seconds.  So, it sounds like the md5sum is not the bottleneck.

Is it possible that you have a name resolution issue?  The time scale you described is about right for a small number of name resolution timeouts before the client falls back to a different name server or name resolution mechanism.


John

Kirk Steffensen

unread,
Jan 9, 2013, 11:02:35 AM1/9/13
to puppet...@googlegroups.com
John, I don't believe there is any name resolution issue.  Both the CentOS and Ubuntu base boxes have "192.168.33.1 puppet" in their /etc/hosts.  From inside the Vagrant VM, a ping to puppet responds immediately, and a tracepath to puppet returns as quickly as I can hit enter, showing sub-millisecond results.

I'll be happy to try any other troubleshooting steps.  For now, I've just inserted pluginsync=false in the base boxes' puppet.conf files.  But I'll make another couple of base boxes with pluginsync=true for troubleshooting, if needed.

Thanks,
Kirk

Ken Barber

unread,
Jan 9, 2013, 11:39:06 AM1/9/13
to Puppet Users
Damn, I thought John had it :-(.

Here's a question I hadn't asked - whats your plugin sync performance
like on the puppetmaster node itself? ie. clear all synced files and
run it locally and time it, comparing to the other nodes.
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/hJ_-Tm4hg1cJ.

Kirk Steffensen

unread,
Jan 9, 2013, 12:30:10 PM1/9/13
to puppet...@googlegroups.com
Ken, thanks.  Unfortunately, (from a troubleshooting standpoint), it only took one or two seconds to sync stdlib on the local box.

rm -rf /var/lib/puppet/lib/*
puppet agent --test

I saw the same stream of File notices, but they streamed by in real time, instead of taking 10 seconds per notice.

John may still have something.  There may still be some name resolution issue on the client, but it's just not obvious from the testing that I did.  

I haven't dug into the file provider code.  What mechanism is it using to move the files from the server to the client?  Could it be a delay in an scp handshake or the like?

Thanks,
Kirk

Ken Barber

unread,
Jan 9, 2013, 12:59:40 PM1/9/13
to Puppet Users
> Ken, thanks. Unfortunately, (from a troubleshooting standpoint), it only
> took one or two seconds to sync stdlib on the local box.
>
> rm -rf /var/lib/puppet/lib/*
> puppet agent --test
>
> I saw the same stream of File notices, but they streamed by in real time,
> instead of taking 10 seconds per notice.

Yeah, thats gotta be some comms problem between your PM and clients
then ... just not sure what.

> John may still have something. There may still be some name resolution
> issue on the client, but it's just not obvious from the testing that I did.

Perhaps perhaps ... I'd strace the puppet process to see whats
happening in the gaps. 10-15 seconds would lead me to suspect DNS as
well. Perhaps we haven't exhausted this avenue ... try adding all your
hosts into /etc/hosts on each box perhaps? Disabling DNS temporarily
(from say nsswitch.conf) then will remove that avenue of potential
:-).

> I haven't dug into the file provider code. What mechanism is it using to
> move the files from the server to the client? Could it be a delay in an scp
> handshake or the like?

HTTPS actually ... with client and server certificates.

ken.

Kirk Steffensen

unread,
Jan 9, 2013, 5:00:46 PM1/9/13
to puppet...@googlegroups.com
Here is the strace output from one of the 10-second periods while waiting for the File notice to appear.  https://gist.github.com/4497263

The strace output came in two bursts during this 10-seconds. 

The thing that leaps out at me is that of the 4061 lines of output, 3754 of them are rt_sigprocmask calls.  I'm wondering if it's a Ruby bug?  For these VMs, I'm using 1.8.7, installed from source: http://ftp.ruby-lang.org/pub/ruby/ruby-1.8.7-p358.tar.gz

I created another gist with the rt_sigprocmask calls to make it easier to tell if there is anything else going on.  https://gist.github.com/4497335  From my quick review, nothing leaps out. 

Hope this helps point the troubleshooting in the right direction.

Thanks,
Kirk

Josh Cooper

unread,
Jan 9, 2013, 5:13:04 PM1/9/13
to puppet...@googlegroups.com
Hi Kirk,

Do you happen to have SRV lookups enabled via the `use_srv_records`
setting? You may want to run tcpdump and look for extraneous DNS
lookups.

Josh
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/OKNPyOQqZcQJ.
>
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.



--
Josh Cooper
Developer, Puppet Labs

Kirk Steffensen

unread,
Jan 9, 2013, 5:45:16 PM1/9/13
to puppet...@googlegroups.com
Josh,

use_srv_records is not set in puppet.conf.  'puppet config print use_srv_records" shows it set to the default of false.

I ran tcpdump from inside the Vagrant VM during pluginsync.  On eth1, where the VM is connecting to the puppet master running on the host, the only calls are puppet calls on 8140 with replies coming back on 34757.  Running 'tcpdump port 53' shows no DNS traffic at all during the pluginsync.  (FWIW, I'm not running DNS for these vagrant boxes.  Everything is in /etc/hosts, so it doesn't surprise me that there are zero DNS calls.)

Thanks,
Kirk

Kirk Steffensen

unread,
Jan 10, 2013, 10:58:19 AM1/10/13
to puppet...@googlegroups.com
SOLVED: I didn't have the VMs in the host's /etc/hosts.  Once I put all of them into the host (i.e, the Puppet Master), then everything sped up to full speed on the clients.  (I had been thinking it was a problem on the client side, so all of my troubleshooting had been isolated to the clients.  I finally thought to try the server side this morning.)

Question: Should I submit this as a bug?  I'm not sure if this is a Puppet bug or an OS bug?  The Puppet Master was eventually serving up the files, but only after 10 seconds (while it was trying to resolve the name?)  Since the Puppet Master was able to reply without being able to resolve the name, why was it bothering to resolve the name instead of just replying directly to the request?

Thanks,
Kirk

Ken Barber

unread,
Jan 10, 2013, 12:21:31 PM1/10/13
to Puppet Users
Huh, good find. I'm guessing you're still running webrick? Based on a
quick google it looks like it by does reverse DNS by default. Most web
servers have this disabled by default but I think webrick does not
(Apache at least disabled dns lookups I'm sure by default).

I think if Puppet passes DoNotReverseLookup => true when creating the
Webrick object (HTTPServer) this will avoid the reverse DNS lookup.
Its quite a small thing, but might help a few people avoid this kind
of hassle going forward:

https://gist.github.com/4503915

The only problem I think, is that this probably only works for the
Webrick available in Ruby 1.9.x ... :-(. Ruby 1.8.7 might need some
other hackery to disable it ...

I can't speak for the core Puppet team (Josh?), but I think it might
be bug-worthy. Most people utilise Apache or Nginx in production so
they don't see the problem, but still for small dev style purposes
like yours its an annoying distraction.

ken.

On Thu, Jan 10, 2013 at 3:58 PM, Kirk Steffensen
> https://groups.google.com/d/msg/puppet-users/-/SjgPXE3VnbYJ.

Josh Cooper

unread,
Jan 10, 2013, 12:37:59 PM1/10/13
to puppet...@googlegroups.com
Hi Kirk

On Thu, Jan 10, 2013 at 7:58 AM, Kirk Steffensen
<ki...@steffensenfamily.com> wrote:
> SOLVED: I didn't have the VMs in the host's /etc/hosts. Once I put all of
> them into the host (i.e, the Puppet Master), then everything sped up to full
> speed on the clients. (I had been thinking it was a problem on the client
> side, so all of my troubleshooting had been isolated to the clients. I
> finally thought to try the server side this morning.)
>
> Question: Should I submit this as a bug? I'm not sure if this is a Puppet
> bug or an OS bug? The Puppet Master was eventually serving up the files,
> but only after 10 seconds (while it was trying to resolve the name?) Since
> the Puppet Master was able to reply without being able to resolve the name,
> why was it bothering to resolve the name instead of just replying directly
> to the request?
>

The puppetmaster, specifically
Puppet::Network::HTTP::Handler#resolve_node, will perform a reverse
lookup in some situations. When using webrick, the puppetmaster will
perform a reverse lookup if the request does not contain a
client_cert. When using puppetmaster inside a rack application, it
will perform a reverse lookup if the request does not contain a header
parameter that matches Puppet's `ssl_client_header` setting.

If all of your agents have already been issued certificates, then I
would not expect any "unauthenticated" requests, so I wouldn't expect
any further lookups...

Also, when looking into the SRV support in puppet, I found that ruby's
internal DNS resolver does not cache SRV lookups, at least in 1.8.7.
It may not cache reverse lookups either, which could make this issue
worse for you.

Josh
> https://groups.google.com/d/msg/puppet-users/-/SjgPXE3VnbYJ.

Kirk Steffensen

unread,
Jan 11, 2013, 2:13:16 PM1/11/13
to puppet...@googlegroups.com
Josh, thanks for the info.  Based on your description, I think I was seeing a bug.  Because the agents were all definitely getting certificates.  When I did the tcpdump, I could see them being used in the exchange.  So, it sounds like the puppetmaster running in webrick was still performing a reverse lookup even with the agent having a client cert.

Is there anything that would be helpful for me to try to nail down if it's doing the right thing or not?

Kirk

Bruno Harbulot

unread,
Mar 17, 2013, 2:01:07 PM3/17/13
to puppet...@googlegroups.com
Hi Kirk,

Sorry, I've only just noticed this thread. I've also experienced the same problem. I've commented on this issue (although it might not be the correct bug after all, since it's not quite the same problem): https://projects.puppetlabs.com/issues/18812#note-1

It would be great to disable reverse lookup by default if possible (although I must admit I'm using Ruby 1.8, so this might be difficult). Trying to debug this issue is quite involved (Wireshark, strace...), and is indeed an unnecessary distraction, especially in an initial test environment, where I guess it's typical to find client nodes that don't have a reverse DNS entry.

My guess is that Webrick performs a reverse lookup either for no reason at all or for potential logging (this reverse lookup is turned off by default in Apache Httpd as far as I know).

Best wishes,

Bruno.
Reply all
Reply to author
Forward
0 new messages