Considerations for puppet/cluster to manage 6000 hosts.

908 views
Skip to first unread message

trey85stang

unread,
Jan 8, 2011, 4:02:56 PM1/8/11
to Puppet Users
I'm trying to get an idea of what kind of backend setup I would need
to run puppet to manage roughly 6000 hosts.

I see puppet by iteself is limited to 10-20; but with mongrel/apache
that number shoots up but I am not sure by how much?

Im still new to puppet and running it in a lab but want to take it to
our production environment but there are some details that I need to
work out but thought I would pose this question first since there are
surely people who have already went through all this already.

1. With a 1000mb connection, how many clients can I serve with
mongrel apache/setup? I'm guessing around 200-300? or can it take
more?

2. Should I let a high availability apache frontend manage a puppet
backend?

( i.e. load balance port 8140 from apache to multiple puppet backend
servers like so:
<Proxy balancer://puppetmaster>
BalancerMember http://10.0.0.10:18140
BalancerMember http://10.0.0.10:18141
BalancerMember http://10.0.0.10:18142
BalancerMember http://10.0.0.10:18143
BalancerMember http://10.0.0.11:18140
BalancerMember http://10.0.0.11:18141
BalancerMember http://10.0.0.11:18142
BalancerMember http://10.0.0.11:18143
BalancerMember http://10.0.0.12:18140
BalancerMember http://10.0.0.12:18141
BalancerMember http://10.0.0.12:18142
BalancerMember http://10.0.0.12:18143
</Proxy>
)

3. What is the best way to manage client signing and keeping the pem/
files in sync across such a backend?

4. Am I thinking about this type of setup all wrong?

Any advice appreciated

Eduardo S. Scarpellini

unread,
Jan 8, 2011, 5:06:14 PM1/8/11
to puppet...@googlegroups.com
My suggestions for big scenarios is: mod_passenger/apache22 (+ ruby-enterprise), subversion (or another scm you like), puppet2.6.4 + stored_configs_async, some stomp server (like activemq), and a couple of mysql servers.
You don't need to sync the ssl keys (pem, etc) between backend servers, since you copy your CA to all of them.
Mongrels + proxy_http is not a good idea for high loads scenarios and you should consider a hardware load balancer and separation of the puppet instances in manifests-server and file-server.

2011/1/8 trey85stang <trey8...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To post to this group, send email to puppet...@googlegroups.com.
To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.




--
 Eduardo S. Scarpellini
<scarp...@gmail.com>

trey85stang

unread,
Jan 8, 2011, 5:11:31 PM1/8/11
to Puppet Users
Thanks for the reply, is there any documentation available on this
type of setup? Where would the sql servers come into play?

On Jan 8, 4:06 pm, "Eduardo S. Scarpellini" <scarpell...@gmail.com>
wrote:
> My suggestions for big scenarios is: mod_passenger/apache22 (+
> ruby-enterprise), subversion (or another scm you like), puppet2.6.4 +
> stored_configs_async, some stomp server (like activemq), and a couple of
> mysql servers.
> You don't need to sync the ssl keys (pem, etc) between backend servers,
> since you copy your CA to all of them.
> Mongrels + proxy_http is not a good idea for high loads scenarios and you
> should consider a hardware load balancer and separation of the puppet
> instances in manifests-server and file-server.
>
> 2011/1/8 trey85stang <trey85st...@gmail.com>
> >        BalancerMemberhttp://10.0.0.12:18143
> > </Proxy>
> > )
>
> > 3. What is the best way to manage client signing and keeping the pem/
> > files in sync across such a backend?
>
> > 4.  Am I thinking about this type of setup all wrong?
>
> > Any advice appreciated
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Puppet Users" group.
> > To post to this group, send email to puppet...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > puppet-users...@googlegroups.com<puppet-users%2Bunsu...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/puppet-users?hl=en.
>
> --
>  Eduardo S. Scarpellini
> <scarpell...@gmail.com>

Patrick

unread,
Jan 8, 2011, 5:18:56 PM1/8/11
to puppet...@googlegroups.com
You'll need one or more mysql servers if you use storedconfigs. Storedconfigs can be useful, but will drastically increase the server CPU usage and will require a mysql backend. You can always turn it on later.

There are two (working) ways to setup SSL which is used for authentication and security. When last I checked, the "chained certificates" method still doesn't work due to bugs.

1) Copy the same certificate authority to each server. This is easy to do, but will break certificate revocation lists (CRL).
2) Dedicate one computer to be the certificate authority. Requires more client config, but allows CRLs to work.

> To unsubscribe from this group, send email to puppet-users...@googlegroups.com.

trey85stang

unread,
Jan 8, 2011, 7:05:12 PM1/8/11
to Puppet Users
Thanks for all the replies, looks like a have a lot more reading to
do.

Appreciate all the info!

Thanks

Daniel Pittman

unread,
Jan 8, 2011, 10:50:14 PM1/8/11
to puppet...@googlegroups.com
On Sat, Jan 8, 2011 at 14:18, Patrick <kc7...@gmail.com> wrote:

> You'll need one or more mysql servers if you use storedconfigs.  Storedconfigs can be useful, but will drastically increase the server CPU usage and will require a mysql backend.  You can always turn it on later.

One or more *SQL* servers: we ran happily on PostgreSQL 8.4, which we
found scaled much better than MySQL did, and was our standard server
platform anyway. Otherwise I absolutely agree with this. :)

Regards,
Daniel
--
✉ Daniel Pittman <dan...@rimspace.net>
dan...@rimspace.net (XMPP)
+1 503 893 2285
♻ made with 100 percent post-consumer electrons

Peter Meier

unread,
Jan 9, 2011, 10:35:45 AM1/9/11
to puppet...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> You'll need one or more mysql servers if you use storedconfigs. Storedconfigs can be useful, but will drastically increase the server CPU usage and will require a mysql backend. You can always turn it on later.
>
> One or more *SQL* servers: we ran happily on PostgreSQL 8.4, which we
> found scaled much better than MySQL did, and was our standard server
> platform anyway. Otherwise I absolutely agree with this. :)

If you are only interested in exported resources then you might want to
enable thin_storeconfigs, which will reduce the load also drastically.

~pete
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0p1cwACgkQbwltcAfKi38e1QCfck1x+ee5DtBxrHAerSHgNkTC
ImEAnjvxy/8yrh4v1elvLz4INF5sA5NO
=BOdC
-----END PGP SIGNATURE-----

Dan Bode

unread,
Jan 9, 2011, 12:47:47 PM1/9/11
to puppet...@googlegroups.com
On Sat, Jan 8, 2011 at 1:02 PM, trey85stang <trey8...@gmail.com> wrote:
I'm trying to get an idea of what kind of backend setup I would need
to run puppet to manage roughly 6000 hosts.

I see puppet by iteself is limited to 10-20;  but with mongrel/apache
that number shoots up but I am not sure by how much?

Im still new to puppet and running it in a lab but want to take it to
our production environment but there are some details that I need to
work out but thought I would pose this question first since there are
surely people who have already went through all this already.

1.  With a 1000mb connection,  how many clients can I serve with
mongrel apache/setup?  I'm guessing around 200-300?  or can it take
more?
 
The answer really depends on a lot of factors:
     - number of source files being managed (the files will only be transfered if they differ, but it makes a separate rest call per host per file to determine the current md5sum)
     - size of catalog (related to number of resources and relationships)
     - size of resports (if reporting is enabled)

Nodes per master usually has to do with CPU consumption on the master, the major factors that contribute to CPU consumption are:
   - compilation of catalogs (it re-compiles every catalog per node per request, there are non-trivial ways to get around this)
   - checking md5 hashes for source files, these checks happen per host, per file, per run, and can be expensive (I would recommend using either package management or the vcsrepo type for large collections of files)

 
2. Should I let a high availability apache frontend manage a puppet
backend?

( i.e. load balance port 8140 from apache to multiple puppet backend
servers like so:
<Proxy balancer://puppetmaster>
       BalancerMember http://10.0.0.10:18140
       BalancerMember http://10.0.0.10:18141
       BalancerMember http://10.0.0.10:18142
       BalancerMember http://10.0.0.10:18143
       BalancerMember http://10.0.0.11:18140
       BalancerMember http://10.0.0.11:18141
       BalancerMember http://10.0.0.11:18142
       BalancerMember http://10.0.0.11:18143
       BalancerMember http://10.0.0.12:18140
       BalancerMember http://10.0.0.12:18141
       BalancerMember http://10.0.0.12:18142
       BalancerMember http://10.0.0.12:18143
</Proxy>
)


I use something like passenger behind apache on each host, then load balance the apaches

 
3. What is the best way to manage client signing and keeping the pem/
files in sync across such a backend?

There are two ways, either delegate one of the Masters to be the CA, or terminate SSL on the load balancer

 
4.  Am I thinking about this type of setup all wrong?

Any advice appreciated

Nigel Kersten

unread,
Jan 9, 2011, 1:20:53 PM1/9/11
to puppet...@googlegroups.com
On Sat, Jan 8, 2011 at 7:50 PM, Daniel Pittman <dan...@rimspace.net> wrote:
> On Sat, Jan 8, 2011 at 14:18, Patrick <kc7...@gmail.com> wrote:
>
>> You'll need one or more mysql servers if you use storedconfigs.  Storedconfigs can be useful, but will drastically increase the server CPU usage and will require a mysql backend.  You can always turn it on later.
>
> One or more *SQL* servers: we ran happily on PostgreSQL 8.4, which we
> found scaled much better than MySQL did, and was our standard server
> platform anyway.  Otherwise I absolutely agree with this. :)

I would go so far as to see how much you can get done without storeconfigs.

You may not actually need it.


> Regards,
>    Daniel
> --
> ✉ Daniel Pittman <dan...@rimspace.net>
> ⌨ dan...@rimspace.net (XMPP)
> ☎ +1 503 893 2285
> ♻ made with 100 percent post-consumer electrons
>

donavan

unread,
Jan 11, 2011, 1:19:32 AM1/11/11
to Puppet Users
On Jan 8, 1:02 pm, trey85stang <trey85st...@gmail.com> wrote:
> I'm trying to get an idea of what kind of backend setup I would need
> to run puppet to manage roughly 6000 hosts.

No one else has asked, but what's the geographic/network distribution
look like?

> I see puppet by iteself is limited to 10-20;  but with mongrel/apache
> that number shoots up but I am not sure by how much?

At puppet camp US I think responses were in the 300-1000 clients/
master range. A dual socket x86_64 whitebox should do at least a few
hundred clients. Masters seem to be CPU bound almost all the time.
Client run interval, catalog size, and storeconfigs are the biggest
factors that come to mind.

> 2. Should I let a high availability apache frontend manage a puppet
> backend?

Using a front end load balancer, Apache or hardware (F5 etc), works
fine. Depending on your DNS control there's also a pending feature to
support SRV records for clients to find masters.

> 3. What is the best way to manage client signing and keeping the pem/
> files in sync across such a backend?

I'd suggest a single/central CA. The certificate signing/creation ties
easily in to the host provisioning (kickstart definition/ec2 setup/
etc) step. Signing on each master works fine, but hinders later
management. The CRL/inventory becomes worthless, for example.

Storeconfigs is a special issue. At puppet camp SF show of hands had
only two large (1000+) sites using store configs. Three or four more
wanted to, but couldnt take the performance hit, as I recall. An SQL
server is required, with postgres mysql & oracle supported IIRC. The
performance requirements for that machine shouldn't be too bad. The
dataset should be in the MB range, easy to keep in memory. Setting
thin_storeconfigs on the masters makes a very large difference in
compilation time. A (very) rough estimate for compilation times of 300
resources: 6s with "full" storeconfigs, 3-4s with 'thin' storeconfigs,
and 2s without.

Carles Amigó

unread,
Jan 11, 2011, 4:05:32 AM1/11/11
to puppet...@googlegroups.com
Storeconfigs is a special issue. At puppet camp SF show of hands had
only two large (1000+) sites using store configs. Three or four more
wanted to, but couldnt take the performance hit, as I recall. An SQL
server is required, with postgres mysql & oracle supported IIRC. The
performance requirements for that machine shouldn't be too bad. The
dataset should be in the MB range, easy to keep in memory. Setting
thin_storeconfigs on the masters makes a very large difference in
compilation time. A (very) rough estimate for compilation times of 300
resources: 6s with "full" storeconfigs, 3-4s with 'thin' storeconfigs,
and 2s without.


What data is exactly discarded with "thin" storeconfigs? 

--
Carles Amigó
fr...@fr3nd.net
http://www.fr3nd.net
Hey dol! merry dol! ring a dong dillo!

Adrian Bridgett

unread,
Jan 11, 2011, 4:47:48 PM1/11/11
to Puppet Users
It may also be worth looking at some form of improved scheduling in
order to avoid a thundering herd of requests to your puppetmasters.
One option that looks interesting (about to try it myself) is to use
mcollective:
http://www.devco.net/archives/2010/03/17/scheduling_puppet_with_mcollective.php

One other thing I've not seen mentioned in this thread is to use a
dedicated fileserver:
http://www.masterzen.fr/2010/01/28/puppet-memory-usage-not-a-fatality/
http://projects.puppetlabs.com/projects/puppet/wiki/Puppet_Scalability

It'd be interesting to see how you get on.

donavan

unread,
Jan 11, 2011, 5:15:44 PM1/11/11
to Puppet Users
On Jan 11, 1:05 am, Carles Amigó <fr...@fr3nd.net> wrote:
> What data is exactly discarded with "thin" storeconfigs?

Effectively only facts and exported resources are stored for each
node[1]. This is opposed to storing the complete set of resources (and
other stuff?) for each node. For normal puppet usage there's no loss
to using thin_storeconfigs. The full set is useful if you want to
query it as part of an external process. A monitoring or inventory
service, for example.

Bryce F did most of the work that actually made storeconfigs useful.
There are some very good posts on his blog[2].

[1] http://docs.puppetlabs.com/references/2.6.3/configuration.html#thinstoreconfigs
[2] http://www.masterzen.fr/tag/storeconfigs/

On Jan 11, 1:47 pm, Adrian Bridgett <adrian.bridg...@gmail.com> wrote:
> It may also be worth looking at some form of improved scheduling in
> order to avoid a thundering herd of requests to your puppetmasters.

Using a cron resource with fqdn_rand() interval works pretty well and
is dead simple. cron { puppet_agent: command => 'puppet agent --
onetime', minute => [fqdn_rand(30), (fqdn_rand(30) + 30)] }

DaveQB

unread,
Jan 11, 2011, 5:45:37 PM1/11/11
to Puppet Users
We had trouble scaling with 400+ nodes. Puppet server is a VM on an
ESX cluster with 3.5GB of ram and 1.5GB of swap but would regularly
kick in OOM which would kill off most if not all of the 10
puppetmaster instances.
We felt scheduling a restart of the puppetmasters a few times a day
was not a sustainable solution.

So we are in the midst of moving to removing the server from the
equation altogether. Seeing as all nodes have a common NFS mount(s),
we are testing moving to simply calling puppet with the sites.pp file
as the only command line argument.
So far in testing, it has been working great.

Just thought I'd mention this as a potential option.



On Jan 12, 9:15 am, donavan <dona...@desinc.net> wrote:
> On Jan 11, 1:05 am, Carles Amigó <fr...@fr3nd.net> wrote:
>
> > What data is exactly discarded with "thin" storeconfigs?
>
> Effectively only facts and exported resources are stored for each
> node[1]. This is opposed to storing the complete set of resources (and
> other stuff?) for each node. For normal puppet usage there's no loss
> to using thin_storeconfigs. The full set is useful if you want to
> query it as part of an external process. A monitoring or inventory
> service, for example.
>
> Bryce F did most of the work that actually made storeconfigs useful.
> There are some very good posts on his blog[2].
>
> [1]http://docs.puppetlabs.com/references/2.6.3/configuration.html#thinst...

Nigel Kersten

unread,
Jan 11, 2011, 6:35:48 PM1/11/11
to puppet...@googlegroups.com
On Tue, Jan 11, 2011 at 2:45 PM, DaveQB <david...@drdstudios.com> wrote:
> We had trouble scaling with 400+ nodes. Puppet server is a VM on an
> ESX cluster with 3.5GB of ram and 1.5GB of swap but would regularly
> kick in OOM which would kill off most if not all of the 10
> puppetmaster instances.
> We felt scheduling a restart of the puppetmasters a few times a day
> was not a sustainable solution.

Ruby version?
Puppet version?
Puppet server architecture? (mongrel, webrick, passenger, etc)

donavan

unread,
Jan 11, 2011, 9:44:00 PM1/11/11
to Puppet Users
On Jan 11, 2:45 pm, DaveQB <david.w...@drdstudios.com> wrote:
> We had trouble scaling with 400+ nodes. Puppet server is a VM on an
> ESX cluster with 3.5GB of ram and 1.5GB of swap but would regularly
> kick in OOM which would kill off most if not all of the 10
> puppetmaster instances.

This is very surprising to me. Is this .24 or .25 per chance serving
large files via the File resource per chance? There were some big
memory improvements in File handling around 2.6.0.

Using 2.6.x, Ruby 1.8.7, Apache 2.2 and passenger I'd expect around
100-200mb usage per process. Even that seems a bit high to me, though
I don't know what's shared and whats resident off hand.

Matt

unread,
Jan 17, 2011, 12:49:29 PM1/17/11
to Puppet Users
Not sure what his issue was but in my organization we had one puppet
master with mod_passenger and puppet 2.6.3 running fine with 200
clients in a VM. We expanded to a 2 node cluster, with the original
puppet master serving as the master for the secondaries. The
secondaries have an F5 infront of them with no session persistence
round robin and a health monitor to know if one of the masters had
gone down.

Eduardo S. Scarpellini

unread,
Jan 17, 2011, 1:31:11 PM1/17/11
to puppet...@googlegroups.com
Matt,
what type of service/health-check do you use in F5?
What's URL (expected-string/HTTP-code) does the loadbalancer check to determine if puppet is alive?

2011/1/17 Matt <mjb...@gmail.com>
--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To post to this group, send email to puppet...@googlegroups.com.
To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.

Christopher Pisano

unread,
Nov 14, 2013, 8:10:57 PM11/14/13
to puppet...@googlegroups.com
Sorry to revive an old thread but i am currently trying to load balance two puppet masters behind an F5 and am running into issues. Can you share your configuration? I have a CA/Foreman server outside of the F5 and 2 Puppetmasters behind the F5. The VIP on the F5 has a generic DNS name with the certificate generated from the CA. The certificate, private key and CA certificate are all loaded to the F5 and configured in a client ssl profile which is applied to the VIP. Am I missing anything on the F5 configuration side? Do I need to dig into the Apache config on the Puppetmasters?

Jo Rhett

unread,
Nov 15, 2013, 4:32:07 AM11/15/13
to puppet...@googlegroups.com
The puppet master needs the SSL certs to sign new client certs, etc. So the SSL traffic cannot terminate at the F5.  You can't offload the SSL from the puppet master.

Remove the SSL cert from the F5, and have it load balance across the nodes without altering the connection and it will work fine.

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/fe372d3a-cb01-4b36-a06b-0c2255cb2ade%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
Jo Rhett
Net Consonance : net philanthropy to improve open source and internet projects.




Reply all
Reply to author
Forward
0 new messages