Help with scaling puppetdb/postgres

877 views
Skip to first unread message

David Mesler

unread,
Oct 24, 2013, 11:55:31 AM10/24/13
to puppet...@googlegroups.com
Hello, I'm currently trying to deploy puppetdb to my environment but I'm having difficulties and am unsure on how to proceed.
I have 1300+ nodes checking in at 15 minute intervals (3.7 million resources in the population). The load is spread across 6 puppet masters. I requisitioned what I thought would be a powerful enough machine for the puppetdb/postgres server. A machine with 128GB of RAM, 16 physical cpu cores, and a 500GB ssd for the database. I can point one or two of my puppet masters at puppetdb with reasonable enough performance, but anymore and commands start stacking up in the puppetdb command queue and agents start timing out. (Actually, even with just one puppet master using puppetdb I still have occasional agent timeouts.) Is one postgres server not going to cut it? Do I need to look into clustering? I'm sure some of you must run puppetdb in larger environments than this, any tips?

Darin Perusich

unread,
Oct 24, 2013, 12:54:54 PM10/24/13
to puppet...@googlegroups.com
Have you tuned PG? You can run pgtune,
http://pgfoundry.org/projects/pgtune, and it'll set sizes for
postgresql.conf based on the resources available on the postgres
server.
--
Later,
Darin
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users...@googlegroups.com.
> To post to this group, send email to puppet...@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.

Ken Barber

unread,
Oct 24, 2013, 1:02:55 PM10/24/13
to Puppet Users
pgtune is probably a good place to start:
https://github.com/gregs1104/pgtune ... available as an rpm/deb on the
more popular distros I believe.

Also, this is probably very premature, but I have a draft doc with
notes for how to tune your DB for PuppetDB:

https://docs.google.com/document/d/1hpFbh2q0WmxAvwfWRlurdaEF70fLc6oZtdktsCq2UFU/edit?usp=sharing

Use at your own risk, as it hasn't been completely vetted. Happy to
get any feedback on this, as I plan on making this part of our
endorsed documentation.

Also ... there is an index that lately has been causing people
problems 'idx_catalog_resources_tags_gin'. You might want to try
dropping it to see if it improves performances (thanks to Erik Dalen
and his colleagues for that one):

DROP INDEX idx_catalog_resources_tags_gin;

It is easily restored if it doesn't help ... but may take some time to build:

CREATE INDEX idx_catalog_resources_tags_gin
ON catalog_resources
USING gin
(tags COLLATE pg_catalog."default");

ken.

Ken Barber

unread,
Oct 24, 2013, 1:08:06 PM10/24/13
to Puppet Users
Here is the URL for the GIN index problem:
http://projects.puppetlabs.com/issues/22947 so if removing it does
help, please let us know either in this thread, or preferably in the
ticket as well.

ken.

David Mesler

unread,
Oct 28, 2013, 10:26:20 PM10/28/13
to puppet...@googlegroups.com
I reconfigured postgres based on the recommendations from pgtune and your document. I still had a lot of agent timeouts and eventually after running overnight the command queue on the puppetdb server was over 4000. Maybe I need a box with traditional RAID and a lot of spindles instead of the SSD. Or maybe I need a cluster of postgres servers (if that's possible), I don't know. The puppetdb docs said a laptop with a consumer grade SSD was enough for 5000 virtual nodes so I was optimistic this would be a simple setup. Oh well.

ak0ska

unread,
Oct 29, 2013, 9:04:44 AM10/29/13
to puppet...@googlegroups.com
Just out of curiosity, what is your catalog duplication rate?

Ken Barber

unread,
Oct 29, 2013, 9:06:37 AM10/29/13
to Puppet Users
Hmm.

> I reconfigured postgres based on the recommendations from pgtune and your
> document. I still had a lot of agent timeouts and eventually after running
> overnight the command queue on the puppetdb server was over 4000. Maybe I
> need a box with traditional RAID and a lot of spindles instead of the SSD.
> Or maybe I need a cluster of postgres servers (if that's possible), I don't
> know. The puppetdb docs said a laptop with a consumer grade SSD was enough
> for 5000 virtual nodes so I was optimistic this would be a simple setup. Oh
> well.

So the reality is, you are effectively running 5200 nodes in
comparison with the vague statement in the docs. This is because you
are running every 15 minutes, whereas the statement presumes running
every hour.

Can we get a look at your dashboard? In particular your catalog and
resource duplication rate?

ken.

David Mesler

unread,
Oct 29, 2013, 12:50:29 PM10/29/13
to puppet...@googlegroups.com
Resource duplication is 98.7%, catalog duplication is 1.5%.

Ryan Senior

unread,
Oct 29, 2013, 2:32:54 PM10/29/13
to puppet...@googlegroups.com
1.5% catalog duplication is really low and from a PuppetDB perspective, means a lot more database I/O.  I think that probably explains the problems you are seeing.  A more typical duplication percentage would be something over 90%.

The next step here is figuring out why the duplication percentage is so low.  There's a ticket I'm working on now [1] to help in debugging these kinds of issues with catalogs, but it's not done yet.  One option you have now is to query for the current catalog of a node after a few subsequent catalog updates.  You can do this using curl and the catalogs API [2].  That API call will give you a JSON representation of the catalog data from PuppetDB for that node.  You can then compare the JSON files and see if you maybe have a resource that is changing with each run.  If you need help getting that information or want some more help troubleshooting the output, head over to #puppet on IRC [3] and one of the PuppetDB folks can help you out. 




--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.

ak0ska

unread,
Oct 30, 2013, 4:11:36 AM10/30/13
to puppet...@googlegroups.com
Also looking at the reports (Foreman, PuppetDB) might give a clue of what is changing.

David Mesler

unread,
Nov 7, 2013, 7:53:25 PM11/7/13
to puppet...@googlegroups.com
Well I found the cause of my 1% duplication rate. I was using the recommendation from this page (http://projects.puppetlabs.com/projects/mcollective-plugins/wiki/FactsFacterYAML) to generate a facts.yaml file for mcollective. I got rid of that and my catalog duplication went up to 73%. I'm not sure what else is changing, my catalogs are huge and I don't know how to diff unsorted json files.

I also moved to a server with a 10 disk RAID10 and performance is better.  I'm still having trouble tuning autovacuum. Either vacuums never finish because they're constantly delayed, or they eat up all the IO and things grind to a halt. And even when IO seems low there are still times where the puppetdb queue swells to over 1000 before draining.

jcbollinger

unread,
Nov 8, 2013, 9:35:51 AM11/8/13
to puppet...@googlegroups.com


On Thursday, November 7, 2013 6:53:25 PM UTC-6, David Mesler wrote:
Well I found the cause of my 1% duplication rate. I was using the recommendation from this page (http://projects.puppetlabs.com/projects/mcollective-plugins/wiki/FactsFacterYAML) to generate a facts.yaml file for mcollective.


Most likely that would be because of the 'content' parameter of resource File['/etc/mcollective/facts.yaml'].  The values of resource parameters are part of the catalog, so to the extent that nodes have different facts ($::hostname, for instance) their catalogs will differ.  That seems to present a fundamental problem for scaling to large numbers of nodes when you're also using PuppetDB.

 
I got rid of that and my catalog duplication went up to 73%. I'm not sure what else is changing, my catalogs are huge and I don't know how to diff unsorted json files.



A quick and dirty way to compare would be simply to pass the catalogs through 'sort' before 'diff'ing them.  Doing so will trash the json structure, but you should still get some useful information out of it.  At minimum you will find out whether there are few or many differences between your catalogs, and you should get at least a general idea about what differs.  This would be most effective if applied to catalogs that are distinct with the facts.yaml generation in place, but duplicate without.



John

Reply all
Reply to author
Forward
0 new messages