PuppetDB Issue with large array-valued fact

263 views
Skip to first unread message

Mike Sharpton

unread,
Apr 19, 2016, 10:38:19 AM4/19/16
to Puppet Users
Hello all,

We are running Puppet 4.2.2 and PuppetDB 3.2.0 with around 2400 nodes and growing.  I am noticing some bad behavior with our PuppetDB, which is only going to get worse if what is below is correct.  I see the error below in the puppetdb log.

 ERROR [p.p.mq-listener] [cfe52545-29f8-4538-bf32-6ed41922be90] [replace facts] Retrying after attempt 8, due to: org.postgresql.util.PSQLException: This connection has been closed.
org.postgresql.util.PSQLException: This connection has been closed.


I see this in the postgres log

incomplete message from client

and I can see the command queue jumping up after the connection has been reset for obvious reasons.  

I searched around and found this article.


I can see that this may be causing the issue I am seeing as we have several large array-valued structured facts. 


 count |                name
-------+------------------------------------
 26848 | partitions
 26014 | disks
  6141 | mountpoints
   494 | networking
   133 | processors
    22 | os
    15 | memory
    14 | dmi
    13 | ssh
     9 | dhcp_servers
     5 | system_uptime
     5 | identity
     4 | ruby
     4 | load_averages



I can't get rid of these facts out of the box.  There doesn't seem to be anyway to filter facts, which is what it appears I need to do.  This fact also clutters PuppetBoard heavily.  Anyone ran into this issue before and found a way around it, or how to fix it?  Thanks in advance,

Mike




Wyatt Alt

unread,
Apr 19, 2016, 11:53:28 AM4/19/16
to puppet...@googlegroups.com, Mike Sharpton
Hey Mike,

The unsatisfying answer is that PuppetDB handles giant facts
(particularly array-valued facts) pretty badly right now, and facter's
disks, partitions, and mountpoints facts can all get pretty huge in
cases such as SANs and similar. Can you try and see if the bulk of those
fact paths are coming from a small set of your nodes? I expect this
query might help:

https://gist.github.com/wkalt/4a58b9a97c79eee31971e5fc04dec0e4

You can mask the facts on a per-node basis by creating a custom fact
with value nil and weight 100 as described here:

https://docs.puppet.com/facter/3.1/custom_facts.html#fact-precedence

(this assumes you aren't using these facts for anything, but that sounds
like the case.)

Longer term, this is something we need to fix on our end. I created
https://tickets.puppetlabs.com/browse/PDB-2631 to track the issue.
https://tickets.puppetlabs.com/browse/FACT-1345 may also be related.

If you get those nodes tracked down, would you mind telling us the
operating system?

Wyatt


Mike Sharpton

unread,
Apr 19, 2016, 1:39:16 PM4/19/16
to Puppet Users, shar...@gmail.com
Wyatt,

Thank you very much for your time and reply.  I greatly appreciate it.  I ran your query and your suspicions are correct.  Some DB servers lead the pack with a massive amount of data due to all the disk that is there.  We will probably just make these facts nil on all machines as we don't need them.  I would assume this will relieve the strain on PuppetDB and remove the resets/etc.  Again, thank you very much.  If I could buy you a beer, I would.  The machines in question are a mix of RHEL5/6/7.

Mike

Wyatt Alt

unread,
Apr 19, 2016, 11:49:27 PM4/19/16
to puppet...@googlegroups.com, shar...@gmail.com


On 04/19/2016 10:39 AM, Mike Sharpton wrote:
 Again, thank you very much.  If I could buy you a beer, I would.  The machines in question are a mix of RHEL5/6/7.
Hah, you're very welcome. Thanks for confirming the OS; that means this isn't just a Solaris issue like that facter ticket suggests.

Mike Sharpton

unread,
Apr 26, 2016, 10:01:24 AM4/26/16
to Puppet Users, shar...@gmail.com
Wyatt,

We implemented the code to make these facts be nothing.  We can see it working on nodes that are small, and it worked in our test environment.  However, we still have the issue that it cannot replace facts as PuppetDB still chokes on emptying the facts for the nodes with large facts.  I tried to deactivating a node, but it still exists in the DB obviously until GC happens (a week for me).  This is too long.  Is there a query I can run to wipe out all these facts in the PuppetDB?  I don't care about them anyway.  Thanks again in advance.

Mike

Wyatt Alt

unread,
Apr 26, 2016, 11:45:40 AM4/26/16
to puppet...@googlegroups.com, shar...@gmail.com
Hey Mike, give this a shot (in a psql session):

begin;
delete from facts where fact_path_id in (select id from fact_paths where
name=any('{"disks", "partitions", "mountpoints"}'));
delete from fact_paths where id not in (select fact_path_id from facts);
delete from fact_values where id not in (select fact_value_id from facts);
commit;


If you hit a transaction rollback you may need to run it with PDB
stopped. Those last two deletes may take some time since your
gc-interval is long, so you should probably run it in tmux/screen or
something.

Wyatt

Mike Sharpton

unread,
Apr 26, 2016, 12:50:09 PM4/26/16
to Puppet Users, shar...@gmail.com
Thanks Wyatt.  I see what you mean, this may take too long. What if I got desperate and decided to just drop the entire PuppetDB.  Is there an easy way to do this?  I really don't care about historical data as we use this basically for monitoring of the environment.

Mike Sharpton

unread,
Apr 26, 2016, 1:09:29 PM4/26/16
to Puppet Users, shar...@gmail.com
Besides, drop, and let it recreate I mean.

Wyatt Alt

unread,
Apr 26, 2016, 4:49:13 PM4/26/16
to puppet...@googlegroups.com, shar...@gmail.com
Mike,

If you have no issue dropping and recreating the full database, that's
totally a workaround here (you know your requirements better than I, so
please don't take this as an endorsement of the approach :-) ).

To do this just stop PuppetDB, drop the puppetdb database, and recreate
the database with the options you want (the usual ones are described
here, but some people use different:
https://docs.puppet.com/puppetdb/latest/configure.html#using-postgresql),
and restart PuppetDB.

PuppetDB won't recreate the database for you but once that's in place
it'll create the tables/indices on startup. You may also consider
setting gc-interval to something less than 1 week, unless there's a good
reason for it being so long. A one-week interval will allow a lot of
time for bloat to build up.

Wyatt

Mike Sharpton

unread,
Apr 26, 2016, 5:59:31 PM4/26/16
to Puppet Users, shar...@gmail.com
Thanks Wyatt, I dropped it a while back after realizing how easy this was.  All is back now, and metrics are returning to normal.  The large facts are now gone and so are our issues.  :-)  I was wrong on the GC, it's at the default of one hour.  I was thinking node-ttl, which we set to 1 week.  Thanks again, case closed.
Reply all
Reply to author
Forward
0 new messages