Puppetdb garbage collection failing

331 views
Skip to first unread message

Matt Jarvis

unread,
Sep 28, 2015, 10:56:18 AM9/28/15
to Puppet Users
We seem to have hit a bit of an issue with puppetdb garbage collection. Initial symptoms were exceptions in the puppetdb logs :

Retrying after attempt 6, due to: org.postgresql.util.PSQLException: This connection has been closed.


And on the postgres side :


LOG:  incomplete message from client


Having turned up the logging on postgres, it appears that the query 


DELETE FROM fact_paths fp

          WHERE fp.id in ( $some_ids )  AND NOT EXISTS (SELECT 1 FROM facts f

                              WHERE f.fact_path_id in ( $some_more_ids ) AND f.fact_path_id = fp.id

                                AND f.factset_id <> $26355)


is the cuplrit. This query is absolutely massive, with over 26000 id's specified as parameters - as soon as the query is executed, postgres returns incomplete message from client and drops the connection. 


puppetdb is 2.3.7-1puppetlabs1

postgres is 9.3


Does anyone have any clues what's going on here ?


Thanks


Matt


DataCentred Limited registered in England and Wales no. 05611763

Wyatt Alt

unread,
Sep 28, 2015, 1:41:43 PM9/28/15
to puppet...@googlegroups.com, Matt Jarvis
DataCentred Limited registered in England and Wales no. 05611763 --
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/5fe3bad3-71a7-4348-a9ff-24d8a0284a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hey Matt,

I can reproduce this by inserting a value at the beginning of an extremely large array-valued structured fact, but we'll need to know more about your particular data to confirm whether that's your particular issue. This could be some large custom fact you're creating or something generated by a module.

I've created a ticket here around this issue here
https://tickets.puppetlabs.com/browse/PDB-2003

can you connect to the database via psql and share (either here or in the ticket) the output of

select count(*),name from fact_paths group by name order by count desc;

?

My hope is that that will identify one or more large structured facts associated with a lot of leaf values, and then we'll need to figure out where they're coming from.

Wyatt

Wyatt Alt

unread,
Sep 28, 2015, 1:45:49 PM9/28/15
to puppet...@googlegroups.com, Matt Jarvis
Just to clarify, I think the top few rows of that result should be enough to illustrate -- no need to include the whole thing.

Wyatt

Matt Jarvis

unread,
Sep 29, 2015, 3:20:33 AM9/29/15
to Puppet Users, matt....@datacentred.co.uk

 count |                      name                       

-------+-------------------------------------------------

     1 | macaddress_qvb34470225_cd

     1 | mtu_qbr2fb476b3_ff

     1 | speed_qvbfa2ec4e3_15

     1 | macaddress_qvo547572f9_14

     1 | speed_qvo2e200191_c0

     1 | mtu_qbr5eaffca5_fb

     1 | macaddress_qbr0d4ed278_e3

     1 | mtu_qvb8166a899_d1

     1 | speed_qvb4e0d1069_13

     1 | speed_qvbb2d99f31_86

     1 | mtu_qbr65afa39a_9a

     1 | speed_qvb336884d1_12

     1 | speed_qvbf81c2831_4f

     1 | mtu_qbr6d9cbcfc_82

     1 | mtu_qbr441a8d9c_9e

     1 | macaddress_qbrb400a4cf_a3

     1 | mtu_qbr0bdbfadc_6a

     1 | macaddress_qbrf9e0c7d4_7b

     1 | macaddress_qbr3fe74368_2f

     1 | macaddress_qvoc943cbcd_c3

     1 | macaddress_qvb7e04f0db_2b

     1 | mtu_qbrb42e4516_13

     1 | macaddress_qvbefdec85e_5b

     1 | mtu_qbr4575c981_84

     1 | speed_qvbb771b00f_b4

     1 | speed_qvo04f9f59c_d2

     1 | macaddress_qbre4308db4_12

     1 | speed_qvb997d8a21_72

     1 | mtu_qvo699d2518_05

     1 | mtu_qvbc5dcb18f_8b

     1 | mtu_qvb766c608d_7a

     1 | speed_qvo137786a3_ce

     1 | speed_qvo02ec32fd_28

     1 | macaddress_qbr3b6455da_f1

     1 | mtu_qvb993a2dfb_5e

     1 | macaddress_qvo14369bd5_d3


Is that enough of that query result ? We're an OpenStack public cloud provider, so in our cluster we have many network interfaces changing a lot when new virtual networks and machines are created - those are all related to virtual interfaces. Looks like the majority of that table is full of them. 

Wyatt Alt

unread,
Sep 29, 2015, 3:21:46 PM9/29/15
to puppet...@googlegroups.com
It's enough to shoot down my theory about structured facts. Assuming the "desc" was included in the order by, that result indicates that you aren't storing any structured facts at all.

The long parameter list in the query you've identified represents the fact paths (equivalent to fact names when there are no structured facts) that become invalidated when a node updates its set of facts in PuppetDB. In the case of a structured fact, this could happen if you inserted an element at the beginning of a large array, but with flat facts like you appear to have I think this would have to mean that a) the node has 26k+ facts associated with it and b) 26k facts are being renamed or removed between the last successful puppet run and the run that's failing.

The final parameter ($26355 in your case) represents the name of the node that's failing, and you can get the associated certname with the query by getting the value of that parameter from your postgres logs and issuing

select certname from factsets where id=<value of $26355>;

from psql.

Can you give me answers to the following:
- has PuppetDB been running fine prior to this issue or have you recently adopted it?
- does it seem possible that you have no structured facts in your database?
- can you give me the first 10 rows of this query?
select count(*),factset_id from facts group by factset_id order by count desc;
- can you get the certname of the failing node using
select certname from factsets where id=<value of $26355>;
and send me the output of
curl -X GET http://localhost:8080/v4/factsets -d 'query=["=","certname","<your certname>"]'
- once you have the certname, is there anything special about that node that you're aware of?
- can you send me the compressed contents of the failed replace-facts commands in your dead letter directory? These will be located at
/opt/puppetlabs/server/data/puppetdb/mq/discarded/replace-facts
if you're on PC1 and
/var/lib/puppetdb/mq/discarded/replace-facts
if you aren't, assuming you're using the default pathing.

Additionally, this is probably going to require some back and forth between us -- if you want to chime in on the ticket at https://tickets.puppetlabs.com/browse/PDB-2003 we can continue the discussion there, and if you're on IRC I'm available in #puppet on freenode as wkalt, mostly during work hours on US pacific time.

Thanks,
Wyatt

Mike Sharpton

unread,
Apr 19, 2016, 10:38:11 AM4/19/16
to Puppet Users, matt....@datacentred.co.uk
Hello Wyatt, 

I think I have ran into an issue with large structured facts.  I posted a new message about it, but I was wondering if you have a solution for large partitions and disks facts?  I am not sure what to do as I cannot disable facts, and I have a large number of nodes.  Your help is appreciated.  Thanks,

Mike
Reply all
Reply to author
Forward
0 new messages