PuppetDB and PostgreSQL 12 errors

311 views
Skip to first unread message

Bob Negri

unread,
Sep 17, 2020, 3:12:15 PM9/17/20
to Puppet Users
We recently updated our PuppetDB servers to PuppetDB 6.12.0 and PostgreSQL 12.
Started getting these errors:

ERROR:  relation "resource_events_20200917z" does not exist at character 13
ERROR:  relation "resource_events_20200917z" already exists
ERROR:  deadlock detected
ERROR:  could not serialize access due to concurrent delete

Not sure if it is a PuppetDB setting or a Postgresql issue. Has anyone else seen this?

Here is more detail:

2020-09-17 14:32:49.515 UTC [3941] ERROR:  relation "resource_events_20200917z" does not exist at character 13
2020-09-17 14:32:49.515 UTC [3941] QUERY:  INSERT INTO resource_events_20200917Z SELECT ($1).*
2020-09-17 14:32:49.515 UTC [3941] CONTEXT:  PL/pgSQL function resource_events_insert_trigger() line 8 at EXECUTE
2020-09-17 14:32:49.515 UTC [3941] STATEMENT:  INSERT INTO resource_events ( new_value, property, name, file, report_id, event_hash, old_value, containing_class, certname_id, line, resource_type, status, resource_title, timestamp, containment_path, message ) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16 )
        RETURNING *
2020-09-17 14:32:49.538 UTC [3937] ERROR:  relation "resource_events_20200917z" already exists
2020-09-17 14:32:49.538 UTC [3937] STATEMENT:  CREATE TABLE IF NOT EXISTS resource_events_20200917Z (CHECK ( "timestamp" >= TIMESTAMP WITH TIME ZONE '2020-09-17T00:00:00Z' AND "timestamp" < TIMESTAMP WITH TIME ZONE '2020-09-18T00:00:00Z' )) INHERITS (resource_events)
2020-09-17 14:32:49.538 UTC [3945] ERROR:  relation "resource_events_20200917z" already exists
2020-09-17 14:32:49.538 UTC [3945] STATEMENT:  CREATE TABLE IF NOT EXISTS resource_events_20200917Z (CHECK ( "timestamp" >= TIMESTAMP WITH TIME ZONE '2020-09-17T00:00:00Z' AND "timestamp" < TIMESTAMP WITH TIME ZONE '2020-09-18T00:00:00Z' )) INHERITS (resource_events)
2020-09-17 14:32:49.538 UTC [3941] ERROR:  relation "resource_events_20200917z" already exists
2020-09-17 14:32:49.538 UTC [3941] STATEMENT:  CREATE TABLE IF NOT EXISTS resource_events_20200917Z (CHECK ( "timestamp" >= TIMESTAMP WITH TIME ZONE '2020-09-17T00:00:00Z' AND "timestamp" < TIMESTAMP WITH TIME ZONE '2020-09-18T00:00:00Z' )) INHERITS (resource_events)
2020-09-17 14:33:27.917 UTC [2875] ERROR:  deadlock detected
2020-09-17 14:33:27.917 UTC [2875] DETAIL:  Process 2875 waits for AccessExclusiveLock on relation 7883116 of database 16385; blocked by process 3945.
        Process 3945 waits for RowExclusiveLock on relation 7883178 of database 16385; blocked by process 2875.
        Process 2875: drop table if exists reports_20200917z cascade
        Process 3945: INSERT INTO resource_events ( new_value, property, name, file, report_id, event_hash, old_value, containing_class, certname_id, line, resource_type, status, resource_title, timestamp, containment_path, message ) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16 )
        RETURNING *


2020-09-17 14:34:47.339 UTC [2875] ERROR:  could not serialize access due to concurrent delete
2020-09-17 14:34:47.339 UTC [2875] STATEMENT:  delete from fact_paths fp  where not exists (select 1 from tmp_live_paths                      where tmp_live_paths.path = fp.path)

Zachary Kent

unread,
Sep 17, 2020, 11:27:29 PM9/17/20
to puppet...@googlegroups.com
From the logs above it looks like the gc process in PuppetDB represented by the Postgres process 2875 is trying to drop the reports_20200917z partition while other transactions are trying to insert into resource_events partitions for the same day. PuppetDB handles partition creation by first trying to insert a row and then creating the partition for a given day if one doesn't yet exist.  

I'm wondering if your install has the report_ttl or resource_events_ttl described in the PuppetDB config docs set to a value that's less than one day. If this were the case it's possible that the gc process is trying to drop partitions for the same day that incoming commands are trying to create them for, causing conflicts. Normally you could expect to see a couple of the "resource_events_20200917z" does not exist errors per day around the same time for both the reports_<date> and resource_events_<date> partitions. If your ttl settings aren't set to less than a day and you're seeing these errors more regularly than daily please let us know. 
  

2020-09-17 14:34:47.339 UTC [2875] ERROR:  could not serialize access due to concurrent delete
2020-09-17 14:34:47.339 UTC [2875] STATEMENT:  delete from fact_paths fp  where not exists (select 1 from tmp_live_paths                      where tmp_live_paths.path = fp.path)

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/6f23bccd-22cd-48dd-acd8-e8e0247440fdn%40googlegroups.com.

Bob Negri

unread,
Sep 18, 2020, 1:29:26 PM9/18/20
to Puppet Users
So 4 hrs might not be big enough? (Looks like the person who did the update did not check the documents.) Trying the new settings now.

Zachary Kent

unread,
Sep 18, 2020, 4:04:54 PM9/18/20
to puppet...@googlegroups.com
On Fri, Sep 18, 2020 at 6:29 AM Bob Negri <bobne...@gmail.com> wrote:
So 4 hrs might not be big enough? (Looks like the person who did the update did not check the documents.) Trying the new settings now.

Yea I don't think 4 hrs is long enough and would cause the behavior you're seeing. In that case when gc fires after 4 hours there will be incoming commands trying to recreate the table that gc is trying to drop. Then every time the gc_interval passes the same thing will happen again causing the errors you're seeing. If you set the report_ttl and resource_events_ttl settings to over 24 hours this issue should go away. You'll still see some of the "resource_events_20200917z" does not exist errors, but these should only happen once a day when a new partition needs to be created.

I think our documentation for those two settings might be lacking with regards to this issue. We added partitioning for reports and resource_events recently and didn't mention the case you're describing in the docs. We will make an update to the docs soon using this ticket: PDB-4899 to track the work. Thanks for bringing this up!

 
Reply all
Reply to author
Forward
0 new messages