Jira (PDB-2398) PuppetDB should enable emergency deletion of historical data

37 views
Skip to first unread message

Charlie Sharpsteen (JIRA)

unread,
Feb 5, 2016, 9:20:02 PM2/5/16
to puppe...@googlegroups.com
Charlie Sharpsteen created an issue
 
PuppetDB / Improvement PDB-2398
PuppetDB should enable emergency deletion of historical data
Issue Type: Improvement Improvement
Assignee: Unassigned
Created: 2016/02/05 6:19 PM
Labels: support
Priority: Normal Normal
Reporter: Charlie Sharpsteen

When a disk fills up on a database server, it is useful to have an option for reducing the disk space used without deleting data used by catalog compilation or requiring an increase in disk space. Disk increases can be infeasible or may require a long change cycle during which the Puppet infrastructure is inoperative.

Historical Context

In PuppetDB 2.x disk usage could be reduced by truncating the reports table, which is often the largest table in the database by several orders of magnitude. This would return large amounts of space to the operating system, enabling further maintenance operations, while keeping exported resources intact for catalog compilation:

# PE 3.8.x
 
# sudo -u pe-postgres /opt/puppet/bin/psql -d pe-puppetdb
could not change directory to "/root"
psql (9.2.14)
Type "help" for help.
 
pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     5
(1 row)
 
pe-puppetdb=# TRUNCATE TABLE reports CASCADE;
NOTICE:  truncate cascades to table "resource_events"
NOTICE:  truncate cascades to table "latest_reports"
TRUNCATE TABLE
pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     5
(1 row)

In 2015.3.x, the TRUNCATE operation now cascades beyond the tables related to reporting and wipes out exported resources, which negatively impacts catalog compilation:

# PE 2015.x
 
# sudo -u pe-postgres /opt/puppetlabs/server/bin/psql -d pe-puppetdb
could not change directory to "/root": Permission denied
psql (9.4.5)
Type "help" for help.
 
pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     6
(1 row)
 
pe-puppetdb=# TRUNCATE TABLE reports CASCADE;
NOTICE:  truncate cascades to table "certnames"
NOTICE:  truncate cascades to table "resource_events"
NOTICE:  truncate cascades to table "factsets"
NOTICE:  truncate cascades to table "catalogs"
NOTICE:  truncate cascades to table "facts"
NOTICE:  truncate cascades to table "catalog_resources"
TRUNCATE TABLE
pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     0
(1 row)

Desired Functionality

PuppetDB should allow for emergency removal of historical data without impacting catalog compilation or requiring changes to disk space allocation.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.12#64027-sha1:e3691cc)
Atlassian logo

Charlie Sharpsteen (JIRA)

unread,
Feb 5, 2016, 9:23:07 PM2/5/16
to puppe...@googlegroups.com

Charlie Sharpsteen (JIRA)

unread,
Feb 5, 2016, 9:30:03 PM2/5/16
to puppe...@googlegroups.com

Charlie Sharpsteen (JIRA)

unread,
Feb 5, 2016, 9:31:02 PM2/5/16
to puppe...@googlegroups.com
Charlie Sharpsteen updated an issue
When a disk fills up on a database server, it is useful to have an option for reducing the disk space used without deleting data used by catalog compilation or requiring an increase in disk space. Disk increases can be infeasible or may require a long change cycle during which the Puppet infrastructure is inoperative.

h2. Historical Context

In PuppetDB 2.x disk usage could be reduced
 in an emergency  by truncating the reports table, which is often the largest table in the database by several orders of magnitude. This would return large amounts of space to the operating system, enabling further maintenance operations, while keeping exported resources intact for catalog compilation:

{noformat}

# PE 3.8.x

# sudo -u pe-postgres /opt/puppet/bin/psql -d pe-puppetdb
could not change directory to "/root"
psql (9.2.14)
Type "help" for help.

pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     5
(1 row)

pe-puppetdb=# TRUNCATE TABLE reports CASCADE;
NOTICE:  truncate cascades to table "resource_events"
NOTICE:  truncate cascades to table "latest_reports"
TRUNCATE TABLE
pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     5
(1 row)
{noformat}


In 2015.3.x, the TRUNCATE operation now cascades beyond the tables related to reporting and wipes out exported resources, which negatively impacts catalog compilation:

{noformat}

# PE 2015.x

# sudo -u pe-postgres /opt/puppetlabs/server/bin/psql -d pe-puppetdb
could not change directory to "/root": Permission denied
psql (9.4.5)
Type "help" for help.

pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     6
(1 row)

pe-puppetdb=# TRUNCATE TABLE reports CASCADE;
NOTICE:  truncate cascades to table "certnames"
NOTICE:  truncate cascades to table "resource_events"
NOTICE:  truncate cascades to table "factsets"
NOTICE:  truncate cascades to table "catalogs"
NOTICE:  truncate cascades to table "facts"
NOTICE:  truncate cascades to table "catalog_resources"
TRUNCATE TABLE
pe-puppetdb=# SELECT count(*) FROM catalog_resources WHERE exported = true;
 count
-------
     0
(1 row)
{noformat}

h2. Desired Functionality


PuppetDB should allow for emergency removal of historical data without impacting catalog compilation or requiring changes to disk space allocation.

Charlie Sharpsteen (JIRA)

unread,
Feb 7, 2016, 3:55:03 PM2/7/16
to puppe...@googlegroups.com

Charlie Sharpsteen (JIRA)

unread,
Feb 10, 2016, 12:41:07 PM2/10/16
to puppe...@googlegroups.com
Charlie Sharpsteen commented on Improvement PDB-2398
 
Re: PuppetDB should enable emergency deletion of historical data

Currently, in PDB 3.x, the certnames_reports_id_fkey constraint is what propagates TRUNCATE CASCADE beyond the reports and resource_events tables:

pe-puppetdb=# \d+ reports
                                                             Table "public.reports"
        Column         |           Type           |                      Modifiers                       | Storage  | Stats target | Description
-----------------------+--------------------------+------------------------------------------------------+----------+--------------+-------------
 id                    | bigint                   | not null default nextval('reports_id_seq'::regclass) | plain    |              |
 hash                  | bytea                    | not null                                             | extended |              |
 transaction_uuid      | uuid                     |                                                      | plain    |              |
 certname              | text                     | not null                                             | extended |              |
 puppet_version        | character varying(255)   | not null                                             | extended |              |
 report_format         | smallint                 | not null                                             | plain    |              |
 configuration_version | character varying(255)   | not null                                             | extended |              |
 start_time            | timestamp with time zone | not null                                             | plain    |              |
 end_time              | timestamp with time zone | not null                                             | plain    |              |
 receive_time          | timestamp with time zone | not null                                             | plain    |              |
 noop                  | boolean                  |                                                      | plain    |              |
 environment_id        | bigint                   |                                                      | plain    |              |
 status_id             | bigint                   |                                                      | plain    |              |
 metrics_json          | json                     |                                                      | extended |              |
 logs_json             | json                     |                                                      | extended |              |
 producer_timestamp    | timestamp with time zone | not null                                             | plain    |              |
 metrics               | jsonb                    |                                                      | extended |              |
 logs                  | jsonb                    |                                                      | extended |              |
 resources             | jsonb                    |                                                      | extended |              |
Indexes:
    "reports_pkey" PRIMARY KEY, btree (id)
    "reports_hash_expr_idx" UNIQUE, btree (ltrim(hash::text, '\x'::text))
    "idx_reports_producer_timestamp" btree (producer_timestamp)
    "reports_certname_idx" btree (certname)
    "reports_end_time_idx" btree (end_time)
    "reports_environment_id_idx" btree (environment_id)
    "reports_status_id_idx" btree (status_id)
    "reports_tx_uuid_expr_idx" btree ((transaction_uuid::text))
Foreign-key constraints:
    "reports_certname_fkey" FOREIGN KEY (certname) REFERENCES certnames(certname) ON DELETE CASCADE
    "reports_env_fkey" FOREIGN KEY (environment_id) REFERENCES environments(id) ON DELETE CASCADE
    "reports_status_fkey" FOREIGN KEY (status_id) REFERENCES report_statuses(id) ON DELETE CASCADE
Referenced by:
    TABLE "certnames" CONSTRAINT "certnames_reports_id_fkey" FOREIGN KEY (latest_report_id) REFERENCES reports(id) ON DELETE SET NULL
    TABLE "resource_events" CONSTRAINT "resource_events_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE

Russell Mull (JIRA)

unread,
Jun 9, 2017, 12:57:03 PM6/9/17
to puppe...@googlegroups.com
Russell Mull updated an issue
 
Change By: Russell Mull
Sprint: Hopper
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Nick Walker (JIRA)

unread,
Jun 19, 2017, 7:49:02 PM6/19/17
to puppe...@googlegroups.com
Nick Walker updated an issue
Change By: Nick Walker

h1.  Workaround  / Proposed Implementation 

{code}
# cat /tmp/emergency_delete.sql
BEGIN TRANSACTION;

ALTER TABLE certnames DROP CONSTRAINT IF EXISTS certnames_reports_id_fkey;

UPDATE certnames SET latest_report_id = NULL;

TRUNCATE TABLE reports CASCADE;

ALTER TABLE certnames ADD CONSTRAINT "certnames_reports_id_fkey" FOREIGN KEY (latest_report_id) REFERENCES reports(id) ON DELETE SET NULL;

COMMIT TRANSACTION;
{code}
{code}
# su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -f /tmp/emergency_delete.sql"
BEGIN
ALTER TABLE
UPDATE 1
psql:/tmp/emergency_delete.sql:7: NOTICE:  truncate cascades to table "resource_events"
TRUNCATE TABLE
ALTER TABLE
COMMIT
{code}

Nick Walker (JIRA)

unread,
Sep 27, 2017, 2:11:08 PM9/27/17
to puppe...@googlegroups.com
echo " BEGIN TRANSACTION;


ALTER TABLE certnames DROP CONSTRAINT IF EXISTS certnames_reports_id_fkey;

UPDATE certnames SET latest_report_id = NULL;

TRUNCATE TABLE reports CASCADE;

ALTER TABLE certnames ADD CONSTRAINT "certnames_reports_id_fkey" FOREIGN KEY (latest_report_id) REFERENCES reports(id) ON DELETE SET NULL;

COMMIT TRANSACTION; " > /tmp/emergency_delete.sql
{code}
{code}
# su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -f /tmp/emergency_delete.sql"
BEGIN
ALTER TABLE
UPDATE 1
psql:/tmp/emergency_delete.sql:7: NOTICE:  truncate cascades to table "resource_events"
TRUNCATE TABLE
ALTER TABLE
COMMIT
{code}

Rob Browning (JIRA)

unread,
Aug 13, 2019, 3:35:38 PM8/13/19
to puppe...@googlegroups.com
Rob Browning commented on Improvement PDB-2398
 
Re: PuppetDB should enable emergency deletion of historical data

It sounds like there's a chance this might be obviated by PDB-4464 if it pans out.

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Nick Walker (JIRA)

unread,
Dec 13, 2019, 11:35:04 AM12/13/19
to puppe...@googlegroups.com
Nick Walker updated an issue
Change By: Nick Walker
When a disk fills up on a database server, it is useful to have an option for reducing the disk space used without deleting data used by catalog compilation or requiring an increase in disk space. Disk increases can be infeasible or may require a long change cycle during which the Puppet infrastructure is inoperative.

It's also useful to TRUNCATE the reports table before a major upgrade that will need to migrate the reports table or if the upgrade contains a PostgreSQL upgrade that will copy the entire database.  

echo "BEGIN TRANSACTION;

ALTER TABLE certnames DROP CONSTRAINT IF EXISTS certnames_reports_id_fkey;

UPDATE certnames SET latest_report_id = NULL;

TRUNCATE TABLE reports CASCADE;

ALTER TABLE certnames ADD CONSTRAINT "certnames_reports_id_fkey" FOREIGN KEY (latest_report_id) REFERENCES reports(id) ON DELETE SET NULL;

COMMIT TRANSACTION;" > /tmp/emergency_delete.sql
{code}
{code}
# su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -f /tmp/emergency_delete.sql"
BEGIN
ALTER TABLE
UPDATE 1
psql:/tmp/emergency_delete.sql:7: NOTICE:  truncate cascades to table "resource_events"
TRUNCATE TABLE
ALTER TABLE
COMMIT
{code}

Austin Blatt (Jira)

unread,
Apr 10, 2020, 12:15:04 PM4/10/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Fix Version/s: PDB 5.2.14
Fix Version/s: PDB 6.10.0
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Austin Blatt (Jira)

unread,
Apr 10, 2020, 12:16:04 PM4/10/20
to puppe...@googlegroups.com
Austin Blatt assigned an issue to Austin Blatt
Change By: Austin Blatt
Release Notes: New Feature
Release Notes Summary: Adds a puppetdb subcommand, `delete-reports`, that will stop PuppetDB and delete all reports and their resource events.
Assignee: Austin Blatt
Resolution: Fixed
Status: Accepted Resolved

Zachary Kent (Jira)

unread,
Mar 10, 2021, 3:32:03 PM3/10/21
to puppe...@googlegroups.com

Zachary Kent (Jira)

unread,
Mar 10, 2021, 3:34:03 PM3/10/21
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Sprint: Release Engineering Hopper

Morgan Rhodes (Jira)

unread,
Mar 11, 2021, 2:30:08 PM3/11/21
to puppe...@googlegroups.com
Morgan Rhodes updated an issue
Change By: Morgan Rhodes
Sprint: Release Engineering Hopper
Reply all
Reply to author
Forward
0 new messages