Jira (PDB-4832) Partitioned reports table can result in slower PDB queries

16 views
Skip to first unread message

Adrian Parreiras Horta (Jira)

unread,
Jul 28, 2020, 7:19:03 PM7/28/20
to puppe...@googlegroups.com
Adrian Parreiras Horta created an issue
 
PuppetDB / Bug PDB-4832
Partitioned reports table can result in slower PDB queries
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2020/07/28 4:18 PM
Priority: Major Major
Reporter: Adrian Parreiras Horta

Particularly with the PE Console, the newly partitioned reports table can result in slower queries compared to previous versions, possibly because of the removal of this index. For an example, see this explain plan where it has to sequence scan 13 million rows to populate part of the "Status" page.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Jarret Lavallee (Jira)

unread,
Jul 28, 2020, 7:21:03 PM7/28/20
to puppe...@googlegroups.com

Austin Blatt (Jira)

unread,
Jul 29, 2020, 2:32:03 PM7/29/20
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-4832
 
Re: Partitioned reports table can result in slower PDB queries

Yeah, it definitely seems that the report partitioning has had a negative effect on the lastest_reports join in the nodes query. We will need to look into a strategy to help the certnames table find the proper partitions to look into.

Nick Walker (Jira)

unread,
Jul 30, 2020, 11:27:10 AM7/30/20
to puppe...@googlegroups.com
Nick Walker commented on Bug PDB-4832

We're missing an index on id for all of the child tables

pe-puppetdb=# \d+ reports_20200717z
                                                           Table "public.reports_20200717z"
        Column         |           Type           | Collation | Nullable |               Default               | Storage  | Stats target | Description
-----------------------+--------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
 id                    | bigint                   |           | not null | nextval('reports_id_seq'::regclass) | plain    |              |
 hash                  | bytea                    |           | not null |                                     | extended |              |
 transaction_uuid      | uuid                     |           |          |                                     | plain    |              |
 certname              | text                     |           | not null |                                     | extended |              |
 puppet_version        | text                     |           | not null |                                     | extended |              |
 report_format         | smallint                 |           | not null |                                     | plain    |              |
 configuration_version | text                     |           | not null |                                     | extended |              |
 start_time            | timestamp with time zone |           | not null |                                     | plain    |              |
 end_time              | timestamp with time zone |           | not null |                                     | plain    |              |
 receive_time          | timestamp with time zone |           | not null |                                     | plain    |              |
 noop                  | boolean                  |           |          |                                     | plain    |              |
 environment_id        | bigint                   |           |          |                                     | plain    |              |
 status_id             | bigint                   |           |          |                                     | plain    |              |
 metrics_json          | json                     |           |          |                                     | extended |              |
 logs_json             | json                     |           |          |                                     | extended |              |
 producer_timestamp    | timestamp with time zone |           | not null |                                     | plain    |              |
 metrics               | jsonb                    |           |          |                                     | extended |              |
 logs                  | jsonb                    |           |          |                                     | extended |              |
 resources             | jsonb                    |           |          |                                     | extended |              |
 catalog_uuid          | uuid                     |           |          |                                     | plain    |              |
 cached_catalog_status | text                     |           |          |                                     | extended |              |
 code_id               | text                     |           |          |                                     | extended |              |
 producer_id           | bigint                   |           |          |                                     | plain    |              |
 noop_pending          | boolean                  |           |          |                                     | plain    |              |
 corrective_change     | boolean                  |           |          |                                     | plain    |              |
 job_id                | text                     |           |          |                                     | extended |              |
 report_type           | text                     |           | not null | 'agent'::text                       | extended |              |
Indexes:
    "reports_hash_expr_idx_20200717z" UNIQUE, btree (encode(hash, 'hex'::text))
    "idx_reports_compound_id_20200717z" btree (producer_timestamp, certname, hash) WHERE start_time IS NOT NULL
    "idx_reports_noop_pending_20200717z" btree (noop_pending) WHERE noop_pending = true
    "idx_reports_prod_20200717z" btree (producer_id)
    "idx_reports_producer_timestamp_20200717z" btree (producer_timestamp)
    "idx_reports_producer_timestamp_by_hour_certname_20200717z" btree (date_trunc('hour'::text, timezone('UTC'::text, producer_timestamp)), producer_timestamp, certname)
    "reports_cached_catalog_status_on_fail_20200717z" btree (cached_catalog_status) WHERE cached_catalog_status = 'on_failure'::text
    "reports_catalog_uuid_idx_20200717z" btree (catalog_uuid)
    "reports_certname_idx_20200717z" btree (certname)
    "reports_end_time_idx_20200717z" btree (end_time)
    "reports_environment_id_idx_20200717z" btree (environment_id)
    "reports_job_id_idx_20200717z" btree (job_id) WHERE job_id IS NOT NULL
    "reports_noop_idx_20200717z" btree (noop) WHERE noop = true
    "reports_status_id_idx_20200717z" btree (status_id)
    "reports_tx_uuid_expr_idx_20200717z" btree ((transaction_uuid::text))
Check constraints:
    "reports_20200717z_producer_timestamp_check" CHECK (producer_timestamp >= '2020-07-17 00:00:00+00'::timestamp with time zone AND producer_timestamp < '2020-07-18 00:00:00+00'::timestamp with time zone)
Foreign-key constraints:
    "reports_certname_fkey_20200717z" FOREIGN KEY (certname) REFERENCES certnames(certname) ON DELETE CASCADE
    "reports_env_fkey_20200717z" FOREIGN KEY (environment_id) REFERENCES environments(id) ON DELETE CASCADE
    "reports_prod_fkey_20200717z" FOREIGN KEY (producer_id) REFERENCES producers(id)
    "reports_status_fkey_20200717z" FOREIGN KEY (status_id) REFERENCES report_statuses(id) ON DELETE CASCADE
Inherits: reports

pe-puppetdb=# EXPLAIN ANALYZE
select id from reports where id = 100; ;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..164.57 rows=15 width=8) (actual time=0.050..0.322 rows=1 loops=1)
   ->  Seq Scan on reports  (cost=0.00..0.00 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
         Filter: (id = 100)
   ->  Seq Scan on reports_20200717z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.028..0.028 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200718z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.016..0.018 rows=1 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 47
   ->  Seq Scan on reports_20200719z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.021..0.022 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200720z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.020..0.020 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200721z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200722z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.022..0.022 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200723z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.021..0.021 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200724z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.020..0.020 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200725z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.024..0.024 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200726z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.025..0.025 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200727z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.020..0.020 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200728z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.024..0.024 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200729z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.020..0.020 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200730z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.027..0.027 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 30
 Planning Time: 2.009 ms
 Execution Time: 0.460 ms
(47 rows)

pe-puppetdb=# CREATE UNIQUE INDEX test on reports_20200717z(id);
CREATE INDEX
pe-puppetdb=# CREATE UNIQUE INDEX test2 on reports_20200718z(id);
CREATE INDEX
pe-puppetdb=# CREATE UNIQUE INDEX test3 on reports_20200719z(id);
CREATE INDEX
pe-puppetdb=# CREATE UNIQUE INDEX test3 on reports_20200720z(id);
ERROR:  relation "test3" already exists
pe-puppetdb=# CREATE UNIQUE INDEX test4 on reports_20200720z(id);
CREATE INDEX
pe-puppetdb=# CREATE UNIQUE INDEX test5 on reports_20200721z(id);
CREATE INDEX
pe-puppetdb=# CREATE UNIQUE INDEX test6 on reports_20200722z(id);
CREATE INDEX
pe-puppetdb=# EXPLAIN ANALYZE
select id from reports where id = 100;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..141.90 rows=15 width=8) (actual time=0.046..0.334 rows=1 loops=1)
   ->  Seq Scan on reports  (cost=0.00..0.00 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
         Filter: (id = 100)
   ->  Seq Scan on reports_20200717z  (cost=0.00..7.60 rows=1 width=8) (actual time=0.022..0.022 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200718z  (cost=0.00..7.60 rows=1 width=8) (actual time=0.019..0.021 rows=1 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 47
   ->  Index Only Scan using test3 on reports_20200719z  (cost=0.14..8.16 rows=1 width=8) (actual time=0.051..0.051 rows=0 loops=1)
         Index Cond: (id = 100)
         Heap Fetches: 0
   ->  Index Only Scan using test4 on reports_20200720z  (cost=0.14..8.16 rows=1 width=8) (actual time=0.026..0.026 rows=0 loops=1)
         Index Cond: (id = 100)
         Heap Fetches: 0
   ->  Index Only Scan using test5 on reports_20200721z  (cost=0.14..8.16 rows=1 width=8) (actual time=0.020..0.020 rows=0 loops=1)
         Index Cond: (id = 100)
         Heap Fetches: 0
   ->  Index Only Scan using test6 on reports_20200722z  (cost=0.14..8.16 rows=1 width=8) (actual time=0.024..0.024 rows=0 loops=1)
         Index Cond: (id = 100)
         Heap Fetches: 0
   ->  Seq Scan on reports_20200723z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.026..0.026 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200724z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.017..0.017 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200725z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.017..0.017 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200726z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200727z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.017..0.017 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200728z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.032..0.032 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200729z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 48
   ->  Seq Scan on reports_20200730z  (cost=0.00..11.75 rows=1 width=8) (actual time=0.017..0.017 rows=0 loops=1)
         Filter: (id = 100)
         Rows Removed by Filter: 30
 Planning Time: 2.608 ms
 Execution Time: 0.422 ms
(47 rows)

Eric Thompson (Jira)

unread,
Jul 30, 2020, 4:54:03 PM7/30/20
to puppe...@googlegroups.com

Eric Thompson (Jira)

unread,
Jul 30, 2020, 4:55:03 PM7/30/20
to puppe...@googlegroups.com

Jarret Lavallee (Jira)

unread,
Jul 30, 2020, 6:43:02 PM7/30/20
to puppe...@googlegroups.com
Jarret Lavallee updated an issue
Change By: Jarret Lavallee
CS Priority: Needs Priority Reviewed

Austin Blatt (Jira)

unread,
Jul 30, 2020, 7:24:03 PM7/30/20
to puppe...@googlegroups.com
Austin Blatt assigned an issue to Rob Browning
Change By: Austin Blatt
Assignee: Austin Blatt Rob Browning

Austin Blatt (Jira)

unread,
Jul 30, 2020, 7:25:03 PM7/30/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Affects Version/s: PDB 6.10.0

Austin Blatt (Jira)

unread,
Jul 30, 2020, 7:25:03 PM7/30/20
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages