Jira (PDB-4769) active_nodes CTE does a costly HashAggregate

0 views
Skip to first unread message

Austin Blatt (Jira)

unread,
Jun 3, 2020, 6:47:03 PM6/3/20
to puppe...@googlegroups.com
Austin Blatt created an issue
 
PuppetDB / Improvement PDB-4769
active_nodes CTE does a costly HashAggregate
Issue Type: Improvement Improvement
Assignee: Austin Blatt
Created: 2020/06/03 3:46 PM
Priority: Normal Normal
Reporter: Austin Blatt
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Austin Blatt (Jira)

unread,
Jun 3, 2020, 7:54:03 PM6/3/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Before
{code}
EXPLAIN ANALYZE WITH inactive_nodes AS (SELECT certname FROM certnames WHERE (deactivated IS NOT NULL OR expired IS NOT NULL)) SELECT fs.certname AS certname, env.environment AS environment, fs.key AS name, fs.value AS value FROM (select certname,        environment_id,        (jsonb_each((stable||volatile))).*  from factsets) fs LEFT JOIN environments env ON fs.environment_id = env.id WHERE ((fs.certname = 'wine-underworld.delivery.puppetlabs.net') AND ((fs.certname) in (SELECT certname FROM  ( (SELECT inactive_nodes.certname AS certname FROM inactive_nodes) )  sub)));
                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join  (cost=28.12..70.31 rows=100 width=128) (actual time=0.416..0.416 rows=0 loops=1)
   CTE inactive_nodes
     ->  Seq Scan on certnames  (cost=0.00..15.20 rows=520 width=32) (actual time=0.004..0.004 rows=0 loops=1)
           Filter: ((deactivated IS NOT NULL) OR (expired IS NOT NULL))
           Rows Removed by Filter: 1
   ->  Hash Right Join  (cost=12.93..40.42 rows=100 width=128) (actual time=0.373..0.392 rows=130 loops=1)
         Hash Cond: (env.id = fs.environment_id)
         ->  Seq Scan on environments env  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.005..0.006 rows=1 loops=1)
         ->  Hash  (cost=11.68..11.68 rows=100 width=104) (actual time=0.362..0.362 rows=130 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 34kB
               ->  Subquery Scan on fs  (cost=0.15..11.68 rows=100 width=104) (actual time=0.258..0.312 rows=130 loops=1)
                     ->  Result  (cost=0.15..10.68 rows=100 width=104) (actual time=0.257..0.299 rows=130 loops=1)
                     ->  ProjectSet  (cost=0.15..8.68 rows=100 width=72) (actual time=0.254..0.280 rows=130 loops=1)
                     ->  Index Scan using factsets_certname_idx on factsets  (cost=0.15..8.17 rows=1 width=104) (actual time=0.013..0.013 rows=1 loops=1)
                     Index Cond: (certname = 'wine-underworld.delivery.puppetlabs.net'::text)
   ->  CTE Scan on inactive_nodes  (cost=0.00..11.70 rows=3 width=32) (actual time=0.000..0.000 rows=0 loops=130)
         Filter: (certname = 'wine-underworld.delivery.puppetlabs.net'::text)
Planning Time: 0.254 ms
Execution Time: 0.480 ms
{code}

After
{code}
(19 rows)pe-puppetdb=# EXPLAIN ANALYZE WITH inactive_nodes AS (SELECT certname FROM certnames WHERE (deactivated IS NULL OR expired IS NULL)) SELECT fs.certname AS certname, env.environment AS environment, fs.key AS name,
fs.value AS value FROM (select certname,        environment_id,        (jsonb_each((stable||volatile))).*  from factsets) fs LEFT JOIN environments env ON fs.environment_id = env.id WHERE ((fs.certname = 'wine-u
nderworld.delivery.puppetlabs.net') AND ((fs.certname) in (SELECT certname FROM  ( (SELECT inactive_nodes.certname AS certname FROM inactive_nodes) )  sub)));
                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=28.24..56.75 rows=100 width=128) (actual time=0.404..0.445 rows=130 loops=1)
   CTE inactive_nodes
     ->  Seq Scan on certnames  (cost=0.00..15.20 rows=5 width=32) (actual time=0.010..0.010 rows=1 loops=1)
           Filter: ((deactivated IS NULL) OR (expired IS NULL))
   ->  HashAggregate  (cost=0.11..0.12 rows=1 width=32) (actual time=0.016..0.016 rows=1 loops=1)
         Group Key: inactive_nodes.certname
         ->  CTE Scan on inactive_nodes  (cost=0.00..0.11 rows=1 width=32) (actual time=0.012..0.013 rows=1 loops=1)
               Filter: (certname = 'wine-underworld.delivery.puppetlabs.net'::text)
   ->  Hash Right Join  (cost=12.93..40.42 rows=100 width=128) (actual time=0.386..0.406 rows=130 loops=1)
         Hash Cond: (env.id = fs.environment_id)
         ->  Seq Scan on environments env  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.009..0.010 rows=1 loops=1)
         ->  Hash  (cost=11.68..11.68 rows=100 width=104) (actual time=0.372..0.372 rows=130 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 34kB
               ->  Subquery Scan on fs  (cost=0.15..11.68 rows=100 width=104) (actual time=0.259..0.315 rows=130 loops=1)
                     ->  Result  (cost=0.15..10.68 rows=100 width=104) (actual time=0.258..0.302 rows=130 loops=1)
                     ->  ProjectSet  (cost=0.15..8.68 rows=100 width=72) (actual time=0.256..0.283 rows=130 loops=1)
                     ->  Index Scan using factsets_certname_idx on factsets  (cost=0.15..8.17 rows=1 width=104) (actual time=0.017..0.017 rows=1 loops=1)
                     Index Cond: (certname = 'wine-underworld.delivery.puppetlabs.net'::text)
Planning Time: 0.240 ms
Execution Time: 0.542 ms
(20 rows)
{code}

In a large database, for a reports query, this HashAggregate step took more than two thirds the total execution time.

Austin Blatt (Jira)

unread,
Jun 5, 2020, 5:22:03 PM6/5/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Fix Version/s: PDB 6.10.2

Austin Blatt (Jira)

unread,
Jun 5, 2020, 5:23:02 PM6/5/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Release Notes Summary: Fixes a bug that caused PuppetDB to use a CTE that materialized a large table, slowing queries

Austin Blatt (Jira)

unread,
Jun 5, 2020, 6:49:03 PM6/5/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Fix Version/s: PDB 6.11.1

Austin Boyd (Jira)

unread,
Jun 16, 2020, 1:26:03 AM6/16/20
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 39600

Austin Boyd (Jira)

unread,
Jun 16, 2020, 1:26:03 AM6/16/20
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Labels: jira_escalated

Austin Boyd (Jira)

unread,
Jun 24, 2020, 5:00:04 PM6/24/20
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 39600
Labels: jira_escalated

Zachary Kent (Jira)

unread,
Jul 10, 2020, 4:45:03 PM7/10/20
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Fix Version/s: PDB 6.10.2
Fix Version/s: PDB 6.11.2

Austin Blatt (Jira)

unread,
Aug 24, 2020, 10:46:03 AM8/24/20
to puppe...@googlegroups.com
Austin Blatt commented on Improvement PDB-4769
 
Re: active_nodes CTE does a costly HashAggregate

Re-opened for backporting this to 5.2.x

Austin Blatt (Jira)

unread,
Aug 26, 2020, 12:59:03 PM8/26/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
 
Change By: Austin Blatt
Fix Version/s: PDB 5.2.19
Reply all
Reply to author
Forward
0 new messages