Jira (PDB-5051) Slow queries from PuppetDB to PostgreSQL after migration 4->7

62 views
Skip to first unread message

kaswob (Jira)

unread,
Feb 26, 2021, 9:22:04 AM2/26/21
to puppe...@googlegroups.com
kaswob created an issue
 
PuppetDB / Bug PDB-5051
Slow queries from PuppetDB to PostgreSQL after migration 4->7
Issue Type: Bug Bug
Assignee: Unassigned
Components: PuppetDB
Created: 2021/02/26 6:21 AM
Priority: Normal Normal
Reporter: kaswob

We are in the process of upgrading our infrastructure from Puppet 4 to Puppet 7 and we've run to an issue with slow queries to PostgreSQL created by Puppetdb.
It takes between 30 and 60s to execute the query on PostgreSQL server when it's executed manually and freezes the whole database server when there are plenty of nodes plugged in.

During the query execution one of CPU cores is 100% consumed on the PSQL server.

Migration process:
Puppetdb 4.4.0 -> Puppetdb 5.2.19 (versions earlier than 5.2.9 failed while starting Puppetdb service - possibly migration issues) -> Puppetdb 6.14.0 -> Puppetdb 7.1.0
PostgreSQL 9.4 -> 13 (although tried with version 9.6 and there is no difference/the performance is even worse)

 

Curl with query execution:

curl 127.0.0.1:8080/pdb/query/v4 -H 'Content-Type:application/json' -XPOST -d '{"query": ["from", "facts", ["and", ["=", "name", "ipaddress"], ["in", "certname", ["extract", "certname", ["select_resources", ["and", ["=", ["fact", "project"], "project_name"], ["=", ["fact", "server_class"], "server_class_name"], ["=", ["fact", "server_type"], "server_type_name"]]]]]]]}' 

where project/server_class/server_type are taken from ENC

Query that is executed on PostgreSQL server:

WITH inactive_nodes AS (SELECT certname FROM certnames WHERE (deactivated IS NOT NULL AND deactivated > '2021-02-09T12:42:14.899Z') OR (expired IS NOT NULL and expired > '2021-02-09T12:42:14.899Z')), not_active_nodes AS (SELECT certname FROM certnames WHERE (deactivated IS NOT NULL OR expired IS NOT NULL)) SELECT fs.certname AS certname, env.environment AS environment, fs.key AS name, fs.value AS value FROM (select certname, environment_id, 'ipaddress'::text as key, (stable||volatile)->'ipaddress' as value from factsets where (stable||volatile) ? 'ipaddress') fs LEFT JOIN environments env ON fs.environment_id = env.id WHERE (((fs.key = 'ipaddress') AND ((fs.certname) in (SELECT certname FROM ( (SELECT c.certname AS certname FROM catalog_resources resources INNER JOIN certnames ON resources.certname_id = certnames.id INNER JOIN catalogs c ON c.certname = certnames.certname LEFT JOIN environments e ON c.environment_id = e.id LEFT JOIN resource_params_cache rpc ON rpc.resource = resources.resource WHERE (((c.certname) in (SELECT certname FROM ( (SELECT certnames.certname AS certname FROM factsets fs LEFT JOIN environments ON fs.environment_id = environments.id LEFT JOIN certnames ON fs.certname = certnames.certname WHERE (fs.stable||fs.volatile) @> '{"project": "project_name"}') ) sub)) AND ((c.certname) in (SELECT certname FROM ( (SELECT certnames.certname AS certname FROM factsets fs LEFT JOIN environments ON fs.environment_id = environments.id LEFT JOIN certnames ON fs.certname = certnames.certname WHERE (fs.stable||fs.volatile) @> '{"server_class":"server_class_name"}') ) sub)) AND ((c.certname) in (SELECT certname FROM ( (SELECT certnames.certname AS certname FROM factsets fs LEFT JOIN environments ON fs.environment_id = environments.id LEFT JOIN certnames ON fs.certname = certnames.certname WHERE (fs.stable||fs.volatile) @> '{"server_type":"server_type_name"}') ) sub)))) ) sub))) AND NOT ((fs.certname) in (SELECT certname FROM ( (SELECT not_active_nodes.certname AS certname FROM not_active_nodes) ) sub)));

explain of the query:

 Nested Loop Left Join  (cost=19.97..1767.19 rows=1 width=118)
   Join Filter: (factsets.environment_id = env.id)
   ->  Nested Loop Semi Join  (cost=19.97..1766.11 rows=1 width=53)
         ->  Seq Scan on factsets  (cost=17.22..64.32 rows=5 width=53)
               Filter: ((NOT (hashed SubPlan 1)) AND ((stable || volatile) ? 'ipaddress'::text))
               SubPlan 1
                 ->  Seq Scan on certnames certnames_4  (cost=0.00..17.20 rows=8 width=22)
                       Filter: ((deactivated IS NOT NULL) OR (expired IS NOT NULL))
         ->  Nested Loop Semi Join  (cost=2.75..340.35 rows=1 width=176)
               ->  Nested Loop Semi Join  (cost=2.20..339.48 rows=1 width=132)
                     ->  Nested Loop  (cost=1.65..338.62 rows=1 width=88)
                           ->  Nested Loop Semi Join  (cost=1.10..7.98 rows=1 width=96)
                                 ->  Nested Loop  (cost=0.55..7.12 rows=1 width=52)
                                       Join Filter: (factsets.certname = c.certname)
                                       ->  Index Scan using certnames_transform_certname_key on certnames  (cost=0.28..6.69 rows=1 width=30)
                                             Index Cond: (certname = factsets.certname)
                                       ->  Index Scan using catalogs_certname_idx on catalogs c  (cost=0.28..0.41 rows=1 width=30)
                                             Index Cond: (certname = certnames.certname)
                                 ->  Nested Loop  (cost=0.55..0.85 rows=1 width=44)
                                       Join Filter: (fs.certname = certnames_1.certname)
                                       ->  Index Scan using factsets_certname_idx on factsets fs  (cost=0.28..0.48 rows=1 width=30)
                                             Index Cond: (certname = c.certname)
                                             Filter: ((stable || volatile) @> '{"project": "project_name"}'::jsonb)
                                       ->  Index Only Scan using certnames_transform_certname_key on certnames certnames_1  (cost=0.28..0.36 rows=1 width=22)
                                             Index Cond: (certname = c.certname)
                           ->  Index Scan using catalog_resources_pkey1 on catalog_resources resources  (cost=0.55..318.82 rows=1182 width=29)
                                 Index Cond: (certname_id = certnames.id)
                     ->  Nested Loop  (cost=0.55..0.85 rows=1 width=44)
                           Join Filter: (fs_2.certname = certnames_3.certname)
                           ->  Index Scan using factsets_certname_idx on factsets fs_2  (cost=0.28..0.48 rows=1 width=30)
                                 Index Cond: (certname = c.certname)
                                 Filter: ((stable || volatile) @> '{"server_type": "server_type_name"}'::jsonb)
                           ->  Index Only Scan using certnames_transform_certname_key on certnames certnames_3  (cost=0.28..0.36 rows=1 width=22)
                                 Index Cond: (certname = c.certname)
               ->  Nested Loop  (cost=0.55..0.85 rows=1 width=44)
                     Join Filter: (fs_1.certname = certnames_2.certname)
                     ->  Index Scan using factsets_certname_idx on factsets fs_1  (cost=0.28..0.48 rows=1 width=30)
                           Index Cond: (certname = c.certname)
                           Filter: ((stable || volatile) @> '{"server_class": "server_class_name"}'::jsonb)
                     ->  Index Only Scan using certnames_transform_certname_key on certnames certnames_2  (cost=0.28..0.36 rows=1 width=22)
                           Index Cond: (certname = c.certname)
   ->  Seq Scan on environments env  (cost=0.00..1.03 rows=3 width=40)
(42 rows)

Query execution time on Puppetdb 4 is ~1s.
Issue appears even if migrating just from Puppetdb 4 to Puppetdb 5 and seems to improve a little in Puppetdb7, but still it's really slow

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Kamil Swoboda (Jira)

unread,
Mar 2, 2021, 7:23:57 PM3/2/21
to puppe...@googlegroups.com
Kamil Swoboda commented on Bug PDB-5051
 
Re: Slow queries from PuppetDB to PostgreSQL after migration 4->7

Query from Puppetdb 4.4.0 to PostgreSQL 9.4:

 

WITH inactive_nodes AS (SELECT certname FROM certnames WHERE (deactivated IS NOT NULL OR expired IS NOT NULL)) SELECT fs.certname AS certname, fp.name AS name, f.value AS value, env.environment AS environment FROM factsets fs INNER JOIN facts f ON fs.id = f.factset_id INNER JOIN fact_paths fp ON f.fact_path_id = fp.id INNER JOIN value_types vt ON vt.id = f.value_type_id LEFT JOIN environments env ON fs.environment_id = env.id WHERE (fp.depth = 0 AND ((fp.name = 'ipaddress') AND ((fs.certname) in  ( (SELECT c.certname AS certname FROM catalog_resources resources INNER JOIN certnames ON resources.certname_id = certnames.id INNER JOIN catalogs c ON c.certname = certnames.certname LEFT JOIN environments e ON c.environment_id = e.id LEFT JOIN resource_params_cache rpc ON rpc.resource = resources.resource WHERE (((c.certname) in  ( (SELECT fs.certname AS certname FROM factsets fs INNER JOIN facts f ON fs.id = f.factset_id INNER JOIN fact_paths fp ON f.fact_path_id = fp.id INNER JOIN value_types vt ON vt.id = f.value_type_id LEFT JOIN environments env ON fs.environment_id = env.id WHERE (fp.depth = 0 AND ((fp.name = 'project') AND (f.value_string = 'project_name')))) ) ) AND ((c.certname) in  ( (SELECT fs.certname AS certname FROM factsets fs INNER JOIN facts f ON fs.id = f.factset_id INNER JOIN fact_paths fp ON f.fact_path_id = fp.id INNER JOIN value_types vt ON vt.id = f.value_type_id LEFT JOIN environments env ON fs.environment_id = env.id WHERE (fp.depth = 0 AND ((fp.name = 'server_class') AND (f.value_string = 'server_class_name')))) ) ) AND ((c.certname) in  ( (SELECT fs.certname AS certname FROM factsets fs INNER JOIN facts f ON fs.id = f.factset_id INNER JOIN fact_paths fp ON f.fact_path_id = fp.id INNER JOIN value_types vt ON vt.id = f.value_type_id LEFT JOIN environments env ON fs.environment_id = env.id WHERE (fp.depth = 0 AND ((fp.name = 'server_type') AND (f.value_string = 'server_type_name')))) ) ))) ) )));

explain of the query:

 

 Nested Loop Left Join  (cost=287.19..3294.34 rows=1 width=103)   ->  Nested Loop  (cost=287.06..3294.17 rows=1 width=79)         ->  Nested Loop Semi Join  (cost=286.93..3294.01 rows=1 width=87)               ->  Nested Loop  (cost=6.50..678.42 rows=7 width=87)                     ->  Nested Loop  (cost=6.23..675.06 rows=11 width=65)                           ->  Index Scan using fact_paths_name on fact_paths fp  (cost=0.42..13.99 rows=1 width=20)                                 Index Cond: ((name)::text = 'ipaddress'::text)                                 Filter: (depth = 0)                           ->  Bitmap Heap Scan on facts f  (cost=5.81..659.28 rows=179 width=61)                                 Recheck Cond: (fact_path_id = fp.id)                                 ->  Bitmap Index Scan on facts_fact_path_id_idx  (cost=0.00..5.77 rows=179 width=0)                                       Index Cond: (fact_path_id = fp.id)                     ->  Index Scan using factsets_pkey on factsets fs  (cost=0.28..0.30 rows=1 width=38)                           Index Cond: (id = f.factset_id)               ->  Hash Semi Join  (cost=280.42..572.31 rows=1 width=110)                     Hash Cond: (c.certname = fs_3.certname)                     ->  Hash Semi Join  (cost=199.57..491.46 rows=1 width=88)                           Hash Cond: (c.certname = fs_1.certname)                           ->  Hash Semi Join  (cost=118.72..410.61 rows=1 width=66)                                 Hash Cond: (c.certname = fs_2.certname)                                 ->  Nested Loop  (cost=37.87..329.75 rows=1 width=44)                                       ->  Nested Loop  (cost=0.55..0.79 rows=1 width=52)                                             ->  Index Scan using catalogs_certname_idx on catalogs c  (cost=0.28..0.42 rows=1 width=30)                                                   Index Cond: (certname = fs.certname)                                             ->  Index Scan using certnames_transform_certname_key on certnames  (cost=0.28..0.36 rows=1 width=30)                                                   Index Cond: (certname = c.certname)                                       ->  Bitmap Heap Scan on catalog_resources resources  (cost=37.32..317.24 rows=1173 width=29)                                             Recheck Cond: (certname_id = certnames.id)                                             ->  Bitmap Index Scan on catalog_resources_pkey1  (cost=0.00..37.03 rows=1173 width=0)                                                   Index Cond: (certname_id = certnames.id)                                 ->  Hash  (cost=80.84..80.84 rows=1 width=22)                                       ->  Nested Loop  (cost=1.12..80.84 rows=1 width=22)                                             ->  Nested Loop  (cost=0.99..80.50 rows=1 width=30)                                                   ->  Nested Loop  (cost=0.70..72.70 rows=1 width=38)                                                         ->  Index Scan using factsets_certname_idx on factsets fs_2  (cost=0.28..0.39 rows=1 width=38)                                                               Index Cond: (certname = fs.certname)                                                         ->  Index Scan using facts_factset_id_idx on facts f_2  (cost=0.42..72.30 rows=1 width=24)                                                               Index Cond: (factset_id = fs_2.id)                                                               Filter: (value_string = 'server_class_name'::text)                                                   ->  Index Scan using fact_paths_pkey on fact_paths fp_2  (cost=0.29..7.79 rows=1 width=8)                                                         Index Cond: (id = f_2.fact_path_id)                                                         Filter: ((depth = 0) AND ((name)::text = 'server_class'::text))                                             ->  Index Only Scan using value_types_pkey on value_types vt_2  (cost=0.13..0.33 rows=1 width=8)                                                   Index Cond: (id = f_2.value_type_id)                           ->  Hash  (cost=80.84..80.84 rows=1 width=22)                                 ->  Nested Loop  (cost=1.12..80.84 rows=1 width=22)                                       ->  Nested Loop  (cost=0.99..80.50 rows=1 width=30)                                             ->  Nested Loop  (cost=0.70..72.70 rows=1 width=38)                                                   ->  Index Scan using factsets_certname_idx on factsets fs_1  (cost=0.28..0.39 rows=1 width=38)                                                         Index Cond: (certname = fs.certname)                                                   ->  Index Scan using facts_factset_id_idx on facts f_1  (cost=0.42..72.30 rows=1 width=24)                                                         Index Cond: (factset_id = fs_1.id)                                                         Filter: (value_string = 'project_name'::text)                                             ->  Index Scan using fact_paths_pkey on fact_paths fp_1  (cost=0.29..7.79 rows=1 width=8)                                                   Index Cond: (id = f_1.fact_path_id)                                                   Filter: (value_string = 'project_name'::text)                                             ->  Index Scan using fact_paths_pkey on fact_paths fp_1  (cost=0.29..7.79 rows=1 width=8)                                                   Index Cond: (id = f_1.fact_path_id)                                                   Filter: ((depth = 0) AND ((name)::text = 'project'::text))                                       ->  Index Only Scan using value_types_pkey on value_types vt_1  (cost=0.13..0.33 rows=1 width=8)                                             Index Cond: (id = f_1.value_type_id)                     ->  Hash  (cost=80.84..80.84 rows=1 width=22)                           ->  Nested Loop  (cost=1.12..80.84 rows=1 width=22)                                 ->  Nested Loop  (cost=0.99..80.50 rows=1 width=30)                                       ->  Nested Loop  (cost=0.70..72.70 rows=1 width=38)                                             ->  Index Scan using factsets_certname_idx on factsets fs_3  (cost=0.28..0.39 rows=1 width=38)                                                   Index Cond: (certname = fs.certname)                                             ->  Index Scan using facts_factset_id_idx on facts f_3  (cost=0.42..72.30 rows=1 width=24)                                                   Index Cond: (factset_id = fs_3.id)                                                   Filter: (value_string = 'server_type_name'::text)                                       ->  Index Scan using fact_paths_pkey on fact_paths fp_3  (cost=0.29..7.79 rows=1 width=8)                                             Index Cond: (id = f_3.fact_path_id)                                             Filter: ((depth = 0) AND ((name)::text = 'server_type'::text))                                 ->  Index Only Scan using value_types_pkey on value_types vt_3  (cost=0.13..0.33 rows=1 width=8)                                       Index Cond: (id = f_3.value_type_id)         ->  Index Only Scan using value_types_pkey on value_types vt  (cost=0.13..0.15 rows=1 width=8)               Index Cond: (id = f.value_type_id)   ->  Index Scan using environments_pkey on environments env  (cost=0.13..0.16 rows=1 width=40)         Index Cond: (fs.environment_id = id)(76 rows)

Execution time:

0m0.603s

Compared to 

~30s on Puppetdb7

 

Margaret Lee (Jira)

unread,
Apr 29, 2021, 6:28:03 PM4/29/21
to puppe...@googlegroups.com
Margaret Lee updated an issue
 
Change By: Margaret Lee
Epic Link: PE-31891
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages