Jira (PDB-4941) Evaluate current table-specific parameters (autovacuum, etc.)

Rob Browning (Jira)

unread,

Oct 22, 2020, 3:00:03 PM10/22/20

to puppe...@googlegroups.com

Rob Browning created an issue

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Issue Type:	Task
Assignee:	Unassigned
Created:	2020/10/22 11:59 AM
Priority:	Normal
Reporter:	Rob Browning

In light of the recent considerations around autovacuuming, double-check the per-table settings pdb currently establishes in migrations.

Add Comment

This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)

Rob Browning (Jira)

unread,

Oct 22, 2020, 3:25:03 PM10/22/20

to puppe...@googlegroups.com

Rob Browning updated an issue

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Change By:	Rob Browning
Fix Version/s:	PDB 7.0.0

Add Comment

Austin Blatt (Jira)

unread,

Nov 4, 2020, 12:46:03 PM11/4/20

to puppe...@googlegroups.com

Austin Blatt updated an issue

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Change By:	Austin Blatt
Fix Version/s:	PDB 7.0.0

Add Comment

Zachary Kent (Jira)

unread,

Nov 18, 2020, 2:53:03 PM11/18/20

to puppe...@googlegroups.com

Zachary Kent updated an issue

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Change By:	Zachary Kent
Epic Link:	PDB-4969

Add Comment

Zachary Kent (Jira)

unread,

Jan 12, 2021, 5:18:05 PM1/12/21

to puppe...@googlegroups.com

Zachary Kent assigned an issue to Zachary Kent

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Change By:	Zachary Kent
Assignee:	Zachary Kent

Add Comment

Zachary Kent (Jira)

unread,

Jan 13, 2021, 2:12:03 PM1/13/21

to puppe...@googlegroups.com

Zachary Kent updated an issue

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Change By:	Zachary Kent
Sprint:	HA 2021-01-27

Add Comment

Zachary Kent (Jira)

unread,

Jan 20, 2021, 5:27:02 PM1/20/21

to puppe...@googlegroups.com

Zachary Kent commented on

PDB-4941

Re: Evaluate current table-specific parameters (autovacuum, etc.)

We currently set a table specific autovacuum_vacuum_scale_factor setting on the factset, catalog, and certnames tables. It is set to 0.75 on the catalog and certnames tables and 0.80 on the factset table. This setting controls the threshold of dead rows that must be hit before an autovacuum of the table will be scheduled. See Thresholds and Scale Factors section of the linked article for background info on this setting. This setting is toggled in two places depending on whether it is a FOSS or PE install. In FOSS the table settings are added in a migration here. In PE the settings are duplicated in the pe_databases::postgresql_settings::table_settings class.

I believe setting the scale factor to 0.80 means that in a steady state assuming a 30 min agent run interval the table will get vacuumed every 24 mins and will bloat up to 44% dead space before being vacuumed. It was found, sometime in the past, that spacing out autovacuums in this way helped reduce CPU use for customers. See comments in this PR-2393 for related details. Setting the scale_factor this high seems to go against what's generally recommended in the PostgreSQL docs but given what was observed in the field it may be the right choice in our case. Another observation is that our update pattern being throttled by the agent run interval means there should usually be only ~ 1 dead row per certname in the tables mentioned above before they are vacuumed.

We recently had issues with some customers related to table bloat and autovacuum not keeping up which lead to the creation of this ticket. See comments in PE-27630 for more detailed background on what was done to resolve these issues. To summarize the solution in that case, the autovacuum_vacuum_cost_limit was increased to allow the autovacuum workers to do more work before they were forced to sleep for autovacuum_vacuum_cost_delay. See this ZD-35697 ticket for background on the problem at a specific customer site. Note the excellent write up from Dylan Ratcliffe copied into the ZD ticket linked above about some of the debugging that was done to figure out this issue.

One outstanding question that came from some of the recent changes mentioned above was about the impact of setting autovacuum_analyze_scale_factor to 0.04. This change was made when the autovacuum_vacuum_scale_factor was originally decreased from the default of 0.20 to 0.08. Later when the autovacuum_vacuum_scale_factor was increased to 0.80 the autovacuum_analyze_scale_factor remained set at 0.04. See here for where the pe_databases module sets this value. Some of the concerns voiced by Nick W. about this change can be found here in the slack history. It seems possible that we may want to increase the value of this setting back to the default of 0.10. In the steady state with a 30 min agent run interval we will be analyzing tables every ~72 seconds. There are some potential issues with this analyze process keeping a transaction held open and preventing clean up of some rows (see slack message linked above). We similarly decreased the autovacuum_analyze_scale_factor to 0.01 on the catalog_resources table recently see ~~PDB-4640~~ for details.

Regardless of any potential changes to the vacuum/analyze_scale_factors mentioned above, the changes we're making to no longer hold sync summary query transactions open should help to reduce the amount of dead rows which are ineligible for clean up during an autovacuum due to open transactions. Once the change to the sync summary queries is merged we should reach out to Support and see if they know if any customers would be willing to gather some before/after metrics related to the autovacuum process and table bloat. If we see improvement to the amount of bloat being created we could then see if customers would be willing to tweak the vacuum/analyze_scale_factor settings to see if we're able to reduce bloat further and decrease the frequency of the analyze runs without hurting query perf. We don't currently have an easy way to test these changes at any reasonable scale/load in house, so getting customers on board to help test this may be our best option.

Add Comment

Zachary Kent (Jira)

unread,

Jan 20, 2021, 5:29:03 PM1/20/21

to puppe...@googlegroups.com

Zachary Kent updated an issue

PuppetDB /

PDB-4941

Evaluate current table-specific parameters (autovacuum, etc.)

Change By:	Zachary Kent
Fix Version/s:	PDB n/a

Add Comment

Reply all

Reply to author

Forward