autoanalyze performance analysis

30 views

Skip to first unread message

Brent Doil

unread,

Jun 13, 2023, 8:04:11 PM6/13/23

to Greenplum Developers

Hi All,

With the introduction of autoanalyze for all tables in Greenplum 7, we measured the impact of autoanalyze sessions on user workloads to ensure acceptable resource consumption.

Note that currently autoanalyze does not populate root statistics, as discussed in this thread.

Questions:

Does autoanalyze have a performance impact during data load or query execution of a TPC-DS 100 dataset?
Is there a difference between autoanalyze on heap vs. aoco tables? No indexes vs. indexes?
Are autoanalyze sessions "fair", does autoanalyze running on a separate database steal significant resources from the primary workload?

To help answer these questions we tested two primary scenarios:

Dataset is loaded into database A and a workload is then run against database A
Dataset is loaded into database A and analyzed. Then a continuous data load is run into database B while a workload is run against database A.

The dataset under test was generated via TPC-DS with scale 100. Heap and AOCO tables were tested along with btree indexes. The reported database size is ~200GB on disk.

The workload consisted of TPC-DS Q1-15, excluding Q6. With this subset of queries, we were able to gauge the impact of autoanalyze on the workload.

The test environment is a 3 segment gpdemo cluster with 32 cores, NVME storage and 128GB memory.

GUC modifications:

[{"optimizer": "off"},

{"gp_vmem_protect_limit": "32768"},

{"statement_mem": "1024MB"},

{"statement_timeout": "1200000"}

Scenario 1:

Method:

Create a fresh cluster

Set autoanalyze off|on

Begin perf record, for example "perf record -F 99 -g --call-graph dwarf -o /data/aocol_btree_runanalyze/perf.out -a -- sleep 3600 &

Using gprestore, load a ~200GB database, for example "gprestore --timestamp 20230525115442 --redirect-db postgres --backup-dir /data/aocol_btree_100 --jobs 8"

Optionally run analyzedb after load completes

Run TPC-DS workload

Collect results

Results:

Heap no indexes

autoanalyze	table AM	indexes	analyzedb after load	load duration(s)	analyzedb duration (s)	Workload duration (s)	analyzedb + workload duration (s)	analyze_rel perf
on	heap	none	yes	1757	90	820	910	- 10.54% do_analyze_rel + 6.48% compute_scalar_stats + 3.69% gp_acquire_sample_rows_func
on	heap	none	no	1517	0	822	822	- 6.07% do_analyze_rel + 4.14% compute_scalar_stats + 1.73% gp_acquire_sample_rows_func
off	heap	none	yes	1573	81	796	877	- 7.40% do_analyze_rel + 4.60% compute_scalar_stats + 2.48% gp_acquire_sample_rows_func

No significant difference in workload durations
Enabling autoanalyze and not running analyzedb after load is actually slightly faster overall
Slight increase in do_analyze_rel calls with autoanalyze+analyzedb, but no effect on overall performance

Heap + Btree

autoanalyze	table AM	indexes	analyzedb after load	load duration(s)	analyzedb duration (s)	Workload duration (s)	analyzedb + workload duration (s)	analyze_rel perf
on	heap	btree	yes	2331	58	765	823	- 5.57% do_analyze_rel + 3.98% compute_scalar_stats + 1.38% gp_acquire_sample_rows_func
on	heap	btree	no	2354	0	769	769	- 5.82% do_analyze_rel + 4.10% compute_scalar_stats + 1.52% gp_acquire_sample_rows_func
off	heap	btree	yes	2333	55	768	823	N/A

No significant difference in workload durations
do_analyze_rel perf is similar with autoanalyze on and analyzedb off|on.
We see that analyze with the btree index is more performant than with no index. Given the increased overhead of creating the indexes, this allowed time for autoanalyze to complete all of the work before the load finished.
For database size of 200GB the overhead of indexes is not worth the performance benefit for analyze or the query workload

AOCO no indexes

autoanalyze	table AM	indexes	analyzedb after load	load duration(s)	analyzedb duration (s)	Workload duration (s)	analyzedb + workload duration (s)	analyze_rel perf
on	aoco	none	yes	1227	54	663	717	- 7.23% do_analyze_rel + 4.69% compute_scalar_stats + 2.07% aoco_acquire_sample_rows
on	aoco	none	no	1245	0	687	687	- 5.05% do_analyze_rel + 3.29% compute_scalar_stats + 1.47% aoco_acquire_sample_rows
off	aoco	none	yes	1157	53	666	719	- 4.52% do_analyze_rel + 2.78% compute_scalar_stats + 1.44% aoco_acquire_sample_rows

Again, workload times are very similar.
Again, enabling autoanalyze and not running analyzedb after load is actually slightly faster overall
autoanalyze on AOCO appears slightly more performant than heap

AOCO + Btree

autoanalyze	table AM	indexes	analyzedb after load	load duration(s)	analyzedb duration (s)	Workload duration (s)	analyzedb + workload duration (s)	analyze_rel perf
on	aoco	btree	yes	2648	56	668	724	- 9.94% do_analyze_rel + 6.32% compute_scalar_stats + 2.98% aoco_acquire_sample_rows
on	aoco	btree	no	2762	0	733	733	- 9.95% do_analyze_rel + 6.32% compute_scalar_stats + 2.98% aoco_acquire_sample_rows

We take a big hit on the load duration and see no significant performance benefit on the workload
No different in perf output. Autoanalyze completed before the load finished, analyzedb did no actual work

Scenario 2:

Method:

Create a fresh cluster

Set autoanalyze off|on

Begin perf record, for example "perf record -F 99 -g --call-graph dwarf -o /data/heap_noidx_avon_runanalyze_concurrent/perf.out -a -- sleep 3600 &

Using gprestore, load a ~200GB database, for example "

gprestore
 --timestamp 20230525115442 --redirect-db postgres --backup-dir /data/heap_noidx_100 --jobs 8

Run analyzedb after load completes

Run gprestore in a loop, loading into additional databases, for example "or i in {1..10}; do gprestore --timestamp 20230611203616 --create-db --redirect-db testdb_$i --backup-dir /data/heap_noidx_10; done"

Run TPC-DS workload

Collect results

Results:

autoanalyze	table AM	indexes	analyzedb after load	load duration(s)	analyzedb duration (s)	Workload duration (s)	analyzedb + workload duration (s)	AV worker - iotop aggregate I/O
on	heap	none	yes	1678	189	1094	1283	DISK READ DISK WRITE COMMAND 57.57 M 194.94 M postgres: 7000, autovacuum worker con208 postgres 8.00 K 108.66 M postgres: 7000, autovacuum worker con726 testdb_1 0.00 B 23.72 M postgres: 7000, autovacuum worker con74 testdb_4 0.00 B 21.25 M postgres: 7000, autovacuum worker con81 testdb_4 32.00 K 26.44 M postgres: 7000, autovacuum worker con137 testdb_5 224.00 K 22.53 M postgres: 7000, autovacuum worker con147 testdb_6
off	heap	none	yes	1662	173	1006	1179	N/A

autoanalyze running continuously on the additional databases resulted in an ~8% increase in overall execution time.
Here we profiled disk I/O via iotop to determine if the AV worker sessions were using significant resources. The aggregate I/O for the workers was insignificant compared to I/O of the COPY commands, motions, and wal sender/receivers.

As a comparison, we see a single slice for one of the queries read over 26GB for each segment

            DISK READ  DISK WRITE          COMMAND

28.38 G 512.00 K postgres: 7003, postgres 127.0.0.1(58120) con111 seg1 cmd3 slice1 MPPEXEC SELECT

27.54 G 640.00 K postgres: 7004, postgres 127.0.0.1(46012) con111 seg2 cmd3 slice1 MPPEXEC SELECT

26.77 G 768.00 K postgres: 7002, postgres 127.0.0.1(54736) con111 seg0 cmd3 slice1 MPPEXEC SELECT

Summary:

Enabling autoanalyze for all tables show no significant negative performance impact, and may be a performance benefit for some workloads.
autoanalyze running on other databases during a workload has a performance impact of roughly 8% on the workload
Resource consumption is minimal in comparison to other postgres processes
The default GUC settings for autovacuum_analyze_threshold and autovacuum_analyze_scale_factor are adequate and don't need to be changed
EXPLAIN ANALYZE outputs and gp_stat_all_tables_summary/gp_statio_all_tables_summary outputs are available for finer grained follow-ups
Dataset needs to scale well above 200GB to see benefit from creating indexes

Follow-ups:

Re-validate and test with ORCA enabled after autoanalyze is able to populate root stats.
Automate above workloads to run on a 1TB+ dataset on a weekly cadence.

Reply all

Reply to author

Forward

0 new messages