-
Dataset is loaded into database A and a workload is then run against database A
-
Dataset is loaded into database A and analyzed. Then a continuous data load is run into database B while a workload is run against database A.
The dataset under test was generated via TPC-DS with scale 100. Heap and AOCO tables were tested along with btree indexes. The reported database size is ~200GB on disk.
The workload consisted of TPC-DS Q1-15, excluding Q6. With this subset of queries, we were able to gauge the impact of autoanalyze on the workload.
The test environment is a 3 segment gpdemo cluster with 32 cores, NVME storage and 128GB memory.
GUC modifications:
[{"optimizer": "off"},
{"gp_vmem_protect_limit": "32768"},
{"statement_mem": "1024MB"},
{"statement_timeout": "1200000"}
Scenario 1:
Method:
Create a fresh cluster
Set autoanalyze off|on
Begin perf record, for example "perf record -F 99 -g --call-graph dwarf -o /data/aocol_btree_runanalyze/perf.out -a -- sleep 3600 &
Using gprestore, load a ~200GB database, for example
"gprestore --timestamp 20230525115442 --redirect-db postgres --backup-dir /data/aocol_btree_100 --jobs 8
"
Optionally run analyzedb after load completes
Run TPC-DS workload
Collect results
Results:
Heap no indexes
autoanalyze |
table AM |
indexes |
analyzedb after load |
load duration(s) |
analyzedb duration (s) |
Workload duration (s) |
analyzedb +
workload duration (s) |
analyze_rel perf |
on |
heap |
none |
yes |
1757 |
90 |
820 |
910 |
- 10.54% do_analyze_rel
+ 6.48% compute_scalar_stats
+ 3.69% gp_acquire_sample_rows_func
|
on |
heap |
none |
no |
1517 |
0 |
822 |
822
|
- 6.07% do_analyze_rel
+ 4.14% compute_scalar_stats
+ 1.73% gp_acquire_sample_rows_func
|
off |
heap |
none |
yes |
1573 |
81 |
796 |
877 |
- 7.40% do_analyze_rel
+ 4.60% compute_scalar_stats
+ 2.48% gp_acquire_sample_rows_func |
- No significant difference in workload durations
- Enabling autoanalyze and not running analyzedb after load is actually slightly faster overall
- Slight increase in do_analyze_rel calls with autoanalyze+analyzedb, but no effect on overall performance
Heap + Btree
autoanalyze |
table AM |
indexes |
analyzedb after load |
load duration(s) |
analyzedb duration (s) |
Workload duration (s) |
analyzedb +
workload duration (s) |
analyze_rel perf |
on |
heap
|
btree |
yes |
2331 |
58 |
765 |
823 |
- 5.57% do_analyze_rel
+ 3.98% compute_scalar_stats
+ 1.38% gp_acquire_sample_rows_func
|
on |
heap |
btree |
no |
2354 |
0 |
769 |
769 |
- 5.82% do_analyze_rel
+ 4.10% compute_scalar_stats
+ 1.52% gp_acquire_sample_rows_func
|
off |
heap |
btree |
yes |
2333 |
55 |
768 |
823 |
N/A
|
- No significant difference in workload durations
- do_analyze_rel perf is similar with autoanalyze on and analyzedb off|on.
- We see that analyze with the btree index is more performant than with no index. Given the increased overhead of creating the indexes, this allowed time for autoanalyze to complete all of the work before the load finished.
- For database size of 200GB the overhead of indexes is not worth the performance benefit for analyze or the query workload
AOCO no indexes
autoanalyze |
table AM |
indexes |
analyzedb after load |
load duration(s) |
analyzedb duration (s) |
Workload duration (s) |
analyzedb +
workload duration (s) |
analyze_rel perf |
on |
aoco |
none |
yes |
1227 |
54 |
663 |
717 |
- 7.23% do_analyze_rel
+ 4.69% compute_scalar_stats
+ 2.07% aoco_acquire_sample_rows
|
on |
aoco |
none |
no |
1245 |
0 |
687 |
687 |
- 5.05% do_analyze_rel
+ 3.29% compute_scalar_stats
+ 1.47% aoco_acquire_sample_rows
|
off |
aoco |
none |
yes |
1157 |
53 |
666 |
719 |
- 4.52% do_analyze_rel
+ 2.78% compute_scalar_stats
+ 1.44% aoco_acquire_sample_rows |
- Again, workload times are very similar.
- Again, enabling autoanalyze and not running analyzedb after load is actually slightly faster overall
- autoanalyze on AOCO appears slightly more performant than heap
AOCO + Btree
autoanalyze |
table AM |
indexes |
analyzedb after load |
load duration(s) |
analyzedb duration (s) |
Workload duration (s) |
analyzedb +
workload duration (s) |
analyze_rel perf |
on |
aoco |
btree |
yes |
2648 |
56 |
668 |
724
|
- 9.94% do_analyze_rel
+ 6.32% compute_scalar_stats
+ 2.98% aoco_acquire_sample_rows
|
on |
aoco |
btree |
no |
2762
|
0 |
733 |
733 |
- 9.95% do_analyze_rel
+ 6.32% compute_scalar_stats
+ 2.98% aoco_acquire_sample_rows
|
- We take a big hit on the load duration and see no significant performance benefit on the workload
- No different in perf output. Autoanalyze completed before the load finished, analyzedb did no actual work
Scenario 2:
Method:
Create a fresh cluster
Set autoanalyze off|on
Begin perf record, for example "perf record -F 99 -g --call-graph dwarf -o
/data/heap_noidx_avon_runanalyze_concurrent/perf.out -a -- sleep 3600 &
Using gprestore, load a ~200GB database, for example
"gprestore
--timestamp 20230525115442 --redirect-db postgres --backup-dir /data/heap_noidx_100 --jobs 8
"
Run analyzedb after load completes
Run gprestore in a loop, loading into additional databases,
for example "or i in {1..10}; do gprestore --timestamp 20230611203616 --create-db --redirect-db testdb_$i --backup-dir /data/heap_noidx_10; done"
Run TPC-DS workload
Collect results
Results:
autoanalyze |
table AM |
indexes |
analyzedb after load |
load duration(s) |
analyzedb duration (s) |
Workload duration (s) |
analyzedb +
workload duration (s) |
AV worker - iotop aggregate I/O |
on |
heap |
none |
yes |
1678 |
189 |
1094 |
1283 |
DISK READ DISK WRITE COMMAND
57.57 M 194.94 M postgres: 7000, autovacuum worker con208 postgres
8.00 K 108.66 M postgres: 7000, autovacuum worker con726 testdb_1
0.00 B 23.72 M postgres: 7000, autovacuum worker con74 testdb_4
0.00 B 21.25 M postgres: 7000, autovacuum worker con81 testdb_4
32.00 K 26.44 M postgres: 7000, autovacuum worker con137 testdb_5
224.00 K 22.53 M postgres: 7000, autovacuum worker con147 testdb_6
|
off |
heap |
none |
yes |
1662 |
173 |
1006 |
1179 |
N/A |
- autoanalyze running continuously on the additional databases resulted in an ~8% increase in overall execution time.
- Here we profiled disk I/O via iotop to determine if the AV worker sessions were using significant resources. The aggregate I/O for the workers was insignificant compared to I/O of the COPY commands, motions, and wal sender/receivers.
- As a comparison, we see a single slice for one of the queries read over 26GB for each segment
DISK READ DISK WRITE COMMAND
28.38 G 512.00 K postgres: 7003, postgres 127.0.0.1(58120) con111 seg1 cmd3 slice1 MPPEXEC SELECT
27.54 G 640.00 K postgres: 7004, postgres 127.0.0.1(46012) con111 seg2 cmd3 slice1 MPPEXEC SELECT
26.77 G 768.00 K postgres: 7002, postgres 127.0.0.1(54736) con111 seg0 cmd3 slice1 MPPEXEC SELECT
Summary:
- Enabling autoanalyze for all tables show no significant negative performance impact, and may be a performance benefit for some workloads.
- autoanalyze running on other databases during a workload has a performance impact of roughly 8% on the workload
- Resource consumption is minimal in comparison to other postgres processes
- The default GUC settings for
autovacuum_analyze_threshold and autovacuum_analyze_scale_factor are adequate and don't need to be changed
- EXPLAIN ANALYZE outputs and gp_stat_all_tables_summary/gp_statio_all_tables_summary outputs are available for finer grained follow-ups
- Dataset needs to scale well above 200GB to see benefit from creating indexes
Follow-ups:
- Re-validate and test with ORCA enabled after autoanalyze is able to populate root stats.
- Automate above workloads to run on a 1TB+ dataset on a weekly cadence.