autoanalyze performance analysis

30 views
Skip to first unread message

Brent Doil

unread,
Jun 13, 2023, 8:04:11 PM6/13/23
to Greenplum Developers
Hi All,

With the introduction of autoanalyze for all tables in Greenplum 7, we measured the impact of autoanalyze sessions on user workloads to ensure acceptable resource consumption.

Note that currently autoanalyze does not populate root statistics, as discussed in this thread.

Questions:
  • Does autoanalyze have a performance impact during data load or query execution of a TPC-DS 100 dataset?
  • Is there a difference between autoanalyze on heap vs. aoco tables? No indexes vs. indexes?
  • Are autoanalyze sessions "fair", does autoanalyze running on a separate database steal significant resources from the primary workload? 

To help answer these questions we tested two primary scenarios:
  1. Dataset is loaded into database A and a workload is then run against database A
  2. Dataset is loaded into database A and analyzed. Then a continuous data load is run into database B while a workload is run against database A.
The dataset under test was generated via TPC-DS with scale 100. Heap and AOCO tables were tested along with btree indexes. The reported database size is ~200GB on disk.

The workload consisted of TPC-DS Q1-15, excluding Q6. With this subset of queries, we were able to gauge the impact of autoanalyze on the workload.

The test environment is a 3 segment gpdemo cluster with 32 cores, NVME storage and 128GB memory.

GUC modifications:
[{"optimizer": "off"},
  {"gp_vmem_protect_limit": "32768"},
  {"statement_mem": "1024MB"},
  {"statement_timeout": "1200000"}

Scenario 1:

Method:

Create a fresh cluster
Set autoanalyze off|on
Begin perf record, for example "perf record -F 99 -g --call-graph dwarf -o /data/aocol_btree_runanalyze/perf.out -a -- sleep 3600 &
Using gprestore, load a ~200GB database, for example ​"gprestore --timestamp 20230525115442 --redirect-db postgres --backup-dir /data/aocol_btree_100 --jobs 8​"
Optionally run analyzedb after load completes
Run TPC-DS workload
Collect results

Results:

Heap no indexes
autoanalyze table AM indexes analyzedb after load load duration(s) analyzedb duration (s) Workload duration (s) analyzedb +
workload duration (s)
analyze_rel perf
on heap none yes 1757 90 820 910
 - 10.54% do_analyze_rel
    + 6.48% compute_scalar_stats
    + 3.69% gp_acquire_sample_rows_func    
on heap none no 1517 0 822 822
 - 6.07% do_analyze_rel  
   + 4.14% compute_scalar_stats
   + 1.73% gp_acquire_sample_rows_func
off heap none yes 1573 81 796 877    - 7.40% do_analyze_rel  
      + 4.60% compute_scalar_stats
      + 2.48% gp_acquire_sample_rows_func

  • No significant difference in workload durations
  • Enabling autoanalyze and not running analyzedb after load is actually slightly faster overall
  • Slight increase in do_analyze_rel calls with autoanalyze+analyzedb, but no effect on overall performance

Heap + Btree
autoanalyze table AM indexes analyzedb after load load duration(s) analyzedb duration (s) Workload duration (s) analyzedb +
workload duration (s)
analyze_rel perf
on heap
btree yes 2331 58 765 823   - 5.57% do_analyze_rel
     + 3.98% compute_scalar_stats
     + 1.38% gp_acquire_sample_rows_func  
on heap btree no 2354 0 769 769   - 5.82% do_analyze_rel
     + 4.10% compute_scalar_stats
     + 1.52% gp_acquire_sample_rows_func    
off heap btree yes 2333 55 768 823 N/A

  • No significant difference in workload durations
  • do_analyze_rel perf is similar with autoanalyze on and analyzedb off|on. 
  • We see that analyze with the btree index is more performant than with no index. Given the increased overhead of creating the indexes, this allowed time for autoanalyze to complete all of the work before the load finished.
  • For database size of 200GB the overhead of indexes is not worth the performance benefit for analyze or the query workload



AOCO no indexes
autoanalyze table AM indexes analyzedb after load load duration(s) analyzedb duration (s) Workload duration (s) analyzedb +
workload duration (s)
analyze_rel perf
on aoco none yes 1227 54 663 717   - 7.23% do_analyze_rel
     + 4.69% compute_scalar_stats
     + 2.07% aoco_acquire_sample_rows  
on aoco none no 1245 0 687 687    - 5.05% do_analyze_rel
      + 3.29% compute_scalar_stats
      + 1.47% aoco_acquire_sample_rows    
off aoco none yes 1157 53 666 719       - 4.52% do_analyze_rel
         + 2.78% compute_scalar_stats
         + 1.44% aoco_acquire_sample_rows  

  • Again, workload times are very similar.
  • Again, enabling autoanalyze and not running analyzedb after load is actually slightly faster overall
  • autoanalyze on AOCO appears slightly more performant than heap

AOCO + Btree
autoanalyze table AM indexes analyzedb after load load duration(s) analyzedb duration (s) Workload duration (s) analyzedb +
workload duration (s)
analyze_rel perf
on aoco btree yes 2648 56 668 724
      - 9.94% do_analyze_rel 
         + 6.32% compute_scalar_stats
         + 2.98% aoco_acquire_sample_rows
on aoco btree no 2762
0 733 733    - 9.95% do_analyze_rel
      + 6.32% compute_scalar_stats 
      + 2.98% aoco_acquire_sample_rows                                             

  • We take a big hit on the load duration and see no significant performance benefit on the workload
  • No different in perf output. Autoanalyze completed before the load finished, analyzedb did no actual work


Scenario 2:

Method:

Create a fresh cluster
Set autoanalyze off|on
Begin perf record, for example "perf record -F 99 -g --call-graph dwarf -o /data/heap_noidx_avon_runanalyze_concurrent/perf.out -a -- sleep 3600 &
Using gprestore, load a ~200GB database, for example ​"gprestore --timestamp 20230525115442 --redirect-db postgres --backup-dir /data/heap_noidx_100 --jobs 8​"
Run analyzedb after load completes
Run gprestore in a loop, loading into additional databases, for example "or i in {1..10}; do gprestore --timestamp 20230611203616 --create-db --redirect-db testdb_$i --backup-dir /data/heap_noidx_10; done"
Run TPC-DS workload
Collect results


Results:

autoanalyze table AM indexes analyzedb after load load duration(s) analyzedb duration (s) Workload duration (s) analyzedb +
workload duration (s)
AV worker - iotop aggregate I/O
on heap none yes 1678 189 1094 1283 DISK READ  DISK WRITE  COMMAND
57.57 M    194.94 M    postgres:  7000, autovacuum worker   con208 postgres
8.00 K     108.66 M    postgres:  7000, autovacuum worker   con726 testdb_1
0.00 B     23.72 M     postgres:  7000, autovacuum worker   con74 testdb_4
0.00 B     21.25 M     postgres:  7000, autovacuum worker   con81 testdb_4
32.00 K    26.44 M     postgres:  7000, autovacuum worker   con137 testdb_5
224.00 K   22.53 M     postgres:  7000, autovacuum worker   con147 testdb_6
off heap none yes 1662 173 1006 1179 N/A

  • autoanalyze running continuously on the additional databases resulted in an ~8% increase in overall execution time.
  • Here we profiled disk I/O via iotop to determine if the AV worker sessions were using significant resources. The aggregate I/O for the workers was insignificant compared to I/O of the COPY commands, motions, and wal sender/receivers. 
    • As a comparison, we see a single slice for one of the queries read over 26GB for each segment
            DISK READ  DISK WRITE          COMMAND
            28.38 G    512.00 K postgres:  7003, postgres 127.0.0.1(58120) con111 seg1 cmd3 slice1 MPPEXEC SELECT
            27.54 G    640.00 K postgres:  7004, postgres 127.0.0.1(46012) con111 seg2 cmd3 slice1 MPPEXEC SELECT
            26.77 G    768.00 K postgres:  7002, postgres 127.0.0.1(54736) con111 seg0 cmd3 slice1 MPPEXEC SELECT


Summary:
  • Enabling autoanalyze for all tables show no significant negative performance impact, and may be a performance benefit for some workloads.
  • autoanalyze running on other databases during a workload has a performance impact of roughly 8% on the workload
  • Resource consumption is minimal in comparison to other postgres processes
  • The default GUC settings for autovacuum_analyze_threshold and autovacuum_analyze_scale_factor are adequate and don't need to be changed
  • EXPLAIN ANALYZE outputs and gp_stat_all_tables_summary/gp_statio_all_tables_summary outputs are available for finer grained follow-ups
  • Dataset needs to scale well above 200GB to see benefit from creating indexes
Follow-ups:
  • Re-validate and test with ORCA enabled after autoanalyze is able to populate root stats.
  • Automate above workloads to run on a 1TB+ dataset on a weekly cadence.







Reply all
Reply to author
Forward
0 new messages