Possible approach: What if we didn't merge stats in the regular workflow
(by detecting if we are doing auto-analyze and bailing)? For relkind = 'p', we
could then call merge instead of recursive analyze. We could do this whenever
reltuples for the partitioned table exceeds the scale factor percentage?
I talked with Soumyadeep offline a bit. There are 2 approaches we discussed:
1) For each leaf, add its root partition to a set. Perform the sampling of all tables, and once all sampling has finished, merge statistics for each root table in the set if all leaves of that root have been analyzed (sampled). Ignore unsupported/extended columns. This has the advantage of minimizing merging and is fairly simple to implement.
For this approach, we would also want to handle the case when a partition is attached/detached/exchanged/dropped. Consider an ETL process that inserts data into a table, that table is analyzed, then is attached to a partitioned table. In this proposal, the stats of the root wouldn't be updated, so we would either want to automatically analyze, or even better would be to add it to the queue in autoanalyze to be analyzed.2) The 2nd approach is simpler. During the autoanalyze loop, determine the root partitions. If the root stats haven't been merged since some user-configurable time (maybe an hour, or 12h), run the merge step. This would unnecessarily merge stats, and doing this too often would cause unnecessary processing (since merging stats is more cpu intensive).
!! External Email
|