On Wed, Nov 8, 2023 at 2:08 PM 'Chris Hajas' via Greenplum Developers
<
gpdb...@greenplum.org> wrote:
> On the more extreme end, there's also using `analyze fullscan <table>` which does a scan of the entire table to get perfect NDVs, but this isn't something we advertise/use often.
Wow we have that? :D
If Analyze = scanning the full table, then it's not really sampling! I
think that is a deal breaker.
> Additionally, we could ask users to also increase the default statistics target for large tables
I think this is the way to go. There are a few ways:
(0) Setting default_statistics_target cluster wide with gpconfig.
(1) Setting default_statistics_target database wide with:
alter database postgres set default_statistics_target=1000;
(2) Setting default_statistics_target session wide, maybe for the
session running analyze.
PS: Unfortunately, there is no way to hook up a target for autoanalyze :(
(3) Running ALTER TABLE SET STATISTICS on the columns of the table to
hike the stats target for just that table.
This only acquires a ShareUpdateExclusiveLock, which is not that
restrictive and is a very quick operation.
For partitioned tables this command recurses appropriately.
-- hikes stats target for count column in partitioned hierarchy
alter table rank alter count set statistics 1000;
select attrelid::regclass, attstattarget from pg_attribute where
attrelid IN ('rank'::regclass, 'rank_1_prt_boys'::regclass) and
attname = 'count';
attrelid | attstattarget
-----------------+---------------
rank | 1000
rank_1_prt_boys | 1000
(2 rows)
Regards,
Soumyadeep (VMware)