post to summarize this interesting topic.
If you are familiar with Greenplum, please skip this section.
- only QD can create gangs and dispatch
- during a query, QD can at most manage one dispatch context
## pg_relation_size's MPP pattern
pg_relation_size is
from Postgres, Greenplum make it an MPP
function and
many other functions are also made MPP using the same
code pattern:
- if executed on QD, dispatch the statement to QEs, fetch all results,
- then combine all on QD to form a result
- if executed on QE, just behave like Postgres
## The problem
Q: there is a table storing some other tables' oid, how to write a
SQL to compute table size for each oid?
This problem is so simple in Postgres (single instance), but it turns out
much more complex than the first glance to work this out high efficiently
in Greenplum.
## Solutions
### S1
select oid, pg_relation_size(oid) from t;
This will not work, because pg_relation_size(oid) are
in targetlist,
and will only be executed
by one QE for a specific oid stored in t
.
### S2
select
oid, pg_relation_size(oid) from
(select
oid from t limit 1000)x;
This will not work, it adds a limit in the subquery to force a gather to QD and
thus pg_relation_size will
be executed on QD, it will try to dispatch,
unfortunately,
Greenplum does not support twice dispatch in a single query's
context.
###
S3
select oid, pg_relation_size(oid) from
unnest(array(select oid from t))
This
might work under planner using InitPlan, but the performance is poor
because
it
will dispatch to all segments every time executing pg_relation_size.
This
method is
found by Xuebin Su.
###
S4
alter table t set distributed replicated;
select oid, sum(pg_relation_size(oid))
from
gp_dist_random('t') group by oid;
This
method is force t a replicated and then use gp_dist_random skill to
compute gp_dist_random to
compute it on QEs, and then manually sum
results
from all QEs. But this method is not correct if planner chooses single
stage
agg, thus it will first redistribute by oid and then compute the size. We
might
use GUCS to force multi-stage agg.
###
S5
This is the most efficient and correct method I can find now. It solves the
drawback in S4 using a feature from upstream called security barrier.
alter table t set distributed replicated;
create view mv with (security_barrier
)
as
select oid, pg_relation_size(oid) size from gp_dist_random('t');
select oid, sum(size) from mv group by oid;
Thanks to security barrier, pg_relation_size will
be executed before the motion.
The SQL will only dispatch once to get
all tables' size.