pg_relation_size in Greenplum

341 views

Skip to first unread message

Zhenghua Lyu

unread,

Mar 8, 2022, 9:54:45 AM3/8/22

to gpdb...@greenplum.org

Hi all,

recently I have been working on several problems that related to

the function pg_relation_size in Greenplum. Here I write this

post to summarize this interesting topic.

## Greenplum's MPP architecture

If you are familiar with Greenplum, please skip this section.

A Greenplum cluster consists of a coordinator and many segments,

catalog is replicated everywhere (there is some catalog difference

among the coordinator and many segments). Data is stored on segments

under the share-nothing design.

Clients directly connect to coordinator, and coordinator forks a worker

called QD to handle the client connection. Since data is stored on

segments, QD has to fork many distributed processes to execute the

query, these processes are forked by segments, called QEs. QD will

generate distributed plan, start distributed transactions and create gangs

(group of QEs), dispatch the plan to gangs to execute the SQL.

Highlight:

only QD can create gangs and dispatch
during a query, QD can at most manage one dispatch context

## pg_relation_size's MPP pattern

pg_relation_size is from Postgres, Greenplum make it an MPP

function and many other functions are also made MPP using the same

code pattern:

if executed on QD, dispatch the statement to QEs, fetch all results,
then combine all on QD to form a result
if executed on QE, just behave like Postgres

## The problem

Q: there is a table storing some other tables' oid, how to write a

SQL to compute table size for each oid?

This problem is so simple in Postgres (single instance), but it turns out

much more complex than the first glance to work this out high efficiently

in Greenplum.

## Solutions

### S1

select oid, pg_relation_size(oid) from t;

This will not work, because pg_relation_size(oid) are in targetlist,

and will only be executed by one QE for a specific oid stored in t.

### S2

select oid, pg_relation_size(oid) from

(select oid from t limit 1000)x;

This will not work, it adds a limit in the subquery to force a gather to QD and

thus pg_relation_size will be executed on QD, it will try to dispatch,

unfortunately, Greenplum does not support twice dispatch in a single query's

context.

### S3

select oid, pg_relation_size(oid) from

unnest(array(select oid from t))

This might work under planner using InitPlan, but the performance is poor

because

it will dispatch to all segments every time executing pg_relation_size.

This method is found by Xuebin Su.

### S4

alter table t set distributed replicated;

select oid, sum(pg_relation_size(oid))

from gp_dist_random('t') group by oid;

This method is force t a replicated and then use gp_dist_random skill to

compute gp_dist_random to compute it on QEs, and then manually sum

results from all QEs. But this method is not correct if planner chooses single

stage agg, thus it will first redistribute by oid and then compute the size. We

might use GUCS to force multi-stage agg.

### S5

This is the most efficient and correct method I can find now. It solves the

drawback in S4 using a feature from upstream called security barrier.

alter table t set distributed replicated;

create view mv with (security_barrier

)
 as

select oid, pg_relation_size(oid) size from gp_dist_random('t');

select oid, sum(size) from mv group by oid;

Thanks to security barrier, pg_relation_size will be executed before the motion.

The SQL will only dispatch once to get all tables' size.

zlyu

unread,

Aug 9, 2022, 10:05:33 AM8/9/22

to Greenplum Developers, zlyu

### S6

The above S5 seems overkill (use security barrier) to prevent subquery pullup, in fact

we can use subquery and volatile projection to fulfill the same goal:

pg_relation_size is volatile, so a subquery containing it won't be pull up:

create table t(id oid) distributed replicated;

select id, sum(size) as size

from

(

select id, pg_relation_size(id) from gp_dist_random('t')

) x(id, size)

group by id;

The above SQL will always compute target list before later motion so will be

always correct even disable multi-stage agg.

Reply all

Reply to author

Forward

0 new messages