Changing Behavior of pg_relation_size for Append-Optimized Tables

47 views
Skip to first unread message

Andrew

unread,
May 26, 2023, 6:46:22 PM5/26/23
to Ashuka Xue
Hello Team!

There's been an ongoing issue with pg_relation_size (and pg_total_relation_size) on
append-optimized tables, where these functions may not return accurate results.  This is because
the appendonly_am relation_size function relies on an EOF marker, and returns the logical size of
the file.  For example, if there have been a number of aborted transactions for the given table,
the actual size on disk might be much larger than the logical size. This is in contrast both to
heap tables and to the expected behavior, which is to return the physical size on disk.

On the other hand, several other code paths rely on the current behavior, and simply changing the
AM function would be inadvisable.

Instead we propose that pg_relation_size for AO tables should not call out to the AM handler and
instead should implement its own logic for measuring file size on disk of AO tables. One possible
implementation might be to leverage ao_foreach_extent_file with a callback that stats each file.

This has the obvious implication of changing the reported size of AO tables, and so I believe
should be done before GPDB7 goes GA.

I would love any feedback on this change, as well as any context on the current implementation
anyone might have to offer.  Thank you for your help!

Sincerely,
Andrew (VMware)

Luis Macedo

unread,
May 29, 2023, 10:38:40 AM5/29/23
to Greenplum Developers, Andrew
Andrew,

I think this could be treated as a bug... Not sure what is the impacts you are talking about but I would also backport it to 6 as well...

Rgds,

Luis

Andrew

unread,
Jun 4, 2023, 4:11:58 PM6/4/23
to Ashuka Xue
Hi Team,

Some further digging in on the pg_relsize work has shown an interesting decision to make about how
we want this to work for AO tables.

For heap tables, we consider all forks of the table (including _fsm and _vm) as part of the core
relation size. This is as opposed to TOAST, which are part of the "table" size, but not the
"relation" size (see appendix below).

Now, for AO tables there are not forks in this way, we
consider only the "main" fork. However, much of the equivalent table metadta is instead housed in
"auxiliary" tables such as the ao_visimap table or aoblkdir table. Thus, to make the behavior of
the relsize functions equivalent between heap and AO, we think it would be appropriate to include
the size of these auxiliary tables in the core "relation" size.

I'd be very interested in any thoughts on this change, in particular if anyone has any ideas of why
the auxiliary tables were excluded from relation size reporting originally. Thank you for your
help!


# Appendix - Three relsize functions
Given there are multiple "relsize" functions to consider, here's a list of them and how their
behavior is/was. This language is definite but this isn't set in stone, I was just
cribbing from my notes. If there are reasons not to make this change we may not do so.

* ALL include the core relation file sizes
* pg_relation_size
* DOES NOT include TOAST
* DOES NOT include indexes
* USED TO NOT include ao_aux tables. DOES now.
* calls `calculate_relation_size`
* pg_total_relation_size
* DOES include indexes.
* DOES include TOAST
* DOES include ao_aux tables. now via `calculate_relation_size`, before in
`calculate_table_size`
function
* calls `calculate_table_size`
* calls `calculate_relation_size`
* calls `calculate_toast_table_size`
* calls `calculate_indexes_size`
* pg_table_size
* DOES includes TOAST
* DOES NOT include indexes
* DOES include ao_aux tables.
* calls `calculate_table_size`
* calls `calculate_relation_size`
* calls `calculate_toast_table_size`

Andrew

unread,
Jun 4, 2023, 10:14:38 PM6/4/23
to gpdb...@greenplum.org
Hi Team,

Some further digging in on the pg_relsize work has shown an interesting decision to make about how
we want this to work for AO tables.

For heap tables, we consider all forks of the table (including _fsm and _vm) as part of the core
relation size.  This is as opposed to TOAST, which are part of the "table" size, but not the
"relation" size (see appendix below).

Now, for AO tables there are not  forks in this way, we
consider only the "main" fork.  However, much of the equivalent table metadta is instead housed in
"auxiliary" tables such as the ao_visimap table or aoblkdir table. Thus, to make the behavior of
the relsize functions equivalent between heap and AO, we think it would be appropriate to include
the size of these auxiliary tables in the core "relation" size.

I'd be very interested in any thoughts on this change, in particular if anyone has any ideas of why
the auxiliary tables were excluded from relation size reporting originally.  Thank you for your
help!

Ashwin Agrawal

unread,
Jun 5, 2023, 6:40:34 AM6/5/23
to Andrew, gpdb...@greenplum.org
On Mon, Jun 5, 2023 at 7:44 AM Andrew <and...@andrewrepp.com> wrote:
Hi Team,

Some further digging in on the pg_relsize work has shown an interesting decision to make about how
we want this to work for AO tables.

For heap tables, we consider all forks of the table (including _fsm and _vm) as part of the core
relation size.  This is as opposed to TOAST, which are part of the "table" size, but not the
"relation" size (see appendix below).

I liked this picture from the stack overflow [1] thread to easily understand the details between these functions.
 
Now, for AO tables there are not  forks in this way, we
consider only the "main" fork.  However, much of the equivalent table metadta is instead housed in
"auxiliary" tables such as the ao_visimap table or aoblkdir table. Thus, to make the behavior of
the relsize functions equivalent between heap and AO, we think it would be appropriate to include
the size of these auxiliary tables in the core "relation" size.

I'd be very interested in any thoughts on this change, in particular if anyone has any ideas of why
the auxiliary tables were excluded from relation size reporting originally.  Thank you for your
help!

I don't think much thought was given in the past to align these functions to heap. That's the reason even
the basic version is only giving logical size and not physical size of AO.

# Appendix - Three relsize functions
Given there are multiple "relsize" functions to consider, here's a list of them and how their
behavior is/was. This language is definite but this isn't set in stone, I was just
cribbing from my notes. If there are reasons not to make this change we may not do so.

* ALL include the core relation file sizes
* pg_relation_size
    * DOES NOT include TOAST
    * DOES NOT include indexes
    * USED TO NOT include ao_aux tables. DOES now.
    * calls `calculate_relation_size`

Based on [2], the single argument of pg_relation_size() only includes size for the "main" fork.
An Additional, second option should be provided to this function "aux" to get the size of auxiliary tables.
 
* pg_total_relation_size
    * DOES include indexes.
    * DOES include TOAST
    * DOES include ao_aux tables.  now via `calculate_relation_size`, before in
      `calculate_table_size`
      function
    * calls `calculate_table_size`
        * calls `calculate_relation_size`
        * calls `calculate_toast_table_size`
    * calls `calculate_indexes_size`
* pg_table_size
    * DOES includes TOAST
    * DOES NOT include indexes
    * DOES include ao_aux tables.
    * calls `calculate_table_size`
        * calls `calculate_relation_size`
        * calls `calculate_toast_table_size`

These look fine to include all the auxiliary tables and align AO with HEAP.

Understand this thread is only focused on behavior change for existing size functions,
just curious to know if thought has been given after this behavior change to easily detect bloated AO tables from two context:
- bloat due to aborted transactions means logical EOF is not equal to physical EOF
- bloat (similar to heap) due to deleted tuples


--
Ashwin Agrawal (VMware)

Andrew

unread,
Jun 13, 2023, 7:18:02 PM6/13/23
to Ashwin Agrawal, Ashuka Xue
Hi Team,

Some further review showed that, in addition to a user-invoked call to this function, GPDB will dispatch `pg_relation_size` for internal use, and in the case of AO tables will still need the logical table size. This is done in `AcquireNumberOfBlocks` used by ANALYZE. Here our optimizer wants logical size to correctly estimate query costs. Rewriting this code to use a different function would either require chewing up more function OIDs in the catalog, or introducing a different dispatch approach. Either of those seems to me like a bad path.

Upstream already recognizes two function signatures for `pg_relation_size`, which are `pg_relation_size ('regclass text')` and `pg_relation_size('regclass')` . The second is equivalent to `pg_relation_size('regclass', 'main')` and is a user-friendly way to just retrieve the size of the 'main' fork in a heap table.

After some more review, and based on the feedback of this thread, the current design we're looking at is as follows:
* Add a third signature: `pg_relation_size('regclass bool bool')`. These `bool` arguments are interpreted as:
* 'includeAuxTables', to meet the request by Ashwin to allow the user to indicate whether to include them
* Defaulted to False, to match the behavior of assuming only the 'main' fork of heap tables shown in the second upstream signature
* `isPhysicalSize`, to indicate whether to return the physical table size (as opposed to logical) for an AO table.
* Defaulted to True, to match the expected behavior as discussed in previously this thread.
* For internal dispatches, in the case of AO tables, use this new signature to pull logical size where needed
* For the common user-facing case, match the expected behavior we've discussed without requiring the user to change the function signature from what they're used to. That is, `pg_relation_size('regclass')` on an AO table would be equivalent to `pg_relation_size('regclass', false, true)`
* To detect transaction bloat in AO tables, as Ashwin had asked about, it then becomes as simple as `SELECT pg_relation_size('regclass', false, false) - pg_relation_size('regclass', false, true) AS bloatSize`. This preserves that visibility without adding any complexity burden to the common case.

Does anyone have any feedback on this approach?

Thanks for your help!

Sincerely,
Andrew Repp (VMware)
Reply all
Reply to author
Forward
0 new messages