TL;DR: the 'CREATE TABLE LIKE INCLUDING STORAGE' command currently
presents undocumented behavior for append-optimized tables,
particularly when multiple 'LIKE' clauses are involved.
Recently, I stumbled upon a
GPDB_12_MERGE_FIXME related to the
CREATE TABLE LIKE command with the INCLUDING STORAGE option.
In upstream Postgres, the INCLUDING STORAGE option is relevant only to
toast management. Specifically, when this option is specified, the
attstorge field from the source table's pg_attribute entry gets copied
over.
Here's an example that demonstrates the use of INCLUDING STORAGE for
heap tables in PG16dev. I have also included INCLUDING COMPRESSION and
INCLUDING ALL for comparison. (The COMPRESSION keyword is only
available in PG14+, hence I used upstream Postgres for this
experiment. GP7 exhibits the same behavior with heap tables)
```sql
-- pre-cleanup
DROP TABLE IF EXISTS heap1;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
DROP TABLE IF EXISTS t3;
DROP TABLE IF EXISTS t4;
-- create a source heap table
CREATE TABLE heap1 (a text COMPRESSION pglz) WITH (fillfactor = 50);
ALTER TABLE heap1 ALTER COLUMN a SET STORAGE MAIN;
-- create table likes
CREATE TABLE t1 (LIKE heap1);
CREATE TABLE t2 (LIKE heap1 INCLUDING STORAGE);
CREATE TABLE t3 (LIKE heap1 INCLUDING COMPRESSION);
CREATE TABLE t4 (LIKE heap1 INCLUDING ALL);
\d+ heap1
\d+ t1
\d+ t2
\d+ t3
\d+ t4
```
\d+ output:
```
Table "public.heap1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+------+-----------+----------+---------+---------+-------------+--------------+-------------
a | text | | | | main | pglz | |
Access method: heap
Options: fillfactor=50
Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
a | text | | | | extended | | |
Access method: heap
Table "public.t2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+------+-----------+----------+---------+---------+-------------+--------------+-------------
a | text | | | | main | | |
Access method: heap
Table "public.t3"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
a | text | | | | extended | pglz | |
Access method: heap
Table "public.t4"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+------+-----------+----------+---------+---------+-------------+--------------+-------------
a | text | | | | main | pglz | |
Access method: heap
```
### INCLUDING STORAGE behavior in Postgres1. INCLUDING STORAGE is only relevant to toast management, and the
storage option is column-specific. 2. INCLUDING COMPRESSION is also only relevant to toast management,
and the compression option is also column-specific.
3. INCLUDING STORAGE and INCLUDING COMPRESSION don't necessarily
depend on each other.
4. Even with INCLUDING ALL, the per-table property "fillfactor" is not
copied. (Nor is the table access method, due to upstream Postgres
having only one in-core access method, this aspect is hard to
demonstrate on the main branch)
Returning to Greenplum, commit 951065c48cc as part of PR 7609 extends
the use of INCLUDING STORAGE, making it responsible also for copying
append-optimized table's encoding options. This specific behavior of
Greenplum is currently not well-documented.(See both
6X doc and
7Xdoc)
To understand how INCLUDING STORAGE works with append-optimized
tables, I carried out the following queries:
```sql
-- pre-cleanup
DROP TABLE IF EXISTS heap;
DROP TABLE IF EXISTS heap2;
DROP TABLE IF EXISTS ao;
DROP TABLE IF EXISTS ao2;
DROP TABLE IF EXISTS aoco;
DROP TABLE IF EXISTS aoco2;
DROP TABLE IF EXISTS like_heap;
DROP TABLE IF EXISTS like_heap_storage;
DROP TABLE IF EXISTS like_ao;
DROP TABLE IF EXISTS like_ao_storage;
DROP TABLE IF EXISTS like_aoco;
DROP TABLE IF EXISTS like_aoco_storage;
DROP TABLE IF EXISTS like_heap_heap_storage;
DROP TABLE IF EXISTS like_heap_ao_storage;
DROP TABLE IF EXISTS like_heap_aoco_storage;
-- create two identical source heap tables
CREATE TABLE heap (a text) WITH (fillfactor = 50);
ALTER TABLE heap ALTER COLUMN a SET STORAGE MAIN;
CREATE TABLE heap2 (a2 text) WITH (fillfactor = 50);
ALTER TABLE heap2 ALTER COLUMN a2 SET STORAGE MAIN;
-- create two identical source ao_row tables
CREATE TABLE ao (b text) USING ao_row WITH (compresstype=zstd,compresslevel=5,blocksize=65536);
ALTER TABLE ao ALTER COLUMN b SET STORAGE MAIN;
CREATE TABLE ao2 (b2 text) USING ao_row WITH (compresstype=zstd,compresslevel=5,blocksize=65536);
ALTER TABLE ao2 ALTER COLUMN b2 SET STORAGE MAIN;
-- create two identical source ao_column tables
CREATE TABLE aoco (c text ENCODING (compresstype='zlib', compresslevel=3, blocksize=8192)) USING ao_column;
ALTER TABLE aoco ALTER COLUMN c SET STORAGE MAIN;
CREATE TABLE aoco2 (c2 text ENCODING (compresstype='zlib', compresslevel=3, blocksize=8192)) USING ao_column;
ALTER TABLE aoco2 ALTER COLUMN c2 SET STORAGE MAIN;
-- create table likes (single)
create table like_heap (like heap);
create table like_heap_storage (like heap including storage);
create table like_ao (like ao);
create table like_ao_storage (like ao including storage);
create table like_aoco (like aoco);
create table like_aoco_storage (like aoco including storage);
-- create table likes (multiple including storage w/ same AMs)
create table like_heap_heap_storage (like heap including storage, like
heap2 including storage);
create table like_ao_ao_storage (like ao including storage, like ao2 including storage); -- ERROR: parameter "blocksize" specified more than once
create table like_aoco_aoco_storage (like aoco including storage, like aoco2 including storage); -- ERROR: parameter "blocksize" specified more than once
-- create table likes (multiple including storage w/ mixed AMs)
create table like_ao_aoco_storage (like ao including storage, like aoco including storage); -- ERROR: parameter "blocksize" specified more than once
create table like_heap_ao_storage (like heap including storage, like ao including storage);
create table like_heap_aoco_storage (like heap including storage, like aoco including storage);
\d+ heap
\d+ ao
\d+ aoco
\d+ like_heap
\d+ like_heap_storage
\d+ like_ao
\d+ like_ao_storage
\d+ like_aoco
\d+ like_aoco_storage
\d+ like_heap_heap_storage
\d+ like_heap_ao_storage
\d+ like_heap_aoco_storage
```
\d+ outputs:
```
Table "public.heap"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+---------+--------------+-------------
a | text | | | | main | |
Distributed by: (a)
Access method: heap
Options: fillfactor=50
Table "
public.ao"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+---------+--------------+-------------
b | text | | | | main | |
Distributed by: (b)
Access method: ao_row
Options: compresstype=zstd, compresslevel=5, blocksize=65536, checksum=true
Table "public.aoco"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Compression Type | Compression Level | Block Size | Description
--------+------+-----------+----------+---------+---------+--------------+------------------+-------------------+------------+-------------
c | text | | | | main | | zlib | 3 | 8192 |
Distributed by: (c)
Access method: ao_column
Options: blocksize=32768, compresslevel=0, compresstype=none, checksum=true
Table "public.like_heap"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+----------+--------------+-------------
a | text | | | | extended | |
Distributed by: (a)
Access method: heap
Table "public.like_heap_storage"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+---------+--------------+-------------
a | text | | | | main | |
Distributed by: (a)
Access method: heap
Table "public.like_ao"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+----------+--------------+-------------
b | text | | | | extended | |
Distributed by: (b)
Access method: heap
Table "public.like_ao_storage"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+---------+--------------+-------------
b | text | | | | main | |
Distributed by: (b)
Access method: ao_row
Options: blocksize=65536, checksum=true, compresslevel=5, compresstype=zstd
Table "public.like_aoco"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+----------+--------------+-------------
c | text | | | | extended | |
Distributed by: (c)
Access method: heap
Table "public.like_aoco_storage"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Compression Type | Compression Level | Block Size | Description
--------+------+-----------+----------+---------+---------+--------------+------------------+-------------------+------------+-------------
c | text | | | | main | | zlib | 3 | 8192 |
Distributed by: (c)
Access method: ao_column
Options: blocksize=32768, checksum=true, compresslevel=0, compresstype=none
Table "public.like_heap_heap_storage"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+---------+--------------+-------------
a | text | | | | main | |
a2 | text | | | | main | |
Distributed by: (a)
Access method: heap
Table "public.like_heap_ao_storage"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------+-----------+----------+---------+---------+--------------+-------------
a | text | | | | main | |
b | text | | | | main | |
Distributed by: (a)
Access method: ao_row
Options: blocksize=65536, checksum=true, compresslevel=5, compresstype=zstd
Table "public.like_heap_aoco_storage"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Compression Type | Compression Level | Block Size | Description
--------+------+-----------+----------+---------+---------+--------------+------------------+-------------------+------------+-------------
a | text | | | | main | | none | 0 | 32768 |
c | text | | | | main | | zlib | 3 | 8192 |
Distributed by: (a)
Access method: ao_column
Options: blocksize=32768, checksum=true, compresslevel=0, compresstype=none
```
Note: the above output is from 7X. 6X behaves almost the same, only
difference is that on 6X the ERRORed query errors on "compresslevel"
instead of "blocksize".
### INCLUDING STORAGE behavior in Greenplum 7 and 6
1. For a single like .. including storage clause with append-optimized
tables, all the following properties are copied as expected:
- The toast related storage options such as "main", "extended"
- The table access method
- The per-table encoding options for both ao_row and ao_column tables
- The per-column encoding options for ao_column tables
I did not find documentation that clearly explains what STORAGE means in
this command.
2. For multiple LIKE .. INCLUDING STORAGE clauses, we encounter an
ERROR if there are more than one append-optimized source tables. This
is because both ao_row and ao_column tables have default encoding
options, and currently, the same encoding option can't be specified
more than once, even if the values are the same. I did not find documentation
about this.
3. For multiple LIKE .. INCLUDING STORAGE clauses, if we mix heap
source tables and an append-optimized source table, the new table will
always be an append-optimized table and will take the per-table
properties from the source append-optimized table. I did not find
documentation about this.
Huansong and I had an offline discussion and we think that the
original toast related STORAGE options and the additional encoding
options for append-optimized tables are really not the same thing.
Here are our reasons:
1. The original STORAGE option is only relevant to toast management,
hence it is only useful if the table AM needs inline and/or
out-of-line toast tables. In Greenplum, ao_row tables may need toast,
while ao_column tables never need toast.
2. The original toast related STORAGE option is per-column, whereas
the append-optimized additions can be per-table.
3. When INCLUDING STORAGE is specified for a source append-optimized
table, the access method also gets copied along with the encoding
options, whereas the toast related STORAGE option is table AM
agnostic.
4. For an ao_column table, it is arguable that the encoding options
can be per-column, which is similar to the toast-related storage
options. However, it is also similar to the COMPRESSION options that
were later introduced in PG14. In upstream Postgres, INCLUDING STORAGE
and INCLUDING COMPRESSION are not the same thing either.
### Questions:
1. Given right now 7X and 6X have the same behavior, do we want to
keep this behavior moving forward?
2. To make things easier, should we just disallow mixing AMs for
multiple INCLUDING STORAGE?
3. Alternatively, do we want to consider using a different keyword for
the append-optimized encoding options? For example, something like:
CREATE TABLE t3 (LIKE t1 INCLUDING ENCODING, LIKE t2 INCLUDING ENCODING) USING ao_column;
We'd like to hear your comments and thoughts!
Regards,
Alex