Compatibility of gp_default_storage_options from GP6 to GP7

58 views
Skip to first unread message

Marbin Tan

unread,
Mar 27, 2024, 5:14:24 PMMar 27
to gpdb...@greenplum.org
Hi folks,

I want to start a discussion about gp_default_storage_options and its compatibility from GP6 to GP7.
In GP7, a new GUC (default_table_access_method) was introduced and due to its overlap with gp_default_storage_options in GP7, several options of gp_default_storage_options were removed.

For example, you will see the following error when trying to set appendonly for gp_default_storage_options in GP7:
postgres=# SET gp_default_storage_options = 'appendonly=true, compresstype=rle_type';
ERROR:  invalid storage option "appendonly"
HINT:  For table access methods use "default_table_access_method" instead

NOTE: this issue also applies for orientation and appendoptimized option.

SCENARIO: on top of my mind, there are two scenarios that the issue can have a big enough impact for a user (there might be more that I'm missing?).

1. Upgrade from GP6 to GP7 will have issues, specially if they have roles that modified gp_default_storage_options.
postgres=# ALTER ROLE foo SET gp_default_storage_options TO 'appendonly=true,compresstype=zstd,compresslevel=3';
ERROR:  invalid storage option "appendonly"
HINT:  For table access methods use "default_table_access_method" instead.
Upgrading from GP6 to GP7 will definitely fail due to this (or at the very least have inconsistencies). If we ignore this problem, then the tables that are created GP7 will not have the same options compared to the GP6 counterpart.

2. Users may have scripts based on GP6 with gp_default_storage_options that include the removed options; the scripts will no longer work for GP7.

Do we want to introduce backwards compatibility for this and have a seamless transition to GP7 for their scripts or will the users need to modify their script to adhere to GP7 standard?

APPROACH: assuming that we agree that we need some form of backwards compatibility, here are some approaches:
1. Enable GP7 pg_dump/pg_dumpall to be able to modify SQL when dumping from a GP6 cluster.
2. Enable pg_restore to read old dumps and modify the SQL on the fly to restore on GP7.
3. Enable to have GPDB (server) have backwards compatibility and it will handle things internally with in the server. (keep GP6 syntax intact).
4. gpupgrade will create a script that will update the dumps to the GP7 version.

If I'm missing other interesting approaches for this issue, please feel free to chime in.

Some observations with the approaches.
Approach 1,2, and 4 do not handle scenario 2.
If we go for approach 3, then it opens up more questions about resolving conflicting options, who takes priority, etc.
For example, gp_default_storage_options and default_table_access_method both have option appendonly, but one is set to true and the other as false.
Who is the correct one?

Input into about this topic would greatly appreciated!

Thank you,
Marbin

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Ashwin Agrawal

unread,
Apr 4, 2024, 2:25:52 PMApr 4
to Marbin Tan, gpdb...@greenplum.org
On Wed, Mar 27, 2024 at 2:14 PM 'Marbin Tan' via Greenplum Developers <gpdb...@greenplum.org> wrote:
Hi folks,

I want to start a discussion about gp_default_storage_options and its compatibility from GP6 to GP7.
In GP7, a new GUC (default_table_access_method) was introduced and due to its overlap with gp_default_storage_options in GP7, several options of gp_default_storage_options were removed.

For example, you will see the following error when trying to set appendonly for gp_default_storage_options in GP7:
postgres=# SET gp_default_storage_options = 'appendonly=true, compresstype=rle_type';
ERROR:  invalid storage option "appendonly"
HINT:  For table access methods use "default_table_access_method" instead

NOTE: this issue also applies for orientation and appendoptimized option.

SCENARIO: on top of my mind, there are two scenarios that the issue can have a big enough impact for a user (there might be more that I'm missing?).

1. Upgrade from GP6 to GP7 will have issues, specially if they have roles that modified gp_default_storage_options.
postgres=# ALTER ROLE foo SET gp_default_storage_options TO 'appendonly=true,compresstype=zstd,compresslevel=3';
ERROR:  invalid storage option "appendonly"
HINT:  For table access methods use "default_table_access_method" instead.
Upgrading from GP6 to GP7 will definitely fail due to this (or at the very least have inconsistencies). If we ignore this problem, then the tables that are created GP7 will not have the same options compared to the GP6 counterpart.

2. Users may have scripts based on GP6 with gp_default_storage_options that include the removed options; the scripts will no longer work for GP7.

Do we want to introduce backwards compatibility for this and have a seamless transition to GP7 for their scripts or will the users need to modify their script to adhere to GP7 standard?

I do feel resolving and providing backward compatibility will ease adoption and migration. So definitely considering options is helpful. My feeling is it shouldn't be too hard to support and provide this backward compatibility aspect and hence makes sense to do. If after evaluation complications are high in providing the compatibility then we can re-discuss not having one.


APPROACH: assuming that we agree that we need some form of backwards compatibility, here are some approaches:
1. Enable GP7 pg_dump/pg_dumpall to be able to modify SQL when dumping from a GP6 cluster.
2. Enable pg_restore to read old dumps and modify the SQL on the fly to restore on GP7.
3. Enable to have GPDB (server) have backwards compatibility and it will handle things internally with in the server. (keep GP6 syntax intact).
4. gpupgrade will create a script that will update the dumps to the GP7 version.

If I'm missing other interesting approaches for this issue, please feel free to chime in.

Option 1 we should be taking gpbackup into account as well for option 1. pg_dump and pg_restore only come into play mostly for gpupgrade (and little gpcopy) context.

2 reads very similar to 3 when actually trying to implement, as I don't see without server side change modifying  backup files is at all feasible. So 2 just seems not an option

4 definitely doesn't sound good. 1 is better for it and takes care of the upgrade scenario.
 

Some observations with the approaches.
Approach 1,2, and 4 do not handle scenario 2.
If we go for approach 3, then it opens up more questions about resolving conflicting options, who takes priority, etc.
For example, gp_default_storage_options and default_table_access_method both have option appendonly, but one is set to true and the other as false.
Who is the correct one?

Reading through the list option 3 seems best to me and also least invasive plus covers all the bases.

There is a lot of tricky code around table options so need to be very cautious at what layer we are providing the backward compatibility and have code well abstracted for the same.

In terms of defining priorities - I would say default_table_access_method has least priority as that's the new thing users would be using. So if the user has defined gp_default_storage_options at database or role or session level it should take precedence over default_table_access_method, similar to if the options were defined in CREATE TABLE command. And this behavior can be clearly documented.

I would really prefer a longer run like GPDB 8 if we can have a route towards being able to remove this backward compatibility layer. Thinking from that perspective, not sure how good or ugly it looks in implementation where backward compatibility is provided only in syntax but finally in catalog or on-disk it gets mapped to default_table_access_method and stored using it. I know that definitely now complicates things if both are defined. So we need to think more on that aspect.

Thanks,
Ashwin (Broadcom)
Reply all
Reply to author
Forward
0 new messages