Proposal to replace gpreload utility with ALTER TABLE...ORDER BY

50 views
Skip to first unread message

Ashwin Agrawal

unread,
Oct 22, 2020, 7:12:06 PM10/22/20
to Greenplum Developers
Recently came to know about the gpreload utility. It helps to reload
table data in sorted order based on user specified columns (kind of
similar to CLUSTER though with some difference we will discuss later).

The main use case for the utility is for append-optimized tables with
compression enabled. Reloading the data to get better compression by
resorting and repacking data with new/old attributes. Also helps to
improve query performance as a result. Plus, helps to save (or
reclaim) disk space by optimizing compression for mostly now static
tables in the system.

The utility takes as input a file where it can specify schema qualified
table names to reload and columns to be used for sorting.

High-level gpreload utility flow for each table:

1. BEGIN;
2. CREATE TEMP TABLE <tmp_table> AS SELECT * FROM <src_table>;
3. SELECT COUNT(*) from <src_table>; -- just for some unsuitable checking
4. SELECT COUNT(*) from <tmp_table>; -- just for some unsuitable checking
5. TRUNCATE TABLE <src_table>;
6. INSERT INTO <src_table> SELECT * FROM <tmp_table> ORDER BY {user_specified_column_list};
7. COMMIT;

Utility has following limitations:
- Table is scanned multiple times by the utility
- Race condition exist and hence may cause data loss if new data is
  inserted into table after creating temp table and before truncating
  actual table
- Wildcard characters are not supported in object names
- Zero (now at least some thanks to Ed and Abhijeet) test coverage
- It's a utility in Python and not SQL command
- Double the size of table free disk space required for the operation
  (temp table and new table), as the truncate is inside the transaction, old
  data is deleted only after commit.

I played with it and am convinced it should be deprecated for GPDB7.
Given its very useful functionality, we should replace it with SQL command:

ALTER TABLE <name> WITH (REORGANIZE=true) ORDER BY clause.

This can be implemented very similarly to how changing table's
distribution is implemented currently.

CLUSTER command has some similarities but doesn't cover the full
use case as index is must for the cluster to work whereas not for gpreload
utility. Also, the database currently doesn't remember the columns on
which the table is sorted.

Future potential enhancements to this command (delivered in iterations)
may be to allow altering storage types like row to column or
compression type or to a foreign table.

Thoughts?


--
Ashwin Agrawal (VMware)

Jesse Zhang

unread,
Oct 23, 2020, 2:23:03 PM10/23/20
to Ashwin Agrawal, Greenplum Developers
I think this actually the working space is almost triple (the explicit
temp table,
the implicit "temp table" created by TRUNCATE, and the tuplesort) the original
table size.

We also lose the visibility information of the original tuples, which
_might_ be undesirable -- an older ongoing transaction will wait for the
"reload" transaction and once unblocked will not see a tuple it
otherwise would see (I believe this breaks isolation level
SERIALIZABLE, but not REPEATABLE READ).

>
> I played with it and am convinced it should be deprecated for GPDB7.
> Given its very useful functionality, we should replace it with SQL command:
>
> ALTER TABLE <name> WITH (REORGANIZE=true) ORDER BY clause.
>
> This can be implemented very similarly to how changing table's
> distribution is implemented currently.
>
> CLUSTER command has some similarities but doesn't cover the full
> use case as index is must for the cluster to work whereas not for gpreload
> utility. Also, the database currently doesn't remember the columns on
> which the table is sorted.
>

May I ask why gpreload (and the suggested "ALTER TABLE ... ORDER BY") is
useful in the absence of a btree index sorting by the same columns?
Specifically what kind of queries or operations can benefit from this
CLUSTERing-without-an-index? My gut feeling is, in the absence of an
index, reordering a table à la CLUSTER is not sufficiently useful. And
if that's true, we should completely get rid of such utilities.

Jesse

Ashwin Agrawal

unread,
Oct 23, 2020, 5:22:23 PM10/23/20
to Jesse Zhang, Greenplum Developers
On Fri, Oct 23, 2020 at 11:23 AM Jesse Zhang <sbj...@gmail.com> wrote:
We also lose the visibility information of the original tuples, which
_might_ be undesirable -- an older ongoing transaction will wait for the
"reload" transaction and once unblocked will not see a tuple it
otherwise would see (I believe this breaks isolation level
SERIALIZABLE, but not REPEATABLE READ).

Yes. Though, AO tables specifically (and GPDB in general) anyways not support much with SERIALIZABLE transactions, so that's something least of our worries I think for now. Even CLUSTER or other ALTER TABLE operations on AO are currently not serializable safe.

May I ask why gpreload (and the suggested "ALTER TABLE ... ORDER BY") is
useful in the absence of a btree index sorting by the same columns?
Specifically what kind of queries or operations can benefit from this
CLUSTERing-without-an-index? My gut feeling is, in the absence of an
index, reordering a table à la CLUSTER is not sufficiently useful. And
if that's true, we should completely get rid of such utilities.

As mentioned initially, the main use case for the tool is less to do with improving query performance instead to reload the data to get better compression by
resorting and repacking data with new/old attributes. This is to reclaim disk space.
Sources mentioned to me that this tool helped many times when running close to disk full situations to get back significant disk space back by reloading the tables via this tool. Consider this more of a VACUUM bucket tool than a CLUSTER bucket tool.

Asim Praveen

unread,
Nov 25, 2020, 7:09:41 AM11/25/20
to Ashwin Agrawal, pvtl-cont-sbjesse, Greenplum Developers
In fact, for heap tables, vacuum full performs cluster without index
if a suitable index is not defined on a table. Vacuum for
append-optimized tables works similarly but at the granularity of a
segment file, rather than the whole table.

That brings another option to the table - should vacuum command be
modified to accept a list of columns to order by for append-optimized
tables?

Asim

Ashwin Agrawal

unread,
Nov 25, 2020, 6:15:12 PM11/25/20
to Asim Praveen, pvtl-cont-sbjesse, Greenplum Developers
On Wed, Nov 25, 2020 at 4:09 AM Asim Praveen <pa...@vmware.com> wrote:
In fact, for heap tables, vacuum full performs cluster without index
if a suitable index is not defined on a table.  Vacuum for
append-optimized tables works similarly but at the granularity of a
segment file, rather than the whole table.
That brings another option to the table - should vacuum command be
modified to accept a list of columns to order by for append-optimized
tables?
 
VACUUM is already a complicated activity and options. I would prefer if we don't further complicate its options. Also, VACUUM usage for this doesn't seem intuitive compared to dedicated ALTER TABLE command. Like do we also move ALTER TABLE .... REORGANIZE option to VACUUM FULL to specify different distribution keys? Doesn't feel good as part of VACUUM. Vacuum full invokes the cluster internally is implementation detail and would prefer not to expose that as frontend detail to the user. In future we may decide to implement Vacuum full in some other way maybe without table rewrite again (for some table AM) and hence clubbing this functionality with its command syntax might not be a good idea.

Internally implementing this new command may also use the same cluster_rel() functionality but at user level having separate command feels better. Any ALTER TABLE command which performs rewrite also does vacuuming currently.

Also, as stated in the initial email, future potential helpful enhancements are to allow altering storage types like row to column or compression type or to a foreign table. Hence, considering all of these, seems better if don't tangle these into VACUUM. Sorting based on specified columns during rewrite seems helpful with these types of ALTER operations.

Luis Macedo

unread,
Nov 26, 2020, 9:29:19 AM11/26/20
to Greenplum Developers, Ashwin Agrawal, Jesse Zhang, Greenplum Developers, Asim Praveen, Louis Mugnano
Ashwin,

+1 to your idea. Reloading the tables does make the compression algorithm more efficient, specially if the table is loaded in mini batch mode. (many small inserts during the day)

Also on GPDB 7 we will eventually have BRIN indexes which can take extra advantage of this sorting utility.

Rgds,

Luis Macedo

Ashwin Agrawal

unread,
Aug 29, 2023, 2:25:54 PM8/29/23
to Luis Macedo, Greenplum Developers, Jesse Zhang, Asim Praveen, Louis Mugnano
To close the loop on this thread, team (Andrew) has implemented ALTER TABLE REPACK SQL functionality as replacement for gpreload utility for GPDB7 to reload table data in sorted order based on user specified columns via PR https://github.com/greenplum-db/gpdb/pull/16132.

So, will be deprecating gpreload utility from GPDB7 (eventually removing the same for GPDB8).

Thanks,
-- 
Ashwin Agrawal (VMware)
Reply all
Reply to author
Forward
0 new messages