VACUUM on append-only compressed tables

399 views
Skip to first unread message

Akon Dey

unread,
Dec 7, 2016, 3:28:30 PM12/7/16
to gpdb-...@greenplum.org
Hi GPDB experts,

We have been running a database with a large column oriented, compressed table with many columns that is partitioned by time. We have noticed that running VACUUM on the entire database seems to slow it down significantly. This seems to have gotten worse after we added the partitioning.

The documentation indicates that running VACUUM on append-only tables helps (http://gpdb.docs.pivotal.io/4320/ref_guide/sql_commands/VACUUM.html).

It would be good to know what VACUUM and VACUUM FULL does when applied to partitioned, append-only, column oriented tables with bitmap indexes.

Thank you for your input and time.
Regards,
Akon

Jon Roberts

unread,
Dec 7, 2016, 4:38:05 PM12/7/16
to Akon Dey, Greenplum Users
Append-Only Tables (which are really now Append Optimized because you can now UPDATE and DELETE these tables), use a visibility map to hide stale rows.  When you execute a VACUUM on an AO table, the database will check the percentage of hidden rows and if it exceeds the GUC gp_appendonly_compaction_threshold, then it will reorganize the table.  It will be the same as a VACUUM FULL at this point.

I wouldn't VACUUM the entire database in a single command.  I would loop through the tables and execute a normal VACUUM for each.  If the percentage of hidden rows exceeds 10% (default gp_appendonly_compaction_threshold), then the table will be reorganized.


Jon Roberts

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

Ivan Novick

unread,
Dec 8, 2016, 10:25:33 AM12/8/16
to Akon Dey, Greenplum Users
Hi Akon,

Can you provide some numbers around size of database in terms of entries in pg_class or number of partitions and number of tables?

VACUUM uses CPU and IO resources so surely will take some amount of the available hardware resources.  Can you share any perf numbers you are seeing?  What % slow down in workload and how long the VACUUM takes.

VACUUM FULL should not be run on normal basis only if you have extreme bloat from lack of regular VACUUM, VACUUM FULL is basically a fully rewrite of the data with no empty blocks.

Cheers,
Ivan 

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--
Ivan Novick
Product Manager Pivotal Greenplum

Akon Dey

unread,
Dec 8, 2016, 6:55:20 PM12/8/16
to Ivan Novick, Greenplum Users
Hi Jon,

Thanks for your reply. This is very helpful.

Hi Ivan,

At this point the database isn't too large. We have roughly 1 to 2 billion rows in one fairly wide table (~200 columns). It has hourly partitions for the most recent 48 hours and daily partitions after that. We have roughly 7 days worth of data.

The slowdown is significant. I can't put an estimate but it prevents us from actively updating the database including other tables.

Switching to running VACUUM on only the heap tables that have updates improved the situation. The CPU and IO overhead does not seem to be the issue. It looks like there is some lock contention or shared resource contention other than IO.

I would be great if you can recommend ways to keep track of that.

Thanks,
Akon

Ashwin Agrawal

unread,
Dec 8, 2016, 7:02:45 PM12/8/16
to Akon Dey, Ivan Novick, Greenplum Users

Indexes definitely plays a role to slow-down when running vacuum on AO tables. As it needs to perform checking for every tuple with visimap in index to cleanup index during vacuum along with fully table re-write. 

So, I would also recommend VACUUM lazy instead of full as others recommended for AO and also on table level instead of database wide.

Luis Macedo

unread,
Dec 9, 2016, 8:18:48 AM12/9/16
to Ashwin Agrawal, Akon Dey, Ivan Novick, Greenplum Users
Akon,

Vacuum Full is an operation to fix a very bad problem, its not for daily maintenance. Do not use it unless you get bloated which you don't want to get. Vacuum full is meant to be run alone on the box.

Vacuum is important only on tables that you do update or delete operations, heap or append optimized (BTW we should definitively change the syntax from "only" to "optimized"... keeping the old syntax for retrocompatibility). 

Another scenario that vacuum is necessary is when you are bulk loading in a table and the process fails. 

Vacuum is meant to be a scheduled process that runs in a quiet time like Sunday 3 AM. You don't need to run it daily. 

Daily you could run analyze on the tables you load a lot of data every day. GPDB has automated this with analyzedb which should be enough for most people.

Makes sense?

Tks


Luis Macedo | Sr Platform Architect | Pivotal Inc 

Mobile: +55 11 97616-6438

Take care of the customers and the rest takes care of itself

Scott Kahler

unread,
Dec 9, 2016, 10:07:33 AM12/9/16
to Ashwin Agrawal, Akon Dey, Ivan Novick, Greenplum Users
I'll add into this that the table I believe you are mentioning is divided up into partitions that represent time slots. Once the data in those partitions have become static there shouldn't be any reason to analyze or vacuum those partitions. You may want to look at a more targeted approach aimed at just the partitions that have changed.

Scott Kahler | Pivotal, R&D, Platform Engineering  | ska...@pivotal.io | 816.237.0610

Akon Dey

unread,
Dec 12, 2016, 12:18:13 PM12/12/16
to Scott Kahler, Ashwin Agrawal, Ivan Novick, Greenplum Users
Thanks Scott, Luis, Ashwin for the information. This is very helpful.

We will incorporate these into our vacuuming strategy and make the recommended changes. I will keep you all posted.

Regards,
Akon
Reply all
Reply to author
Forward
0 new messages