pg_attribute bloat

485 views
Skip to first unread message

来熊

unread,
Jan 4, 2017, 3:56:00 AM1/4/17
to Greenplum Users
we used gpload to load massive data for years,now we found that table pg_attribute  has taken 150G disk space.
now when using gpload,it read the meta data of the target table very slowly.so,make the loading very slowly.
how to shink the space? 

Heikki Linnakangas

unread,
Jan 4, 2017, 4:49:36 AM1/4/17
to 来熊, Greenplum Users
"VACUUM FULL pg_attribute" to shrink it back.

To avoid getting into that situation again, you should schedule a
regular VACUUM of the catalog tables (plain VACUUM, not VACUUM FULL).
With e.g. a cron job, or perhaps add it as a final step of the script
that you use for the data loads.

- Heikki

Danilo Fortunato

unread,
Jan 4, 2017, 5:44:26 AM1/4/17
to Heikki Linnakangas, 来熊, Greenplum Users
Also the indexes of pg_attribute table will probably be bloated, therefore I suggest to rebuild the indexes to shrink their size

REINDEX TABLE pg_catalog.pg_attribute;

I suggest to rebuild the indexes of all pg_catalog tables on a regular basis, for example once a week or once a month, as for the VACUUM of pg_catalog tables.

Regards,
Danilo Fortunato



--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

Marshall

unread,
Jan 4, 2017, 9:06:23 AM1/4/17
to Greenplum Users
In addition to the very helpful comments already in this thread, you will find a wealth of good ideas for administrators and users of GPDB in the Greenplum Best Practices Guide. 
MEP

..: Mark Sloan :..

unread,
Jan 4, 2017, 6:24:40 PM1/4/17
to Greenplum Users



On Wednesday, January 4, 2017 at 3:56:00 AM UTC-5, 来熊 wrote:
we used gpload to load massive data for years,now we found that table pg_attribute  has taken 150G disk space.
now when using gpload,it read the meta data of the target table very slowly.so,make the loading very slowly.
how to shink the space? 






I suspect you will find that pg_attribute probably isn't the only extremely bloated table.

could try something like the below as a first pass.






#!/usr/bin/env bash
DBNAME=${1}
VCOMMAND="VACUUM FULL "
SYSTABLES="' pg_catalog.' || relname || ';' from pg_class a, pg_namespace b
where a.relnamespace=b.oid and b.nspname='pg_catalog' and a.relkind='r'"
psql -tc "SELECT '$VCOMMAND' || $SYSTABLES" $DBNAME | psql -a $DBNAME

printf "starting reindex\n"
reindexdb --system -d $DBNAME
printf "starting analze\n"
analyzedb -a -s pg_catalog -d $DBNAME




Scott Kahler

unread,
Jan 4, 2017, 8:01:33 PM1/4/17
to Greenplum Users
If you aren't already, take a look at the REUSE_TABLES option in gpload and see if it is a fit for your ETL process. That could save you some catalog cruft.
--

Scott Kahler | Pivotal, R&D, Platform Engineering  | ska...@pivotal.io | 816.237.0610

来熊

unread,
Jan 5, 2017, 6:18:02 AM1/5/17
to Scott Kahler, Greenplum Users
Thanks all of you,we will do it next。
our production is 7*24 hours,so we will use vacuum analyze instead of vacuum full,
and I tested the reuse_tables option,found it can not work,my delimiter is a tab。
thanks again。
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

..: Mark Sloan :..

unread,
Jan 5, 2017, 12:27:00 PM1/5/17
to 来熊, Scott Kahler, Greenplum Users
if your tables are as bloated as you indicated - i'd be pretty surprised at this point if a regular vacuum will be sufficient.

I suspect you are going to need a 'one time' run of vacuum full + reinidex to compact them back to something reasonable and then keep them in check with a regular vacuum in the future.


Reply all
Reply to author
Forward
0 new messages