Looking for a small bit of advice...
I have a script that updates several tables with large amounts of data.
Before running the updates, it drops all indexes for optimal
performance. When the updates have finished, I run the following
procedure:
recreate the indexes
cluster the tables
vacuum full analyze on the tables
I was hoping an expert could comment on the optimal way to order these
three commands. For instance I have a hunch that creating the indexes
first (as I do now) could slow down the clustering - perhaps the row
locations in the indexes all have to be updated as the cluster command
shifts their locations? And perhaps vacuuming should be done before
clustering so that dead tuples aren't "in the way"?
Of course I could just test every combination until I get it right, but
I'd like to have a good understanding as well.
Any insight would be much appreciated.
Thank you,
Mark
clustering also removes the dead tuples.
I would just:
- create one index, the one to be clustered
- cluster the table
- create the remaining indexes
--
Eat right. Exercise regularly. Die anyway.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
And then run ANALYSE. No need to vacuum because the cluster did that
already.
Have a nice day,
--
Martijn van Oosterhout <kle...@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
Mark
> --fXStkuK2IQBfcDe+
> Content-Type: application/pgp-signature
> Content-Disposition: inline;
> filename="signature.asc"
> Content-Description: Digital signature
> X-Google-AttachSize: 190
I have a followup question. What if the set of dead tuples is too big
and I need to VACUUM FULL, as opposed to VACUUM. (The update size
varies greatly from day to day.) Will the clustering effectively do a
VACUUM FULL, or just a VACUUM?
Thanks again for your help,
Mark
Hi all,
I'd like to see a general way to take indexes off line without actually
losing their definitions. For example, something like "ALTER TABLE [EN
| DIS] ABLE INDEXES", "ALTER INDEX [EN | DIS] ABLE", etc. This could
also be used internally when a backend encounters an error
reading/writing an index. Rather than refusing to execute queries, it
could just ignore indexes it knows are disabled or bad in some way and
re-plan as needed.
This would have two benefits. First, the above scenerio would be much
simpler. Rather than dropping and re-creating new indexes, you could
just disable and then re-enable them without having any knowledge of
their structure. Secondly, it would allow us to put indexes in an
alternate table space on a non-redundant volume and, in the case of a
drive failure, be able to limp along, and get the system back to normal
simply by replacing the disk and issuing a REINDEX command.
I realize there are a couple gotchas with this. For example, what to do
with unique indexes? Perhaps a backend would still need to refuse to do
update/inserts on a table with degraded unique indexes, unless the index
was disabled explicitly? And then, refuse to rebuild/re-enable the
index as normal if non-unique values found?
Thx for considering :-)
-Glen Parker
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Since PITR works well, my use of pg_dump has shifted. Rather than using
it as a backup tool, I now use it as a snapshotting tool. At the end of
each month we do an ASCII dump to keep around, so if we ever need to,
we can see the data as it was any number of months or years ago. Not a
backup at all, just a raw data archive.
These archives do not need to hold all our data, for example, system
logs would be useless later. There also is no reason to include
indexes. Ignoring extranious tables and indexes is a great way to keep
the archive small and keep the time to restore as low as possible.
So, it would be great if pg_dump could accept some sort of argument to
make it simply not dump certain types of objects. Indexes, views,
functions, etc.
Thx for considering :-)
-Glen Parker
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
One issue would be that even disabled indexes would need to be updated
when there are new rows. If you don't update the index when it's
disabled, then re-enabling will essentially need to rebuild the index.
That's what I had in mind. You could just as easily blow away the index
file(s). It's just that I don't want it to toss the index *definition*.
To continued to update such an index would be to completely negate the
benefit of disabling it!
-Glen
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly
I assume that's what he wants. However, it's not immediately clear that
there's a sufficient use case for this to justify the extra apparatus
compared to just DROP INDEX (and recreate it later).
regards, tom lane
CLUSTER is the equivalent of VACUUM FULL, with the addition that it
orders the data in the table in the order of the index you specify.
VACUUM FULL doesn't follow any particular order for the data.
-Bill
I hope I can make this clear somehow... Anyway... This all involves PG
8.1.4 on a 64-bit FC5 box.
Select version() says "PostgreSQL 8.1.4 on x86_64-redhat-linux-gnu,
compiled by GCC x86_64-redhat-linux-gcc (GCC) 4.1.0 20060304 (Red Hat
4.1.0-3)".
I guess the best question I can see is, under what circumstances is the
directory name in pg_tablespace actually used?
I have a scenario where I want to restore from a PITR backup, into an
alternate location on the same machine it came from, while the original
database is still up and running. I have one alternate table space.
It goes like this. First I expand the base archive into an alternate
location, then expand the table space archive(s) into alternate
location(s). Then I recreate the links under pg_tblspc. I then fiddle
a little bit with config files and run postgres on the new alternate
location. Everthing goes fine, the database rolls forward, and then
postgres quits (because I give it an SQL file for stdin). Great.
But now I have a problem. What if I move objects from the main
tablespace to the alternate one, such as indexes, between the time of
the backup and the restore? During the restore/recovery, the
pg_tablespace table is out of date. If the tablespace directory listed
there was used in copying files, I'd have a big fat mess involving a
badly broken production database.
Hopefully that all makes sense...
-Glen
---------------------------(end of broadcast)---------------------------
It isn't used (except by pg_dumpall) ... what counts is where the
symlink in $PGDATA/pg_tblspc points.
> I have a scenario where I want to restore from a PITR backup, into an
> alternate location on the same machine it came from, while the original
> database is still up and running. I have one alternate table space.
You can probably make this work if you don't issue any CREATE TABLESPACE
commands while PITR logging is active, but you'll want to test your
procedures pretty carefully.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
That's what I thought, and after your message, I went ahead with it and
had no problems. Thx, Tom.
Well, you can just pg_dump the index definition to a file. What more
would someone want?
--
Bruce Momjian br...@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster