As we switch from the current file replication to WAL-based replication,
and remove all the persistent table stuff and more, let's revisit the
filespaces feature. There would be some work needed to keep it working,
so now is a good time to consider how we'd want it to work. Or if we
could revert all that the way it is in the upstream.
Here's what I propose we do:
* Remove the concept of filespaces.
* Revert tablespaces the way they are in the upstream.
* Cherry-pick commit 16d8e594ac from PostgreSQL 9.2 to remove
spclocation field from pg_tablespace
(
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=16d8e594acd96661267cb7897834f9cba51a2ffd).
With these changes, we'll have one feature less to maintain. I believe
you can do all the things that users currently do with filespaces, with
the unmodified upstream tablespaces feature. Although the UI and the
details will be different.
If I understand correctly, the idea behind filespaces has been that you
can specify a different "mount point" on each server, for each
tablespace. With the upstream commit, to remove
pg_tablespace.spclocation, you can do that with plain tablespaces,
without the concept of filespaces. The mount point for each tablespace
is stored as a symlink in the data directory, and it can be different on
each server.
In order to make that nicer to use, maybe we need some embellishments.
Like, a UDF to set the symlink, so that you don't need to ssh into each
server and set it up manually. Or some python scripts. Something along
those lines.
Currently, even the location of the data directory on each segment is
stored in the master's pg_filespace_entry table. Do we need that? A lot
of the management tools depend on that currently, so I guess we do. I
propose that as we remove the pg_filespace_entry table, we add a field
to gp_segment_configuration instead, for the data directory's path. But
I don't think the master needs to have the paths of every tablespace on
every segment. When you know the path to the main data directory, you
can dig into the data directory and look at the tablespace's symlink, to
see where it points to.
One question is, what do you do in pg_dumpall, when dumping from an old
server that uses filespaces? If the concept of filespaces goes away
altogether, we can't restore them the way they were. Maybe we can find
some semi-intelligent mapping from filespaces+tablesapces to just
tablespaces, or maybe ask the user to give more information on how to
map it. And the same with pg_upgrade.
Thoughts?
- Heikki