For me, my main struggle with gp_segment_configuration is when dealing with coordinator segment replicas/copies. The gp_segment_configuration always needs to be updated by first promoting and starting up the coordinator segment replica (to manually update the catalog table). This locks away some useful features that a coordinator segment replica could do... one major thing being hot standby dispatch (e.g. the ability to have a full on replica cluster up in hot standby recovery mode and do read-only query dispatches).
So my proposal would be the following:
1. Convert gp_segment_configuration into a view that calls an internal catalog function gp_get_segment_configuration(). The main gp_segment_configuration catalog table would now be called gp_internal_segment_configuration.
2. Move the current gp_segment_configuration choosing logic from getCdbComponentInfo() into the new gp_get_segment_configuration() so that it'd look like this:
gp_get_segment_configuration()
=> readGpSegConfigFromCatalog() // if (IsTransactionState())
=> readGpSegConfigFromFTSFiles() // else
Make getCdbComponentInfo use the new gp_get_segment_configuration().
After implementing the above, we would be at functional parity but gp_segment_configuration would now be able to be more easily extended upon. For example, we could later on have:
gp_get_segment_configuration()
=> readGpSegConfigFromFlatFile() // if (EnableHotStandby)
=> readGpSegConfigFromCatalog() // else if (IsTransactionState())
=> readGpSegConfigFromFTSFiles() // else
The above would allow the hot standby dispatcher to create a hot standby cdbgang according to some static flat file similar to how we have readGpSegConfigFromFTSFiles() written today... and the user would be able to SELECT from gp_segment_configuration and get an accurate view of their hot standby cluster. This is just one single use case, but it could be further extended (more conditional cases or have a GUC setting) to help other use cases that struggle due to the limitations of gp_segment_configuration.
Previous discussion thread reference:
https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/_zIdZlnZKK8/m/4f3TIMBMCgAJ
Thanks Jimmy.
So far, all the discussions are centered around mirrored deployment, and with auto-failover scenario in mind. I don't have direct comments on that.
I'd like to bring related scenarios of what's the impact to the mirrorless deployment, and what's the impact to the historical table `gp_configuration_history`.
Here are my questions:
- Under mirrorless deployments, the FTS is disabled, then will the `gp_segment_configuration` still readable?
- Since we capture the `gp_segment_configuration` differently, what's the impact to the `gp_configuration_history` table?
Thanks,
Shine
From:
'Jimmy Yih' via Greenplum Developers <gpdb...@greenplum.org>
Date: Wednesday, July 5, 2023 at 3:35 PM
To: Greenplum Developers <gpdb...@greenplum.org>
Subject: Should we make gp_segment_configuration into a view
!! External Email
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Fgreenplum.org%2Fg%2Fgpdb-dev%2Fc%2F_zIdZlnZKK8%2Fm%2F4f3TIMBMCgAJ&data=05%7C01%7Czhxin%40vmware.com%7C889d009b830042edd4e508db7d976488%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638241861408043144%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Kx5Ykp%2BJ%2BU%2BG5LyUs%2FsDv9XuHsy2SXEEsiZc5K2p1bA%3D&reserved=0
!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
There shouldn't be a difference. Querying gp_segment_configuration in a mirrorless deployment would be the same as a mirrored deployment where the gp_segment_configuration view would simply return the contents of the new gp_internal_segment_configuration catalog table (which would be populated by gpinitsystem; same as how it works today).
> - Since we capture the `gp_segment_configuration` differently, what's the impact to the `gp_configuration_history` table?
There should be no impact to the `gp_configuration_history` table. It's purpose would still be the same: record updates from FTS for regular GPDB High Availability events. If necessary, other things could INSERT into the table as well since it's just a table logging High Availability relevant changes... but I imagine non-FTS gp_segment_configuration changes would probably be logged elsewhere if it's even needed (e.g. I think pg_auto_failover has its own events table). If needed/desired, we could also extend this proposal to make `gp_configuration_history` into a view too (so that for example we could access the pg_auto_failover events table as a foreign server table)... but I don't see it being a requirement whereas gp_segment_configuration being a table has made starting some new features harder.
This is great to know. Thanks.
!! External Email
|
!! External Email
|
As stated in the proposal, it could be done by simply checking for EnableHotStandby and maybe recovery mode state. Current dispatcher logic already checks for transaction state so we'd just be extending the logic with more cases. Or we could trivially control/force it with a new reloadable GUC.
In the replica cluster scenario, the gp_segment_configuration info could be stored in a flat file similar to the FTS twophase file. The replica cluster info is generally static and would never change (or at least until gpexpand is supported but then you'd just have to add an entry to the flat file).
> - Clearly articulate the functional APIs (add primary, add mirror, modify config and such ....) required to be exposed by underlying Segment Configuration implementations (Catalog, FlatFile, FTSFile, ...)
We already have catalog functions to do this:
gp_add_coordinator_standby
gp_add_segment
gp_add_segment_primary
gp_add_segment_mirror
gp_remove_coordinator_standby
gp_remove_segment
gp_remove_segment_mirror
They can be updated later on if needed to satisfy any new logic where required.
Hey folks,
I realize I am late to this party. Let me share my thoughts nonetheless.
(1) Having a pluggable component fully manage the cluster configuration
throughout the uptime of the cluster is a challenging ask, with complete APIs
etc.
I thought about this a bit and I feel at this stage of the 7 dev cycle, lets
not introduce this change. There is a fair bit to do to get it right, and I
don't think we have the cycles to develop and test. The server backend won't be
the only thing changing, utilites (and their tests) will have to change and be
tested too (and as mentioned KBs, docs etc).
I think we should do it in the best way possible and not compromise due to the
amount of time we have left.
(2) There is quite a bit of code around (gangs, dispatch etc) which lies in
critical paths. The gang and dispatch code today rely on a cache:
cdb_component_dbs which is currently invalidated via FTS and even gpexpand! Now,
we would need to add interface routines around these as well.
else if ((cdb_component_dbs->fts_version != ftsVersion ||
cdb_component_dbs->expand_version != expandVersion))
{
...
cdbcomponent_destroyCdbComponents();
cdb_component_dbs = getCdbComponentInfo();
(3) One other way to tackle the problem of cluster config maintenance is to
completely decouple it from the server -> just have one provider of truth: etcd
(or similar).
Instead of having multiple ways to query/update cluster configuration, there
would only be one way (and there should really be one way).
Utilities can directly update/read it and so can the server
backends.
(4) Possible temporary compromise to support read replica clusters:
I think the main requirement for a read replica cluster and hot standby dispatch
is that a mechanism is needed to seed a fresh gp_segment_configuration
that departs
from the primary cluster. What if through a utility, we could modify the
gp_segment_configuration catalog? The main challenge of doing that though is
that gp_segment_configuration is WAL replicated and any updates in the primary
cluster will mess with the read replica cluster.
What if we made gp_segment_configuration UNLOGGED? Making it so
might mean a bit of work for gpactivatestandby, but might help us here.
I can't quite recall what Jimmy did here in the code to read the cluster config
from the catalog in one of our POCs for hot standby dispatch, but maybe it
looked like this:
if (guc)
configs = readGpSegConfigFromFile(&total_dbs);
else if (IsTransactionState())
configs = readGpSegConfigFromCatalog(&total_dbs);
else
configs = readGpSegConfigFromFTSFiles(&total_dbs);
I think this is the simplest thing to do at this point for the 7X release, using
a flat file as an integration mechanism to tackle this one need. It can also be
done post-release as a new feature.
dispatch code, specially for OLTP queries. (see all callers for t
readGpSegConfigFromCatalog()). The segment config cache exists for this purpose
today and we can continue to use it. So we would need a way to invalidate these
backend-local caches. That should be part of the framework code that would call
into the provider APIs.
5. gp_configuration_history is a pretty critical entity and that would need
interfaces around as well. It can be a view now.