These are the params passed currently on cmdline to start master and segments in GPDB. Was looking into usage for each of it and see if possible eliminate need for passing on cmdline. This would help to simplify starting of cluster. Lets look at each of separately.
gp_dbid: This provides unique identifier for each postgres instance in greenplum cluster. The number comes from number entered in gp_segment_configuration table on master. Currently gp_dbid has no usage on segments at all except recently introduced dependency of generating tablespace path. Its used to generate unique tablespace path for each instance including primary and mirror. If we can find some other way to achieve the same, then need to pass aroung gp_dbid can be completely eliminated. One way could to use contentid in tablespace instead of dbid but it only makes it unique across segments but not for primary and mirror pair. So, if have any ideas how can generate unique names without dbid would love to hear. One could be add 'p' and 'm' but will not work in future for multiple mirrors. Ideally upstream suffers from this problem as well for tablespace name collision for primary and mirror.
gp_contentid: This provides unique identifier for pair of primary-mirror pair. Used on segments and returned as part of results to identify execution segment or where the tuple resides. Since this is same for primary and mirror, never changes in lifetime of the segment, can be easily recorded in postgresql.conf file during segment creation and never need to be passed later during starts. Tried and works perfectly fine. Any parameter which varies across primary and mirror can't be recorded in postgresql.conf as it gets copied from primary to mirror during mirror initialization. For this param its perfectly great way to make it consistent across primary and mirror.
gp_num_contents_in_cluster: This provides how many primaries are in system and used for hashing/distribution. So, this seems someting useful but I feel can be easily dispatched to segments as part of query execution instead of cmdline argument at start of segment. For master, it can cache this value fetching its once on startup from catalog table gp_segment_configuration. gpexpand currently restart master today so this should just work as well if more primaries are added.
So, with those changes cmdline can be simplified for starting GPDB postres instances. Does it eliminate the need for starting the master once in utility during gpstart to fetch the segment info, NO. As still need to know port and data-directory to start segments. Now its possible to modify the gpstart flow to start the master, get the info, start the segments and instead of restarting the master in dispatch mode, write some UDF to get already started master in dispatch mode and done.
(Mostly wished to have the findings recorded here for reference.)