gp_dbid, gp_contentid and gp_num_contents_in_cluster

177 views
Skip to first unread message

Ashwin Agrawal

unread,
May 17, 2018, 1:49:32 PM5/17/18
to Greenplum Developers

These are the params passed currently on cmdline to start master and segments in GPDB. Was looking into usage for each of it and see if possible eliminate need for passing on cmdline. This would help to simplify starting of cluster. Lets look at each of separately.


gp_dbid: This provides unique identifier for each postgres instance in greenplum cluster. The number comes from number entered in gp_segment_configuration table on master. Currently gp_dbid has no usage on segments at all except recently introduced dependency of generating tablespace path. Its used to generate unique tablespace path for each instance including primary and mirror. If we can find some other way to achieve the same, then need to pass aroung gp_dbid can be completely eliminated. One way could to use contentid in tablespace instead of dbid but it only makes it unique across segments but not for primary and mirror pair. So, if have any ideas how can generate unique names without dbid would love to hear. One could be add 'p' and 'm' but will not work in future for multiple mirrors. Ideally upstream suffers from this problem as well for tablespace name collision for primary and mirror.

gp_contentid: This provides unique identifier for pair of primary-mirror pair. Used on segments and returned as part of results to identify execution segment or where the tuple resides. Since this is same for primary and mirror, never changes in lifetime of the segment, can be easily recorded in postgresql.conf file during segment creation and never need to be passed later during starts. Tried and works perfectly fine. Any parameter which varies across primary and mirror can't be recorded in postgresql.conf as it gets copied from primary to mirror during mirror initialization. For this param its perfectly great way to make it consistent across primary and mirror.

gp_num_contents_in_cluster: This provides how many primaries are in system and used for hashing/distribution. So, this seems someting useful but I feel can be easily dispatched to segments as part of query execution instead of cmdline argument at start of segment. For master, it can cache this value fetching its once on startup from catalog table gp_segment_configuration. gpexpand currently restart master today so this should just work as well if more primaries are added.

So, with those changes cmdline can be simplified for starting GPDB postres instances. Does it eliminate the need for starting the master once in utility during gpstart to fetch the segment info, NO. As still need to know port and data-directory to start segments. Now its possible to modify the gpstart flow to start the master, get the info, start the segments and instead of restarting the master in dispatch mode, write some UDF to get already started master in dispatch mode and done.

(Mostly wished to have the findings recorded here for reference.)

Xin Zhang

unread,
May 17, 2018, 5:24:53 PM5/17/18
to Ashwin Agrawal, Greenplum Developers
If `gp_contentid` can be stored in the `postgresql.conf`, then `gp_dbid` can be also stored in the `postgresql.conf`. That's a unique number to identify which segment we are talking to within the cluster.

If we want to remove the `gp_dbid`, then we can follow the path of adding `p` and `m` with `gp_contentid`. If we want to support multiple mirrors, then we can have `m1`, `m2`, etc.

To conclude, if we can add all the id information to the `postgresql.conf`, and we don't need them on the command line then.

Thanks,
Shin
--
Shin
Pivotal | Sr. Principal Software Eng, Data R&D

Oak Barrett

unread,
May 17, 2018, 5:44:06 PM5/17/18
to Xin Zhang, Ashwin Agrawal, Greenplum Developers
Shin, 

Can you provide an example of what you are referring to when you mentioned, as I am trying to understand if it would affect the naming of our backup files:  


"... can follow the path of adding `p` and `m` with `gp_contentid`. If we want to support multiple mirrors, then we can have `m1`, `m2`, etc.  "


specific to the data files which gpbackup creates, we use the following naming convention:

gpbackup_0_20180516141302_737638.gz

where the _0_ in this example, is the content id of the data backed up, regardless if the segment was the preferred primary, or a mirror which was promoted to primary.

Thanks,
oak 


Max Yang

unread,
May 17, 2018, 9:23:16 PM5/17/18
to Xin Zhang, Ashwin Agrawal, Greenplum Developers
There is a file called gp_dbid already records gp_dbid information. 
So it seems we don't need to record that in postgresql.conf or pass in command line?
--
Best Regards,
Max

Ashwin Agrawal

unread,
May 17, 2018, 9:26:05 PM5/17/18
to Xin Zhang, Greenplum Developers
On Thu, May 17, 2018 at 2:24 PM, Xin Zhang <xzh...@pivotal.io> wrote:
If `gp_contentid` can be stored in the `postgresql.conf`, then `gp_dbid` can be also stored in the `postgresql.conf`. That's a unique number to identify which segment we are talking to within the cluster.

gp_dbid can be stored but its different for primary and mirror. And postgresql.conf gets copied from primary to mirror as part of pg_basebackup and all. So, do not wish to introduce anything which can introduce inconsistencies and requires manipulating the file later.

 

If we want to remove the `gp_dbid`, then we can follow the path of adding `p` and `m` with `gp_contentid`. If we want to support multiple mirrors, then we can have `m1`, `m2`, etc.

How on mirror to know its 1 or 2 ?

Ashwin Agrawal

unread,
May 17, 2018, 9:28:27 PM5/17/18
to Oak Barrett, Xin Zhang, Greenplum Developers
On Thu, May 17, 2018 at 2:44 PM, Oak Barrett <obar...@pivotal.io> wrote:
Shin, 

Can you provide an example of what you are referring to when you mentioned, as I am trying to understand if it would affect the naming of our backup files:  


"... can follow the path of adding `p` and `m` with `gp_contentid`. If we want to support multiple mirrors, then we can have `m1`, `m2`, etc.  "

Oak its in context of generating unique physical tablespace paths for primaries and mirrors on filesystem. Backup using content-id is perfectly fine.

Max Yang

unread,
May 17, 2018, 9:31:30 PM5/17/18
to Ashwin Agrawal, Xin Zhang, Greenplum Developers
How about we split role and id into different parts? For example, if m1 promoted to primary, it is not easy to change from m1 to p1.
--
Best Regards,
Max

Pengzhou Tang

unread,
May 17, 2018, 11:04:20 PM5/17/18
to Ashwin Agrawal, Greenplum Developers

So, with those changes cmdline can be simplified for starting GPDB postres instances. Does it eliminate the need for starting the master once in utility during gpstart to fetch the segment info, NO. As still need to know port and data-directory to start segments. Now its possible to modify the gpstart flow to start the master, get the info, start the segments and instead of restarting the master in dispatch mode, write some UDF to get already started master in dispatch mode and done.

 
So, will master be started in dispatch mode instead of utility mode at the first time? Will initTM() during InitPostgres() be a problem because it need all segments up?

Ashwin Agrawal

unread,
May 22, 2018, 3:09:31 PM5/22/18
to Pengzhou Tang, Greenplum Developers
On Thu, May 17, 2018 at 8:04 PM, Pengzhou Tang <pt...@pivotal.io> wrote:



So, with those changes cmdline can be simplified for starting GPDB postres instances. Does it eliminate the need for starting the master once in utility during gpstart to fetch the segment info, NO. As still need to know port and data-directory to start segments. Now its possible to modify the gpstart flow to start the master, get the info, start the segments and instead of restarting the master in dispatch mode, write some UDF to get already started master in dispatch mode and done.

 
So, will master be started in dispatch mode instead of utility mode at the first time? Will initTM() during InitPostgres() be a problem because it need all segments up?

What I was proposing was to not directly start master in dispatch mode, but instead have UDF which later can get the already started master in dispatch mode. So, performing completion of distributed transactions part of initTM() instead on first distributed query do it as part of this UDF.

Its only worth introducing this complexity if we save much on time. Our current first effort should be in direction to speed up start and stop. With recent PR https://github.com/greenplum-db/gpdb/pull/5015 time to start master, fetch catalog information and shutdown gets reduced drastically and hence restarting master doesn't become to much of a problem with much cleaner flow, start master only when its ready to run distributed user queries.

Ashwin Agrawal

unread,
May 23, 2018, 9:35:35 PM5/23/18
to Max Yang, Xin Zhang, Greenplum Developers
On Thu, May 17, 2018 at 6:22 PM, Max Yang <my...@pivotal.io> wrote:
There is a file called gp_dbid already records gp_dbid information. 

That file is currently present only on master and not on any of primaries or mirror. I do not fully understand the usage of it but seems related to switching the dbid of standby after activating. As standby has different dbid when acting as standby and later after activation dbid changes to that of master. So, removing the need of passing aroung dbid and dependency on it will help to also completely remove this very weird usage of the gp_dbid file.

Pengzhou Tang

unread,
May 23, 2018, 10:36:49 PM5/23/18
to Ashwin Agrawal, Greenplum Developers

What I was proposing was to not directly start master in dispatch mode, but instead have UDF which later can get the already started master in dispatch mode. So, performing completion of distributed transactions part of initTM() instead on first distributed query do it as part of this UDF.

Its only worth introducing this complexity if we save much on time. Our current first effort should be in direction to speed up start and stop. With recent PR https://github.com/greenplum-db/gpdb/pull/5015 time to start master, fetch catalog information and shutdown gets reduced drastically and hence restarting master doesn't become to much of a problem with much cleaner flow, start master only when its ready to run distributed user queries.
 
I see now, thanks, we also notice that gpexpand need to restart the cluster after adding a machine, maybe this proposal can also help to minimize the whole progress.

Max Yang

unread,
May 23, 2018, 11:59:01 PM5/23/18
to Ashwin Agrawal, Xin Zhang, Greenplum Developers
Thanks for the explanation. Agreed to remove that gp_dbid file because of so much confusion
--
Best Regards,
Max

Xin Zhang

unread,
May 24, 2018, 12:13:39 PM5/24/18
to Max Yang, Ashwin Agrawal, Greenplum Developers
Hi Ashwin,
 
​If `gp_contentid` can be stored in the `postgresql.conf`, then `gp_dbid` can be also stored in the `postgresql.conf`. That's a unique number to identify which segment we are talking to within the cluster.

gp_dbid can be stored but its different for primary and mirror. And postgresql.conf gets copied from primary to mirror as part of pg_basebackup and all. So, do not wish to introduce anything which can introduce inconsistencies and requires manipulating the file later.

​Thanks a lot. You are right, the `postgresql.conf` is also copied during `pg_basebackup`​. If that's the case, then `gp_dbid` shouldn't be there. As also later discussion pointed out, we don't even need it at all.

If we want to remove the `gp_dbid`, then we can follow the path of adding `p` and `m` with `gp_contentid`. If we want to support multiple mirrors, then we can have `m1`, `m2`, etc.

How on mirror to know its 1 or 2 ?

I haven't thought further on how to identify m1 or m2. As now, we don't support multiple mirrors, maybe just `p` and `m` are good enough.

Now back to our scenarios without `gp_dbid` and `gp_num_contents_incluster`:

- Adding a mirror through pg_basebackup: gp_contentid is enough

- Create tablespace: need a way to uniquely identify each instance. Maybe master can add an unique path during dispatch?

- Gpstart: gp_contentid is enough.

- backup/restore: gp_contentid is enough.

I see a lot of references to `GpIdentity` of `GpId` type, which stores the `gp_dbid` and `gp_num_contents_in_cluster` startup parameter values. I am wondering about what's the fundamental MPP design assumptions on those two parameters from the `GpIdentity`, and whether we can totally remove them from the code base.

If we cannot remove them totally, the question is WHEN to initialize them. I can see a lot of benefits of removing them from the startup time. As Ashwin pointed out, a UDF can be used to transfer master from utilty mode to dispatch mode to avoid master restart.

If we can piggyback on that UDF to also `configure` the segments with its proper `GpIdentity.dbid` and `GpIdentity.numsegments`, then it's probably doable without impacting much of existing functionalities (including tablespace).

Thoughts?

-- 
Shin
Pivotal | Sr. Principal Software Eng, Data R&D

Ashwin Agrawal

unread,
May 24, 2018, 1:41:15 PM5/24/18
to Xin Zhang, Max Yang, Greenplum Developers
On Thu, May 24, 2018 at 9:12 AM, Xin Zhang <xzh...@pivotal.io> wrote:

If we want to remove the `gp_dbid`, then we can follow the path of adding `p` and `m` with `gp_contentid`. If we want to support multiple mirrors, then we can have `m1`, `m2`, etc.

How on mirror to know its 1 or 2 ?

I haven't thought further on how to identify m1 or m2. As now, we don't support multiple mirrors, maybe just `p` and `m` are good enough.

Even adding `p` or `m` doesn't work, as role of segment keeps flipping between primary and mirror. Essentially a unique number is needed which sticks with the segment irrespective of its role. Plus this being on-disk change we need to think through long term to get it right.

 
I see a lot of references to `GpIdentity` of `GpId` type, which stores the `gp_dbid` and `gp_num_contents_in_cluster` startup parameter values. I am wondering about what's the fundamental MPP design assumptions on those two parameters from the `GpIdentity`, and whether we can totally remove them from the code base.

I had looked into all the references of gp_dbid and GpIdentity.dbid. And except tablespace path generation it has no real usage on segments at all.
 

If we cannot remove them totally, the question is WHEN to initialize them. I can see a lot of benefits of removing them from the startup time. As Ashwin pointed out, a UDF can be used to transfer master from utilty mode to dispatch mode to avoid master restart.

If we can piggyback on that UDF to also `configure` the segments with its proper `GpIdentity.dbid` and `GpIdentity.numsegments`, then it's probably doable without impacting much of existing functionalities (including tablespace).

numsegments can be dispatched with the query, as needed only on primaries and not on mirrors. So, only real hurdle is unique tablespace path generation which remains consistent for the lifetime of a segment irrespective of role, once that is achieved removing dependency of these params is cake walk. One possibility could be some file which persist the unique number used in tablespace path, which gets created when primary or mirror is created. Still continuing to think of better alternatives to avoid collision of tablespace path for primary/mirror pair. Ideally once solution is found, same can be proposed upsteam as well.

Xin Zhang

unread,
May 24, 2018, 4:34:37 PM5/24/18
to Ashwin Agrawal, Max Yang, Greenplum Developers
Yeah, walking down that path, then a unique id is required when a postgres instance join the GPDB cluster, and that unique id should be persistent with that instance.

However, the unique id doesn't have to be assigned by the master, it could be an GUID.

Again, thinking further, if we can decouple the segment identity from its content and purpose, we will reach much better flexibility.

I hope I am on the right track.

Thanks,
Shin

Ashwin Agrawal

unread,
May 24, 2018, 4:43:44 PM5/24/18
to Xin Zhang, Max Yang, Greenplum Developers
On Thu, May 24, 2018 at 1:33 PM, Xin Zhang <xzh...@pivotal.io> wrote:
Yeah, walking down that path, then a unique id is required when a postgres instance join the GPDB cluster, and that unique id should be persistent with that instance.

However, the unique id doesn't have to be assigned by the master, it could be an GUID.

Again, thinking further, if we can decouple the segment identity from its content and purpose, we will reach much better flexibility.


Correct Xin, that's the intent. After discussion more with Asim, will be exploring the idea of adding timestamp to tablespace path. As ideally uniqueness should only be required while creating the tablespace. After that should always be found using symbolic link from data directory. So, having `_contentid_timestamp` feel should give us the uniqueness needed. I haven't looked into the full details yet so that proposal is pretty much educated guess as of now.

Xin Zhang

unread,
May 29, 2018, 1:29:08 PM5/29/18
to Ashwin Agrawal, Max Yang, Greenplum Developers
Wonderful. I am glad I am on the same page. 

However, even a timestamp cannot guarantee the uniqueness across the cluster. GUID or timestamp or maybe hostname and port, and maybe SHA of something unique should be able to help us uniquely identify the instance.

Have fun, and looking forward to seeing we make progress in this area.

Thanks,
Shin

Scott Kahler

unread,
May 30, 2018, 4:00:21 PM5/30/18
to Xin Zhang, Ashwin Agrawal, Max Yang, Greenplum Developers
Would creating a GUID during initsystem help with this. Having a unique identifier for a given system could be advantageous in other ways
--

Scott Kahler | Pivotal, Greenplum Product Management  | ska...@pivotal.io | 816.237.0610

Ashwin Agrawal

unread,
May 31, 2018, 12:35:07 PM5/31/18
to Scott Kahler, Xin Zhang, Max Yang, Greenplum Developers
On Wed, May 30, 2018 at 1:00 PM, Scott Kahler <ska...@pivotal.io> wrote:
Would creating a GUID during initsystem help with this. Having a unique identifier for a given system could be advantageous in other ways

We do already generate unique identifier today.

aagrawal@aagrawal-MacBookPro:~/workspace/gpdb$ pg_controldata gpAux/gpdemo/datadirs/qddir/demoDataDir-1/
pg_control version number:            9030600
Catalog version number:               301805171
Database system identifier:           6561539402119501787
Database cluster state:               in production
pg_control last modified:             Thu 31 May 2018 09:26:59 AM PDT

Let me restate the problem, using content id we can differentiate between different primaries today. Only issue is differentiating between primary and mirror pair on same host, which is development and testing issue only. On production its not an issue as primary and mirror will never be on same host. Now since mirror gets created from primary it inherits everything from primary today including this unique identifier. So, we need to introduce extra step to generate some unique number for mirror which is then effectively same as dbid today.

Xin Zhang

unread,
Jun 1, 2018, 12:01:28 PM6/1/18
to Ashwin Agrawal, Scott Kahler, Max Yang, Greenplum Developers
Seems like the only thing blocking us to remove the dbid is the unique way to identify the postgresql instance for tablespace creation.

Then, maybe we can update the tablespace creation implementation to always generate GUID for each postgresql instance?

Overall, I vote for reducing commandline parameters to just the gp_contentid.

Thanks,
Shin
Reply all
Reply to author
Forward
0 new messages