Discuss: clarify hostname/address and port in GPDB

166 views
Skip to first unread message

Hao Wu

unread,
Feb 17, 2020, 9:22:19 PM2/17/20
to Greenplum Developers, Jim Doty
Hi hackers,

The addresses and port in GPDB are confusing, which causes unexpected behaviors. There are several github issues(#8755, #9060, #9132) about the topic we want to talk about. The confusion has two different points:
1. The semantics of address, hostname in gp_segment_configuration and listen_addresses.
2. Which one takes precedence over others?

----------------------------------------------------------------------
Let's talk about the first confusion.
The semantics of address and hostname in gp_segment_configuration is proposed by Jim Doty(See https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/jim$20hostname%7Csort:date/gpdb-dev/LfusrgthupY/pRSAqNYHBAAJ). I strongly suggest you read the original mail. In short, Jim divides addresses into 3 parts.
1. run once per segment (interconnect, replication)
2. run once per node (one node may have more than one segment)
3. cluster service address (mainly used by external tools)

Unfortunately, gp_segment_configuration and some tools like gpinitsystem don't distinguish them. So replication might use the public address other than the address in private network.
Confusion of the address/hostname could bring bugs in the multi-NIC/address environment.

----------------------------------------------------------------------
Now, we'll talk about the second confusion.
Both ports in gp_segment_configuration and postgresql.conf say 'The TCP port the database server listener process is using'. I'm not sure whether one port always takes precedence over the other one. Two values mean we have at least two ways to get the port value. If the code happens to refer the port value not used by the server, it's a bug.
My point is that we should at least clarify their usage in documents.

For listen_addresses in postgresql.conf, I see its init value is '*'. It's OK, but I'm not sure if it's allowed to change to other values. If it's allowed, what will happen?

----------------------------------------------------------------------
Next:
If we have reached a consensus on the semantics, I propose to have a sheet to track all code/tools and make sure that they use addresses and port correctly.

Thank you

Tyler Ramer

unread,
Feb 18, 2020, 10:31:08 AM2/18/20
to Hao Wu, Greenplum Developers, Jim Doty
Hao,

Regarding your second question, which I think is actually two questions... 
For the first question:
> Both ports in gp_segment_configuration and postgresql.conf say 'The TCP port the database server listener process is using'. I'm not sure whether one port always takes precedence over the other one. 

The ports in gp_segment_configuration are in fact the ports the postgres process binds to for each segment - the values in posgresql.conf and gp_segment_configuration 
should be the same as one another. I believe that gpinisystem should be assigning these although I don't know the exact mechanism. 

Here's a live example however to prove that point:

[root@sdw3 gpseg33]# grep "listener port"  postgresql.conf
port=4009 ##port = 5432                         # sets the database listener port for 

gpadmin=# select * from gp_segment_configuration where content=33;                                      
 dbid | content | role | preferred_role | mode | status | port |        hostname        | address | repli
cation_port
------+---------+------+----------------+------+--------+------+------------------------+---------+------
------------
   35 |      33 | p    | p              | s    | u      | 4009 | sdw3.pivotal.hpc.local | sdw3-x1 |      
       4509
   83 |      33 | m    | m              | s    | u      | 5009 | sdw1.pivotal.hpc.local | sdw1-x2 |      
       5509
(2 rows)


For the second:
> For listen_addresses in postgresql.conf, I see its init value is '*'. It's OK, but I'm not sure if it's allowed to change to other values. If it's allowed, what will happen?

There is no mechanism currently available for utilizing the listen_addresses in postgresql.conf, at least not one governed by a gpconfig style option. 
There was previously a commit that attempted to add this, but the logic used was faulty and I submitted and merged a PR to revert it.


I'd imagine that a user might be able to edit listen_addresses for each segment postgresql.conf, and the default binding should work, but I've not tested this. Without a better
mechanism to set this value, I would not recommend setting it manually on each segment.



Tyler Ramer  |  Software Engineer, Greenplum Building Blocks | tra...@pivotal.io

Hao Wu

unread,
Feb 18, 2020, 9:48:27 PM2/18/20
to Tyler Ramer, Greenplum Developers, Jim Doty
Thanks, Tyler.

The ports in gp_segment_configuration are in fact the ports the postgres process binds to for each segment - the values in posgresql.conf and gp_segment_configuration 
should be the same as one another. I believe that gpinisystem should be assigning these although I don't know the exact mechanism. 

Yes, keeping the two values in gp_segment_configuration and in postgresql.conf equal should be no problem. It's easy to verify which port takes precedence over the other port. We should document this behavior:
1. whether we are allowed to change the port after initialization of the segment, if not, no further effort is needed.
2. if the user wants to change the port, we should make sure the two values be consistent.
The port is not allowed to update via gpconfig. But we could change the port in gp_segment_configuration in some ways, which is dangerous. (another topic on "unbreakable greenplum")


I'd imagine that a user might be able to edit listen_addresses for each segment postgresql.conf, and the default binding should work, but I've not tested this. Without a better mechanism to set this value, I would not recommend setting it manually on each segment.

The default listening address is '*', i.e. listen to all valid addresses. We are not allowed to update this GUC via gpconfig.

The topic seems a mess, but it focuses on the semantics of addresses the segment uses and the port.

Ashwin Agrawal

unread,
Feb 19, 2020, 5:45:05 PM2/19/20
to Hao Wu, Tyler Ramer, Greenplum Developers, Jim Doty

On Tue, Feb 18, 2020 at 6:48 PM Hao Wu <ha...@pivotal.io> wrote:
Thanks, Tyler.

The ports in gp_segment_configuration are in fact the ports the postgres process binds to for each segment - the values in posgresql.conf and gp_segment_configuration 
should be the same as one another. I believe that gpinisystem should be assigning these although I don't know the exact mechanism. 

Yes, keeping the two values in gp_segment_configuration and in postgresql.conf equal should be no problem. It's easy to verify which port takes precedence over the other port. We should document this behavior:
1. whether we are allowed to change the port after initialization of the segment, if not, no further effort is needed.
2. if the user wants to change the port, we should make sure the two values be consistent.
The port is not allowed to update via gpconfig. But we could change the port in gp_segment_configuration in some ways, which is dangerous. (another topic on "unbreakable greenplum")

I am missing why we need to have PORT setting in postgresql.conf file. When starting the cluster, utilities specify PORT value from gp_segment_configuration to pg_ctl on command. We need master to know the segments PORT so gp_segment_configuration definitely needs to store the same. Even today for GPDB6 and forward the value of PORT in postgresql.conf for mirror is wrong as it copies the file from primary. So, primaries PORT value is present on mirror in conf file but gp_segment_configuration passes the correct value. If we eliminate persisting value in conf file for PORT it makes changing the value easier. I understand having the value in conf file helps to have easier pg_ctl command for starts if manually playing with segment starts but utilities always pass PORT value. Removing this redundancy would be better, just have value in gp_segment_configuration.

One of the use cases for changing the PORT value I think is for upgrades, hence the case definitely exist. Making it simpler would be better.

Hao Wu

unread,
Feb 20, 2020, 9:08:31 PM2/20/20
to Ashwin Agrawal, Tyler Ramer, Greenplum Developers, Jim Doty
> I am missing why we need to have PORT setting in postgresql.conf file. When starting the cluster, utilities specify PORT value from gp_segment_configuration to pg_ctl on command. We need master to know the
> segments PORT so gp_segment_configuration definitely needs to store the same. Even today for GPDB6 and forward the value of PORT in postgresql.conf for mirror is wrong as it copies the file from primary. So,
> primaries PORT value is present on mirror in conf file but gp_segment_configuration passes the correct value. If we eliminate persisting value in conf file for PORT it makes changing the value easier.

Thanks, Ashwin.
 
I didn't mean we need to set PORT in postgresql.conf file. As far as I know, gpinitsystem will set the PORT in postgresql.conf file to the port value from gp_segment_configuration for primary. Sorry, I didn't aware that the value of PORT in postgresql.conf file on the mirror is copied from its primary. I thought the PORT in postgresql.conf on the mirror is correct at the very beginning before.

> Removing this redundancy would be better, just have value in gp_segment_configuration.

I agree. Since the PORT in postgresql.conf file on the mirror is not correct at the beginning, we should not touch(no read, no write) the GUC value anywhere.
Maybe we should mark the PORT in postgresql.conf deprecated to avoid misuse accidentally.


Hao Wu

unread,
Mar 3, 2020, 1:26:06 AM3/3/20
to Greenplum Developers
Hi hackers,

I have reorganized the proposal in the attachment. Here are some key points. Please see more details in the attachment.

Proposal 1: The semantics of the address and hostname in gp_segment_configuration

1. address will never be changed after the initialization of the segment, even if we restart the cluster.
2. address will always be resolved to the same single IP address at runtime when it’s a name.

All connections inside the GPDB cluster should use the address, not the hostname.

If the address is an IP address, the IP address is fixed forever.

If the address is a resolvable name, the name itself is fixed. The mapping from the name to its IP address can’t change online. We must shut down the GPDB cluster before re-mapping the address name to a new single IP address.


hostname in gp_segment_configuration is defined to the public network address of the node where the segment is in.

hostname can be used inside or outside the GPDB cluster. It’s usually used to access the node of the cluster, not a specific segment, but there is an exception.



Proposal 2: Try to avoid running out of port
1. Try to avoid one address assigned to too many segments. Assign one address to one segment if possible.
2. The QE process for interconnect/UDP binds to its address, a unicast address, in gp_segment_configuration.


Please leave your comment in the doc.
Thank you
Reply all
Reply to author
Forward
0 new messages