**Summary**
There are some inconsistencies in how different parts of the product use the
`hostname` vs the `address` columns in `gp_segment_configuration`. We’d like to
propose an expected use of each column, based on an in-use configuration with
separate networks and hostnames for external traffic and internal, interconnect
traffic.
We would like to solicit GPDB-dev feedback on the proposed design, as well as
ask for help ensuring GPDB tooling is in compliance with the design, if it is
regarded as sound.
**Design Overview**
Let’s consider the following setup of a node of the cluster:
```
+-----------+
| | SDW1-X1
sdw1.local.net | +-----+ 10.10.1.101
172.29.100.1 +----+ | Interconnect
| | Network
Data Center | | SDW1-x2
Network | +-----+ 10.10.2.101
| |
+-----------+
```
and the following `/etc/hosts` file:
```
172.29.100.1
sdw1.local.net sdw1-pa
# ...
10.10.1.101 sdw1-x1 sdw1
10.10.2.101 sdw1-X2
```
This setup has the following conventions:
+ The FQDN resolves to the IP that is routable on the data center network.
+ The FQDN has a short name, `sdw1-pa`, for convenience
+ The interconnect interfaces are resolvable with the short names above, but
only inside of the cluster - they are not routable from outside the cluster
This design isolates traffic in/out of the cluster from bandwidth available to
inside-the-cluster actions. For example - moving a database backup outside of
the cluster should not cause queries to slow down due to network congestion,
because the query uses the interconnect exclusively.
We find that there are three possible addresses to consider:
**Segment address(es)**: The “address” column of `gp_segment_configuration`
Could be used as bound listen addresses for segment servers for security (not
implemented). Any “run per segment” feature should utilize the interconnect, as
each segment might be bound to a specific NIC for optimization (for example,
NUMA locality)
Examples:
+ All database interconnect traffic
+ All replication traffic (gprecoverseg)
**Generic hostname**: The “hostname” column of `gp_segment_configuration`
Can be used for “run once” operations where using the segment address(es) would
cause an undesired “run more than once” per node. Should be a one-to-one
mapping of hostname/ip to node.
Examples:
+ Any “run once” tool, such as `gppkg`
+ GPCC for reporting host statistics
**Fully Qualified Domain Name (FQDN)**: The canonical hostname, used by
services outside the cluster to direct traffic to the cluster
Not in `gp_segment_configuration`. This is useful for services like `gpcopy`,
which might route to from another server, etc… Might be used for as bound
listen addresses for postgres server for security (implemented at least for
master server).
Examples:
+ `gpcopy`
+ `gptranfer`
+ `gpfdist`
+ Routing for external tables
+ UI for the database, perhaps GPCC
+ `pgadmin`
Generic and FDQN hostnames could be routable from inside the cluster - the
important distinction is that external traffic can only be routed via the
external FQDN/devices, and internal traffic may be directed over the
interconnect devices by using the generic hostname, instead of being routed
outside of the cluster via FQDN hostnames and devices.
We should call out that this proposal leaves it up to the sysadmin as to
whether the output of the `hostname` command is the generic hostname or the
FQDN. We would like to find a way to not couple our tooling to that aesthetic
choice.
**Proposed behavior**
Any traffic benefitting from or requiring cluster-specific high network
throughput should pull the IP or hostname from gp_segment_configuration
“address” column in order to optimize network and avoid congestion of the PA
network
Any tools which require a “run once per server” behavior should utilize the
“hostname” column to de-duplicate servers - it will be up to the GPDB designer
to determine if this “hostname” is externally routable or still uses the
interconnect.
External utilities must not assume that the “interconnect” network is
addressable from outside the cluster.
That said, based on the above proposed design, we know of at least two tools
that don’t follow these rules:
1. gprecoverseg routes recovery traffic over the “hostname” network,
not interconnect network
https://github.com/greenplum-db/gpdb/issues/9060
2. gpinitsystem creates a hostfile incorrectly from a provided config file.
https://github.com/greenplum-db/gpdb/issues/9132
Cheer,
Jim & Tyler
--
Jim Doty | R&D Greenplum Building Blocks Team |
jd...@pivotal.io
Tyler Ramer | R&D Greenplum Building Blocks Team |
tra...@pivotal.io