Per Gang’s design, we will implement a HA GPDB cluster which will do fail-over automatically. The work also involves client escalation. For after cluster auto-failover, the client must connect to new master. This step should be done without human intervention. As we know multiple hostaddrs libpq has been introduced into postgres10 by this commit:https://github.com/postgres/postgres/commit/7b02ba62e9ffad5b14c24756a0c2aeae839c9d05#diff-710b105f6095e65526ac264f1694db17. It will allow user set multiple hosts and ports and try each destination until connect to one of them successfully.
Example:
"hostaddr=127.0.0.1,127.0.0.1 port=15432,16432 user=gpadmin dbname=postgres connect_timeout=2 target_session_attrs=any"
It also provide an attribute target_session_attrs with value “read-write” or “any”.
So we want to merge this feature to GPDB and provide two hosts/ports in production environment, master and standby as below.
Master and standby are all available, but the client may fail to connect to master due to network failure intermittently.
In this case, if client connects to master successfully, it is all OK. If the client fail to connect to master, it will try standby instead. But the standby is read only, so this connection fails.
As master is down, so the client can’t connect to master and it will try standby. After standby activated, it can accept read-write connection. So the switching finished automatically.
ODBC is based on libpq, so it will support multi-hosts with latest version.
JDBC has already implemented multi-hosts configuration and can provide more advanced features. Below is document of JDBC.
To support simple connection fail-over it is possible to define multiple endpoints (host and port pairs) in the connection url separated by commas. The driver will try to once connect to each of them in order until the connection succeeds. If none succeed, a normal connection exception is thrown.
The syntax for the connection url is:
jdbc:postgresql://host1:port1,host2:port2/database
The simple connection fail-over is useful when running against a high availability postgres installation that has identical data on each node. For example streaming replication postgres or postgres-xc cluster.
For example an application can create two connection pools. One data source is for writes, another for reads. The write pool limits connections only to master node:
jdbc:postgresql://node1,node2,node3/accounting?targetServerType=master . And read pool balances connections between slaves nodes, but allows connections also to master if no slaves are available:
jdbc:postgresql://node1,node2,node3/accounting?targetServerType=preferSlave&loadBalanceHosts=true
Now the libpq will try another connection only when it fails to connect to the node, or connection timeout, or the node is read-only(if target_session_attrs is set to “read-write”). But if it connects to standby now, it will get the error message “the database system is in recovery mode” and will not try the next. This should be solved.
MPP team.
