In vSphere environment there is by default no guaranteed network bandwidth between virtual machines. Under heavy workload, sockets created between VMs will die. When those sockets die, it’s important to handle the socket failure appropriately in the product.
Interconnect Sockets LifecycleSocket Creation
Currently, the interconnect sockets creation happened in cdb_setup() , which happened in the MPP initialization part of InitPostgres(). In the motion layer IPC subsys initialization part of cdb_setup(): InitMotionLayerIPC(), it calls InitMotionUDPIFC() or InitMotionTCP() according to the interconnect type, inside them setupUDPListeningSocket() would be called seperately twice for the both listening sockets and the sender sockets.
Socket descriptors
The socket descriptors are saved for later reference to the sockets created.
TCP_listenerFd / UDP_listenerFd
ICSenderSocket (Just for UDP, initialized to -1)
Socket Close
On the exit of the main process, cdb_cleanup() would be called to performs all necessary cleanup required, where CleanUpMotionLayerIPC() is performed to close all sockets (by CleanupMotionTCP() or CleanupMotionUDPIFC() according to interconnect type).
Currently by the design, the sockets would be created only once and no error handling when failure happens. We can enhance the robustness of the IC sockets in case of socket failures (e.g. limited network bandwidth or else).
Start a background thread as a socket state checker, and checks the socket state periodicly.
The check can be done by the getsockopt() or ioctl() system call. For the getsockopt() syscall, we can use a general socket option such as SO_ACCEPTCONN passed in to check if the socket is ready for accepting connections. If the socket is in invalid state, and the socket fd is a normal one, recreate the sockets by the checker thread. For the TCP protocol specially, we can use TCP_INFO option to access the tcpi_state, which is a member of kernel struct tcp_info.
When cleanup operation is performed, the socket file descriptor should be reset to an invalid value, so the checker would know it’s closed intentionally, and would not recreate the sockets.
May consider adding some GUCs to control the checker thread, such as the checker period interval, and the Error types to handle, etc.