Jian Guo

unread,

Jan 19, 2023, 3:21:43 AM1/19/23

to Greenplum Developers

Background

In vSphere environment there is by default no guaranteed network bandwidth between virtual machines. Under heavy workload, sockets created between VMs will die. When those sockets die, it’s important to handle the socket failure appropriately in the product.

Interconnect Sockets Lifecycle

Socket Creation

Currently, the interconnect sockets creation happened in cdb_setup() , which happened in the MPP initialization part of InitPostgres(). In the motion layer IPC subsys initialization part of cdb_setup(): InitMotionLayerIPC(), it calls InitMotionUDPIFC() or InitMotionTCP() according to the interconnect type, inside them setupUDPListeningSocket() would be called seperately twice for the both listening sockets and the sender sockets.
Socket descriptors

The socket descriptors are saved for later reference to the sockets created.
- TCP_listenerFd / UDP_listenerFd
- ICSenderSocket (Just for UDP, initialized to -1)
Socket Close

On the exit of the main process, cdb_cleanup() would be called to performs all necessary cleanup required, where CleanUpMotionLayerIPC() is performed to close all sockets (by CleanupMotionTCP() or CleanupMotionUDPIFC() according to interconnect type).

Enhancement to socket robustness

Currently by the design, the sockets would be created only once and no error handling when failure happens. We can enhance the robustness of the IC sockets in case of socket failures (e.g. limited network bandwidth or else).

Start a background thread as a socket state checker, and checks the socket state periodicly.
The check can be done by the getsockopt() or ioctl() system call. For the getsockopt() syscall, we can use a general socket option such as SO_ACCEPTCONN passed in to check if the socket is ready for accepting connections. If the socket is in invalid state, and the socket fd is a normal one, recreate the sockets by the checker thread. For the TCP protocol specially, we can use TCP_INFO option to access the tcpi_state, which is a member of kernel struct tcp_info.
When cleanup operation is performed, the socket file descriptor should be reset to an invalid value, so the checker would know it’s closed intentionally, and would not recreate the sockets.
May consider adding some GUCs to control the checker thread, such as the checker period interval, and the Error types to handle, etc.

Jian Guo

unread,

Jan 20, 2023, 4:33:00 AM1/20/23

to gpdb...@greenplum.org

[Proposal] Enhancement to Interconnect sockets robustness

Shine Zhang

unread,

Jan 23, 2023, 1:12:58 PM1/23/23

to Greenplum Developers, Jian Guo

Thank you very much for putting the proposal together.

I am looking forward to this feature to bring the interconnect robustness to the next level, and cannot wait to test it out.

Thanks

Shine

Shine Zhang

unread,

Jan 23, 2023, 2:13:46 PM1/23/23

to Greenplum Developers, Shine Zhang, Jian Guo

One more thing, when we add another thread as a socket state checker, will that be one per backend, one per segment, one per slice?

I am just wondering about how many threads we are going to create for each connection, and also during the check what will be additional overhead to the cluster.

I tried to understand under a very high load of the system where we hit the upper limit of the possible sockets to be opened, what's the impact of this monitoring thread to this limit? My goal is to introduce no regression, so that any customer who is running close to the upper limits of sockets still be able to run after the upgrade.