Note that channelz is not necessarily a debugging tool - it can be used for monitoring use-cases as well. But I do admit that it gives you only a snapshot so you have to keep polling to detect addition/deletion.
I have a better idea: you can use a streaming RPC for both heart-beats and node-down event notification to other nodes. You will have to add some business logic in your streaming RPC server side as follows:
- it maintains a list/table of currently active connected clients
- on each new serverCall it uses Grpc.
TRANSPORT_ATTR_REMOTE_ADDR to get the remote IP (example
here) and add it to the list if not already there
- if the call gets dropped/disconnected/completed for any reason (receiving
onHalfClose(),
onCancel() or
onComplete()) then it implies the node is not alive any more and use the same Grpc.
TRANSPORT_ATTR_REMOTE_ADDR trick to get the remote IP to remove that IP from your list and at the same time add an event to an event queue that will be used to inform other nodes of this event (so you do this asynchronously instead of blocking the original thread where you received the serverCall listener event.
Hope that helps.