.
In grpc 1.5, there is a race condition: when channel goes idle, load balancer is shutdown and sub channels are scheduled to shutdown in 5 sec. If some rpc come and succeeded within this 5 sec, new load balancer and sub-channel will be created.
However, when old sub-channel shuts down, old load balancer will be notified about state change and update channel picker with an empty list of channels.
Now if some rpc is made before channel goes idle again, that rpc will be buffered in delayed transport forever.
Following code reproduces the scenario above:
public static void main(String[] args) throws Exception {
ServerBuilder.forPort(12345)
.addService(new GreeterImpl().bindService())
.build()
.start();
Channel channel = NettyChannelBuilder.forTarget("localhost:12345")
.idleTimeout(1, TimeUnit.SECONDS)
.negotiationType(NegotiationType.PLAINTEXT)
.loadBalancerFactory(RoundRobinLoadBalancerFactory.getInstance())
.usePlaintext(true)
.build();
GreeterBlockingStub stub = GreeterGrpc.newBlockingStub(channel);
stub.sayHello(HelloRequest.getDefaultInstance());
Thread.sleep(5500); // idle mode timer runs after 1 sec, sub-channel will shutdown after 6 sec
stub.sayHello(HelloRequest.getDefaultInstance()); // connection reestablished and rpc succeeded
Thread.sleep(600); // wait for channel shuts down, bad channel picker got set
stub.sayHello(HelloRequest.getDefaultInstance()); // if I make another rpc before channel goes idle, it will never return
}
The bug is fixed in 1.7 because following change protects channel picker from being updated by load balancer after shutdown.