java LB round-robin has 30 minutes blank window before re-resolve

54 views
Skip to first unread message

eleano...@gmail.com

unread,
Nov 28, 2018, 5:23:13 PM11/28/18
to grpc.io
Here is the test case:

I have implemented my custom NameResolver, and using RoundRobinLoadBalancer in managedChannelBuilder. 

1. initially has 4 instances running (serverA, serverB, serverC, serverD)

2. then kill 2 instances (serverC, serverD), then serverA and serverB continues serving the request

3. then create 2 more instances (serverE, serverF), only serverA and serverB continues serving the request, since the NameResolver::refresh is only triggered due to connection failures or GOAWAY signal.

4. then kill serverA and serverB, there is 30 minutes blank window, that gRPC seems not doing anything, then after 30 minutes NameResolver::refresh is triggered and the messages are served by serverE and serverF. (seems no messaging loss).

Can someone please suggest why there is a 30 minutes blank window, and is there anyway we can configure it to be shorter?

Thanks a lot!

Carl Mastrangelo

unread,
Nov 29, 2018, 4:14:37 PM11/29/18
to grpc.io
Responses inline


On Wednesday, November 28, 2018 at 2:23:13 PM UTC-8, eleano...@gmail.com wrote:
Here is the test case:

I have implemented my custom NameResolver, and using RoundRobinLoadBalancer in managedChannelBuilder. 

1. initially has 4 instances running (serverA, serverB, serverC, serverD)

2. then kill 2 instances (serverC, serverD), then serverA and serverB continues serving the request

Do you mean gracefully shutdown, or just pull the plug?  gRPC has no way of knowing the latter case, which means you need to turn on keep-alives in the channel.
 

3. then create 2 more instances (serverE, serverF), only serverA and serverB continues serving the request, since the NameResolver::refresh is only triggered due to connection failures or GOAWAY signal.

Name resolvers are meant to be push based.   It is expected that some other service will notify your name resolver when new servers enter the pool.   DNS is pull based, so we implemented as a timer based refresh, but it isn't desirable.  If in your custom resolver you pull, then you'll have to use a timer like DNS does. 

eleano...@gmail.com

unread,
Nov 29, 2018, 4:37:59 PM11/29/18
to grpc.io
Hi Carl, 

Thanks for the reply:

1. how do I kill the instances: docker stop the container of the gRPC server.  is this what you meant by 'pull the plug?'
2. when you say: DNS is pull based, so we implemented as a timer based refresh, but it isn't desirable, so you mean the DNS refresh will be called periodically? If so, what is the configuration for the period? is it hard coded in the code (can you please point to me the class) or it is configurable (if so, please also point to me how I can configure it)?

For my project, it is just I extends io.grpc.NameResolver and overwrite getServiceAuthority, start(Listener listener), refresh and shutdown methods. 


Thanks a lot!
Reply all
Reply to author
Forward
0 new messages