grpc-go balancer behavior when there are no backends

1,201 views
Skip to first unread message

aa...@dropbox.com

unread,
May 7, 2018, 2:32:51 PM5/7/18
to grpc.io
Hi,

I had a question about grpc-go's behavior when you use a custom balancer and there are no available backends available. My understanding is with FailFast callopt, the rpc should fail immediately rather then consuming the entire RPC deadline budget. However, that does not seem to be the case. I've modified the helloworld to use the manual balancer https://gist.github.com/aamitdb/7df4bc5c6b8c0bf3b667023f3b779ebf and despite FailFast, the program terminates with `2018/05/07 11:07:03 could not greet: rpc error: code = DeadlineExceeded desc = context deadline exceeded`. 

This seems different than the behavior of the deprecated resolver package. The cause seems to be the RPC is waiting at https://github.com/grpc/grpc-go/blame/master/picker_wrapper.go#L170 but most of the situations blockingCh are written to are only in response to subconn state changing (but we have no subconns). 

Am I missing something about how to use the balancer APIs? Or is the expectation that we always at least call NewAddresses with an empty slice (which seems to resolve it has somewhere in the stack it tries to attempt a connection and fails fast). Attached a log of a run when I pass an empty slice and get the desired FailFast behavior. 

```

$ GRPC_GO_LOG_SEVERITY_LEVEL=INFO GRPC_GO_LOG_VERBOSITY_LEVEL=1000 go run greeter_client/main.go 

INFO: 2018/05/07 11:31:44 parsed scheme: "blatux149k5z"

INFO: 2018/05/07 11:31:44 ccResolverWrapper: sending new addresses to cc: [{ 0  <nil>}]

INFO: 2018/05/07 11:31:44 base.baseBalancer: got new resolved addresses:  [{ 0  <nil>}]

INFO: 2018/05/07 11:31:44 base.baseBalancer: handle SubConn state change: 0xc420172070, CONNECTING

WARNING: 2018/05/07 11:31:44 grpc: addrConn.createTransport failed to connect to { 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: missing address". Reconnecting...

INFO: 2018/05/07 11:31:44 base.baseBalancer: handle SubConn state change: 0xc420172070, TRANSIENT_FAILURE

2018/05/07 11:31:44 could not greet: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: missing address"

exit status 1

```


Thanks,
Ashwin

Menghan Li

unread,
May 23, 2018, 5:33:40 PM5/23/18
to grpc.io
Ashwin,

What you observed is correct.

A bit into details: failfast RPCs fail when the ClientConn is in TRANSIENT_FAILURE, which means, all underlying connections are all in TRANSIENT_FAILURE. But if the resolver never returned any addresses, no connection was ever started, so the connectivity state is not TRANSIENT_FAILURE.

The expectation was, the resolver should return some addresses when the ClientConn starts up. However, this isn't always true because there are cases that the resolver simply cannot resolve the name.
There should be a way for a resolver to report an error back when the name resolution fails. I filed https://github.com/grpc/grpc-go/issues/2102 to track this.

A walkaround today would be to return a fake address that doesn't work, to trigger the connecting and connectivity state change.

In your case, why does your resolver not return any addresses? Is it because of an error? Would adding the error reporting API solve your problem?

Thanks,
Menghan

aa...@dropbox.com

unread,
Jun 18, 2018, 1:46:10 PM6/18/18
to grpc.io
Hey Menghan,

Sorry I missed this reply. It can sometimes return no addresses when there are no backends available for transient reasons, but the expectation is that the RPC would rather fail than consuming the deadline budget. https://github.com/grpc/grpc-go/issues/2102 would be great to address this.

Thanks,
Ashwin

shuwen.u...@gmail.com

unread,
Aug 16, 2019, 3:58:05 AM8/16/19
to grpc.io
rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: missing address"
在2018年5月8日星期二UTC + 8上午2:32:51,aa ... @ dropbox.com写道:
嗨,

当您使用自定义平衡器并且没有可用的后端时,我对grpc-go的行为有疑问。我的理解是使用FailFast callopt,rpc应立即失败,而不是消耗整个RPC截止期限预算。但是,情况似乎并非如此。我已经修改了helloworld以使用手动平衡器  https://gist.github。com / aamitdb / 7df4bc5c6b8c0bf3b667023f3b779e bf尽管有FailFast,程序终止于`2018/05/07 11:07:03无法问候:rpc错误:代码= DeadlineExceeded desc =超出上下文截止时间。 

这似乎与弃用的解析程序包的行为不同。原因似乎是RPC正在等待  https://github.com/grpc/ grpc-go / blame / master / picker_ wrapper.go#L170但是大多数情况下,blockingCh被写入仅仅是为了响应子协会状态改变(但我们没有子外壳)。 

我是否遗漏了有关如何使用平衡器API的信息?或者期望我们总是至少用一个空切片调用NewAddresses(这似乎解决了它在堆栈中的某个地方尝试连接并快速失败)。当我传递一个空切片并获得所需的FailFast行为时附加一个运行日志。 

```

$ GRPC_GO_LOG_SEVERITY_LEVEL = INFO GRPC_GO_LOG_VERBOSITY_LEVEL = 1000 go run greeter_client / main.go 

INFO:2018/05/07 11:31:44解析方案:“blatux149k5z”

INFO:2018/05/07 11:31:44 ccResolverWrapper:向cc发送新地址:[{0   <nil>}]

INFO:2018/05/07 11:31:44 base.baseBalancer:得到新解析的地址:  [{0   <nil>}]

信息:2018/05/07 11:31:44 base.baseBalancer:处理SubConn状态更改:0xc420172070,CONNECTING

警告:2018/05/07 11:31:44 grpc:addrConn.createTransport无法连接到{0   <nil>}。错误:连接错误:desc =“transport:拨打tcp时出错:丢失地址”。重新连接...

信息:2018/05/07 11:31:44 base.baseBalancer:处理SubConn状态更改:0xc420172070,TRANSIENT_FAILURE

2018/05/07 11:31:44无法问候:rpc错误:代码=不可用desc =所有SubConns都在TransientFailure中,最新连接错误:连接错误:desc =“transport:拨号时出错tcp:缺少地址”

退出状态1

```


谢谢,
阿什温
Reply all
Reply to author
Forward
0 new messages