failover when Deadline Exceeded

67 views
Skip to first unread message

Damon Lee

unread,
Aug 10, 2022, 4:54:26 AM8/10/22
to grpc.io
Hi, 
I faced the issue that
when the server cannot communicate with infras(db, kafka, ...)
the client receives deadline exceed error and does not run failover as the failover is triggered only when the server is not connectable. 

I want to close the connection with deadline exceed and connect to another server ready for failover. 

As I think there are two options, but not quite great. 
1. error check for deadline exceed for all client grpc codes
2. server close connection faster than deadline exceed 

Is it possible to trigger failover when deadline exceed triggered as the connection closed? what about a custom loadbalancer? 

Mark D. Roth

unread,
Aug 17, 2022, 1:18:12 PM8/17/22
to Damon Lee, grpc.io
It's important to understand the difference between a connection and an RPC.  There are generally many RPCs sent on a given connection, and DEADLINE_EXCEEDED is a failure status for an individual RPC, not for a connection.  In the general case, just because one individual RPC failed does not mean that the underlying connection itself is bad; there might be reasons why that individual RPC failed, but other RPCs on the same connection might succeed.  As a result, it is generally not a good idea for an LB policy to stop using a connection just because one individual RPC failed.

If the failure mode you're worried about here is that the server cannot communicate with some underlying database that it needs to function, then the best approach would be to have the server detect that and tell the client that it is unhealthy, so that the client will stop sending it any traffic.  If you're using an LB policy other than pick_first, you can probably use client-side health checking to do this (see gRFC A17: Client-Side Health Checking).  However, client-side health checking is not supported with the pick_first LB policy, so if you have to use that policy, then another option would be to simply have the server shut down when it is not in communication with its database; if the server is not listening on its port, then clients will not be able to send traffic to it.

Alternatively, you can also try using the new outlier detection LB policy to stop sending traffic to backends that are failing RPCs.  See gRFC A50: gRPC xDS Outlier Detection Support for more info.

I hope this info is helpful.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/4eabfc56-f65a-43f9-9778-82dfc4128a20n%40googlegroups.com.


--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.
Reply all
Reply to author
Forward
0 new messages