health checking a gRPC-based upstream host?

729 views
Skip to first unread message

weit...@gmail.com

unread,
Jun 27, 2017, 6:14:03 PM6/27/17
to envoy-users
Hi,

We use Envoy to load balancing an upstream cluster of gRPC-based servers. To health checking the servers, we use the gRPC health checking protocol, and the Envoy (client) is configured to use TCP-based health-checking via "send" and "receive" bytes. However, we found the approach wouldn't work because for a unary gRPC method, the server closes the HTTP/2 stream after responding the client request. Then at the next health-checking instant, Envoy re-sends the same "bytes" to the server, using the same stream id, and got network_failure due to TCP connection getting closed. The server closes the TCP connection because it already half closed the stream and does not expect to get a request for the same stream id. So, the Envoy's health checking would alternate between getting a healthiness status and hitting a network_failure.

Is there is a way for Envoy to work with the gRPC health checking protocol?

Thanks a lot for your help!

Weita

Matt Klein

unread,
Jun 27, 2017, 10:53:56 PM6/27/17
to weit...@gmail.com, envoy-users
We have this issue open to define a proper gRPC HC type: https://github.com/lyft/envoy/issues/369

I wasn't aware of the existing HC proto, so thanks for that. I added it to the issue.

Until that is implemented, your best course of action is to run an http endpoint on a different port and use that as the HC port. It's not a great answer but it's what we do currently at Lyft.

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.
To post to this group, send email to envoy...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/3133b938-2d8a-4156-8273-ac4a59eb9513%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Matt Klein
Software Engineer
mkl...@lyft.com

Weita Chen

unread,
Jun 28, 2017, 1:50:21 AM6/28/17
to Matt Klein, envoy-users
Hi Matt,

Thanks for your reply.

To follow up, how can I configure Envoy to health check on a different port, other than the port exposed for the gRPC service? I only see the path parameter in the Health Checking component and didn't see a port parameter. BTW., is there some sample configuration using the approach that I can take a look at?

Thanks a lot!

Jose Nino

unread,
Jun 28, 2017, 10:29:00 AM6/28/17
to Weita Chen, Matt Klein, envoy-users
Hi Weita,
The way we do this at Lyft is having a route table on the gRPC service's envoy configuration that pivots off of content-type; subsequently having gRPC requests go to the cluster with the gRPC port, and non gRPC request go to the cluster with HTTP port. Like this:

Route Table excerpt:

"virtual_hosts": [
  {
    "name": "local_service",
    "domains": ["*"],
    "routes": [
      {
       "timeout_ms": 0,
       "prefix": "/",
       "headers": [
         {
         "name": "content-type",
         "value": "application/grpc"
         }
       ],
       "cluster": "local_service_grpc"
     },
     {
       "timeout_ms": 0,
       "prefix": "/",
       "cluster": "local_service"
     }
   ]
 }
]

Then excerpt of two clusters:

{
  "name": "local_service",
  "connect_timeout_ms": 250,
  "type": "static",
  "lb_type": "round_robin",
  "circuit_breakers": {
    "default": {
      "max_pending_requests": 30,
      "max_connections": 200
    }
  },
  "hosts": [
    {
      "url": "tcp://127.0.0.1:8080"
    }
  ]
},
{
  "name": "local_service_grpc",
  "connect_timeout_ms": 250,
  "type": "static",
  "lb_type": "round_robin",
  "features": "http2",
  "circuit_breakers": {
    "default": {
      "max_requests": 200
    }
  },
  "hosts": [
    {
      "url": "tcp://127.0.0.1:8081"
    }
  ]
}

Hope that helps,
Jose 

Matt Klein

unread,
Jun 28, 2017, 10:29:39 AM6/28/17
to Weita Chen, envoy-users
Currently there is no way to provide an alternate HC port. There is another issue opened on this and we are going to support it in the v2 API: https://github.com/lyft/envoy/issues/439

The way we do this today at Lyft relies on the fact that we run a full mesh of Envoys (meaning we run them in front of all of our gRPC services). This allows us to do this:

Basically, on ingress, we route to either the gRPC port or the HTTP port. HTTP HCs automatically get routed to the HTTP port.

I realize this is suboptimal in cases where people want to do a simpler config with direct HC, and am happy to add direct gRPC health checking, etc. Someone just needs to do the work.

Weita Chen

unread,
Jun 28, 2017, 5:29:45 PM6/28/17
to Jose Nino, Matt Klein, envoy-users
Hi Matt and Jose,

Thanks for your replies.

Is it possible to have the Envoy associated with the gRPC server doing the HTTP to gRPC translation for the healthy checks? We are thinking about having the gRPC server only speaking gRPC, rather than hosting another HTTP/1 or HTTP/2 server. Then, our architecture will look like:

Frontend -> Frontend_Envoy   -----> Backend_Envoy -> Backend

Backend only speaks gRPC and Backend_Envoy will translate the HTTP health_check requests from Frontend_Envoy to a gRPC request.

BTW., is the HTTP health checking request using HTTP/1.1 or HTTP/2?
Thanks.

Weita   

You received this message because you are subscribed to a topic in the Google Groups "envoy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/envoy-users/AWc8Ifd34Ec/unsubscribe.
To unsubscribe from this group and all its topics, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

Matt Klein

unread,
Jun 28, 2017, 7:34:20 PM6/28/17
to Weita Chen, Jose Nino, envoy-users
inline below

On Wed, Jun 28, 2017 at 3:29 PM, Weita Chen <weit...@gmail.com> wrote:
Hi Matt and Jose,

Thanks for your replies.

Is it possible to have the Envoy associated with the gRPC server doing the HTTP to gRPC translation for the healthy checks? We are thinking about having the gRPC server only speaking gRPC, rather than hosting another HTTP/1 or HTTP/2 server. Then, our architecture will look like:

Frontend -> Frontend_Envoy   -----> Backend_Envoy -> Backend

Backend only speaks gRPC and Backend_Envoy will translate the HTTP health_check requests from Frontend_Envoy to a gRPC request.

This is not possible currently. The correct solution here is to add a new HC type which does direct gRPC calls using the proto that you referenced.
 

BTW., is the HTTP health checking request using HTTP/1.1 or HTTP/2?

HTTP/1.1

weit...@gmail.com

unread,
Jul 12, 2017, 8:37:56 PM7/12/17
to envoy-users, weit...@gmail.com, jn...@lyft.com

Hi Matt and Jose,

Another question about the behavior of Envoy when HealthCheck results show all upstream hosts are unhealthy.

I followed your recommendation by setting up a Envoy mesh for both our frontend and backend and setting  a routing table to dispatch gRPC-based requests and http-based health-check requests. So, the system looks like this:

Frontend -> Front_Envoy -------> Back_Envoy ->Backend

When all the backend servers are in the lame-duck mode or are down, the Front_Envoy sees health checks failing. At this point, if there are new requests coming to Front_Envoy, Front_Envoy still send the requests to the upstream.
Is this the intended behavior?
If so, what is the rationale of having Front_Envoy still sending the requests to the upstream vs not sending requests to the upstream in such situation?

Thanks a lot!
inline below

Thanks.

Weita   

To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/3133b938-2d8a-4156-8273-ac4a59eb9513%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Matt Klein
Software Engineer
mkl...@lyft.com

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "envoy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/envoy-users/AWc8Ifd34Ec/unsubscribe.
To unsubscribe from this group and all its topics, send an email to envoy-users...@googlegroups.com.

Matt Klein

unread,
Jul 13, 2017, 11:05:53 AM7/13/17
to Weita Chen, envoy-users, Jose Nino

To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages