Help understand DNS resolution between client and Nginx Ingress

103 views

Skip to first unread message

Staniel Yao

unread,

Aug 21, 2022, 1:32:33 PM8/21/22

to grpc.io

Hello gRPC community,

I am looking for help on understand my grpc client DNS resolution behavior. Both service and clients are written in Java and deployed in EKS.

My architecture is, my grpc client is using a DNS name which could resolve to 3 different A records via R53. Then the request will go through NLB -> Nginx Ingress Controller (with gRPC enabled) -> my backend. The Nginx is able to load balance the request as long as the connection is establish.

However, since my DNS names could resolve to 3 records, thus 3 different NLBs. The DNS could be resolved from 1 NLB to another only during startup, or service restart, and any followup switch took long delays (83 mins consistently based on my experiment over the last 2 days). I would like to find a way to make this interval shorter to make the R53 DNS load balancing more balanced.

I read some documents and discussion and it's not helping. I think it's because the MAX_CONNECTION_AGE is only affecting the connection between Nginx and backend and doesn't affect the client connection to Nginx itself. Am I right on this? I also try to tune some Nginx config map value such as keep-alive but it's also not helping.

I also enable debug log on client side but there is no outstanding log happened during the switch time. The only thing I noticed is the debug log line's ID and IP address changed during the switch point, like this:

2022-08-21 04:34:38.417 [grpc-default-worker-ELG-1-13] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler:214 - [id: id-new, L:/x.x.x.x - R:my-service-dns/y.y.y.y:443] INBOUND PING: ack=true bytes=1234

So in summary, I would like to get some suggestion to understand:

1: Why my client switch DNS every other 83 mins? It seems a weird magic number as I can not find anything related.

2: What other config could help shorten the interval for triggering DNS resolution switch if my client is talking to Nginx Ingress Controller?

Thanks for any suggestion!

Eric Anderson

unread,

Aug 25, 2022, 1:24:10 PM8/25/22

to Staniel Yao, grpc.io

On Sun, Aug 21, 2022 at 10:32 AM Staniel Yao <yaoli...@gmail.com> wrote:

I think it's because the MAX_CONNECTION_AGE is only affecting the connection between Nginx and backend and doesn't affect the client connection to Nginx itself. Am I right on this?

Correct. The necessary configuration would be in nginx.

I also try to tune some Nginx config map value such as keep-alive but it's also not helping.

I'd expect keep-alive or keep-alive-requests to help. Maybe you are using an older version of nginx before they got rid of the http2-specific configuration?

So in summary, I would like to get some suggestion to understand:
1: Why my client switch DNS every other 83 mins? It seems a weird magic number as I can not find anything related.

I have to imagine that is related to nginx cycling the connection after 10k requests. Multiply 83 minutes by an estimated queries per minute and maybe it will make more sense.

Reply all

Reply to author

Forward

0 new messages