HttpClient - DNS refresh after TTL expiration

3,010 views
Skip to first unread message

ionut rusu

unread,
Apr 20, 2017, 12:08:39 PM4/20/17
to vert.x

Hi everyone,

We've started to build a reverse proxy using Vert.x (based on HttpServer and HttpClient) and we found a limitation for our use-case.

Our backend service is behind an AWS ELB which uses a CNAME and which periodically updates the IP list that the cname is resolved to (e.g. when an instance from the ELB is marked for retirement, it will first be taken out from the IP list of the ELB CNAME and then it will get unavailable)

In the case of HttpClient (configured with HttpClientOptions.setKeepAlive(true)) it seems that the IP list that the host is resolved to is not refreshed, it will remain constant.

Do you have any suggestion for a configuration or workaround for this ? The only workaround that we found is to recreate periodically the HttpClient instance, this way the host will be re-resolved.

Thanks,
Ionut

Julien Viet

unread,
Apr 20, 2017, 1:27:54 PM4/20/17
to ve...@googlegroups.com
can you see where this is cached ? in the client pool itself or in the dns resolver of vertx ?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/fafb218a-1c43-478e-8e0e-698f57b310d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ionut rusu

unread,
May 8, 2017, 7:52:41 AM5/8/17
to vert.x
I had a look, and the DNS caching is done in the Netty layer, somewhere in DefaultDnsCache class, there is a timer attached to the TTL of the record - and the record is removed from the cache list after ttl expires, but the tcp connections that are using that entry still remain active.

In the case of an ELB endpoint, an IP can be removed from the list, and the machine that hosts that ip may be taken offline after a while. 
Since we're using keepalive, the HttpClient will continue to talk to that ip and eventually get into a failure - even if that ip has been advertised as being taken offline.

We found some workarounds for this:
1. Periodically recreate the HttpClient - resulting in tcp connections being re-established to the correct IPs
2. Implement a custom resolver, and recreate the HttpClient only when we detect that the hostname is being resolved to a different list of IPs

Thoughts :) ?

Ionut

Tim Fox

unread,
May 8, 2017, 8:59:18 AM5/8/17
to vert.x


On Monday, 8 May 2017 12:52:41 UTC+1, ionut rusu wrote:
I had a look, and the DNS caching is done in the Netty layer, somewhere in DefaultDnsCache class, there is a timer attached to the TTL of the record - and the record is removed from the cache list after ttl expires, but the tcp connections that are using that entry still remain active.


A TCP connection doesn't know anything about DNS, it only knows about the IP Address and ports of the source and target, consequently, if the DNS record which was used to obtain the original target ip address changes, it's not possible to "update" the IP for the connection, it wouldn't make any sense. 

Julien Viet

unread,
May 8, 2017, 9:11:12 AM5/8/17
to ve...@googlegroups.com
Hi,

have you looked at the AddressResolverOptions which have a cache setting ? the actual default value is very large, have you tried to use a smaller value ?

Julien

Julien Viet

unread,
May 8, 2017, 9:18:24 AM5/8/17
to ve...@googlegroups.com
As Tim, said the HttpClient connections don’t know about these changes once connections have been established.

One possibility is to resolve yourself the DNS address and open the connection using the resolved IP address and not the DNS name in the HttpClient.

Julien




Julien Viet

unread,
May 8, 2017, 9:19:13 AM5/8/17
to ve...@googlegroups.com
I mean use the resolved IP in HttpClient#get(port, ip, …) instead of the host name

Jason Copeland

unread,
May 8, 2017, 2:59:30 PM5/8/17
to vert.x
If the issue is simply a keep-alive connection to an IP on the backend that will go away, ideally the ELB would inject a Connection: close response to close keep-alive connections.  If they aren't, seems like a miss on their end.  HttpClientOptions does have setIdleTimeout(), which can be used to clean up connections if they aren't used for a period of time, but if the connection is constantly being used, it won't get cleared out.  I don't really see any kind of setMaxLiftime() type options, either.  I also don't see an easy way to know how long a connection has been opened.  If you could identify how long it was opened, you could close the connection and obtain a new one, that would effectively allow you to do the maxLifetime() thing.

It would be interesting to see some glue between a resolver implementation and the http pool under the HttpClient stuff, such that the resolver would identify changes (removal) of IP from the result set, and instruct the pool to evict all the entries that have resolved to that particular IP.  I don't immediately see a way to get at the Http*Pool which is private within ConnectionManager.ConnQueue.

Julien Viet

unread,
May 8, 2017, 3:27:27 PM5/8/17
to ve...@googlegroups.com
On May 8, 2017, at 8:59 PM, 'Jason Copeland' via vert.x <ve...@googlegroups.com> wrote:

If the issue is simply a keep-alive connection to an IP on the backend that will go away, ideally the ELB would inject a Connection: close response to close keep-alive connections.  If they aren't, seems like a miss on their end.  

good to know

HttpClientOptions does have setIdleTimeout(), which can be used to clean up connections if they aren't used for a period of time, but if the connection is constantly being used, it won't get cleared out.  I don't really see any kind of setMaxLiftime() type options, either.  I also don't see an easy way to know how long a connection has been opened.  If you could identify how long it was opened, you could close the connection and obtain a new one, that would effectively allow you to do the maxLifetime() thing.

you can maintain it using the underlying HttpConnection obtained from the HttpClientRequest that shall remain the same object and put that in a map that you evict with the HttpConnection closeHandler


It would be interesting to see some glue between a resolver implementation and the http pool under the HttpClient stuff, such that the resolver would identify changes (removal) of IP from the result set, and instruct the pool to evict all the entries that have resolved to that particular IP.  I don't immediately see a way to get at the Http*Pool which is private within ConnectionManager.ConnQueue.

imho we should first identify the real issue, if the ELB is supposed to close the connection that should solve the problem (anyway the connection become unusable and will be evicted from the pool at some point), so if there is an issue it’s more a caching issue in the actual resolver ?

Jason Copeland

unread,
May 8, 2017, 5:21:17 PM5/8/17
to vert.x
HttpClientOptions does have setIdleTimeout(), which can be used to clean up connections if they aren't used for a period of time, but if the connection is constantly being used, it won't get cleared out.  I don't really see any kind of setMaxLiftime() type options, either.  I also don't see an easy way to know how long a connection has been opened.  If you could identify how long it was opened, you could close the connection and obtain a new one, that would effectively allow you to do the maxLifetime() thing.

you can maintain it using the underlying HttpConnection obtained from the HttpClientRequest that shall remain the same object and put that in a map that you evict with the HttpConnection closeHandler
 
Ohhh a most interesting idea!  I personally don't need this behavior, but I'll file this one in the back of my mind for the future.

It would be interesting to see some glue between a resolver implementation and the http pool under the HttpClient stuff, such that the resolver would identify changes (removal) of IP from the result set, and instruct the pool to evict all the entries that have resolved to that particular IP.  I don't immediately see a way to get at the Http*Pool which is private within ConnectionManager.ConnQueue.

imho we should first identify the real issue, if the ELB is supposed to close the connection that should solve the problem (anyway the connection become unusable and will be evicted from the pool at some point), so if there is an issue it’s more a caching issue in the actual resolver ?

 Given all the details thus far it seems like at least from the Vertx side everything is working as it should be.  Things like a connection lifetime limit or even test invocation on idle connections would help, but they would be more feature enhancements.

Julien Viet

unread,
May 9, 2017, 2:25:54 AM5/9/17
to ve...@googlegroups.com
yes connection life time can be helpful, can you open an issue ?


--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

ionut rusu

unread,
May 9, 2017, 3:26:16 AM5/9/17
to vert.x
+1 for connection lifetime 

Jason Copeland

unread,
May 10, 2017, 2:53:48 AM5/10/17
to vert.x
https://github.com/eclipse/vert.x/issues/1977

Then, because it is right in there with something else I care about, I also filed:


For ensuring a minimum number of idle connections.

Thanks!
Jason
Reply all
Reply to author
Forward
0 new messages