DNS TTL for RDS

159 views
Skip to first unread message

Anshu Prateek

unread,
Nov 20, 2018, 8:07:23 PM11/20/18
to vault...@googlegroups.com
Hi,

We use vault to generate dynamic creds from RDS.

During multi AZ failover, vault fails to get the new DNS resolution for ~30 minutes. This suggests vault is using its own local dns cache within vault.

Is there any way to control this cache / fix this issue without restarting vault?

Sent from my phone

Lowe Schmidt

unread,
Nov 22, 2018, 6:58:40 AM11/22/18
to vault...@googlegroups.com
RDS failover should not change the DNS for the RDS cluster endpoint. 

(I'm assuming AWS here). 

Could you give some mer detail on the scenario, please?


--
Lowe Schmidt | +46 723 867 157


--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/3BCF5A99-3A7C-402D-BCAB-46F73C2B6015%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Anshu Prateek

unread,
Nov 22, 2018, 7:08:59 AM11/22/18
to vault...@googlegroups.com
We create AWS Route53 aliases for RDS AWS DNS names and use the aliases in Vault and our apps.

After multi AZ failover, vault just gets stuck on new creds generation till it times out. This happens about 30 minutes after failover.

Our assumption is that the IP resolution changes in case of multi AZ failover and vault is still resolving it to older IP.

If we do a vault restart, it immediately starts working, further aiding the assumption that there is some DNS cache in play.

We run vault in docker w consul storage ( again another docker) w/ host DNS being managed by dnsmasq cache in yet another docker.

Let me check if the RDS resolution changes in multi AZ failover.

Sent from my phone

Anshu Prateek

unread,
Nov 22, 2018, 7:42:53 AM11/22/18
to vault...@googlegroups.com
Cross checked and confirmed that the IP resolution changes ( the DNS name remains the same - the resolution changes)


Sent from my phone

On Nov 22, 2018, at 5:28 PM, Lowe Schmidt <m...@loweschmidt.se> wrote:

Joel Thompson

unread,
Nov 23, 2018, 2:14:45 AM11/23/18
to vault...@googlegroups.com
This is a little bit outside of my areas of expertise, but the first thing that comes to mind is database connection pooling. Often a persistent connection will be maintained rather than configuring a new one every time. I could see, for example, a scenario where a failover as you have described doesn't actually invalidate the connection in the connection pool but just leaves it to an RDS instance that throws errors when Vault is trying to generate/revoke credentials. From the connection level, it's still a valid connection. I've also seen connection pool implementations where the underlying TCP connection gets closed but the pooling mechanism doesn't detect that and will return the invalid connection back to the caller, and it's only when the caller tries to use the connection and it errors out that it gets removed from the pool.

I could totally see a scenario where Vault uses a connection pool, the RDS failover doesn't proactively flush the pool, and It's only when all the old, invalid connections get removed from the pool (through whatever means) does Vault attempt to create a new connection which then triggers a fresh DNS resolution.

A few questions. What RDS engine are you using? Are you seeing it as exactly 30 minutes, even under varying load? And what is the exact error message you're getting?

Thanks,

--Joel

Anshu Prateek

unread,
Nov 23, 2018, 2:52:27 AM11/23/18
to vault...@googlegroups.com
It’s MySQL RDS.

30 minutes was our observation in last prod controlled failover, so it was under load ( though was done in low traffic time but still was loaded enough ). It would be tentative 30 minutes though.

Will check on the connection count in RDS monitoring to validate this theory.

Don’t have the exact error message, will check next week but as much as I remember it times out.

Sent from my phone

Daniel Scott

unread,
Jun 18, 2020, 4:38:31 AM6/18/20
to Vault
We're seeing a similar issue with vault registration of an RDS database where it is a 're-registration' of a previous database. (This is during load testing where I'm re-creating the RDS database and vault registration in a loop). This is what I believe is happening.

We create an RDS instance (Either MySQL or Postgres).
We register the database in vault, but the DNS has not yet been created/propagated, so vault (via python hvac library) throws a 'no such host' error.

This would be OK, because we can catch that error, and retry, but even after retrying for 5 minutes (the TTL is 60s), vault still appears to have cached the missing host record and is unable to register the database.

I believe that this is partly caused by the previous vault registration which is still exists. So is it possible that there are other vault processes which are still trying to connect to the previous DB instance, causing the missing record cache? It's strange that vault appears to detect that the dns record has been deleted, when the RDS instance is deleted, but doesn't seem to detect the new record, once the new instance has been created.

Is there anything special about vault's DNS config? Or does it just use the system resolver?
It’s MySQL RDS.
To unsubscribe from this group and stop receiving emails from it, send an email to vault...@googlegroups.com.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault...@googlegroups.com.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault...@googlegroups.com.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages