Vault token expiring (not auto-renewed) between Nomad -> Vault (nomad servers)

1,243 views
Skip to first unread message

Jason M

unread,
Jul 16, 2019, 9:46:06 AM7/16/19
to Nomad
Hi guys

I am running a complete 0.9.1 nomad estate with vault integration and consul, every 30 days or so the vault token expires on my nomad servers and is not renewed automatically, the resulting error when Nomad tries to deploy / execute jobs is (as seen by journalctl -u nomad):

Vault: server failed to derive vault token: failed to create an alloc vault token: Error making API request. URL: POST https://vault.sandbox.local:443/v1/auth/token/create/sandbox-nomad-cluster Code: 403. Errors: * permission denied

I thought this issue was fixed in 0.9.1? the only way to renew the token is to send SIGHUP to the nomad server service or restart the service completely.

Has anyone also run into this issue?

I will double check the GitHub release notes.

Thanks

Jason M

unread,
Jul 16, 2019, 9:53:14 AM7/16/19
to Nomad
Does anyone know if 0.9.2 addresses this issue?

Specifically: https://github.com/hashicorp/nomad/pull/5479 "vault: fix renewal time"

Thanks

Jason M

unread,
Jul 16, 2019, 11:39:36 AM7/16/19
to Nomad
I've just been looking at the nomad-0.9.1 source code and it seems to stipulate that the nomad CLIENT is responsible for renewing the token and secrets NOT the server which would explain why mine has been failing as I've been running a mixed 0.9.1 server / 0.8.x client environment for a couple of weeks.

I've just upgraded the clients to the same.

Chris Raborg

unread,
Jul 16, 2019, 7:30:08 PM7/16/19
to Nomad
Hey Jason - based on my understanding of Nomad (my team's been using it for a couple of years now), the servers are the ones that are vault-aware - not the clients.
You should see a vault token in their configuration files (possibly under /etc/nomad/server.hcl). 

Now that being said I think I've run into a similar if not the same issue as you. To my knowledge all non-root tokens that are generated by Vault have a TTL - even if they are renewable (see here). That means your token backend may have a max_lease_ttl of 30d, so when you hit that limit your token cannot be renewed at all. To my knowledge, Nomad does not have the built-in ability to generate new root tokens for itself - that would be something that the vault agent would do (if you have one setup). 

Lang Martin

unread,
Jul 17, 2019, 11:30:08 AM7/17/19
to Nomad
Jason, your understanding is correct, it's the client that renews the secrets. PR 5479 that you referenced before is important, before that fix it was easy for a stale token to be retained by the client around the time of its expiration. Upgrading to 0.9.1 clients should fix the permanently stuck token problem, but 0.9.3 has some significant improvements and includes the renewal time fix.


On Tuesday, July 16, 2019 at 11:39:36 AM UTC-4, Jason M wrote:

Lang Martin

unread,
Jul 18, 2019, 12:43:45 PM7/18/19
to Nomad
Chris, the nomad server has vault configuration and gives the clients derived tokens, but the nomad clients use cubbyhole to obtain the final tokens used by child processes, and refresh the tokens.

Chris Raborg

unread,
Jul 18, 2019, 8:46:46 PM7/18/19
to Nomad
Thank you for the clarification! I think that helps put this issue into context more and in fact I think I've experienced the same issue in my cluster. I incorrectly assumed that one of my server's vault tokens had expired, when it seems like one of the client's tokens did instead. 

Jason M

unread,
Aug 1, 2019, 1:49:37 PM8/1/19
to Nomad
Hi guys

I just wanted to apologize as I was sure I had replied to this thread and thanked you both for your input!

Anyway thanks for the clarification and replies, that helped a lot!
Reply all
Reply to author
Forward
0 new messages