client nonce mismatch and instance meta-data incorrect is getting annoying!!

1,064 views
Skip to first unread message

Greg Keys

unread,
Sep 12, 2016, 8:37:43 PM9/12/16
to Vault
We are using ec2 login tokens on our CI/CD runners so that we can pull credentials from our vault like NPM private tokens and such.

The problem we are having is that these runners run several jobs one after another and they autoscale if we add too many tasks and jobs at the same time 
so the same runner isn't always asking for an ec2 token from vault.

if we use different nonce's per task then we get a "client nonce mismatch" error if the same machine asks for a token using two different nonce's

so we have set the same nonce for all CI/CD tasks and jobs and enabled allow instance migration. Now the problem is that if the same nonce is used on a different ec2 instance we get this error  "client nonce mismatch and instance meta-data incorrect" which is looks like is the result of pendingTime being after the storedPendingTime

Im at a loss as to how to properly configure this so that we can use ec2 tokens reliably, does this have something to do with the ttl? should I set it to a much shorter time?

Vishal Nayak

unread,
Sep 13, 2016, 10:36:09 AM9/13/16
to Vault
Hi Greg,

> if we use different nonce's per task then we get a "client nonce mismatch" error if the same machine asks for a token using two different nonce's

aws-ec2 auth backend works on Trust On First Use principle. So the client nonce _tries_ to add an extra layer of validation to establish the identity of the caller.
To this regard, tasks hard-coding a constant nonce does not look like the right thing to do.

> so we have set the same nonce for all CI/CD tasks and jobs and enabled allow instance migration. Now the problem is that if the same nonce is used on a different ec2 instance we get this error  "client nonce mismatch and instance meta-data incorrect" which is looks like is the result of pendingTime being after the storedPendingTime

Is it possible that the tasks create a random nonce and store it on the instance's disk which is trying to get the token?
If the task is able to find the nonce at the desired location, it could use it instead of creating a new one.

Also, "client nonce mismatch and instance meta-data incorrect" shows up when the backend has cached the login metadata for a particular instance and if an older identity document is used for a subsequent login.
Not sure why this is happening for the use case you described.
However, I recommend clearing out the identity-whitelist entries to get rid of the problem.

Regards,
Vishal

David Adams

unread,
Sep 13, 2016, 10:53:43 AM9/13/16
to vault...@googlegroups.com
You need to generate the nonce exactly one time for each instance. Do this after the instance starts and before your workers run. If you need to share it among multiple workers, then write it to disk so they can all read it. To avoid storing it on actual disk, you can write it to some tmpfs filesystem like /var/run, which is part of the filesystem, but is just volatile storage.

The way I have been using ec2 auth (to be fair, not for very long), is that I have a script that runs at boot time and once a week (I use the @reboot and @weekly special cron times for this), which checks a location in /etc for a nonce file, and if it's not found creates it. That file is set to mode 400 so only root can read it. Then, the script uses the nonce to authenticate to vault. It gets a token with a 30 day lifetime and writes the token to /var/run/vault-instance-token, with a mode of 640, owner root, group "vault-users". So this token has a 30 day lifetime but will be renewed every seven days and upon any reboots by the @weekly cron. But the nonce file (in /etc) will never be recreated after the first run.

Then I put my application user into the vault-users group, so it can read the vault token, which it uses to access the secrets or create new tokens or whatever.

Shorter TTLs than what I'm currently using are probably more secure. But for now I'm opting for the relative safety of having plenty of leeway to fix/workaround things when problems occur.

The important thing is that your app shouldn't be messing with the nonce. Leave that to a single root-level process.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/0d22d4b7-869e-439e-bb56-ec59ab09b9e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Greg Keys

unread,
Sep 13, 2016, 8:01:29 PM9/13/16
to Vault

Is it possible that the tasks create a random nonce and store it on the instance's disk which is trying to get the token?
If the task is able to find the nonce at the desired location, it could use it instead of creating a new one. 

I would love to do that but we are running the runners in docker containers, im not immediately able to write to disk something that persists, the way the runners work is they start with a clean environment for each task, there is caching and artifacting but I cant control it per instance those span the runners....

I dropped the ttl down to 5 seconds which seems to have helped, I know the shared nonce isnt the best pattern but I'm not sure how else to solve that until I can persist data between tasks which is specific to the instance

David Adams

unread,
Sep 13, 2016, 8:17:10 PM9/13/16
to vault...@googlegroups.com
It sounds like you are using the ec2 auth in a way it wasn't designed to support, so it's not a surprise it isn't working well. The host itself should perform the authentication and then provide the nonce or better the host can create individual tokens for each worker when they are started. There are other authentication options if ec2 auth does not work how you want.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
Message has been deleted

Greg Keys

unread,
Sep 13, 2016, 9:17:43 PM9/13/16
to Vault
Using a dynamic nonce generated from the meta data rather than trying to store it on the instance seems to be working well


On Monday, September 12, 2016 at 5:37:43 PM UTC-7, Greg Keys wrote:

Vishal Nayak

unread,
Sep 15, 2016, 1:51:02 PM9/15/16
to Vault
Hi,

Just FYI, clients can now rely on the backend for generating strong nonces, by not specifying the nonce key during login.
The generated nonce will be returned as part of auth metadata.

Note that explicitly setting nonce to "" has a special meaning!


Best,
Vishal
Reply all
Reply to author
Forward
0 new messages