Crazy amount of Vault data in Consul...

770 views
Skip to first unread message

Fishstick Kitty

unread,
Feb 12, 2016, 8:12:31 AM2/12/16
to Vault
Hi Vault Peeps, I am trying to figure out why my AWS cross AZ traffic between Consul servers is about 3TB/day (yes that's a "T") so naturally that raised some flags :).

When I did a KV export from Consul, the file size (unformatted) was about 55MB.  99.9% of that contained entries like this:

{
    "LockIndex": 0,
    "Key": "vault/sys/expire/id/auth/app-id/login/004a096cb2f0c5a15ee4854965aaac41f4f4c9f8",
    "Flags": 0,
    "Value": "AAAAAQJyUHQb0Y7orMhxB1biUh1aWaSdmdoF1xBinyFwaY+YYkPK498ZIdJb9CXnKKsHXo0oR+bBt5PYrmQc4FadOnXwPjszRO0Z2orTk8lxwDtiKnGby+vU22WD73F9kkoHp2mNZdTPfhNh672ib7C1eUCpdSM3uYW7Wvu0eO49+IEDLNv3lF/zbsuS4Y10DjM5EkuVmb7gKEGrEcR62xB/j4OXvwEeRcMgMJKmaHAQwN5XoF0Qo6WPTvRmoJ0xxlb2rppkPVGyc/yEMQjjcu+Koe/h49Ulb4zZ47pO78sdPOrcOL6XejAbjyuKiXQA64+Zwcc5RPXnXXG5eYb5mymtTsrFB8UoByJXUuyhxQ6gSNHBlqdSDJbIczTF+hSNpNLOOSntItFszZjBTzTeJP/HF0tl8HgrvvMOg2McfJCTvkI9ckG7JwGjRzKWIvSq/UMw4nusA17oDl5yoSZEHxUqZdFLL/jP4m0OBYxHhRgfPMabEU4R82Ce7RfEtX0M9de35oLS6aX8hKkrfyGjJYu2AEo9WTXNsBPwBy/wKhS7dxnxBd9RLWVEN8sgD1FE/8eRhWX/lrUIq+OZMc01FzPM7PjxeQ/4zFh9DfsEz0hJ+uQU7illu3zUG8S6RbRJJGIdUz91tqDE14qRgyC5c++ghnm3CFUkDpMF4nnPUtNZivmy0DqV/f0ZYZGD8tm6zbsodJ/UB7qppJ/IyN1lETlJe4YRnx6WiWfZ0wXwHlCsN2D81L+PQGLgb/W50lR+4tmDOVlIAkKf4npiK1WMWL0OUXrGLwlHrNwfCXTRHiDRcg==",
    "CreateIndex": 364357,
    "ModifyIndex": 364357
  },

I am running Vault with the following configuration:

backend "consul" {
  address = "localhost:8500"
  path = "vault"
}

listener "tcp" {
  address = ":8200"
  tls_disable = 1
}

disable_mlock = true
default_lease_ttl = "10s"
max_lease_ttl = "1m"


I modified the default TTL settings about 1 month ago to what you see above.  For many hours, our system is for the most part idle...but the traffic remains consistent among Consul servers.


Question:  I can't tell from the KV data when those entries were created.  Is there a way to just clear all of that out without affecting other data?  Also, is there a way to dramatically reduce this type of activity?

Thanks!!!

Jeff Mitchell

unread,
Feb 12, 2016, 10:03:03 AM2/12/16
to vault...@googlegroups.com
Hi,

Entries like that (under vault/sys/expire) are dynamic leases. Those
are the entries containing the information necessary for Vault to
renew and/or revoke the lease.

You didn't make it clear whether 99.9% was in terms of items or in
terms of file size. You also didn't say how many entries there are
under that Consul prefix. However, based on the fact that you modified
the max_lease_ttl to one minute, my guess is that rather than having
services log in and store a token and re-use it for many Vault calls
you are just having them fetch a token every time they need to call in
to Vault (or maybe your services really are are coming and going that
fast). The default lease in Vault is 30 days, so it's quite reasonable
to see how that would add up very quickly.

That all said, it's impossible to really know from the information you
gave whether 3TB sounds reasonable, rather than just a lot. But as a
thought exercise, let's say that you are generating 1000 leases per
hour. At a bare minimum that's 1000 writes for each lease when
generated and 1000 writes for each lease when deleted. The entry you
pasted above is about 1k, giving you 2MB per hour * 24 * 4 (let's go
for broke and assume a five-node cluster). Not exactly a high number.
In reality there will be a lot more traffic, I'm assuming, because of
Raft replication/leadership handling/etc. But that said, without
knowing more about your usage patterns, how many leases you actually
have, etc., I'm hesitant to put the blame on Vault here.

I'll ask one of the Consul guys about it, but you may just want to
start a thread on the Consul list. They at least will be better able
to help you identify what is actually causing the traffic, as it may
be a pathological condition within Consul rather than anything related
to the number or size of your KV entries.

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/92c1f1ca-166e-4a05-9c8a-cfa8a6bc0cd2%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Fishstick Kitty

unread,
Feb 12, 2016, 10:46:19 AM2/12/16
to Vault
Hi Jeff, thanks for the response.  I did have a post over on the Consul forum a couple days ago (https://groups.google.com/forum/#!topic/consul-tool/6QaAN-dMpRo) and they pointed me over here :).  

For our environment, we are using the latest release of Consul and Vault.  A 5 node cluster over 3 AZs.  The environment has about 15 nodes with around 10 services and somewhere around 50 non-vault KV entries.  I have turned Vault auditing on and our requests to Vault are very small....under 10 in the last 2 hours.

You are correct about our app...we request a new lease when we want something from Vault.  I was under the impression that changing the default TTLs would mean they wouldn't hang around as long in Consul.

I did a count on the number of Vault entries containing "vault/sys/token" (52887 occurrences) and "vault/sys/expire" (52885 occurrences).  I can't tell from the data when these were created...is there a way to determine that?

Jeff Mitchell

unread,
Feb 12, 2016, 11:13:55 AM2/12/16
to vault...@googlegroups.com
On Fri, Feb 12, 2016 at 10:46 AM, Fishstick Kitty <samp...@gmail.com> wrote:
> Hi Jeff, thanks for the response. I did have a post over on the Consul
> forum a couple days ago
> (https://groups.google.com/forum/#!topic/consul-tool/6QaAN-dMpRo) and they
> pointed me over here :).
>
> For our environment, we are using the latest release of Consul and Vault. A
> 5 node cluster over 3 AZs. The environment has about 15 nodes with around
> 10 services and somewhere around 50 non-vault KV entries. I have turned
> Vault auditing on and our requests to Vault are very small....under 10 in
> the last 2 hours.
>
> You are correct about our app...we request a new lease when we want
> something from Vault. I was under the impression that changing the default
> TTLs would mean they wouldn't hang around as long in Consul.

They wouldn't, but there will be a lot more traffic overall, because
rather than just fetching the encrypted value, you need to do a lookup
in the auth backend, generate a token entry, store the expiration
information, then eventually that expiration will hit which will
require revocation.

Generally speaking, getting a new lease any time you want a value out
of Vault is a recipe for trouble. That's why we took out lease
generation from the generic backend in 0.3...people querying values
very often running up huge numbers of leases.

> I did a count on the number of Vault entries containing "vault/sys/token"
> (52887 occurrences) and "vault/sys/expire" (52885 occurrences). I can't
> tell from the data when these were created...is there a way to determine
> that?

You could use the /sys/raw endpoint with a root(/sudo) token and your
knowledge of the key values to decode them and get an idea of the
times. Or you could backdoor your Vault installation with some code to
print them out. You can't just list tokens and leases, for the reasons
you'd expect :-)

From the other thread, you said that you have about a million entries,
but this adds up to 100k. What/where are the other 900k entries? (Or
have a large number timed out since that thread?)

I'm still overall rather baffled, especially given your request count
of 10 in 2 hours. Any chance that at some point, something got in a
bad state and spun in a loop of
authing-to-vault-trying-to-get-entry-something-failed-authing-to-vault....?

--Jeff

Fishstick Kitty

unread,
Feb 12, 2016, 11:31:11 AM2/12/16
to Vault
Thanks Jeff, great suggestion RE /sys/raw.  I decrypted a few (random sample) and they look like below...created in November and December.

Also, my estimate of a million was off...by an order of magnitude :).

---------------------

vault read /sys/raw/sys/expire/id/auth/app-id/login/000e3a99148a431554f56626ac9741d46f249183 
Key   Value 
value{  
   "lease_id":"auth/app-id/login/000e3a99148a431554f56626ac9741d46f249183",
   "client_token":"c5004eef-5172-f0f5-577b-361bc1c89395",
   "path":"auth/app-id/login",
   "data":null,
   "secret":null,
   "auth":{  
      "lease":0,
      "lease_grace_period":0,
      "renewable":false,
      "InternalData":null,
      "DisplayName":"app-id-omaha-job",
      "Policies":[  
         "root"
      ],
      "Metadata":{  
         "app-id":"sha1:e0840350d8725c79a1ac9b2e4dc61ed2893d1a3a",
         "user-id":"sha1:a83ded9f376214f4ac71af27ce0efb87cd3ff813"
      },
      "ClientToken":"c5004eef-5172-f0f5-577b-361bc1c89395"
   },
   "issue_time":"2015-12-04T15:30:02.263404437Z",
   "expire_time":"0001-01-01T00:00:00Z"
}

vault read /sys/raw/sys/expire/id/auth/app-id/login/001dbd56aaf4e87a1003fccd90d1566264c9d251 
Key   Value 
value{  
   "lease_id":"auth/app-id/login/001dbd56aaf4e87a1003fccd90d1566264c9d251",
   "client_token":"e12d4824-c896-4e46-05af-2c4cbb6858be",
   "path":"auth/app-id/login",
   "data":null,
   "secret":null,
   "auth":{  
      "lease":0,
      "lease_grace_period":0,
      "renewable":false,
      "InternalData":null,
      "DisplayName":"app-id-omaha-job",
      "Policies":[  
         "root"
      ],
      "Metadata":{  
         "app-id":"sha1:e0840350d8725c79a1ac9b2e4dc61ed2893d1a3a",
         "user-id":"sha1:1f9ce74ff5b91a5142392c774b2e4a12e710817a"
      },
      "ClientToken":"e12d4824-c896-4e46-05af-2c4cbb6858be"
   },
   "issue_time":"2015-11-26T22:59:53.302001501Z",
   "expire_time":"0001-01-01T00:00:00Z"
}

vault read /sys/raw/sys/expire/id/auth/app-id/login/003b3257c00e71ea42703f3141296a98f1cd99db 
Key   Value 
value{  
   "lease_id":"auth/app-id/login/003b3257c00e71ea42703f3141296a98f1cd99db",
   "client_token":"d18b6c69-e8d6-4095-a529-da75c91f03e4",
   "path":"auth/app-id/login",
   "data":null,
   "secret":null,
   "auth":{  
      "lease":0,
      "lease_grace_period":0,
      "renewable":false,
      "InternalData":null,
      "DisplayName":"app-id-omaha-job",
      "Policies":[  
         "root"
      ],
      "Metadata":{  
         "app-id":"sha1:e0840350d8725c79a1ac9b2e4dc61ed2893d1a3a",
         "user-id":"sha1:1f9ce74ff5b91a5142392c774b2e4a12e710817a"
      },
      "ClientToken":"d18b6c69-e8d6-4095-a529-da75c91f03e4"
   },
   "issue_time":"2015-11-25T21:39:55.092237971Z",
   "expire_time":"0001-01-01T00:00:00Z"
}

vault read /sys/raw/sys/expire/id/auth/app-id/login/7fca46f8c611ecf0ce48d71486b264fa35362646 
Key   Value 
value{  
   "lease_id":"auth/app-id/login/7fca46f8c611ecf0ce48d71486b264fa35362646",
   "client_token":"8e5a884e-bb12-6d94-3652-514f1c794a09",
   "path":"auth/app-id/login",
   "data":null,
   "secret":null,
   "auth":{  
      "lease":0,
      "lease_grace_period":0,
      "renewable":false,
      "InternalData":null,
      "DisplayName":"app-id-omaha-job",
      "Policies":[  
         "root"
      ],
      "Metadata":{  
         "app-id":"sha1:e0840350d8725c79a1ac9b2e4dc61ed2893d1a3a",
         "user-id":"sha1:1f9ce74ff5b91a5142392c774b2e4a12e710817a"
      },
      "ClientToken":"8e5a884e-bb12-6d94-3652-514f1c794a09"
   },
   "issue_time":"2015-11-23T23:14:57.511132204Z",
   "expire_time":"0001-01-01T00:00:00Z"
}

Jeff Mitchell

unread,
Feb 12, 2016, 4:40:05 PM2/12/16
to vault...@googlegroups.com
Hi Fishstick,

Thanks for posting that; now I can tell you exactly why those entries
are still there!

The reason is that they are root tokens, and root tokens do not expire
(we recommend you never give out root tokens outside of the intial
token, and with 0.5 you can generate new root tokens so you can even
revoke the initial one). So they will linger in there forever, until
revoked.

You can revoke them using the Vault API using
https://www.vaultproject.io/docs/http/sys-revoke-prefix.html or, since
you have the ability to walk through Consul and use the /sys/raw
endpoint, you can simply pull the client token out of each entry (be
sure that the lease_id starts with auth/app-id so you don't delete
your initial root token) and then make a revoke call on that token
directly.

Once you get rid of those tokens, it will be very interesting to see
if the Consul traffic does indeed go down. This is definitely not a
slam dunk in my mind, because those entries, since they don't expire,
aren't trying to be revoked. So they should really cause no traffic.

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/763cea1b-eda8-43d2-b639-a7137dbb6100%40googlegroups.com.

Fishstick Kitty

unread,
Feb 22, 2016, 8:10:07 AM2/22/16
to Vault
Hi Jeff, thanks for the response.  So, I am using the app-id auth scheme....when I create an app-id, I issue a command similar to this:  vault write auth/app-id/map/app-id/my-app-id value=root display_name=my-app-id.

Are you saying that the "value=root" is the culprit here?  When that app-id logs in, it is given root tokens?

Thanks!!

Fishstick Kitty

unread,
Feb 22, 2016, 8:18:48 AM2/22/16
to Vault
Nevermind...I see this in the docs:  

Root Policy

The "root" policy is a special policy that can not be modified or removed. Any user associated with the "root" policy becomes a root user. A root user can doanything within Vault.

Fishstick Kitty

unread,
Feb 22, 2016, 1:14:27 PM2/22/16
to Vault
Ok, just to close the loop...I made these changes in 2 environments (dev and test) and now a) my app is not using root policy and b) the amount of outstanding leases is down near zero which is expected.  The traffic moving between my consul servers basically went from crazy levels down to near zero...so that is great from a cost perspective :) (see attached graphs).

Thank you very much for your help!!!
dev_network.png
test_network.png

Jeff Mitchell

unread,
Feb 22, 2016, 1:18:39 PM2/22/16
to vault...@googlegroups.com
Great!

I still find the amount of traffic surprising, but I honestly couldn't
tell you what the issue might be. I'm glad to hear that things are
solved for you, though.

--Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/71814f47-f2ab-44b2-93dc-7f1f66af4fd1%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages