Vault TTL expires every 30 second

1,213 views
Skip to first unread message

Tanul

unread,
Aug 14, 2018, 3:06:12 PM8/14/18
to Vault
Hello Team,

I have created a cluster where consul client is connected with Vault and above them there is are consul servers. 

Now every 30 seconds vault ttl is expiring due to which a log is coming in consul again and again that vault missed its ttl and it is critical. This message also publish every 30 seconds. Is there any procedure to keep the vault service check ttl to unlimited time.

Thanks & Regards
Tanul

Jeff Mitchell

unread,
Aug 17, 2018, 12:37:46 PM8/17/18
to Vault
Hi Tanul,

Can you share your configuration of Vault and your server logs?

Thanks,
Jeff

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/088c5255-65ea-4e0e-b009-7541fa859d7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Flora Wong

unread,
Mar 7, 2019, 2:45:26 PM3/7/19
to Vault
Hi Jeff, I am facing the same problem where this is happening only with HA enabled (2 vault instances and 3 consul instances). I am trying out different versions hoping that this problem won't be there. However, every combination of version seems the same. Now, I am using Vault v1.0.3 and Consul v1.4.3 

In addition to the above message "2019/03/07 12:33:30 [WARN] agent: Check "vault:10.244.2.220:8200:vault-sealed-check" missed TTL, is now critical"
I am also seeing the following errors:

from vault:

2019-03-05T22:47:06.455Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (CheckID "vault:10.244.0.49:8200:vault-sealed-check" does not have associated TTL)"

from consul:

    2019/03/05 22:37:35 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.2.97:8200:vault-sealed-check?note=Vault+Unsealed, error: CheckID "vault:10.244.2.97:8200:vault-sealed-check" does not have associated TTL from=10.244.2.97:50372

It looks it this is coming from consul.go:runEventDemuxer(). After looking at the logic, I have turn discovery service off. i.e. disable_registration="true". It stops the spam.  However, I really like to understand this better and wanted to do this right. First, I would like to set the check TTL  time correctly but searching all over, I couldn't find a configurable setting for check_ttl.  Any help would be greatly appreciated!

Jeff Mitchell

unread,
Mar 7, 2019, 3:07:18 PM3/7/19
to Vault
Hi there,

If you are connecting Vault directly to a Consul server, this could be
causing this kind of issue -- service health checks are meant to be
run by the agent that is on the same node as the service. You should
always connect Vault to the local Consul agent. This also helps with
uptime/availability if Consul servers switch since the agent is given
information about which servers exist and which one is leader.

You may also want to stop Vault, manually ensure that all existing
checks are deleted from Consul, then restart, in case there is some
check that got stuck somehow.

Best,
Jeff
> To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/2b6a1fe3-875b-466e-9036-0dc6f10b9f57%40googlegroups.com.

Flora Wong

unread,
Mar 7, 2019, 4:29:18 PM3/7/19
to Vault
Hi Jeff, Thanks for your quick responds!  
  • I am using Kubernetes. In this case, vault + consult agent are deployed in the same pod. 
  • The consul servers are all running in separate pods. 
  • HA deployment seems to be all doing the right thing. i.e. consul leadership is always selected during deployment and remain stable. 
  • Vault instances are joined at the Vault deployment time with one active and one standby. Both vault instances finished unsealed at post install stage.  
  • Health checks are also successful if I ran vault CLI within the container. 
Therefore, I couldn't understand how I got into this situation. It does look like check was failing when it was done to the standby vault node.  Along the code path, when runCheck(sealed) was called, it was calling a function agent.PassTTL() which is marked as deprecated in the Consul/api/agent.go file. (suggested to use UpdateTTL() instead) Not sure if there is any correlation here (source code was fetch today and assumed to be the latest). Will be great if you could shed some light based on the condition I have described here.  Thanks again!

Regards,
Flora
Message has been deleted
Message has been deleted

Flora Wong

unread,
Mar 11, 2019, 12:34:11 AM3/11/19
to Vault
Hi Jeff,

Would you help shed some light on what mistake I have made in my configs as I don't seem to get my vault consul agent to talk to the consul server. This is my vault config:

consulAgent:
  repository: consul
  tag: 1.4.3
  pullPolicy: IfNotPresent
  start_join: [ "consul-consul-0", "consul-consul-1", "consul-consul-2" ]
  client_addr: "127.0.0.1"
  bind_addr: "[::]"
  server: false
  datacenter: "dc1"
  data_dir: "/etc/consul"
  addresses: { "http": "0.0.0.0" }
  ports: { "http": 8500 }
  log_level: "DEBUG"

    listener:
      tcp:
        address: '[::]:8200'
        api_addr: '[::]:8200'
        cluster_address: '[::]:8201'

    storage:
      consul:
         address: "127.0.0.1::8500"
         server: "false"
         datacenter: "dc1"
         client_addr: "0.0.0.0"
         data_dir: "/etc/consul"
         scheme: "http"
         default_lease_ttl: "768h"
         max_lease_ttl : "768h"
         disable_registration: "false"
         path: "vault/"

        env:
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: VAULT_CLUSTER_ADDR
            value: "http://$(POD_IP):8201"
          - name: VAULT_API_ADDR
            value: "http://$(POD_IP):8200"

BTW, this is the way consul agent started in its own container, is there a way to get hold of agent debug log? Or is there any way to test it?

            exec /bin/consul agent \
              $GOSSIP_KEY \
              -join={{- .Values.consulAgent.join }} \
              -data-dir=/etc/consul

netstat result is only showing my vault instances talking to my consul server

tcp        0      0 vault-vault-d9dc877d7-9nf8v:59450 consul-consul-2.consul-consul.xxxxx-sso.svc.cluster.local:8500 ESTABLISHED 

tcp        0      0 vault-vault-d9dc877d7-9nf8v:59448 consul-consul-2.consul-consul.xxxxx-sso.svc.cluster.local:8500 ESTABLISHED 

 


Any help would be greatly appreciated!

Flora

Jeff Mitchell

unread,
Mar 11, 2019, 1:02:33 PM3/11/19
to Vault
Hi Flora,

I don't really know why it would keep timing out like that. You may
want to check with the Consul team as they may have more insight.

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/2c1a350a-2145-4518-817a-aa98c9a77541%40googlegroups.com.

fwong

unread,
Mar 11, 2019, 2:27:34 PM3/11/19
to Vault
Thanks Jeff !  Yes, I am going to check with Consul team as well.

Forgot to mentioned that I have been bringing Vault and Consul up many times to try different things.  So, your earlier comment on making sure all existing checks are deleted from the consul is good.

Is there any possibility that I have both Vault and Vault Consul agent talking to the Consul server?  As I couldn't tell, I did see 2 TCP connections established from the netstat output. I just want to be sure I didn't miss anything from the configuration side.  If you could confirm my configs looks ok?  Really appreciated!

tcp        0      0 vault-vault-d9dc877d7-9nf8v:59450 consul-consul-2.consul-consul.xxxxx-sso.svc.cluster.local:8500 ESTABLISHED 

tcp        0      0 vault-vault-d9dc877d7-9nf8v:59448 consul-consul-2.consul-consul.xxxxx-sso.svc.cluster.local:8500 ESTABLISHED 


Thank you so much!

Jeff Mitchell

unread,
Mar 11, 2019, 2:50:27 PM3/11/19
to Vault
Hi there,

Vault isn't making those connections; Vault is connecting to the local
Consul agent, and the local Consul agent is making those connections.

Best,
Jeff
> To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/a714608b-4392-4433-842d-a544443d77ed%40googlegroups.com.

fwong

unread,
Mar 11, 2019, 3:59:09 PM3/11/19
to Vault
Thanks again!

Shubham Kumar

unread,
Mar 12, 2019, 10:25:50 AM3/12/19
to vault...@googlegroups.com
Hi Jeff , 

I have enabled kubernetes auth in vault and set values like :


vault write auth/kubernetes/config \
    token_reviewer_jwt="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpX---somevalue" \
    kubernetes_ca_cert="-----BEGIN CERTIFICATE----- MIIC6jCCAdKgAwIBAgIBAT ---somevalue"


Now when one of my  init container deployed in openshift tries to resolve the vault token , I get an error like :

{ "errors": [ "Post https://apiserver-kube-service-catalog.apps.shubham-dev.ap-south-1.igtb.digital/apis/authentication.k8s.io/v1/tokenreviews: x509: certificate is valid for apiserver.kube-service-catalog, apiserver.kube-service-catalog.svc, apiserver.kube-service-catalog.svc.cluster.local, not apiserver-kube-service-catalog.apps.shubham-dev.ap-south-1.igtb.digital" ] }

Fri Mar  1 12:00:42 UTC 2019 ERROR - unable to retrieve vault login token: {

  "errors": [

    "Post https://apiserver-kube-service-catalog.apps.shubham-dev.ap-south-1.igtb.digital/apis/authentication.k8s.io/v1/tokenreviews: x509: certificate is valid for apiserver.kube-service-catalog, apiserver.kube-service-catalog.svc, apiserver.kube-service-catalog.svc.cluster.local, not apiserver-kube-service-catalog.apps.shubham-dev.ap-south-1.igtb.digital"

  ]

}


I was thinking that this error is coming from k8's api-server side , But looks like vault itself is trying to validate the certificate with available SAN names present in certificate that is passed in "kubernetes_ca_cert" under auth/kubernetes/config   and throwing this x509 error  ?????

Then  can I disable the certificate check in vault to get rid of this error rather than injecting custom certificates in openshift with my domain name SAN values ????


- Shubham

Reply all
Reply to author
Forward
0 new messages