Question on configuring Vault health checks

763 views
Skip to first unread message

mlap...@newrelic.com

unread,
Mar 7, 2016, 2:18:54 PM3/7/16
to Consul
Hello Consul friends!

I'm in the middle of deploying a Vault cluster, backed by a Consul cluster for the storage backend. Initially I was fronting my Vault cluster with an AWS ELB but after talking to Jeff Mitchell, I've decided to switch to using Consul for service discovery. 

I'm using a custom CA (created using Certstrap by Sqaure) for all the TLS server & client certificates. 

I've configured the service definition for Vault as follows:

{
        "service":{
         "id": "vault1",
         "name": "vault",
         "tags": ["secrets","development","aws"],
         "address": "<IP Addr>",
         "port": 8202,
         "check": {
          "id": "api",
          "name": "HTTP API check on port 8202",
          "http": "https://<IP Addr>:8202/v1/sys/health",
          "interval": "5s",
          "timeout" : "2s"
          }
        }
}

However, when I bounce my Consul node, I receive the following error:
2016/03/07 11:08:52 [WARN] agent: http request failed 'https://10.0.4.238:8202/v1/sys/health': Get https://10.0.4.238:8202/v1/sys/health: x509: certificate signed by unknown authority

In my Consul server config, I've configured the ca_file to point to the public certificate for my custom CA, and that appears to be working as data is synced between the nodes in my Consul Cluster.

Is there a way to specify a custom CA certificate for health checks or does Consul fully depend on the underlying OS for determining valid root certificates?

Cheers,
Matthew

Michael Fischer

unread,
Mar 7, 2016, 6:11:32 PM3/7/16
to consu...@googlegroups.com
Consider using a process-based check using curl instead of the builtin http check.  That way you have complete control over the check behavior, including the CA certificate used by the client.

Best regards,

--Michael

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/ec1b5ed8-552f-4640-b966-d83e99a9eb0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Phillips

unread,
Mar 8, 2016, 1:29:10 PM3/8/16
to consu...@googlegroups.com
Hi Matthew,

Currently the ca_file configuration doesn't make it into the client that's used to perform HTTP health checks, it is only used for RPC communication between Consul agents.

If you can install your custom CA's certs in a system location Go should respect that and start using it. Michael's suggestion of using a script check that calls curl is a good one, too. Hope that helps!

-- James

Marcin Wielgoszewski

unread,
Mar 9, 2016, 11:08:28 AM3/9/16
to Consul
Matthew,

At my current company, we ran into the same issue trying to use a Consul HTTP health check for Vault. We worked around this by initializing a separate tcp listener on localhost:8201 with TLS disabled, and configured the HTTP health check to hit http://localhost:8201/v1/sys/health.

Regards,
Marcin

mlap...@newrelic.com

unread,
Mar 11, 2016, 5:58:07 PM3/11/16
to Consul
Thanks for the suggestions folks!

I think I'm going to pursue the curl/script approach Michael suggested. For this approach, is it sufficient to simple return the HTTP status code from the script or does consul need other data or have the responses in a specific format?

Thanks,
Matthew

James Phillips

unread,
Mar 11, 2016, 6:08:49 PM3/11/16
to consu...@googlegroups.com
Hi Matthew,

For a script check Consul only cares about the exit code, and it just captures the output for operators to inspect. Exit code 0=passing, 1=warning, and anything else is failing, so you'll want to map your curl return into those.

-- James

Michael Fischer

unread,
Mar 11, 2016, 6:54:51 PM3/11/16
to consu...@googlegroups.com
The '-f' option argument to curl will make it return non-zero in the event the HTTP server returns a non-2xx response.

--Michael

On Fri, Mar 11, 2016 at 2:58 PM, <mlap...@newrelic.com> wrote:

mlap...@newrelic.com

unread,
Mar 16, 2016, 1:45:25 PM3/16/16
to Consul
Hey Friends,

Working on getting the script check working and I'm running into some problems. Initially I created a ruby script to call the endpoints on Vault and return 0 for an HTTP 200, 1 for an HTTP 429, and so on. When I'd start Consul, it would immediately report that all 3 vault instances were critical. In my first attempt to ensure I'm understanding the workflow correctly, I created the follow Python script and update the health check to call this script:

#!/usr/bin/env python
import sys

def main():
        sys.exit(0)
if __name__ == '__main__':
    main()

However, when launching Consul, it is still reporting that all Vault nodes are critical. Any ideas what I'm doing incorrectly? 

My service definition is defined as:

{
        "service":{
         "id": "vault1",
         "name": "vault",
         "tags": ["secrets","development","aws"],
         "address": "<IP of Vault Node>",
         "port": 8202,
         "check": {
          "id": "script",
          "name": "Ruby health-check script on port 8202",
          "script": "health-check.py",

mlap...@newrelic.com

unread,
Mar 16, 2016, 7:19:06 PM3/16/16
to Consul
Scratch that. I ended up importing the custom CA into my systems root store and using the HTTP checks. I think this is how we'll end up configuring it in production so made the most sense.

Thanks All!
Reply all
Reply to author
Forward
0 new messages