Hi,
I'm running a single node consul as a vault back-end, within docker, both in their own network namespace.
I've got my docker-compose.yml setup to simply start them, and I then have a script to unseal vault...
On first run, everything work as expected, I can set secrets, policies, etc.
Stopping and restarting the containers via docker-compose usually works well as they end of with the same IP as before.
The first time, the consul container is started with (As I don't want to bake the key in the image):
docker-entrypoint.sh agent -server -dc delve1 -bootstrap-expect 1 -client=0.0.0.0 -ui -encrypt $GOSSIP_KEY
... and afterward with simply this (as the key is now kept in consul's data):
docker-entrypoint.sh agent -server -dc delve1 -bootstrap-expect 1 -client=0.0.0.0 -ui
(... N.B. I know gossip encryption on a single node doesn't make much sense for now, but just preparing the runtime for the future ...)
The consul config.json is the following:
{
"key_file": "/var/project/keys/consul/consul.key",
"cert_file": "/var/project/keys/consul/consul.crt",
"ca_file": "/var/project/keys/public_keys/ec-rootCA.pem",
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": false,
"disable_remote_exec": true,
"disable_update_check": true,
"ports": {
"https": 8700,
"http": -1
}
}
Problems happen though if I restart the machine, or simply the daemon (systemctl restart docker).
Doing so tends to shift the container's IP around and, while the DNS address stay the same, it looks like consul gets quite confused:
Sep 8 16:23:46 localhost docker/project_consul_1[32045]: 2016/09/08 20:23:46 [WARN] raft: Election timeout reached, restarting election
Sep 8 16:23:46 localhost docker/project_consul_1[32045]: 2016/09/08 20:23:46 [INFO] raft: Node at
172.17.0.3:8300 [Candidate] entering Candidate state
Sep 8 16:23:46 localhost docker/project_consul_1[32045]: 2016/09/08 20:23:46 [ERR] raft: Failed to make RequestVote RPC to
172.17.0.2:8300: dial tcp
172.17.0.2:8300: getsockopt: connection refused
I also some time also see this:
Sep 8 16:41:01 localhost docker/project_consul_1[10197]: 2016/09/08 20:41:01 [ERR] agent: failed to sync remote state: No cluster leader
Sep 8 16:41:01 localhost docker/project_consul_1[10197]: 2016/09/08 20:41:01 [ERR] agent: failed to sync changes: No cluster leader
Meanwhile, my guess is that it refuses vault's connections, as poking /v1/sys/init to get the status returns:
{"errors":["failed to check for initialization: Unexpected response code: 500"]}
If I restart consul again, the problem stays, but doing a "consul leave", inside the container, will make consul quit, then the container will auto-restart without any issues until next time...
I'm confused as to why this happens since I'm using "bootstrap-expect 1"?
Are there any other settings that can prevent this beside trying to set a static IP in docker or other hacks?
Also, could self-registered vault health check be a problem in the long run or will these warnings just expire:
Sep 8 17:06:43 localhost docker/project_consul_1[18169]: 2016/09/08 21:06:43 [WARN] agent: Check 'vault:240.1.0.5:8200:vault-sealed-check' missed TTL, is now critical
Sep 8 17:06:43 localhost docker/project_consul_1[18169]: 2016/09/08 21:06:43 [WARN] agent: Check 'vault:240.1.0.4:8200:vault-sealed-check' missed TTL, is now critical
Sep 8 17:10:48 localhost docker/project_consul_1[18169]: 2016/09/08 21:10:48 [INFO] agent: Synced check 'vault:240.1.0.4:8200:vault-sealed-check'
Sep 8 17:14:08 localhost docker/project_consul_1[18169]: 2016/09/08 21:14:08 [INFO] agent: Synced check 'vault:240.1.0.5:8200:vault-sealed-check'
Thank you in advance! :)