Stuck in bad CA configuration of consul connect

396 views
Skip to first unread message

Nicholas Kisseberth

unread,
Aug 15, 2018, 2:12:43 PM8/15/18
to Consul
I tried to configure consul connect immediately with Vault CA (didnt run with consul CA first).... it seems to fail because of host verification problems (active.vault.service.consul vs real name) so I think there needs to be a tls skip verify option there.... ok fine, I decide to change the address of the vault server to the proper host name so I try to do a "consul ca set-config -config-file fixedca.json" and I get a "Error setting CA configuration: Unexpected response code: 500 (rpc error making call: internal error: CA provider is nil)" which from the consul source code looks like it is trying to perform an orderly transition from one CA to another.... but it doesn't handle a case of the first CA not working. Is there anyway to override the transition or to outright delete the CA configuration in a case like this?

Kyle Havlovitz

unread,
Aug 15, 2018, 6:44:07 PM8/15/18
to consu...@googlegroups.com
There isn't any way to do this currently - it tries to load the existing config so that it can perform a CA rotation and if it can't then it's correct to give an error here. If the CA config never worked here though, then the cluster shouldn't have even finished bootstrapping; in this case the answer isn't to forcibly reset the CA config or anything like that - it's to give it a valid configuration so that Consul can set up the CA when the leader starts up, since it hasn't even gotten that far yet. 

 We will probably need something like a -force option on setting the config in case an external CA gets into a state that makes it no longer valid, though. 

On Wed, Aug 15, 2018 at 11:12 AM, Nicholas Kisseberth <nkis...@gmail.com> wrote:
I tried to configure consul connect immediately with Vault CA (didnt run with consul CA first).... it seems to fail because of host verification problems (active.vault.service.consul vs real name) so I think there needs to be a tls skip verify option there.... ok fine, I decide to change the address of the vault server to the proper host name so I try to do a "consul ca set-config -config-file fixedca.json" and I get a "Error setting CA configuration: Unexpected response code: 500 (rpc error making call: internal error: CA provider is nil)" which from the consul source code looks like it is trying to perform an orderly transition from one CA to another.... but it doesn't handle a case of the first CA not working. Is there anyway to override the transition or to outright delete the CA configuration in a case like this?

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
Community chat: https://gitter.im/hashicorp-consul/Lobby
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/4c4e0b12-7158-4ac0-a14f-ff87149adb5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicholas Kisseberth

unread,
Aug 16, 2018, 9:20:46 AM8/16/18
to Consul
Bootstrapping the CA needs better documentation and error handling. Perhaps I did it wrong (I was kinda guessing), but I just added a connect.json config file with the config for vault CA and restarted the single consul server instance. No obvious error messages and the api queries directly on that server instance showed the ca configuration was set as I indicated. But at that point I think it was unable to vote on a leader and the cluster wasn't able to give consistent reads (which was the consul connect errors I was getting at that point). It should have been more obvious that consul was in a bad state, but consul members looked normal. I finally saw the host verification errors in logs and tried to reset the CA to consul instead of vault on that same node but it gave the nil CA error I wrote about. Restarting the bad server node didn't seem to help (i could remove the connect.json file and it would come up ok, but if i put a new one in with consul as the CA it would give the same nil CA error). In the end shutting down all three server nodes, changing the connect.json config to consul provider and bringing them back up worked. So I've learned a bit, but it should be a bit more straight forward to handle. And you need a tls skip verify option for the vault CA config.  Now on to the seemingly impossible task of using consul connect with nomad running docker tasks. I suspect I am too far on the bleeding edge.


On Wednesday, August 15, 2018 at 6:44:07 PM UTC-4, Kyle Havlovitz wrote:
There isn't any way to do this currently - it tries to load the existing config so that it can perform a CA rotation and if it can't then it's correct to give an error here. If the CA config never worked here though, then the cluster shouldn't have even finished bootstrapping; in this case the answer isn't to forcibly reset the CA config or anything like that - it's to give it a valid configuration so that Consul can set up the CA when the leader starts up, since it hasn't even gotten that far yet. 

 We will probably need something like a -force option on setting the config in case an external CA gets into a state that makes it no longer valid, though. 
On Wed, Aug 15, 2018 at 11:12 AM, Nicholas Kisseberth <nkis...@gmail.com> wrote:
I tried to configure consul connect immediately with Vault CA (didnt run with consul CA first).... it seems to fail because of host verification problems (active.vault.service.consul vs real name) so I think there needs to be a tls skip verify option there.... ok fine, I decide to change the address of the vault server to the proper host name so I try to do a "consul ca set-config -config-file fixedca.json" and I get a "Error setting CA configuration: Unexpected response code: 500 (rpc error making call: internal error: CA provider is nil)" which from the consul source code looks like it is trying to perform an orderly transition from one CA to another.... but it doesn't handle a case of the first CA not working. Is there anyway to override the transition or to outright delete the CA configuration in a case like this?

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
Community chat: https://gitter.im/hashicorp-consul/Lobby
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages