Consul v1.2.0 (and v1.2.1) - service registration via Golang's API package with disabled encryption causes 100-150% CPU usage

105 views
Skip to first unread message

Stanislav N. aka pztrn

unread,
Jul 17, 2018, 6:08:53 AM7/17/18
to consul-tool
Hey list.

I've encountered $subj. That's my setup:

1. Kubernetes (minikube on my machine).
2. 3 Consul instances successfully clustered, from consul:1.2.1 image.
3. Configuration:

{
"connect": {
"enabled": false
},
"encrypt_verify_incoming": false,
"encrypt_verify_outgoing": false,
"log_level": "trace",
"performance": {
"raft_multiplier": 1
},
"verify_incoming": false,
"verify_incoming_rpc": false,
"verify_outgoing": false,
"verify_server_hostname": false
}

4. Using github.com/hashicorp/consul/api like this:

if registerHealthCheck {
serviceMajorVersion.Check = &api.AgentServiceCheck{
Name: serviceMajorVersion.ID + " HTTP check",
HTTP: "http://" + s.localIP + ":" + strconv.Itoa(port) + "/api/v1/healthCheck/",
Method: http.MethodPost,
Interval: "5s",
Timeout: "2s",
}
}

err := agent.ServiceRegister(serviceMajorVersion)
if err != nil {
return err
}

Everything works, service registers and address can be reached via HTTP or DNS requests. The problem with CA bootstraping which I want to disable, as Consul cluster runs in private environment without internet access. On service registering and de-registering it executes a request which push back this error:

2018/07/17 09:45:30 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54062

After first message appeared it starts to eat 100-150% CPU of 4-core host constantly.

consul info output from master (which eats CPU):

agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 39f93f01
version = 1.2.1
consul:
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 172.20.0.4:8300
server = true
raft:
applied_index = 146
commit_index = 146
fsm_pending = 0
last_contact = 0
last_log_index = 146
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:3634851e-5b58-d0ad-3f22-546de4d294c5 Address:172.20.0.4:8300} {Suffrage:Voter ID:ee5204e4-2a30-f83f-3dde-6c2e0d46902d Address:172.20.0.5:8300} {Suffrage:Voter ID:27a3b189-89e6-ec95-0177-003b023fdefb Address:172.20.0.6:8300}]
latest_configuration_index = 1
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 2
goroutines = 12433
max_procs = 2
os = linux
version = go1.10.1
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 4
members = 3
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 5
members = 3
query_queue = 0
query_time = 1

Complete log for register-deregister with TRACE level:

2018/07/17 10:05:07 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54892
2018/07/17 10:05:07 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54888
2018/07/17 10:05:07 [INFO] agent: Synced service "test-192.168.137.144-0"
2018/07/17 10:05:08 [INFO] agent: Synced service "test-192.168.137.144-0-1-1"
2018/07/17 10:05:09 [INFO] agent: Deregistered service "test-192.168.137.144-0"
2018/07/17 10:05:09 [INFO] agent: Deregistered service "test-192.168.137.144-0-1-1"
2018/07/17 10:05:09 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54902
2018/07/17 10:05:10 [INFO] agent: Synced service "test-192.168.137.144-0"
2018/07/17 10:05:11 [INFO] agent: Synced service "test-192.168.137.144-0-1-1"
2018/07/17 10:05:11 [INFO] agent: Deregistered service "test-192.168.137.144-0"
2018/07/17 10:05:12 [INFO] agent: Deregistered service "test-192.168.137.144-0-1-1"
2018/07/17 10:05:12 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54888
2018/07/17 10:05:12 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54886
2018/07/17 10:05:12 [ERR] http: Request GET /v1/agent/connect/ca/leaf/test-192.168.137.144, error: cluster has no CA bootstrapped from=192.168.99.1:54908
2018/07/17 10:05:13 [INFO] agent: Synced service "test-192.168.137.144-0"
2018/07/17 10:05:13 [INFO] agent: Synced service "test-192.168.137.144-0-1-1"
2018/07/17 10:05:14 [INFO] agent: Deregistered service "test-192.168.137.144-0"
2018/07/17 10:05:14 [INFO] agent: Deregistered service "test-192.168.137.144-0-1-1"

How to fix that? Am I doing something wrong here?

-- 
With best regards,
Stanislav N. aka pztrn.
https://pztrn.name
XMPP: pz...@pztrn.name
Matrix: @pztrn:pztrn.name

pba...@hashicorp.com

unread,
Jul 17, 2018, 10:22:20 AM7/17/18
to Consul
Hey Stanislav,

This seems like a bug. Given that Connect is disabled, you shouldn't see any CA requests at all let alone 100% CPU hot loop.

I also don't see you actual service definition (e.g. serviceMajorVersion defintion) - did you skip it in the code you posted, that seems to only be the optional health check added.

If you are trying to register a service that _does_ have connect proxy or native integration while connect is disabled so no CA exists then that is likely the cause of the issue - that's logically not a valid thing, you must have connect enabled on servers to be able to use it!

That said, the fact it turns into a busy loop is obviously a bug caused (I guess) by the edge case of trying to register a connect proxy/service with connect disabled on servers. We should fix that.

In the mean time can you post your service def so we can reproduce it?

Thanks

Paul

pba...@hashicorp.com

unread,
Jul 19, 2018, 6:17:28 AM7/19/18
to Consul
Stanislav for some reason could reply to group and sent the following follow up just to me:

---

Hey, sorry for writing you privately, something very wrong with my mail provider or Google, 3 mails didn't reach list. Sending email that I've tried to send to list previously.

Here is the code:

func (s *Service) RegisterAtConsul(port int, versionMajor string, versionFull string, registerHealthCheck bool) error {
    s.versionFull = versionFull
    s.versionFullDashes = strings.Replace(s.versionFull, ".", "-", -1)
    s.versionMajor = versionMajor

    // Register with consul with only major version.
    agent := s.Client.Agent()
    serviceMajorVersion := &api.AgentServiceRegistration{
        Name:    s.Name + "-" + s.versionMajor,
        Tags:    s.Tags,
        Address: s.localIP,
        Port:    port,
    }

    serviceMajorVersion.ID = s.ID + "-" + s.versionMajor


    if registerHealthCheck {
        serviceMajorVersion.Check = &api.AgentServiceCheck{
            Name:     serviceMajorVersion.ID + " HTTP check",
            HTTP:     "http://" + s.localIP + ":" + strconv.Itoa(port) + "/api/v1/healthCheck/",
            Method:   http.MethodPost,
            Interval: "5s",
            Timeout:  "2s",
        }
    }

    err := agent.ServiceRegister(serviceMajorVersion)
    if err != nil {
        return err
    }

    // Register at consul with full version.
    serviceFullVersion := &api.AgentServiceRegistration{
        Name:    s.Name + "-" + s.versionFullDashes,
        Tags:    s.Tags,
        Address: s.localIP,
        Port:    port,
    }

    serviceFullVersion.ID = s.ID + "-" + s.versionFullDashes

    if registerHealthCheck {
        serviceFullVersion.Check = &api.AgentServiceCheck{
            Name:     serviceFullVersion.ID + " HTTP check",

            HTTP:     "http://" + s.localIP + ":" + strconv.Itoa(port) + "/api/v1/healthCheck/",
            Method:   http.MethodPost,
            Interval: "5s",
            Timeout:  "2s",
        }
    }

    err1 := agent.ServiceRegister(serviceFullVersion)
    if err1 != nil {
        return err1
    }

    return nil
}

Where s.Client - from api.NewClient().

I haven't created AgentServiceConnect structures, do I think it should be disabled by default.

pba...@hashicorp.com

unread,
Jul 19, 2018, 6:23:13 AM7/19/18
to Consul
 Stanislav,

Hmm yeas you are right, there is no `AgentServiceConnect` so nothing should be attempting to fetch certificates at all.

I have a few follow up questions to dig deeper, but at this point I think it would be really useful to move this into a GitHub issue so we can track it as a bug properly. Could you please open one including the original issue here, the full code in the followup, and answers to the following questions please:?

 1. Did you ever have connect enabled on this cluster?
 2. If yes, did you have services with connect enabled registered on this agent?
 3. Please paste (in gist maybe so don't have to worry about length) output of the client agent logs especially capturing a restart (this is where it finds old proxy definitions and attempts to start them)
 4. Please paste the output of `curl -s http://<server IP>:8500/connect/ca/roots`

One possibility is that if you played with connect, it persisted a proxy definition and didn't clean up properly and is now attempting to load that proxy and fetch it's certs even though you no longer have CA enabled.
Reply all
Reply to author
Forward
0 new messages