You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Consul
When I deploy my Sandbox environment that consists of ~150 Docker containers I use one Consul agent (client mode) to utilize VM's resources more efficiently.
There are 3 consul servers running in a cluster that work with the agent. Both agent and servers are running in the official 0.9.3 Docker containers.
It was all good, but recently I started getting timeout error when I register new services. Meanwhile Consul agent runs and looks normal, it writes logs that show that it performs health checks and syncs all the info with the servers.
Agent's logs:
```2017/12/27 15:13:45 [ERR] consul: RPC failed to server 172.33.0.254:8300: rpc error: rpc error: timed out enqueuing operation
2017/12/27 15:13:45 [ERR] http: Request PUT /v1/catalog/register, error: rpc error: rpc error: timed out enqueuing operation from=172.33.1.38:32894```
I was wondering how many health checks can one consul agent perform. Is the amount of services related to the issue?
`consul info`
```agent:
check_monitors = 0
check_ttls = 23
checks = 101
services = 273
build:
prerelease =
revision = 112c060
version = 0.9.3
consul:
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 16
goroutines = 395
max_procs = 16
os = linux
version = go1.9
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 4
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 38
members = 28
query_queue = 0
query_time = 30```
You can find Consul server leader's logs attached.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to consu...@googlegroups.com
Hi,
The "timed out enqueuing operation" actually originates on the servers
and is delivered when a write is very slow. This can come from slow
disks on very write-heavy workloads, or more commonly comes from high
packet loss between the servers when they are replicating data,
resulting in TCP retransmits. Do you have any insight into either of
these causes on the servers in your sandbox environment?