[WARN] serf: Event queue depth: 4096

Chris Miller

unread,

Apr 21, 2016, 5:27:41 AM4/21/16

to Consul

We're seeing a situation where Consul is logging the following every second or so:

Apr 20 17:27:08 demolx1 consul: 2016/04/20 17:27:08 [WARN] serf: Event queue depth: 4096

Apr 20 17:27:08 demolx1 consul[1632]: serf: Event queue depth: 4096

This continues indefinitely and seems to cause Consul to use some CPU in the process (a constant 2%+ on an 8 core server).

Additional information:

We have just a single Consul server instance running with the following config:

{

"bootstrap_expect": 1,

"server": true,

"datacenter": "test",

"data_dir": "/var/consul",

"encrypt": "...",

"log_level": "INFO",

"enable_syslog": true,

"client_addr": "0.0.0.0",

"start_join": ["10.112.32.97"]

}

We have 4 services that register themselves with Consul, and a couple of clients that monitor Consul for these services.

The services use /v1/event/fire/ and /v1/event/list to send messages to each other and to the clients. We did have a bug whereby /v1/event/list was using the wrong index for blocking calls, so was being called in a tight loop so perhaps that is what triggered the initial problem. As far as I'm aware we're sending very few events overall, at peak maybe a few events per second but usually should be much less. However even if we stop every single service and client process so there is nothing talking to Consul at all, Consul keeps on churning out the "Event queue depth: 4096" warnings ad infinitum. What causes this? Is it to be expected? Restarting Consul does make the warnings go away.

Armon Dadgar

unread,

Apr 21, 2016, 8:35:32 PM4/21/16

to consu...@googlegroups.com, Chris Miller

Chris,

Do you have a single node cluster that means? In a single node cluster case, there

is no other node to gossip with, so the event queues get saturated buffering messages

since there is nobody to send to, and they drop older messages and warn about the

saturation. It’s harmless, and sounds like it might be due to having only a single node.

Best Regards,

Armon Dadgar

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/d1a3d6a7-2beb-4d97-b387-5e43785b9bae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris Miller

unread,

Apr 22, 2016, 7:49:55 AM4/22/16

to Consul, chri...@gmail.com

Hi Armon,

Yes, this is just a single node cluster so sounds like your explanation is correct (we also have a completely separate three node cluster where we don't see this issue).

It's very helpful to know that the messages are harmless, thank you. Perhaps the logging of these warnings could be suppressed in the specific case when bootstrap-expect = 1? Or does the number of agent nodes impact this too? (We have no agents in this configuration). The warnings are generating a lot of noise in our logs, and short of setting the log level to Error I'm not sure there's any way to disable them? This is just a test/dev system and we're unlikely to increase the cluster size above 1 any time soon.

Armon Dadgar

unread,

Apr 22, 2016, 1:23:47 PM4/22/16

to consu...@googlegroups.com, Chris Miller, chri...@gmail.com

Hey Chris,

One of the reasons we don’t mask the warning is that you can imaging

a multi node cluster where a network partition has caused a node to get

isolated from the rest of the cluster, such that it seems like a one node cluster.

In those cases, we want there to be some diagnostic message that indicates

you are potentially losing messages.

Even with a bootstrap expect of one, as long as there are other client machines

you would still be doing gossip, even if there are no other servers.

Hope that helps!

Best Regards,

Armon Dadgar

To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/c53d9ebb-e559-414e-a1a8-3b0763d0fcb3%40googlegroups.com.

Chris Miller

unread,

Apr 22, 2016, 1:30:04 PM4/22/16

to Consul

Thanks for clarifying/confirming. Given it's just a test server I guess we'll live with the logging for now.

Reply all

Reply to author

Forward