[WARN] serf: Event queue depth: 4096

516 views
Skip to first unread message

Chris Miller

unread,
Apr 21, 2016, 5:27:41 AM4/21/16
to Consul
We're seeing a situation where Consul is logging the following every second or so:

Apr 20 17:27:08 demolx1 consul: 2016/04/20 17:27:08 [WARN] serf: Event queue depth: 4096
Apr 20 17:27:08 demolx1 consul[1632]: serf: Event queue depth: 4096

This continues indefinitely and seems to cause Consul to use some CPU in the process (a constant 2%+ on an 8 core server).

Additional information:

We have just a single Consul server instance running with the following config:
{
    "bootstrap_expect": 1,
    "server": true,
    "datacenter": "test",
    "data_dir": "/var/consul",
    "encrypt": "...",
    "log_level": "INFO",
    "enable_syslog": true,
    "client_addr": "0.0.0.0",
    "start_join": ["10.112.32.97"]
}

We have 4 services that register themselves with Consul, and a couple of clients that monitor Consul for these services.
The services use /v1/event/fire/ and /v1/event/list to send messages to each other and to the clients. We did have a bug whereby /v1/event/list was using the wrong index for blocking calls, so was being called in a tight loop so perhaps that is what triggered the initial problem. As far as I'm aware we're sending very few events overall, at peak maybe a few events per second but usually should be much less. However even if we stop every single service and client process so there is nothing talking to Consul at all, Consul keeps on churning out the "Event queue depth: 4096" warnings ad infinitum. What causes this? Is it to be expected? Restarting Consul does make the warnings go away.

Armon Dadgar

unread,
Apr 21, 2016, 8:35:32 PM4/21/16
to consu...@googlegroups.com, Chris Miller
Chris,

Do you have a single node cluster that means? In a single node cluster case, there
is no other node to gossip with, so the event queues get saturated buffering messages
since there is nobody to send to, and they drop older messages and warn about the
saturation. It’s harmless, and sounds like it might be due to having only a single node.

Best Regards,
Armon Dadgar
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/d1a3d6a7-2beb-4d97-b387-5e43785b9bae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris Miller

unread,
Apr 22, 2016, 7:49:55 AM4/22/16
to Consul, chri...@gmail.com
Hi Armon,

Yes, this is just a single node cluster so sounds like your explanation is correct (we also have a completely separate three node cluster where we don't see this issue).

It's very helpful to know that the messages are harmless, thank you. Perhaps the logging of these warnings could be suppressed in the specific case when bootstrap-expect = 1? Or does the number of agent nodes impact this too? (We have no agents in this configuration). The warnings are generating a lot of noise in our logs, and short of setting the log level to Error I'm not sure there's any way to disable them? This is just a test/dev system and we're unlikely to increase the cluster size above 1 any time soon.

Armon Dadgar

unread,
Apr 22, 2016, 1:23:47 PM4/22/16
to consu...@googlegroups.com, Chris Miller, chri...@gmail.com
Hey Chris,

One of the reasons we don’t mask the warning is that you can imaging
a multi node cluster where a network partition has caused a node to get
isolated from the rest of the cluster, such that it seems like a one node cluster.
In those cases, we want there to be some diagnostic message that indicates
you are potentially losing messages.

Even with a bootstrap expect of one, as long as there are other client machines
you would still be doing gossip, even if there are no other servers.

Hope that helps!

Best Regards,
Armon Dadgar

Chris Miller

unread,
Apr 22, 2016, 1:30:04 PM4/22/16
to Consul
Thanks for clarifying/confirming. Given it's just a test server I guess we'll live with the logging for now.
Reply all
Reply to author
Forward
0 new messages