High Kernel CPU Usage Caused by Wazuh-Analysisd Daemon

CJK

unread,

Sep 30, 2024, 9:10:46 AM9/30/24

to Wazuh | Mailing List

Hi Team,

I’m experiencing an issue with my Wazuh cluster setup where the Wazuh-Analysisd daemon is consuming lots of CPU resource, leading to a heavy load on my servers. Due to this many events are getting dropped, and the process is spending 11+ hours on the queue.

Here are the server specs:

Wazuh Master Server: 20 Cores, 16 GB RAM, OS: Ubuntu 22.04
Wazuh Worker Server: 12 Cores, 16 GB RAM, OS: Ubuntu 22.04

The Average EPS is around 349.926, yet the system is struggling to keep up. I have attached Htop screenshots, along with the wazuh-analysisd_state and statistics images for both servers.

Any insights or suggestions on how to resolve this and optimize the performance would be greatly appreciated! Please help me on this.

Thanks,

wazuh-analysisd_state - Worker.txt

Htop Master node.png

Statistics.png

wazuh-analysisd_state - Master.txt

Htop worker node.png

Fabian Ruiz

unread,

Sep 30, 2024, 10:48:59 AM9/30/24

to Wazuh | Mailing List

Hi,

The Wazuh Manager processes a large volume of events sent by agents. To prevent overwhelming the manager with excessive events, agents are equipped with an anti-flooding mechanism. This mechanism allows agents to buffer generated events and send them to the manager at a controlled rate, typically limited to a specified number of events per second (default is 500 EPS).

I recommend reviewing the following guide for a deeper understanding of these topics: https://documentation.wazuh.com/4.0/user-manual/capabilities/antiflooding.html, Additionally, I suggest analyzing the sources generating such a high volume of events.

Regards.

CJK

unread,

Oct 6, 2024, 11:37:23 PM10/6/24

to Wazuh | Mailing List

Hi Fabian,

Thanks for the reply.

The active agents currently total only 230. I have a similar instance for another client that has over 230 active agents without any issues, and the only difference is that the other client is running Wazuh version 4.3.8, while I am facing problems on version 4.7.5. Additionally, the other client's manager server has lower specifications, running on 8 cores and 15 GB of RAM, whereas the server in question has 20 cores yet is still experiencing high utilization in the wazuh-analysisd daemon.

Is there any way to pinpoint the exact cause of the queue issue with the wazuh-analysisd daemon? Your assistance would be greatly appreciated..

For reference i am attaching two servers statistics page from GUI. SERVER A (Working fine). SERVER B (issue facing server). On the SERVER B high volume of events are dropped.

Kindly help me on this.

SERVER A.png

SERVER B.png

SERVER_A.png

SERVER_B.png

Fabian Ruiz

unread,

Oct 7, 2024, 10:52:59 AM10/7/24

to Wazuh | Mailing List

Hi,

As I mentioned before, the main cause of this problem is usually a high volume of events that the endpoints are sending to the Wazuh nodes. I recommend you to check which events are generating this situation to identify the specific cause, If you see too many alerts generated by that agent per second, and they are all the produced by the same event, you should check if your configuration is correct.

If no alerts appear on the manager from this agent (or not too many as would be expected) it means that your agent is being flooded by an event which is not generating alerts at all. You should then enable the "logall" option on the Wazuh Manager so every single event received is stored in a file (archives.log) and check that file to find the problematic event. Remember to disable the logall option after you finish investigating the issue, also remember to review the documentation I sent you, to understand how the antiflooding works, it will give you an idea of what is going on.

Regards.

CJK

unread,

Nov 26, 2024, 10:34:30 PM11/26/24

to Wazuh | Mailing List

Hi Fabian,

Thanks for your support and sry for the late reply, I have gone through the doc you have provided and that was really helpful.

As i go through more analysis on my collector nodes I found that the huge amount of logs are from 514 port(udp syslog) not from 1514 (agent communication). I have fortigate firewall, sonicwall and loadbalancer logs forwarded to collector servers via 514 port which are using most of my CPU. Can you suggest anything to finetune this. Agents part is fine now.

EPS is around 4120 From both nodes in 1hr (GUI)

Also can you help me with resource allocation, specifically how much is required to onboard 4500 EPS?

Wazuh : 4.7 Distributed setup
Collector Master : 20core 16Gb Ram
Collector worker : 12 core 16Gb Ram