Hi team,
We have a Wazuh manager worker node running Wazuh version 4.7.5. The server almost runs on CPU utilization (90%). Although we typically handle an EPS of 25,000, Sometimes we are experiencing delays in log processing, which is adversely affecting our monitoring.
The analysisd process remains active, but only a portion of the logs are being processed in real time. Over time, the delay in log ingestion increases. We have not identified any specific errors in ossec.log related to analysisd performance.
After we restart the wazuh-manager service, logs return to real time, but this usually results in loss of logs that were delayed during the lag period.
However, the following messages appear frequently in ossec.log no other major errors or warnings are found:
2025/12/28 06:24:27 wazuh-modulesd:vulnerability-detector: ERROR: (5513): CVE database could not be updated. 2025/12/28 12:22:50 wazuh-modulesd:vulnerability-detector: ERROR: (5553): The allowed number of failed pages (5) has been exhausted. The feed will not be updated.
Questions:
Before diving into a specific root cause, it would be helpful to gather more context about the environment in order to better guide the troubleshooting.
1. Architecture and load distributionHow many manager workers are running in the cluster?
Does the issue occur on all workers, or only on this specific node?
Does this worker handle any uneven or special load (for example, more agents, noisier agents, or additional responsibilities)?
Have you tried moving agents between workers to check whether the issue follows the worker or follows the agents?
Are CPU, memory, and disk resources continuously monitored on the server?
Is there any memory pressure (swap usage, cache reclaim activity) or disk I/O contention?
Is the hardware properly sized for the target EPS you expect the node to handle?
These symptoms may indicate that the node is operating very close to, or above, its effective capacity.
3. Agent types and analysis workloadAre the agents connected to this worker predominantly Windows or Linux?
Does the behavior remain the same even if the agents assigned to the worker are changed?
In environments with a high number of Windows agents, the volume and complexity of events (Security logs, Sysmon, PowerShell, etc.) typically increase the load on analysisd significantly.
4. About the vulnerability-detector errorsThe vulnerability-detector messages do not appear to be the direct cause of the analysisd delay, but they do indicate overall system stress. Repeated feed update failures are commonly associated with:
Resource exhaustion
CPU- or I/O-related timeouts
Intermittent network issues
It would be useful to confirm whether these errors correlate in time with CPU spikes and increasing ingestion lag.