Wazuh 4.10.1 Agent Buffer Full & Log Loss from AWS S3 Polling

30 views
Skip to first unread message

WENWEN H

unread,
Jun 16, 2026, 8:17:30 AM (12 days ago) Jun 16
to Wazuh | Mailing List
hello All,
Background:
Wazuh 4.10.1 cluster with two agents collecting logs from multiple AWS accounts via S3 polling every 10 minutes.

Current status of both agents:
### 1. wazuh-agent (SGP)
- **Location**: Singapore (SGP)
- **Buffer Full incidents today**: 97 times (triggered in every collection round)
- **Processing time per round**: 6–7 minutes
- **Collection sources**:
  - 10 WAF buckets
  - ALB logs
  - CloudTrail
  - GuardDuty
  - CloudWatch
- **Severity & Behavior**:
  - Able to recover automatically, but log loss already occurs before each recovery, leading to continuous log drops
  - Buffer full events appear regularly, unavoidable in every processing round

### 2. wazuh-agent-linux3 (HK)
- **Location**: Hong Kong (HK)
- **Buffer Full incidents today**: 47 times
- **Processing time per round**: Unknown (no exact duration data provided)
- **Collection sources**:
  - 2 WAF buckets
  - 5 ALBs
  - CloudTrail
- **Severity & Behavior**:
  - After afternoon hours, unable to recover automatically; buffer remains persistently full
  - Compared with the SGP agent, it lacks recovery capability, causing more severe and sustained impact

Root cause:
events_per_second appears to have a hard limit (setting 5000 returns ERROR: (1235): Invalid value for element 'events_per_second': 5000)
S3 polling delivers large batches of logs all at once every 10 minutes, overwhelming the agent buffer
SGP agent triggers buffer overflow every single polling cycle — logs are being lost continuously

Questions:
What is the actual maximum value for events_per_second in Wazuh 4.10.1?
For high-volume, multi-account, multi-region AWS log ingestion, what architecture is recommended?
Would moving the aws-s3 wodle from agents to the Wazuh Manager resolve the buffer limitation?
Is Kinesis Data Firehose recommended for this use case? How should it be configured in a cluster environment?

Farouk Musa

unread,
Jun 16, 2026, 5:19:57 PM (11 days ago) Jun 16
to Wazuh | Mailing List
The maximum value for events per second is 1000. you can see more here. Also since your log source collect large amount of logs, it might be better to have a shorter interval so that the logs don't poll for long and then the agent has so much logs to process. Another thing to consider is increasing the buffer size of the agent, this allows the agent the ability to hold more logs for processing and not trash them. you can see information here

Moving the configuration from the agent to the manager will mean that you do not deal with the queue and buffer size limits however that does not solve the problem but instead just moves the bottleneck to a different location. This can also cause performance issues on the manager and affect analysisd and other modules.

Will kinesis be helpful? I do not think it will be helpful since it does not control how the logs are collected and transmitted from the buckets to the agents.

To answer your final question on how should it be configured in a cluster environment?  : i will recommend this setup:
1.  Replace S3 polling with SQS subscriber mode - the Wazuh AWS wodle support SQS, so you can use SQS instead so logs are processed as they become available rather than polling for long. So your logs go from S3 > SNS > SQS > Wazuh
2. Share the configuration across the worker nodes but leave the master node so nothing causes any heavy resource usage and affect analysisd.

WENWEN H

unread,
Jun 17, 2026, 5:20:06 AM (11 days ago) Jun 17
to Wazuh | Mailing List
hello, Farouk
Thank you for your explanation. I will try your method later and give you a reply.
Reply all
Reply to author
Forward
0 new messages