Environment: Wazuh Cluster (Master + Workers), large-scale agent deployment (~20.000)
Symptoms: Agents frequently disconnect from the server. Restarting the agent temporarily restores the connection, but it soon degrades and drops again, leading to log loss.
The agent's ossec.log shows the following loop:
```
wazuh-agent: WARNING: Target 'agent' message queue is full (1024). Log lines may be lost
wazuh-agent: INFO: Closing connection to server ([ip-address]:port/tcp)
wazuh-agent: INFO: Trying to connect to server ([ip-address]:port/tcp)
...
wazuh-agent: INFO: Requesting key from server
```
Troubleshooting Performed:
Disabled client_buffer (<disabled>yes</disabled>) on the agent: no effect, same logs.
Inspected Worker nodes: The /var/ossec/queue/db/ directory exceeds 150GB on each worker.
Hypothesis: Heavy FIM (Syscheck) configurations without proper <ignore> rules caused the agents' local SQLite databases to grow massively. This creates a severe I/O bottleneck and SQLite locking on the worker nodes. As a result, wazuh-remoted fails to process incoming data in time and stops responding. This causes the agent's buffer to overflow (queue is full) and the TCP connection to drop due to timeout. During reconnection attempts, a Master-Worker synchronization lag occurs, prompting the agent to request a new key.
Questions:
Is the hypothesis correct that I/O degradation caused by massive SQLite DBs in /queue/db/ leads to forced TCP drops by the worker and buffer overflows on the agents?
Is it completely safe to stop wazuh-manager on the workers and execute rm -f /var/ossec/queue/db/*.db to force a Full Sync and rebuild the databases from scratch as an emergency recovery step?
Aside from adding <ignore> rules for dynamic directories in <syscheck>, what other parameters should be tuned on Worker nodes to prevent SQLite locking during high-volume FIM traffic?
wazuh-agent: WARNING: Target 'agent' message queue is full (1024). Log lines may be lost
You can fix this warning by increasing the log collector.queue_size below 220000.
nano /var/ossec/etc/local_internal_options.conf
Add this to the local_internal_options.conf file to not overwrite even after the upgrade.
# Logcollector - Output queue size [128..220000]
logcollector.queue_size=100000
1. Massive DB files that are contained in /var/ossec/queue/db/ this path is not an issue unless the storage does not have enough space.
2. I am not recommend to remove the agent databeses due to it can be server unable to start or can be data lose.
3. The /var/ossec/queue/db/ directory is used by FIM, Syscollector, and SCA on the Wazuh manager to store agent data. As you’ve seen, the size of this directory grows based on the number of agents, the number of monitored files, and data collected by Syscollector. If syscheck is causing high disk usage, it’s worth reviewing the configuration on the agent. Monitoring large directories or paths with many subfolders can quickly increase the number of tracked files. It’s better to limit monitoring to only what’s necessary and use the <ignore> option to skip files or directories that don’t add value, and also you can use registry_ignore to list of registry entries.
The same applies to registry monitoring. Keep only what’s needed and exclude the rest to keep storage usage under control.
You can also set file and registry limits on the Wazuh agent to control the number of files and registries monitored by FIM. For that, you can refer to the Wazuh documentation.
At the moment, there isn’t a built-in way to set a size limit for this database or selectively clean it up. The practical approach is tuning what gets collected at the agent level.
Also, keep in mind that this directory can still contain data from old or disconnected agents. Those don’t get cleaned up automatically. If you have agents that are no longer in use, remove them from the manager. That will also clear their data from this directory and free up space.
You can remove an old agent using:
Let us know if you need any further information.