Hi Veera,
In general, seeing the FIM database (fim.db) being re-initialized after Syscheck/FIM configuration changes can be expected, as the baseline may need to be rebuilt when the monitored scope or settings change. However, the exact behavior and the best approach depend heavily on the environment and configuration.
To provide more accurate guidance, could you please share a bit more context?
Which Wazuh version (manager and agent) are you using?
Are all the NFS filesystems being scanned from a single agent, or is it possible to distribute them across multiple agents/hosts?
Approximately how large are the filesystems (number of files / total size)?
What scan frequency do you need for each volume (e.g. daily, weekly, near-real-time)?
Are you using (or considering) realtime or whodata monitoring for any of these paths?
What operating system and mount options are used for the NFS volumes?
With this information, we can better determine whether rebuilding the FIM database is expected in your case and suggest a more suitable design (scope, scheduling, or agent distribution).
There is no specific Wazuh documentation focused on File Integrity Monitoring configurations for large NFS filesystems, mainly because this is not the most common FIM use case. This does not mean it is not possible or that it is “unsupported”; rather, it requires carefully defining the most appropriate configuration based on the size of the environment, the number of files, and the actual monitoring objective.
For volumes of the scale you mentioned (tens of terabytes and millions of files), it is critical to clearly understand what needs to be monitored in order to achieve a stable and manageable implementation.
To better assist you, could you please clarify a few points?
* What types of changes do you actually need to detect?
File creation or deletion
File content modifications
Other types of changes
* Do these controls need to apply to the entire NFS filesystem, or can they be limited to specific directories or file types?
* Is the primary goal security/auditing or inventory/visibility?
If possible, it would also be very helpful if you could share your current Syscheck/FIM configuration.
Please review it before sharing and obfuscate any sensitive or personal data if necessary.
With this information, we will be able to evaluate how to properly tune FIM for this type of implementation.
Do these controls need to apply to the entire NFS filesystem, or can they be limited to specific directories or file types?
Yes , at the moment we cannot limit to specific directories or file types,
we need to monitor the entire NFS filesystem for all types of changes listed above.
Here , can you provide examples of how to configure Wazuh to monitor NFS mounts for full mount and limited to specific directories or file types?
That will help us to understand better the configuration options available.
Is the primary goal security/auditing or inventory/visibility?
I would say both are equally important.
Looking for Data visibility / Inventory, Compliance / Audit, Threat detection / Response , Operational / Performance insights are expected.
Attached here is the syscheck configuration we are using currently for NFS mounts monitoring.
Review and suggest any changes if required to meet the above requirements.
Refer https://groups.google.com/g/wazuh/c/Wfyk5uU-apY/m/9T3OcJXNBAAJ also ...
Thanks
Thanks,
Based on the requirements you described, what you are looking for goes beyond a File Integrity Monitoring (FIM) use case. Most of the events you expect (such as READDIR, GETATTR, and detailed access auditing) fall under file activity and access monitoring, not FIM. In addition, on NFS mounts the agent cannot reliably observe or attribute these types of access events in the same way it can on local filesystems, so features such as realtime monitoring or who-data attribution do not provide the audit semantics you are expecting.
From a scalability perspective, asking a single agent to scan tens of terabytes and millions of files is not realistic. This not only impacts local resources, but also forces massive metadata and content reads over the network. Even with scans every 12 hours, it is very unlikely that a full scan could complete consistently, and FIM also has practical design limits that make this scenario unsuitable.
For these reasons, the recommended approach is to split responsibilities, while still keeping Wazuh as the central platform for management and correlation:
Access auditing: use mechanisms such as auditd (or equivalent solutions) on the NFS server or on the hosts mounting the NFS to capture file activity and access events.
This information can be sent to Wazuh using an agent or, where appropriate, by ingesting logs via localfile, allowing it to be correlated and managed centrally.
Inventory / data visibility: rely on capabilities provided by the storage system itself or on external batch processes.
For example, in CephFS, this is supported by the Metadata Server (MDS), which maintains global filesystem metadata and allows inventory and statistics to be generated without traversing the entire filesystem from a client.
The resulting inventory or summary data can also be forwarded to Wazuh for analysis and tracking.
File integrity: use Wazuh FIM strictly for file integrity monitoring, applied to a reduced scope that is relevant from a security perspective, rather than as a full NFS scanner.
This approach aligns better with both the functional requirements and the scale of the environment, while allowing Wazuh to remain the central point for ingestion, correlation, and alerting without forcing FIM to cover use cases it was not designed for.
If you have any questions, need further clarification, or would like us to review how this set of components could work in your environment, please let us know and we can go into more detail.
Regarding your questions, there are a few key points to clarify first. The main limiting factor for FIM is not the filesystem size (in TB), but the total number of monitored files. Wazuh includes the file_limit parameter as a safeguard, with a default value of 100,000 files, specifically to prevent severe performance impact.
Although this limit can be increased, FIM maintains a local database and performs full filesystem traversal, which introduces practical scalability limits by design, especially in large NFS environments.
A good practice is to define the scan interval based on the actual time required to complete a full scan. As a general rule, the interval should be at least twice the duration of a full scan, to avoid overlapping executions.
In this scenario, we recommend starting with a long test scan (for example, around 48 hours) and then adjusting the schedule based on how long the process actually takes in your specific environment.
For this use case:
Using check_all="yes" is not recommended unless all checks are strictly required, as it significantly increases the per-file cost.
It is preferable to enable only the required attributes (mtime, size, permissions, ownership, etc.).
If the system runs other resource-intensive processes, it may be advisable to disable scan_on_start, to avoid resource contention when the agent restarts.
If you have any questions or need further clarification, please let us know and we can go into more detail.
If the system runs other resource-intensive processes, it may be advisable to disable scan_on_start, to avoid resource contention when the agent restarts. - Since this host is a dedicated scanner only (no other resource-intensive workloads), scan_on_start can remain enabled? . However, please suggest any recommended tuning options to better utilize the system resources and optimize scan performance.
we recommend starting with a long test scan (for example, around 48 hours) and then adjusting the schedule based on how long the process actually takes in your specific environment. - Please clarify how we can determine the real scan time for a specific endpoint (for example, a scanner host with 3 NFS volumes), so that the schedule can be set appropriately.
From the syscheck snippet shared earlier, For an NFS-dedicated scanner, is it acceptable / recommended to exclude local paths like /etc from FIM, so the scanner focuses only on the required NFS mount paths as they are reported in fim events.
with all your recommendations , is that ok to schedule 2 or more volumes in a single scanner/agent?
Below are our answers point by point.
1) scan_on_start on a dedicated scanner:
On a host dedicated exclusively to FIM, it is acceptable to keep scan_on_start enabled. This ensures the baseline is rebuilt after a restart, as long as restarts are infrequent and controlled.
The actual scan duration should be determined by reviewing the Wazuh agent logs:
identify when the syscheck/FIM scan starts,
identify when the scan finishes,
the difference between those timestamps is the real scan time.
As a general rule, the scan interval should be at least twice the duration of a full scan, to avoid overlapping executions. A good starting point is to run a long test scan (for example, around 48 hours) and then adjust the schedule based on the observed behavior.
3) Excluding local paths on an NFS-dedicated scanner:The file_limit parameter applies at the agent level, not per filesystem or per path. All files from all monitored paths count toward the same limit.
On a scanner dedicated to NFS:
it is valid and recommended to exclude most local operating system paths, as they do not add value to the use case and only consume part of the file_limit,
however, keeping /etc monitored is often a good practice, since it contains critical system configuration and usually has a low file count, resulting in minimal performance impact.
In short:
exclude /bin, /usr, /lib, etc.,
keep /etc if you want to preserve basic OS integrity without significantly affecting NFS scanning.
It is possible to monitor more than one NFS volume per agent, but the key factor is not the number of volumes, it is the total number of files being monitored.
If you don't have a reliable estimate of the file count per volume in advance, the safest approach is to start with one volume per agent, measure scan duration and stability, and only then evaluate adding more volumes if the behavior remains consistent.
For this scenario, we recommend:
avoiding check_all="yes" unless it is strictly required,
keeping realtime="no",
adjusting the scan frequency based on real measurements from your environment.
Hi Pablo,
I have configured the NFS volumes using the recommended FIM settings:
check_sum="yes", check_owner="yes", check_group="yes", check_perm="yes", check_size="yes", realtime="no", and recursion_level="100".
The Wazuh agents are configured to monitor single or multiple volumes, each maintained within the defined file limit, with a scan schedule of 48 hours. However, even after 144 hours, FIM events are being reported from only one server, while no events are received from the other servers.
Additionally, the server that is reporting events has three configured volumes, but FIM activity is observed from only one of those volumes.
Could you please advise which logs should be reviewed for troubleshooting and suggest the next steps to investigate this behavior?
Thanks ...
First, with check_sum="yes" enabled, FIM must read the full contents of every file in order to calculate checksums. On large NFS volumes, this significantly increases scan time and I/O cost, and it is one of the main reasons scans can take much longer than expected.
Given that, it is very likely that:
the scan has only completed the first volume so far, and
the other volumes have not been reached yet, so no events are generated from them at this stage.
Additionally, if the agent reaches the configured file_limit before all volumes are scanned, FIM will stop registering new files. When this happens, you should see corresponding warnings in the agent log (/var/ossec/logs/ossec.log) indicating that the file limit has been reached or that files are being skipped.
Recommended next stepTo validate this and get a clear baseline, we recommend a partial test:
Configure the agent to monitor only a single NFS volume.
Let the scan run until it fully completes (confirm start and end messages in ossec.log).
Measure how long that scan actually takes.
Once confirmed, gradually add additional volumes and observe how scan time and behavior change.
This approach helps ensure that scans can complete reliably and provides concrete data to size the number of volumes per agent and the scan interval appropriately.
If you’d like, you can share the relevant ossec.log excerpts (scan start/end and any file limit warnings), and we can help review them.
The behavior you are observing (multiple “scan started” messages and very few “scan ended” messages) is expected in this scenario and is directly related to the cost of the scan. With check_sum="yes" enabled, FIM must read the full contents of every file in order to calculate checksums. On large NFS volumes, this makes the scan extremely slow and it can take several days to complete.
FIM performs scans sequentially and does not store the traversal progress. This means that:
if a scan starts but does not complete (for example due to an agent restart, configuration reload, or frequency change),
the next scan will start again from the beginning of the traversal,
and volumes that have not yet been reached will not generate events because they are not yet part of the baseline.
This explains why you may see:
activity only from a single volume,
events concentrated on a specific date,
and repeated scan restarts without consistent progress.
This behavior does not indicate a malfunction; it means the scan is operating at the edge of what is feasible with the current configuration.
As a practical way to move forward and demonstrate progress, we recommend a temporary adjustment:
Disable check_sum temporarily to significantly reduce scan cost.
Keep the remaining checks enabled (owner, group, permissions, size).
Configure a long scan interval (for example, 4 days).
Avoid agent restarts or configuration changes while the scan is running.
This allows you to confirm whether the scan can complete successfully and whether all volumes are reached. Once that is validated, you can continue working toward a more refined final solution based on measured behavior.
Apologies if my previous message was confusing — sharing the full grep logs may have been misleading. I am able to clearly track the exact 48-hour FIM scan start and end timestamps, so that part is confirmed.
However, my concern is different. I realize I may be asking a lot of questions, but I’ve had to test multiple scenarios and want to ensure I understand the behavior correctly.
At the moment, I have not yet disabled check_sum="yes".
Case 1:
One Wazuh agent is scanning a single volume larger than 50 TB. The file count is roughly half of the configured file_limit. The FIM scan took 146 hours and 10 minutes to complete. A second scan has just started today and is still in progress. The configured scan frequency is 48 hours, but so far, no events have been generated.
Case 2:
Another Wazuh agent is scanning four volumes with a combined size of about 4 TB and a file count of up to 50 million. The FIM scan completed in approximately three days, but events were generated for only one of the volumes.
My question is: If a FIM scan for an individual volume completes successfully, when should the related events be expected?
Additionally, I reconfigured the multiple volumes to be scanned one by one and was able to successfully receive alerts for the second volume on the same agent. However, after that, events from the first volume stopped appearing.
So in a multi-volume setup on a single agent, are events generated and reported independently per volume, rather than all at the same time?
Thanks in advance for your guidance.
Thanks for the detailed explanation of both cases.
When should FIM events be expected after a scan completes? A key point to clarify is that FIM does not generate events simply because a scan finishes.The message “File integrity monitoring scan ended” only indicates that the filesystem traversal has completed and the baseline is consistent. FIM events are generated only when detectable changes occur, such as file creation, deletion, metadata changes, or content changes based on the enabled checks.
These events can be generated:
during a scan, as directories are traversed, or
after a scan, when a subsequent scan compares the current state against the baseline.
It is therefore normal for a scan to complete successfully and still produce no events, if no relevant changes occurred. To confirm coverage for a specific volume, the correct approach is to perform a controlled change (for example, create or modify a file) and verify that the event is detected.
Regarding the case with ~50 million files: while increasing file_limit makes this technically possible, it is well beyond the default and recommended operating range of FIM. In such scenarios, long scan times, uneven event visibility across volumes, and sensitivity to scan order are expected behaviors rather than errors.