Hi Wazuh devs!
I did some deep investigating on this issue yesterday and this morning, and have come to the conclusion that there is a serious bug in Wazuh's AWS Cloudtrail ingestion. This email might get a little bit long, but I feel the info is necessary to document and explain the problem.
Environment:
Wazuh Version: 4.12.0 (x86_64)
OS: Amazon Linux 2
Yesterday, I set wazuh_modules.debug = 2 on our running instance, then waited for a little while to see some of the debug logs coming in. I decided to focus my investigation on one specific set of logs - namely the Cloudtrail logs in a single account (464811824699), in the us-east-1 region. This is our most heavily used account, generating thousands of cloudtrail entries every few minutes.
First, the relevant ossec.log entries with debug enabled, but non-relevant data cut out:
2025/11/06 18:20:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:84 at wm_aws_main(): INFO: Starting fetching of logs.
2025/11/06 18:20:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:136 at wm_aws_main(): INFO: Executing Bucket Analysis: (Bucket: cloudtrail-logs-5a3ea689, Path: Cloudtrail/, Type: cloudtrail)
2025/11/06 18:20:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:379 at wm_aws_run_s3(): DEBUG: Create argument list
2025/11/06 18:20:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:494 at wm_aws_run_s3(): DEBUG: Launching S3 Command: wodles/aws/aws-s3 --bucket cloudtrail-logs-5a3ea689 --trail_prefix Cloudt
rail/ --only_logs_after 2024-JAN-01 --type cloudtrail --debug 2
...
DEBUG: +++ Marker: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1815Z_mT5isOkzuCsvdT5N.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1815Z_nN57He9CcCZI0atM.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1815Z_quTqxDRfWIKewbhi.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1815Z_xNLYCe5lOhLx9iL4.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_3YHZBJUq7RjnXVpa.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_7ekYStSqGjmlhLme.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_8xDhhJkwDX3fl9Bp.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_A4M6Q8yoBrMtPvuv.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_N2tOEzzp8W4IciuF.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_RlOLdU0pRkOjC0HW.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_TaFUOtg6dvR20792.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_V4SDZUzTLKf4sixS.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_XRSWN9BACzkHYNBl.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_c7jYmEd6afCn0Aup.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_jfNx1jvrIxTyTKid.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_mqF9rwRfdkgF3vcd.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_n2OXQS2jyDE7cmZf.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_wntHuU3rAE1GJKue.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1820Z_zNO4a2mNGPPmJhpm.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_VVX7lWU4WAcepzRk.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_ceZmYcMYFd5XXP1n.json.gz
DEBUG: +++ DB Maintenance
...
2025/11/06 18:23:38 wazuh-modulesd:aws-s3[1426] wm_aws.c:201 at wm_aws_main(): INFO: Fetching logs finished.
...
2025/11/06 18:25:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:84 at wm_aws_main(): INFO: Starting fetching of logs.
2025/11/06 18:25:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:136 at wm_aws_main(): INFO: Executing Bucket Analysis: (Bucket: cloudtrail-logs-5a3ea689, Path: Cloudtrail/, Type: cloudtrail)
2025/11/06 18:25:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:379 at wm_aws_run_s3(): DEBUG: Create argument list
2025/11/06 18:25:46 wazuh-modulesd:aws-s3[1426] wm_aws.c:494 at wm_aws_run_s3(): DEBUG: Launching S3 Command: wodles/aws/aws-s3 --bucket cloudtrail-logs-5a3ea689 --trail_prefix Cloudt
rail/ --only_logs_after 2024-JAN-01 --type cloudtrail --debug 2
...
DEBUG: +++ Marker: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_ceZmYcMYFd5XXP1n.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_eV7ZTAbrNkqqV7s8.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_hCxmodiASno13OOK.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_utekoji3oD5VOsnh.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_wEMhQYC1iO5xpz9m.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1825Z_yQ0kNGbZKabKNgBd.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1830Z_X5Cd1Pguwp2C8nfw.json.gz
DEBUG: ++ Found new log: Cloudtrail/AWSLogs/464811824699/CloudTrail/us-east-1/2025/11/06/464811824699_CloudTrail_us-east-1_20251106T1830Z_mC9GAucLKKJOomZb.json.gz
DEBUG: +++ DB Maintenance
So from this, we can see the following:
18:20:46 - S3 wodle starts the log fetching process for this account.
Once it gets to the us-east-1 region, it uses the file 464811824699_CloudTrail_us-east-1_20251106T1815Z_mT5isOkzuCsvdT5N.json.gz as the marker, which has a last-modified time of 2025-11-06T18:10:37.000Z. It proceeds to download a number of files - 3 with a "T1815Z" timestamp, 15 with "T1820Z", and 2 with "T1825Z".
18:25:46 - S3 wodle starts the next log fetch cycle.
At this point, it uses the file 464811824699_CloudTrail_us-east-1_20251106T1825Z_ceZmYcMYFd5XXP1n.json.gz the marker, with last-modified time of 2025-11-06T18:20:26.000Z. It proceeds to download 5 files with "T1825Z", and 2 with "T1830Z".
Now here comes the bug! When I compare those 15 files with the "T1820Z" timestamp with the list in S3, S3 actually has 18 files with that timestamp. Comparing the 2 lists, and getting the last modified time for those 3 files shows the following:
464811824699_CloudTrail_us-east-1_20251106T1820Z_1kg87Tbe9k8bmKlp.json.gz - last modified 2025-11-06T18:22:22.000Z
464811824699_CloudTrail_us-east-1_20251106T1820Z_99MbCyo86R00FiBR.json.gz - last modified 2025-11-06T18:21:57.000Z
464811824699_CloudTrail_us-east-1_20251106T1820Z_cJ0PSMsYmdeR07VL.json.gz - last modified 2025-11-06T18:24:20.000Z
So each of those 3 missing files were actually written AFTER 464811824699_CloudTrail_us-east-1_20251106T1825Z_ceZmYcMYFd5XXP1n.json.gz at 18:20:26. But the polling cycle that started at 18:25:46 completely ignored them! The root cause appears to be that the marker advances based on the last processed filename (lexicographically), not by timestamp. Once the marker moves to a file from T1825Z (ceZmYcMYFd5XXP1n.json.gz at 18:20:26), files from T1820Z that arrive later are never reconsidered, even though CloudTrail's eventual consistency means files can arrive several minutes after their timestamp.
I scanned the entire log file for Nov 6 this morning, and I see no evidence that those files were ever pulled in yesterday.
I also ran some comparison scripts between the entire list of files in S3, vs the data in Opensearch (referencing the data.aws.log_info.log_file field). In our environment, when I ran this test yesterday against roughly 18 hours of data, I saw somewhere around 40% of my Cloudtrail log files were missed, and I suspect it's all due to this bug.
I think this needs to be treated as a very serious bug. For ourselves, I'm going to test using the subscriber="buckets" option (S3 Data events + SQS) I mentioned in my previous email - assuming the data structure is similar enough that our existing rules will still work properly when ingesting the data in that method, I think that's a safer solution.
Jeremy Utley