First off, just want to say thanks for the amazing product. I have this rolled out to about 60 servers so far and this is the first hitch I've run into that I couldn't figure out myself. I'm hoping someone can point me in the right direction to troubleshoot this. My company hosts many applications in Amazon Linux 2 ARM64 ECS clusters. I have the Wazuh agents installed on the host EC2 instances and it works fine everywhere except for one server. All the systems use the same base AMI, so they should
be identical, but here we are :) On this one server it works fine for a few days and then something changes...
If I login to the box and view processes it shows that the log collector is spinning at 100%
If I reboot the server or the wazuh-agent it will fix itself for a few days but eventually breaks again. I've even tried redeploying the server and get the same result.
I checked the ossec.log file already and it looks identical to servers that don't have these issues. I looked at the log files Wazuh is monitoring and they don't seem overly large or active. No more so then any of my other systems anyways. Only thing I can think of is that it's something with the specific logs the containers being run are generating that's not playing nice with Wazuh, but I don't know where to start to try and troubleshoot specifically what that might be. On these particular servers the only containers being run are Teleport (https://goteleport.com/
) processes. Any help would be appreciated!
My specs are:
Wazuh Agent and Manager: 4.3.10
OS: Linux 4.14.296-222.539.amzn2.aarch64 #1 SMP Wed Oct 26 20:36:51 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Server type: t4g.small