Hello Wazuh experts,
I have a few questions regarding the Wazuh architecture, including a few points. I am confused about the capability of Wazuh as well as doubt it. Would you please correct me If I am wrong? I will list out them here. Please can you explain it more in detail? I will focus on RTO, RPO, MTD, MTO, scalability and stability
1/ Core Components
Filebeat --> lightweight forwarder
Elasticsearch --> Ingest and index logs.
Kibana Visualize --> ingested log data.
As a can I see in distributed deployment in the image below includes a manager node (Wazuh), one or more forward nodes(Filebeat) running network sensor components, and one or more search nodes running Elastic search components(ES). As far I I understand this architecture may cost more upfront, but it provides for greater scalability and performance, as you can simply add more nodes to handle more traffic or log sources.
Firstly, I will point out that causing a single dead point if you are using this architecture for an MSSP and services can be unavailable, causing significant to the business.
- NGINX is a single dead point, not yet failover
- Kibana is a single dead point, not yet failover.
==> Do you have any bits of advice to improve my single dead point?
Secondly, For Wazuh Cluster, ElasticSearch Cluster, File beat we can significantly increase the number of agents as long as we add worker nodes whenever necessary. But how many agents can we capability handle? I understand that it's depend how many agents you will have monitoring, the number of events per second generated, if you will monitor other types of devices (network devices), how long you will need the data to be online.
I assume that we have 300 Gbytes of logs a day, 20k Agents, I need to store at least the last 6 month of log before delete.
Our big problem is dealing with volume of logs from firewall, network devices.
==> Would you please show an example of a working architecture for Hardware recommendations in this situation?
Since ElasticSearch keeps the same information, so adding more Elasticsearch nodes will not make it more suitable for dealing with bigger amounts of data also it also not prevent data loss if ES crash for this point. How can minimum risk deal RPO/RTO?
Finally, Any Suggest to build a multi cloud solution through service provider that will help us minimum risk deal MTD/MTO.
Could you advise us on architecture and sizing to support the ingestion of the logs of this infrastructure?
I'm looking forward to hearing from you soon.
Regards,