Question - The Wazuh architecture - Distributed

1,428 views
Skip to first unread message

Eric Vu

unread,
Oct 13, 2021, 12:52:41 AM10/13/21
to Wazuh mailing list
Hello Wazuh experts,

I have a few questions regarding the Wazuh architecture, including a few points. I am confused about the capability of Wazuh as well as doubt it. Would you please correct me If I am wrong? I will list out them here. Please can you explain it more in detail? I will focus on RTO, RPO, MTD, MTO, scalability and stability

1/ Core Components
Filebeat --> lightweight forwarder
Elasticsearch --> Ingest and index logs.
Kibana Visualize --> ingested log data.

As a can I see in distributed deployment in the image below includes a manager node (Wazuh), one or more forward nodes(Filebeat) running network sensor components, and one or more search nodes running Elastic search components(ES). As far I I understand this architecture may cost more upfront, but it provides for greater scalability and performance, as you can simply add more nodes to handle more traffic or log sources.

wazuh.png

Firstly, I will point out that causing a single dead point if you are using this architecture for an MSSP and services can be unavailable, causing significant to the business. 

- NGINX is a single dead point, not yet failover
- Kibana is a single dead point, not yet failover. 

==> Do you have any bits of advice to improve my single dead point? 

Secondly,  For Wazuh Cluster, ElasticSearch Cluster, File beat we can significantly increase the number of agents as long as we add worker nodes whenever necessary. But how many agents can we capability handle? I understand that it's depend how many agents you will have monitoring, the number of events per second generated, if you will monitor other types of devices (network devices), how long you will need the data to be online.

I assume that we have 300 Gbytes of logs a day, 20k Agents, I need to  store at least the last 6 month of log before delete. 
Our big problem is dealing with volume of logs from firewall, network devices.

==> Would you please show an example of a working architecture for Hardware recommendations in this situation? 

Since ElasticSearch keeps the same information, so adding more Elasticsearch nodes will not make it more suitable for dealing with bigger amounts of data also it also not prevent data loss if ES crash for this point. How can minimum risk deal RPO/RTO?

Finally, Any Suggest to build a multi cloud solution through service provider that will help us minimum risk deal MTD/MTO. 

Could you advise us on architecture and sizing to support the ingestion of the logs of this infrastructure?

I'm looking forward to hearing from you soon.

Regards, 

Eric Vu

unread,
Oct 18, 2021, 11:17:23 AM10/18/21
to Wazuh mailing list
Hello Wazuh experts, 

I'm looking for a solution that can happen, but I have not found the answer to my question. Life is an echo. What you send out comes back. 

I'm hoping that I will receive an answer. 

Regards, 

Miguel Casares

unread,
Nov 25, 2021, 10:50:17 AM11/25/21
to Wazuh mailing list
Hello Eric,

Sorry for the late response here.

Regarding your questions, let me answer below:

1/ Core Components
Filebeat --> lightweight forwarder
Elasticsearch --> Ingest and index logs.
Kibana --> Visualize ingested log data.
Wazuh server -> Analyze the data received from the agents, triggering alerts when threats or anomalies are detected. It is also used to manage the agents configuration remotely and to monitor their status.
Wazuh agent -> It provides threats prevention, detection, and response capabilities. It is also used to collect different types of system and application data, that it forwards to the Wazuh server through an encrypted and authenticated channel.


Regarding the single dead point:

- NGINX is a single dead point, not yet failover
- Kibana is a single dead point, not yet failover. 

The Wazuh agent does support failover configuration so in case NGINX fails, the agents can report directly to the Wazuh servers: https://documentation.wazuh.com/current/user-manual/configuring-cluster/advanced-settings.html

Regarding Kibana, it is possible to add more Kibana nodes that will point to the same cluster to visualize the data in case one crashes. We do recommend installing them on different servers.

Regarding:

 Secondly,  For Wazuh Cluster, ElasticSearch Cluster, File beat we can significantly increase the number of agents as long as we add worker nodes whenever necessary. But how many agents can we capability handle?

Yes, you can increase the number of worker nodes as you go. But I would recommend around 2000 agents per node as a general rule. Regarding your specific architecture, it is based on our experience and internal testing with the EPS. You may start adding a new node if the `remoted` and `analysid` daemons start to drop and you can see that here: 

It will be more accurate if you can provide the number of EPS to better determine this instead of using the volume of logs. Additionally, it will be better if you can provide the number of network devices and related to those 20k agents, what is their OS? Also, 300GB of logs in total for network devices and agents?

If 300GB would be total and you need to keep them for 180 days, in Elasticsearch you will need, at least, 54 TB summing across all the Elasticsearch nodes you will have. For the Wazuh server, it's compressing the data with a ratio of 20, so for the Wazuh server, it will be needed 2.7 TB if you want to keep the raw data. If you will only keep the alerts, it will be the half of it (again summing across all the Wazuh server nodes)

Regarding:


Since ElasticSearch keeps the same information, so adding more Elasticsearch nodes will not make it more suitable for dealing with bigger amounts of data also it also not prevent data loss if ES crash for this point. How can minimum risk deal RPO/RTO?

Indeed, for Elasticsearch it's better to add more nodes as long as you increase the data stored because it is sharding across the nodes. In addition to that, the shards and replicas are helping to prevent the data in the architecture:


Additionally, snapshots are available if needed.

Reference: 

Regarding:
Finally, Any Suggest to build a multi cloud solution through service provider that will help us minimum risk deal MTD/MTO. 


Using different environments and managing all of them with a single Kibana instance it's possible with the aforementioned. 

I hope everything here helps. Don't hesitate to contact us should you have more questions. 

You can also try our cloud service if you want to leverage the architecture of your environment: 

Regards,

Miguel Casares
Reply all
Reply to author
Forward
0 new messages