Inquiries about wazuh clusters

杨海

unread,

Aug 23, 2019, 11:13:59 AM8/23/19

to wa...@googlegroups.com

Hi

Before we have wazuh agents on some hosts and one single server containing wazuh manager, elasticsearch and kibana. We know wazuh manager can be deployed as cluster, with one master node and some worker nodes. Elasticsearch can have master node and data nodes as well. We understand the clustering can bring in better scalability and stability. Here we have several questions and look forward to best practices on how clusters can be configured and optimized.

As to wazuh manager cluster,

1) It's said that each worker needs an event forwarder (Filebeat for ELK) to send data. Does the master server node need a forwarder as well?

2) I suppose the worker nodes can work very well in the situation that one of them fails. But if the master node fails? Can worker nodes elect a new master?

As to elasticsearch cluster,

I can configure the ES server IP for each filebeat as output. I assume that each filebeat is corresponding to a worker node. Then, how many filebeat nodes (worker node) can a ES server be connected to?

Regards

Hai

Jesús Sánchez de Lechina Tejada

unread,

Aug 23, 2019, 1:39:59 PM8/23/19

to Wazuh mailing list

Hi Hai,

Yes, you will need a forwarder as Filebeat on the master node. Every node on the cluster will be generating alerts, so every node will need to send those alerts to Elasticsearch.

The master node is in charge of agent registration and pushing configuration to workers. So if a master fails workers will keep generating alerts from the previously registrated agents and forwarding them to Elasticsearch, but the cluster won't be able to register agents until the master is up again. Currently we are developing a new feature in which if a master node fails workers could be elected as the new master.

Elasticsearch does not define a logical limit of Filebeat connections. You may add as many as needed, regarding at the physical limitations of the environment (network, elasticsearch node's RAM...). Check this link for more information about Filebeat's configuration: https://www.elastic.co/guide/en/beats/filebeat/current/load-balancing.html

Best Regards,
Jesús

杨海

unread,

Aug 26, 2019, 8:29:25 PM8/26/19

to jesús sánchez de lechina tejada, Wazuh mailing list

Hi Jesús

Imaging we have over 20K agents registered to wazuh master node and served by cluster of worker nodes. We tend to use master node dedicated for managing agents and worker nodes, is that workable?

Also we suppose nginx loadbalancer is a choice for worker nodes to serve 20K agents efficiently and stable.

I am not sure how much data will be collected on each agent per day, and what would be the desired architecture to serve 20K agents? for example, 20 worker nodes?

Regards

Hai

--------------原始邮件--------------
发件人："Jesús Sánchez de Lechina Tejada "<jesus....@wazuh.com>;
发送时间：2019年8月24日(星期六) 凌晨1:39
收件人："Wazuh mailing list" <wa...@googlegroups.com>;
主题：Re: Inquiries about wazuh clusters
-----------------------------------

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/cb3406bd-67a6-4de5-8178-858ced76b852%40googlegroups.com.

杨海

unread,

Aug 27, 2019, 11:39:17 PM8/27/19

to 杨海, jesús sánchez de lechina tejada, Wazuh mailing list

Hi Jesus,

Add one more question, as each node (master or worker) has SQLite, is there any mechanism to prevent data loss in the database when one worker node fails?

As to the data quantity on each agent, I saw somewhere the statement of 400K-700K bytes per second, so it would be infeasible for 10 worker nodes to serve 20K agents, right?

Regards

Hai

------------------ Original ------------------

From: "杨海"<hai....@magic-shield.com>;

Date: Tue, Aug 27, 2019 08:29 AM

To: "jesús sánchez de lechina tejada"<jesus....@wazuh.com>; "Wazuh mailing list"<wa...@googlegroups.com>;

Subject: 回复:Re: Inquiries about wazuh clusters

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/tencent_7F09304D1CE88D8D7BA01C07%40qq.com.

Jesús Sánchez de Lechina Tejada

unread,

Aug 28, 2019, 12:14:07 PM8/28/19

to Wazuh mailing list

Hi Hai,

Yes, the master node is the one in charge of managing agent registration and deletion, configuring agents grouping and workers' configuration (rules, decoders, and CDB list synchronization). Check it here: https://documentation.wazuh.com/current/user-manual/manager/wazuh-cluster.html#master

Nginx is known to be an appropriate load balancer for a wazuh cluster. Take a look on this blog for a deeper look into this topic: https://wazuh.com/blog/nginx-load-balancer-in-a-wazuh-cluster/

About the number of managers needed: It depends on the amount of traffic generated (for instance windows agents generate way more logs than Linux ones). Besides, it depends on the hardware properties of the managers themselves. We can show you an example of a working architecture:

Wazuh manager, Elasticsearch+Kibana and 2000 agents. Willing to retain data for three months.

• Wazuh manager node:
   - Wazuh manager
   - Filebeat
   - 4 cores, 16 GB of RAM, 200GB SSD disk

• Elasticsearch + kibana node:
   - Elastic Stack
   - 8 cores, 32GB of RAM minimum and 64 GB max, 400~500GB SSD disk

• Agents:
   - 2000 Agents using a load balancer as manager IP

This architecture should support 2000 agents, but there are several ways to improve the performance of this cluster: Adding more manager nodes (horizontally) and improving its properties (vertically, increasing the amount of RAM memory, CPUs and using Solid State Drives). One of the Wazuh's cluster advantages is that it can scale easily doing this, adjusting to your needs. Please, note that these data is an approximation, the actual number of agents depends on factors such as the kind of agent (linux/windows), the events per second generated or their configuration.

About the data loss prevention: If the fail didn't affect the database there shouldn't be a problem. To prevent data loss in case the database is affected you can have a backup of the database on a separate machine. Agents will keep sending information to a new manager node if the previous one fails. So some elements like Syscheck might duplicate alerts when the agent is connected to a new manager.

Hope this helps you out, ask again if you have any more doubts.

Regards,
Jesús

On Wednesday, August 28, 2019 at 5:39:17 AM UTC+2, 杨海 wrote:

Hi Jesus,

Add one more question, as each node (master or worker) has SQLite, is there any mechanism to prevent data loss in the database when one worker node fails?

As to the data quantity on each agent, I saw somewhere the statement of 400K-700K bytes per second, so it would be infeasible for 10 worker nodes to serve 20K agents, right?

Regards
Hai

------------------ Original ------------------
From: "杨海"<hai.yang@magic-shield.com>;

Date: Tue, Aug 27, 2019 08:29 AM

To: "jesús sánchez de lechina tejada"<jesus.sanchez@wazuh.com>; "Wazuh mailing list"<wa...@googlegroups.com>;

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/cb3406bd-67a6-4de5-8178-858ced76b852%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

杨海

unread,

Aug 28, 2019, 8:46:43 PM8/28/19

to jesús sánchez de lechina tejada, Wazuh mailing list

Hi Jesus

About the standard configuration of one manager, one elasticsearch+kibana, and 2000 agents(Linux only for us) , as you said, we can add 10 more worker nodes, 10 more elasticsearch nodes, and a dedicated kibana server to serve 20K agents.

About the traffic from each agent, the default leaky bucket and buffer seems limit it to 500 events per second. I assume the standard configuration for 2000 agents are based on this traffic. Correct?

As to the SQLite backup, like you said, it may not be necessary as the agents keep sending the new events and new database will then be populated. The only exception here might be the file hash of monitored folders.

Regards

Hai
----------

该邮件从移动设备发送

--------------原始邮件--------------
发件人："Jesús Sánchez de Lechina Tejada "<jesus....@wazuh.com>;

发送时间：2019年8月29日(星期四) 凌晨0:14

收件人："Wazuh mailing list" <wa...@googlegroups.com>;

主题：Re: Re: Inquiries about wazuh clusters
-----------------------------------

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/b288bc28-6757-4367-b5d6-c066eb95644e%40googlegroups.com.

Jesús Sánchez de Lechina Tejada

unread,

Aug 29, 2019, 1:07:54 PM8/29/19

to Wazuh mailing list

Hi Hai,

First, we must distinguish between the Wazuh cluster and the Elasticsearch cluster. Wazuh cluster will scale horizontally, being able to manage more agents by adding more managers to the cluster. The architecture we provided previously must be taken as a reference. As there are many factors that may affect the eventual number of managers that your system will require.

On the other hand, Elasticsearch clusters have a different behaviour. Every node on an Elasticsearch cluster keeps the same information, so adding more Elasticsearch nodes will not make it more suitable for dealing with bigger amounts of data. What it's recommended instead is to have several Elasticsearch clusters. You will be able to query all of your Elasticsearch clusters using the cross-cluster search feature and you will be able to distribute the information along the instances on your clusters using filebeat.

You are right about the leaky bucket. By default it is set to 500 EPS. This amount can be modified if needed, but be aware as this may affect on your network traffic. These calculations were based on that limit rather than the average EPS, the actual EPS does not tend to match with the limit (indeed that is a flood sign).

The SQLite databese stores information about agents, FIM data, rootcheck detected defects and static core settings (you can check it here in detail https://documentation.wazuh.com/current/user-manual/reference/daemons/wazuh-modulesd.html#wazuh-modulesd). If you are willing to keep that information you can do a backup for it. Alerts are stored on the wazuh manager (/var/ossec/logs/alerts) and subsequently sent to Elasticsearch. If you want to keep any of those then you can backup them as well.

Here I leave some links that may be of interest:

- Cross-cluster search: https://www.elastic.co/guide/en/elasticsearch/reference/7.3/modules-cross-cluster-search.html
- Sending data to different elasticsearch hosts with Filebeat: https://www.elastic.co/guide/en/beats/filebeat/5.4/elasticsearch-output.html#hosts-option

Please, keep trusting on us to answer your inquiries.

Kind regards,
Jesús

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/b288bc28-6757-4367-b5d6-c066eb95644e%40googlegroups.com.

杨海

unread,

Sep 1, 2019, 2:12:03 AM9/1/19

to jesús sánchez de lechina tejada, Wazuh mailing list

Hi Jesus

Thanks for prompt reply. That is definitely great help.

Additionally, we require automate the upgrading of wazuh software in 20K agents, how can master mode manage it elegantly?

Also, considering the high scale of deployment, is SQLite has the optimal performance to serve 20K agents? Or any other options that outperform, such as MariaDB, PostgreSQL?

We ever use Kafka as message queue. Between the agent and manager, is there such an option? If not, why?

Regards

Hai

--------------原始邮件--------------
发件人："Jesús Sánchez de Lechina Tejada "<jesus....@wazuh.com>;

发送时间：2019年8月30日(星期五) 凌晨1:07

收件人："Wazuh mailing list" <wa...@googlegroups.com>;

主题：Re: Re: Re: Inquiries about wazuh clusters
-----------------------------------

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/8dd676ab-4638-4094-8ef5-2ea46d7f7261%40googlegroups.com.

杨海

unread,

Sep 2, 2019, 10:11:45 AM9/2/19

to 杨海, jesús sánchez de lechina tejada, Wazuh mailing list

Hi Jusus,

I'd like to explain a little bit. Say we upgrade 20K agents, we might avoid to do it at the same time that lead to network storm.

Here I tried to figure out how big ES data nodes need to store alerts and events for 30 days from 20K agents.

30 * 24 * 3,600 * 500 * 20,000 is about 26T events based on the assumption of 500 EPS.

Obviously the calculation is misleading as not all data from agents are transferred to ES, some stored in SQLite, some are digested in analysisd.

Then what would be a realistic estimation so that we could make a plan of ES shards and replicas.

Regards

Hai

--------------原始邮件--------------
发件人："杨海 "<hai....@magic-shield.com>;
发送时间：2019年9月1日(星期天) 下午2:11
收件人："jesús sánchez de lechina tejada" <jesus....@wazuh.com>;"Wazuh mailing list" <wa...@googlegroups.com>;
主题：回复:Re: Re: Re: Inquiries about wazuh clusters
-----------------------------------

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/tencent_1AB891D73EA73828100DB050%40qq.com.

Jesús Sánchez de Lechina Tejada

unread,

Sep 9, 2019, 2:17:39 PM9/9/19

to Wazuh mailing list

Hi Hai,

Sorry that we could not reply earlier. Here I will be discussing your recent inquiries:

You can update an agent using the command line with /var/ossec/bin/agent_upgrade on your manager, check the option -h for a further explanation. Using this option with a script should allow you to do this, but be aware that every agent that is restarted after the upgrade will increase the traffic through your network, so you should regulate the throughput of the updates (e.g. a sleep every 15 agents upgraded). You might also use Kibana instead to do this thanks to wazuh API, but for such a huge number it is highly not recommended as agents can only be upgraded one at a time.

SQLite has been proved to work on high scale deployments with Wazuh. Issues related to the size of the deployment may concern Elastic Stack storage and server properties rather than wazuh's SQLite database.

The message queue that exists between the manager and its agents is the one native to Wazuh-OSSEC, it has some convenient features such as the anti-flood system (https://documentation.wazuh.com/3.9/user-manual/capabilities/antiflooding.html ). Therefore, we do not recommend other tools as they have not been tested and integrated with Wazuh-OSSEC. It is possible that the producer-consumer architecture of Kafka would result in a manager overload, with unpredictable results. If you are willing to, you could use Filebeat on your Wazuh Manager(s) instance(s) to send the events to Kafka (https://www.elastic.co/guide/en/beats/packetbeat/5.1/kafka-output.html ), but using Filebeat along with Elasticsearch is the most common configuration.

By default an Elasticsearch index takes 3 shards and no replicas. The amount of data that a shard is able to handle safely is 30G of data or less, a higher amount may lead to faulty behaviours. Wazuh creates an index every day, so you can use this information to adapt the amount of shards per index to your requirements. As we mentioned before, this is hard to estimate as two systems with the same number of agents may differ on a wide range. A healthy cluster should have that upper bound, so it could be a good idea to prepare the system for the worst possible case.

I hope this helps with your planning.

Regards,
Jesús

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/8dd676ab-4638-4094-8ef5-2ea46d7f7261%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/tencent_1AB891D73EA73828100DB050%40qq.com.

杨海

unread,

Sep 10, 2019, 2:27:00 AM9/10/19

to Jesús Sánchez de Lechina Tejada, Wazuh mailing list

Thanks Jesus. It has been really big help. On agent upgrade for 20K hosts, will submit a new ticket for further discussion.

Regards

Hai

------------------ Original ------------------

From: "Jesús Sánchez de Lechina Tejada"<jesus....@wazuh.com>;

Date: Tue, Sep 10, 2019 02:17 AM

To: "Wazuh mailing list"<wa...@googlegroups.com>;

Subject: Re: 回复:Re: Re: Re: Inquiries about wazuh clusters

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ce3a26b4-e30d-4889-aeb5-a210269911d7%40googlegroups.com.

Jesús Sánchez de Lechina Tejada

unread,

Sep 10, 2019, 10:14:42 AM9/10/19

to Wazuh mailing list

Hi,

It has been a pleasure to help you. I hope that you can take advantage of these recommendations. Best wishes on your upgrade.

Feel free to ask at any time if you have any doubts.

Best regards,
Jesús

On Tuesday, September 10, 2019 at 8:27:00 AM UTC+2, 杨海 wrote:

Thanks Jesus. It has been really big help. On agent upgrade for 20K hosts, will submit a new ticket for further discussion.

Regards
Hai

------------------ Original ------------------
From: "Jesús Sánchez de Lechina Tejada"<jesus.sanchez@wazuh.com>;

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ce3a26b4-e30d-4889-aeb5-a210269911d7%40googlegroups.com.

Reply all

Reply to author

Forward