Wazuh Cluster Indexer

1,276 views
Skip to first unread message

Nepolean

unread,
May 22, 2023, 8:09:01 AM5/22/23
to Wazuh mailing list
Hi All,

I have a small doubt on Wazuh cluster set up. Let's say I have two indexers (one master and one worker) in my cluster. 

1. My doubt is that whether these two indexers will be mirrored or not. 
2. How the two indexer nodes work in a given point of time. Will they work at the same time ?

Thanks
Nepolean

Marcos Darío Buslaiman

unread,
May 22, 2023, 9:49:09 AM5/22/23
to Wazuh mailing list

Hi
Thanks for using Wazuh!

First of all, I would like to briefly comment on what Wazuh's architecture is like.
The Wazuh agents running on endpoints send events to Wazuh Manager, who analyzes the events, applies rules, and triggers alerts, among other things, then we have Filebeat that sends these events to be stored by the Wazuh Indexer.
The Wazuh indexer cluster is a collection of one or more nodes that communicate with each other to perform read and write operations on indices.
https://documentation.wazuh.com/current/getting-started/architecture.html
https://documentation.wazuh.com/current/getting-started/components/wazuh-indexer.html

deployment-architecture1.png

Regarding your first question, the indexes that store the information can be split into multiple segments called shards.
Each shard is in itself a fully functional and independent "index" that can be hosted on any node in the cluster. 
The splitting is important for two main reasons:

* Horizontal escalation.
* Distribute and parallelize operations across shards, increasing the performance and throughput.

In addition, you can make one or more copies of the index shards in what are called replica shards, or replicas for short. 
Replication is important for two main reasons:

* It provides high availability in case a shard or node fails.
* It allows search volume and throughput to scale since searches can be executed on all replicas in parallel.

Here it is important to mention that:
The number of shards and replicas can be defined per index at the time of their creation. Once the index is created, the number of replicas must be changed dynamically, whereas the number of fragments cannot be changed afterward.

How many shards should an index have?
As it is not possible to "reshard" (changing the number of shards) without reindexing, careful consideration should be given to how many shards will be needed before creating the first index. The number of nodes in the installation will influence the number of shards to be planned. In general, the most optimal performance will be realized by using the same number of shards as nodes. Thus, a cluster with three nodes should have three shards, while a cluster with one node would only need one shard.

How many replicas should an index have?
Here is an example of how a cluster with three nodes and three shards could be set up:

No replica: Each node has one shard. If a node goes down, an incomplete index of two fragments will remain.

One replica: Each node has one shard and one replica. If a node goes down, a full index will remain.

Two replicas: Each node has one shard and two replicas (the full index). With this setup, the cluster can continue to operate even if two nodes go down. Although this seems to be the best solution, it increases the storage requirements.

So, to have the indices mirrored you will need to configure the number of replicas to 1.
Changing the number of replicas
The number of replicas can be changed dynamically using the Elasticsearch API. In a cluster with one node, the number of replicas should be set to zero:

# curl -X PUT "http://localhost:9200/wazuh-alerts-\*/_settings?pretty" -H 'Content-Type: application/json' -d'
{
  "settings" : {
    "number_of_replicas" : 0
  }
}'


Regarding your second question, the indexer node can be configured in different ways that you can be defined, by default they work in parallel they Manage the overall operation of a cluster, and keeps track of the cluster state. This includes creating and deleting indexes and allocating shards to nodes. Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards.

Here you will have more information about this:
 https://opensearch.org/docs/latest/tuning-your-cluster/index/

Please just let us know for any doubts or comments.

Best regards,
Marcos 

Nepolean

unread,
May 23, 2023, 8:12:48 AM5/23/23
to Wazuh mailing list
Thank you Marcos for a very detailed answer. I got some idea about the setup. Likes to know more. You said about sharding, does it happen  automatically as more and more date come or is the number of shards is fixed? from your answer it is fixed by us, but I have seen shards to be increasing upto 1000 in a single node wazuh indexer. What is that 1000 limit? What you were saying was I can limit number of shards from 1000 to 1.. right?

Thanks
Nepolean

Marcos Darío Buslaiman

unread,
May 30, 2023, 11:34:18 AM5/30/23
to Wazuh mailing list
Hi Napolean,
Sorry for the delay, regarding your questions, the limitation of the number of shards is defined by elasticsearch in order to prevent the failure of the server, by default, the number of shards is 1000 per node which means an infrastructure with 1 node can handle 1000 shards but 3 nodes can handle up to 3000 shards.
This limit of shards can be reached, for example, when you have 2 shards by each index and you have 500 indices, therefore the recommendation to avoid reaching this limit is to configure index retention policies, here you will find more information about this https://wazuh.com/blog/wazuh-index-management/
Also, I will recommend these articles about these topics:
Please, let me know if you have any other questions.
Regards
Reply all
Reply to author
Forward
0 new messages