Hello Fawwas,
I apologize for the delay,
It seems the issue is related to the maximum shards limit reached. Based on that, I'll recommend the following:
There are two possible solutions:
- Increase the shards limit.
- Reduce the number of shards.
Option 1 will quickly solve the solution but it is not advisable for the long run as it will bring more problems in the future. However, this guide will explain how to do it in case it is needed.
The following setting is the one responsible for this limit:
cluster.routing.allocation.total_shards_per_nodeIt is possible to change the setting using the Wazuh-indexer API. You can either use the Dev tools option within the management section in the Wazuh Dashboard:
PUT _cluster/settings
{
"persistent" : {
"cluster.routing.allocation.total_shards_per_node" : 1200
}
}
or curl the API directly from a terminal:
curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
"persistent" : {
"cluster.routing.allocation.total_shards_per_node" : 1200
}
}
'
Querying the API from the terminal requires to use of credentials to authenticate. It is necessary to specify the IP of the service too.
These settings impose a hard limit which can result in some shards not being allocated. Use with caution.
Although, I'll recommend the Option 2:
Reaching the limit of shards means there are no retention policies applied to the environment. This could lead to storing the data forever and cause failure in the system.
To reduce the number of shards, it is necessary to delete old indices. It is necessary to check what are the indices stored in the environment, the following API call can help:
GET _cat/indices
Then, it is necessary to delete indices that are not needed or older indices. Bear in mind that this cannot be retrieved unless there are backups of the data either using snapshots or Wazuh alerts backups.
The API call to delete indices is:
DELETE <index_name>
By deleting indices, you will free up shards and the cluster will have more space to continue allocating indices.
Prevention
The next step is to avoid this from happening again, for that reason, it is necessary to guide through the complete resolution of the issue.
Normally, this can happen in a single-one installation as the Wazuh template is configured to use 3 shards per index. The first thing should be to clarify and understand the architecture and the retention policy of the environment.
1. Change the number of shards according to the infrastructure. Although the most optimal configuration of shards per index will be addressed in a different article, a good rule of thumb would be 1 shard per node. However, 3 shards should be the maximum, from that point it is necessary to analyze the number of shards accordingly.
To change the number of shards, edit the /etc/filebeat/wazuh-template.json file.
"settings": {
"index.refresh_interval": "5s",
"index.number_of_shards": "3",
Then, it is necessary to reload the template and restart the Filebeat service:
# filebeat setup --index-management
# systemctl restart filebeat.
Reduce the number of shards and replicas, if no of the rest of the indices. For instance, it is possible to reduce the shards and replicas from the Wazuh UI settings. It will be necessary to analyze each case.
Set up a retention policy. This can be achieved using index policies as explained here: https://wazuh.com/blog/wazuh-index-management/
I hope this information helps. Please let me know how it goes!