Hello Filip,
I apologize for the delay in replying,
Answering your questions:
1.- "That still does not clarifies to me where the limit of 1000 shards comes from. I guess it has something to do with java heap and memory limit."
I start with the definition commented on the shared blog: "Data in Elasticsearch is organized into indices. Each index is made up of one or more shards. Each shard is an instance of a Lucene index, which you can think of as a self-contained search engine that indexes and handles queries for a subset of the data in an Elasticsearch cluster." An index can be divided into different shards to distribute the information between nodes (when there is more than 1 Elasticsearch node). In this case, by only having one this is not necessary. And this limit is associated with avoiding performance problems in the Elasticsearch cluster.
2.- "modify the value from 3 (By default) to 1: ""index.number_of_shards ":"1""
-> If I understand correctly, this will change from 3 shards to one per day which will limit my total per year to 365 shards for wazuh alerts :
wazuh-alerts-4.x-2021.07.07 1 p STARTED 27854 42.6mb 127.0.0.1 node-1
wazuh-alerts-4.x-2021.07.07 2 p STARTED 27706 42mb 127.0.0.1 node-1
wazuh-alerts-4.x-2021.07.07 0 p STARTED 28031 43mb 127.0.0.1 node-1
Did I get it right ?" Yes, it will change that they are generated instead of 3 to 1 shard per day and it will generate fewer shards over time.
3.- "Do you know the reasoning behind "index.number_of_shards": "3" as default ?" Because this configuration is intended for a three-node Elasticsearch cluster. By default, it is the most common option to have this structure for cluster functionality reasons.
4.- "Is it needed for huge deployments to keep the shard size smaller ? or it is required when you have more nodes ?"
In the blog, this point is mentioned and it is because the larger the size/quantity of shards, the more time the cluster will take to process the information and therefore affect the performance.
Here the reference from the Blog about it:
How does shard size affect performance?
In Elasticsearch, each query is executed in a single thread per shard. Multiple shards can however be processed in parallel, as can multiple queries and aggregations against the same shard.
This means that the minimum query latency, when no caching is involved, will depend on the data, the type of query, as well as the size of the shard. Querying lots of small shards will make the processing per shard faster, but as many more tasks need to be queued up and processed in sequence, it is not necessarily going to be faster than querying a smaller number of larger shards. Having lots of small shards can also reduce the query throughput if there are multiple concurrent queries.
I hope this information helps. Please let me know if you have any other questions!
Regards,
Alexander Bohorquez