Wazuh Index management - AWS S3

343 views
Skip to first unread message

Eric

unread,
Aug 19, 2021, 12:08:30 PM8/19/21
to Wazuh mailing list
Hello Wazuh Experts, 

I have a few questions about Index Backup Management on the Open Distro. Please can you explain more in detail regarding my concern? 

We are using Wazuh Master & Elastichsearch Cluster on the Ubuntu 20.04 LTS. Since the data grow up faster & data retain too long, we are facing retrieve the data has response time slow & storage has a space cost and a limit, AWS S3 it's the best solution to implement. 

I have configured all of the above successfully and have a bit confused with this guide. 


1/ I have created a Policy & Backup to AWS S3/Restore an index successfully.

es.png

2/ Now time to back to work on my question that I am unable to answer. 

I am facing the issue with   [1000]/[1000] maximum shards open 

Prod Evn: We are planning to triple the number of agents very soon, 1200 agents in six months. 
wazuh 4.1.5 (Wazuh Cluster (1 Master Node + 1 Worker Node)) +  opendistroforelasticsearch Cluster ( 1 Master Node + 1 Data Node)
400 agents + 20 logs files from routers
used ram 16/24
cpu load average low
used  hdd 700G/1TB

3/ As far I understand, we can fix this issue by adding more nodes to Elasticserach cluster & Increment the max shards per node ==> Do we have a formula or simulator to calculate the hardware system requirement & data node, increase the shards limit by nodes, and be careful with business requirements? If not, could you please suggest max shares per node and what max data node should I do?

4/ Do we have a solution/command query to monitor shard per node, ES Cluster, I'm considering with the Grafana Prometheus, not sure we can do that with a few simple command, that would be nice if you could point out to me. 

5/ How to query GET snapshot to grep index on the DevConsole. I'm looking for that. 

s3.png


I'm looking forward to hearing from you soon!

Regards,


mayte...@wazuh.com

unread,
Aug 20, 2021, 3:52:06 AM8/20/21
to Wazuh mailing list
Hi!,

Unfortunately there is no one perfect formula for estimating the number of shards per node. However, there are a few tips that can help us:
  • Aim to keep the average shard size between at least a few GB and a few tens of GB in order to reduce overhead.
  • Avoid very large shards. It can negatively affect the cluster's ability to recover from failure. A shard size of 50GB is often quoted as a limit
  • The number of shards per node must be below 20 shards per GB heap. The number of shards you can hold on a node will be proportional to the amount of heap you have available.

I would keep shards between 20GB and 40GB in size. If your shards are smaller than this size, applying this rule can also help you to reduce the number of maximum shards open.


You can use the following query to monitor the shards per node: curl -k -u <user>:<pass> "https://localhost:9200/_cat/allocation?pretty&v&h=shards,node&s=node"
(GET _cat/allocation?pretty&v&h=shards,node&s=node on the DevConsole)

Regarding your question about GET snapshot to grep index on the DevConsole, I am not aware of any query parameters that allow filtering by the index name (I have checked the GET snapshot API and the cat snapshots API). 
Perhaps it can be achieved in some way, the Elastic team might be able to advise you further on this subject.

 I hope it helps.

Best regards,
Mayte Ariza.

Eric

unread,
Aug 20, 2021, 4:19:11 AM8/20/21
to mayte...@wazuh.com, Wazuh mailing list

Thank you for sending me the information about my question on this case. I learned all of your answers.

 

Your insights and summary are beneficial.

 

Best regards,

 

2/ Now time to back to work on my question that I am unable to answer. 

 

I am facing the issue with   [1000]/[1000] maximum shards open 

 

Prod Evn: We are planning to triple the number of agents very soon, 1200 agents in six months. 

wazuh 4.1.5 (Wazuh Cluster (1 Master Node + 1 Worker Node)) +  opendistroforelasticsearch Cluster ( 1 Master Node + 1 Data Node)

400 agents + 20 logs files from routers
used ram 16/24
cpu load average low

used  hdd 700G/1TB

 

3/ As far I understand, we can fix this issue by adding more nodes to Elasticserach cluster & Increment the max shards per node ==> Do we have a formula or simulator to calculate the hardware system requirement & data node, increase the shards limit by nodes, and be careful with business requirements? If not, could you please suggest max shares per node and what max data node should I do?

 

4/ Do we have a solution/command query to monitor shard per node, ES Cluster, I'm considering with the Grafana Prometheus, not sure we can do that with a few simple command, that would be nice if you could point out to me. 

 

5/ How to query GET snapshot to grep index on the DevConsole. I'm looking for that. 

 

 

 

I'm looking forward to hearing from you soon!

 

Regards,

 

 

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/790655fc-f141-48b7-80a0-dfd093607627n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages