Retention policies not applied anymore

Franck Ehret

unread,

Jun 13, 2024, 3:18:39 AM6/13/24

to Wazuh | Mailing List

Hi there,

This morning, I found a crashed Wazuh 😁

I tried to restart Dashboard service but it indicated me the max of shards was reached (which is strange, I had 1500 since a while)

{"type":"log","@timestamp":"2024-06-13T05:39:32Z","tags":["error","opensearch","data"],"pid":5442,"message":"[validation_exception]: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1499]/[1500] maximum shards open;"}

I checked the health and active shards and noticed that I had a lot of 2022 shards.

I increased max shards temporarily to gain access to the GUI again and deleted all 2022 indices.

Now, I'm back to a normal shards amount and lowered the max back to initial 1000 but I still have a problem: retention policies doesn't seems to apply since a while anymore.

I have 119 policy managed indices where I have 690 indices in total. Apparently, policies stopped to apply the 1st of April 2023 (good joke!)

I can't relate this to any crash.

Here is one of my policies (I have similar for each kind of indices):

{
"id": "xxxxx_statistics_retention",
"seqNo": 1595,
"primaryTerm": 6,
"policy": {
"policy_id": "xxxxx_statistics_retention",
"description": "Wazuh index state management for OpenDistro to move indices into a cold state after 3 months and delete them after a year.",
"last_updated_time": 1656601320673,
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"replica_count": {
"number_of_replicas": 0
}
}
],
"transitions": [
{
"state_name": "cold",
"conditions": {
"min_index_age": "92d"
}
}
]
},
{
"name": "cold",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"read_only": {}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "366d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"delete": {}
}
],
"transitions": []
}
],
"ism_template": [
{
"index_patterns": [
"wazuh-statistics*"
],
"priority": 100,
"last_updated_time": 1656229281151
}
]
}
}

Where can I start looking?

Thanks in advance for you help!

PS: my system is back to business, but would be better to fix this no? 😊

John E

unread,

Jun 13, 2024, 6:32:53 AM6/13/24

to Wazuh | Mailing List

Hi Franck,

I would vote to understand the root cause. and then deciding to fix it is up to you 😊.

First, we need to ensure the cluster has enough resources (CPU, memory, disk space) to handle ISM tasks. Overloaded resources can prevent ISM policies from executing properly.
Also, it did be great to check the overall health of your clusters.

you can do that by:

GET _cluster/health

Let me know, what you find.

Regards.

Franck Ehret

unread,

Jun 13, 2024, 6:55:25 AM6/13/24

to Wazuh | Mailing List

Hi there,

So I've 3 VMs, one for dashboard, one for manager and one for indexer.

The indexer has 16 GB of memory, 4 CPUs and more than 100GB of free disk so hopefully, it's not an issue.

I've seen a big peak of CPU earlier this morning, but this was probably the upgrade to 4.8, that is when I lost the service, but usually, the CPU is pretty calm (it's my own private infra/lab)

This is the result of GET _cluster/health:

{

"cluster_name" : "wazuh-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 721,
"active_shards" : 721,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Now I have 721 shards when I had 1499 this morning (limit was 1500 so it was the reason why dashboard didn't restart)

I put all policies in place to keep "only" one year of data and avoid those crashes 😊

If you need any log, let me know which ones. Thanks in advance 😉

Franck

John E

unread,

Jun 13, 2024, 7:59:57 AM6/13/24

to Wazuh | Mailing List

Hello Franck,

Great specs you got there, now I am convinced that's not the problem.

Now, we can take a look at some OpenSearch logs, especially on April 1.

grep "2023-04-01" opensearch.log | grep -i "error"

grep "2023-04-01" opensearch.log | grep -i "warning"

Regards

Franck Ehret

unread,

Jun 13, 2024, 8:23:01 AM6/13/24

to Wazuh | Mailing List

Hi,

Sorry, I couldn't spot the opensearch.log file on my system, maybe you could quickly tell me where it should be?

Thanks 😉

John E

unread,

Jun 13, 2024, 2:39:07 PM6/13/24

to Wazuh | Mailing List

Hello Franck,

It simply means you dont have it as a separate component.

I will be escalating this to the dashboard team. I will come back with a response from them.

Regards.

John E

unread,

Jun 18, 2024, 1:00:06 PM6/18/24

to Wazuh | Mailing List

Hello Frank,

So sorry for the late reply, had issues with my computer.

I tried testing your retention policy, but its plagued with errors, so I would suggest recreating the retention policy.

Also, you can achieve same result using cronjobs.

# crontab -e 0 0 * * * find /var/ossec/logs/alerts/ -type f -mtime +365 -exec rm -f {} \; 0 0 * * * find /var/ossec/logs/archives/ -type f -mtime +365 -exec rm -f {} \;

Regards.

Franck Ehret

unread,

Jun 20, 2024, 5:00:29 AM6/20/24

to Wazuh | Mailing List

Hello,

I've an issue, because I can't remove the policies.

When I try to delete a policy I get this message (but it makes sense if the policy is still assigned):

Failed to delete the policy, [cluster_block_exception] index [.opendistro-ism-config] blocked by: [FORBIDDEN/8/index write (api)];

And if I try to remove policy from indices, I get that:

[index_management_exception] Failed to clean metadata for remove policy indices.

How should I proceed? Thanks in advance

Kind regards

Franck

John E

unread,

Jun 20, 2024, 6:55:36 AM6/20/24

to Wazuh | Mailing List

Hello Franck,

There are several reasons for this.

which can be permission issues, or a locked index.

i will guess its the latter so to unlock the index you can follow below steps.

Regards.

Franck Ehret

unread,

Jun 21, 2024, 8:11:12 AM6/21/24

to Wazuh | Mailing List

Hello John,

Even after starting the command (was same result as yours), I get the following when I try to remove the policy from indices:

[index_management_exception] Failed to clean metadata for remove policy indices.

If I try to remove the policy itself, I get this

Could not delete the policy "xxx_statistics_retention" : [cluster_block_exception] index [.opendistro-ism-config] blocked by: [FORBIDDEN/8/index write (api)];

And if I try to create a new policy from scratch (using same values), I get this:

Failed to create policy: [cluster_block_exception] index [.opendistro-ism-config] blocked by: [FORBIDDEN/8/index write (api)];

Anything else I can try? Thx in advance

PS: I use the admin user

Kind regards

Franck

Message has been deleted

John E

unread,

Jun 24, 2024, 8:54:15 AM6/24/24

to Wazuh | Mailing List

Hello Franck,

I was able to find a similar issue being discussed here that i think might be helpful.

Regards.

Franck Ehret

unread,

Jun 24, 2024, 9:09:44 AM6/24/24

to Wazuh | Mailing List

Hi John,

That might be the issue because I had disk almost full because policies were not working anymore. :-)

I did increase the partition on my VM and deleted some indexes in the meantime.

Can you help me spot the ones that would be locked?

(is there any command to list them)

Thanks and kind regards

Franck

John E

unread,

Jun 25, 2024, 6:58:43 AM6/25/24

to Wazuh | Mailing List

Hello Franck,

Below are some ElasticSearch commands to help troubleshoot.

Get all indexes and their settings (block status is included):

GET _all/_settings

Directly get the status:

GET _all/_settings/index.blocks.write?pretty

Regards.

Franck Ehret

unread,

Jul 19, 2024, 9:26:16 AM7/19/24

to Wazuh | Mailing List

Hello,

The second command gave me one index locked and it was .opendistro-job-scheduler-lock

After launching the following command, I could confirm, none was locked anymore:

PUT /.opendistro-job-scheduler-lock/_settings
{
"index.blocks.read_only_allow_delete": null,
"index.blocks.write": null
}

But this didn't change the result unfortunately. Trying to create very simple policy failed the same way, same for removing a policy from an existing index.

Any clue what to do next?

Thanks in advance

Franck

Franck Ehret

unread,

Jul 19, 2024, 10:03:14 AM7/19/24

to Wazuh | Mailing List

PS: I tried deleting all indices still managed by policies. No problem by deleteing...

Normally I keep a year of data but in my case, I wanted to see if it solves the issue.

Unfortunately, it didn't solve anything: I can't remove a policy nor create a new one.

Error messages are the same as before:

Create:

Failed to create policy: [cluster_block_exception] index [.opendistro-ism-config] blocked by: [FORBIDDEN/8/index write (api)];

Delete:

Could not delete the policy "xxx_statistics_retention" : [cluster_block_exception] index [.opendistro-ism-config] blocked by: [FORBIDDEN/8/index write (api)];

Thanks 🙏

Franck Ehret

unread,

Jul 19, 2024, 12:23:53 PM7/19/24

to Wazuh | Mailing List

Well, I had the answer written in the error message. I did launch the same command with .opendistro-ism-config index and it solved the issue, my policies started to apply again without any further action (so they were OK since the beginning I guess).

PUT /.opendistro-ism-config/_settings
{
"index.blocks.read_only_allow_delete": null,
"index.blocks.write": null
}

The weird thing is that the command used to display any block was not displaying this index so if you have an answer...

This issue is closed, thanks for the help, you put me on the track to solve it myself! 😉