No more data since almost a week

680 views
Skip to first unread message

Franck Ehret

unread,
Jun 25, 2022, 6:12:41 PM6/25/22
to Wazuh mailing list
Hi there,

Please help as soon as you can, I have no data in my system since a few days now (is working well for months):

Capture d’écran 2022-06-26 à 00.09.20.png

Problem might be because of a reboot or because of Indices policies (see under)

My system has 3 servers as follow:
- srv01 - Frontend
- srv03 - Wazuh Manager
- srv05 - Elastic Search Open Distro

I know alerts are sent to manager, I've checked the logs and agents are active.

I started looking on elastic search side and I found the following :

I have unassigned shards
GET _cluster/health?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 810,
  "active_shards" : 810,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 189,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 81.08108108108108
}


It's probably due to my policies as I took the sample, I have (had - I corrected it in the meantime to 0) 
"name": "hot", "actions": [ { "replica_count": { "number_of_replicas": 0 } }

What I also did wrong : I assigned the alerts policy to another Indices. It probably didn't arrange data but I saw duplicates when I run :

GET _cat/shards?h=index,shard,prirep,state,unassigned .reason,node -s

I get the following (extract) :
security-auditlog-2022.02.07                            0 p STARTED
security-auditlog-2022.02.07                            0 r UNASSIGNED
security-auditlog-2022.06.17                            0 p STARTED
security-auditlog-2022.06.17                            0 r UNASSIGNED
security-auditlog-2022.01.10                            0 p STARTED
security-auditlog-2022.01.10                            0 r UNASSIGNED
security-auditlog-2022.06.09                            0 p STARTED
security-auditlog-2022.06.09                            0 r UNASSIGNED
security-auditlog-2022.03.20                            0 p STARTED
security-auditlog-2022.03.20                            0 r UNASSIGNED
wazuh-alerts-4.x-2022.04.06                             2 p STARTED
wazuh-alerts-4.x-2022.04.06                             1 p STARTED
wazuh-alerts-4.x-2022.04.06                             0 p STARTED
wazuh-alerts-4.x-2022.03.13                             2 p STARTED
wazuh-alerts-4.x-2022.03.13                             1 p STARTED
wazuh-alerts-4.x-2022.03.13                             0 p STARTED
security-auditlog-2022.03.17                            0 p STARTED
security-auditlog-2022.03.17                            0 r UNASSIGNED
wazuh-alerts-4.x-2022.01.28                             2 p STARTED
wazuh-alerts-4.x-2022.01.28                             1 p STARTED
wazuh-alerts-4.x-2022.01.28                             0 p STARTED
wazuh-alerts-4.x-2022.03.04                             2 p STARTED
wazuh-alerts-4.x-2022.03.04                             1 p STARTED
wazuh-alerts-4.x-2022.03.04                             0 p STARTED
wazuh-monitoring-2021.52w                               0 p STARTED
security-auditlog-2022.04.11                            0 p STARTED
security-auditlog-2022.04.11                            0 r UNASSIGNED
security-auditlog-2022.03.15                            0 p STARTED
security-auditlog-2022.03.15                            0 r UNASSIGNED
security-auditlog-2022.04.21                            0 p STARTED
security-auditlog-2022.04.21                            0 r UNASSIGNED 


So now, how can I get rid of those / solve the problem ?

Thanks in advance, I've already struggled a few hours to get those infos, but I don't know what are the next steps.

Kind regards

Franck Ehret

unread,
Jun 26, 2022, 9:29:32 AM6/26/22
to Wazuh mailing list
Well some update : at some point, I managed to get that back in business. 
If there is a way to retrieve the missing data, let me know.

To solve that (I hope it's the right way), I set up 3 policies for the 3 "main" indices wazuh-alerts*, security-auditlog* and wazuh-statistics* - all 3 based on this one (only the index pattern was changed):

{
    "policy": {
        "description": "Wazuh index state management for OpenDistro to move indices into a cold state after 2 months and delete them after a year.",
        "default_state": "hot",
        "states": [
            {

                "name": "hot",
                "actions": [
                    {
                        "replica_count": {
                            "number_of_replicas": 1
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "cold",
                        "conditions": {
                            "min_index_age": "61d"
                        }
                    }
                ]
            },
            {
                "name": "cold",
                "actions": [
                    {
                        "read_only": {}
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "366d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
       "ism_template": {
           "index_patterns": ["wazuh-alerts*"],
           "priority": 100
       }
    }
}


First : I would like to have your opinion : are those policies "OK" for all 3 indices? I'm open to all suggestion (except retention time)

Second : I still have some unassigned shards (9) and when I look at the list, almost all the .opendistro-* have a unassigned replica :

.opendistro-reports-definitions                         0 p STARTED
[...]
.opendistro-ism-managed-index-history-2022.06.16-000003 0 p STARTED
.opendistro-ism-managed-index-history-2022.06.16-000003 0 r UNASSIGNED
[...]
.opendistro-ism-managed-index-history-2022.06.19-000006 0 p STARTED
.opendistro-ism-managed-index-history-2022.06.19-000006 0 r UNASSIGNED
[...]
.opendistro-ism-config                                  0 p STARTED
.opendistro-ism-config                                  0 r UNASSIGNED
[...]
.opendistro-ism-managed-index-history-2022.06.17-000004 0 p STARTED
.opendistro-ism-managed-index-history-2022.06.17-000004 0 r UNASSIGNED
[...]
.opendistro-ism-managed-index-history-2022.06.18-000005 0 p STARTED
.opendistro-ism-managed-index-history-2022.06.18-000005 0 r UNASSIGNED
[...]
.opendistro-ism-managed-index-history-2022.06.26-000007 0 p STARTED
.opendistro-ism-managed-index-history-2022.06.26-000007 0 r UNASSIGNED
[...]
.opendistro-job-scheduler-lock                          0 p STARTED
.opendistro-job-scheduler-lock                          0 r UNASSIGNED
[...]
.opendistro-ism-managed-index-history-2022.06.15-000002 0 p STARTED
.opendistro-ism-managed-index-history-2022.06.15-000002 0 r UNASSIGNED
[...]
.opendistro-ism-managed-index-history-2022.06.14-1      0 p STARTED
.opendistro-ism-managed-index-history-2022.06.14-1      0 r UNASSIGNED

How can I "fix" that ? If there is something to fix of course...

Thanks in advance & kind regards
Franck

Franck Ehret

unread,
Jun 28, 2022, 7:31:20 AM6/28/22
to Wazuh mailing list
Hi there,

I think this topic was left behind. I'd like to have some advice about indices policies (2nd message) and crosscheck if I did everything right.

Thx in advance! 😉

Kind regards
Franck

Juan Carlos Tello

unread,
Jul 4, 2022, 7:08:16 AM7/4/22
to Franck Ehret, Wazuh mailing list
Hello Franck,
Sorry we missed your message last week.
From what I understand you only have 1 machine running Elasticsearch, in this case no replicas should be configured as the system cannot allocate them to an independent system.  You may make a request to set the number of replicas to 0 to those indices, however it is important to note that this will not affect the operation of your environment as the end result is that no replicas will exist.

As for the policy, it is a good policy to achieve your goal.

Finally, you asked how you may recover information that was not indexed during the indexer's down-time and yes, you may follow this guide: https://wazuh.com/blog/recover-your-data-using-wazuh-alert-backups/

In it you copy the recovery.py script to your Wazuh manager, configure Filebeat to read the /tmp/recovery.json file and execute the script to gradually fill the file with the events observed during the downtime.

I hope this helps,
Best Regards,
Juan C. Tello

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/c8b4c559-d60b-4a42-a690-0d93691e6c34n%40googlegroups.com.

Franck Ehret

unread,
Aug 9, 2022, 2:21:45 PM8/9/22
to Wazuh mailing list
Hi,

HELP, it happened again but I don't know where to start... :-/

elasticsearch log is full of these errors (replaced IP with a generic chain)

[2022-08-09T20:17:20,252][ERROR][c.a.o.s.a.s.InternalESSink] [srv05] Unable to index audit log {"audit_cluster_name":"elasticsearch","audit_transport_headers":{"_system_index_access_allowed":"false"},"audit_node_name":"srv05","audit_trace_task_id":"oo0vvwkjQQqy6uUNeI9C9w:5797","audit_transport_request_type":"CreateIndexRequest","audit_category":"INDEX_EVENT","audit_request_origin":"REST","audit_request_body":"{}","audit_node_id":"oo0vvwkjQQqy6uUNeI9C9w","audit_request_layer":"TRANSPORT","@timestamp":"2022-08-09T18:17:20.211+00:00","audit_format_version":4,"audit_request_remote_address":"srv03-IP","audit_request_privilege":"indices:admin/auto_create","audit_node_host_address":"srv05-IP","audit_request_effective_user":"admin","audit_trace_indices":["<wazuh-alerts-4.x-{2022.08.09||/d{yyyy.MM.dd|UTC}}>"],"audit_node_host_name":"srv05-IP"} due to org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
        at org.elasticsearch.indices.ShardLimitValidator.validateShardLimit(ShardLimitValidator.java:80) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.aggregateIndexSettings(MetadataCreateIndexService.java:765) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV1Templates(MetadataCreateIndexService.java:489) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:370) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:377) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.action.admin.indices.create.AutoCreateAction$TransportAction$1.execute(AutoCreateAction.java:137) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.10.2.jar:7.10.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]

curl -XGET http://localhost:9200/_status command on elasticsearch node give me 
curl: (52) Empty reply from server

Where do we start? Thx in advance

Kind regards
Franck

Juan Carlos Tello

unread,
Aug 11, 2022, 4:34:19 AM8/11/22
to Franck Ehret, Wazuh mailing list
Hello Franck,
The key message is:
this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open
Although it is configurable, the default maximum and recommended number of maximum shards per elasticsearch or Wazuh indexer node is 1000.

The way to resolve this is to close or delete order indices and to ensure an Index Lifecycle Policy is in place that takes into account your log retention needs to avoid having this happen again in the future.

If you're still able to log onto the web interface, you may go into Dev Tools to close old indices which will enable the most recent events to be indexed again by going into Index Management by running a request such as:
POST wazuh-alerts-4.x-2021.08.*/_close
(This would close all alerts indices from August 2021).
You may also directly delete indices by running
DELETE wazuh-alerts-4.x-YYYY.MM.DD
Where YYYY.MM.DD must be replaced by the day(s) you wish to delete.

By default the wazuh-alerts are created daily with 3 shards, wazuh-monitoring and wazuh-statistics are created weekly with 1 shard each, so after roughly 2 years the 1000 shards limits will be reached in an environment with a single indexer node. If you wish to have a longer retention period then it is important to modify the configuration for the number of shards or configure a lifecycle policy which rolls over indices into weekly or monthly indices.

Let us know if you have any more questions,
Best Regards,
Juan Carlos Tello

Franck Ehret

unread,
Aug 11, 2022, 6:01:28 AM8/11/22
to Wazuh mailing list
Hi Juan,

Thanks a lot, for now, I've increased the shards to 1500 to unstuck the situation and it is getting events again, but I'd like to have a permanent solution :-)
My environment is a "big" home lab, I've approx 60 agents with all machines.

But there is a few things I dont get:
- The retention : "By default the wazuh-alerts are created daily with 3 shards, wazuh-monitoring and wazuh-statistics are created weekly with 1 shard each, so after roughly 2 years the 1000 shards limits will be reached in an environment with a single indexer node. If you wish to have a longer retention period then it is important to modify the configuration for the number of shards or configure a lifecycle policy which rolls over indices into weekly or monthly indices.

For now, I've since the last time issue (end of june) the following policies for all the wazuh indexes, one for each (statistics, security auditlog, monitoring and alerts), all 4 share have the same settings as followed:

{
    "policy_id": "statistics_retention",
    "description": "Wazuh index state management for OpenDistro to move indices into a cold state after 3 months and delete them after a year.",

    "default_state": "hot",
    "states": [
        {
            "name": "hot",
            "actions": [
                {
                    "replica_count": {
                        "number_of_replicas": 0

                    }
                }
            ],
            "transitions": [
                {
                    "state_name": "cold",
                    "conditions": {
                        "min_index_age": "92d"

                    }
                }
            ]
        },
        {
            "name": "cold",
            "actions": [
                {
                    "read_only": {}
                }
            ],
            "transitions": [
                {
                    "state_name": "delete",
                    "conditions": {
                        "min_index_age": "366d"
                    }
                }
            ]
        },
        {
            "name": "delete",
            "actions": [
                {
                    "delete": {}
                }
            ],
            "transitions": []
        }
    ],
    "ism_template": {
        "index_patterns": [
            "wazuh-statistics*"
        ],
        "priority": 100,
      }
    }
}

As you can see, I'd like to keep a year of data but keep "hot data" for the last 3 months and 1000 shards should be more than enough. But maybe thoses policies are not OK and create too much shards. 
Would you have a policy suggestion to reduce shards amount?

- The "system" indexes : I've seen that .opendistro-job-scheduler-lock and .opendistro-ism-managed-index-history have replicas but I don't know how to set a policy to remove them and if I remember right, I wasn't able to create one.
What would you suggest for them ? I don't plan to add a second node, I think it's overkill for my environment.

In advance, many thanks :-)

Juan Carlos Tello

unread,
Aug 12, 2022, 6:23:37 AM8/12/22
to Franck Ehret, Wazuh mailing list
Hi Franck,

It may be that the version installed on your environment had a different amount of default shards for the statistics, monitoring and audit indices resulting in a total of more than 1000 shards. I recommend verifying through the Index Management app in the web interface what the distribution of shards is in existing indices.

The policy you have shared does not reduce the amount of shards after transitioning to the cold state. This can be achieved with the shrink action. Furthermore, by creating a rollover policy you may have multiple daily indices be stored in only one with the appropriate amount of shards. A good guide on how to determine the recommended amount of shards can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#shard-size-recommendation

I hope you find this helpful, be sure to let us know if you have any other questions.
Best Regards,
Juan C. Tello

Franck Ehret

unread,
Aug 16, 2022, 2:31:22 AM8/16/22
to Wazuh mailing list
Hi Juan,

I've tried to add a warm state because you are right : most of my indexes are using 3 shards so my idea was to add a warm state in between to reduce the amount of shards after 30 days.
But it gives me the error [illegal_argument_exception] Invalid field: [shrink] found in Action.
I don't understand because shrink should an action right?

The state code:
{
    "policy_id": "statistics_retention",
    "description": "Wazuh index state management for OpenDistro to move indices into a cold state after 3 months and delete them after a year.",

    "default_state": "hot",
    "states": [
        {
            "name": "hot",
            "actions": [
                {
                    "replica_count": {
                        "number_of_replicas": 0

                    }
                }
            ],
            "transitions": [
                {
                    "state_name": "warm",
                    "conditions": {
                        "min_index_age": "32d"

                    }
                }
            ]
        },
            {
                "name": "warm",
                "actions": [
                    {
                        "shrink": {
                            "num_new_shards": 1,
                            "force_unsafe": false

                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "cold",
                        "conditions": {
                            "min_index_age": "92d"
                        }
                    }
                ]
            },
..... rest of the code, same as before.

Any idea why? Maybe I'm missing some parameters, but the error code doesn't really help.

PS: for now, rollover is too complex, I prefer not to play around with it 😉

Franck Ehret

unread,
Aug 16, 2022, 4:15:36 AM8/16/22
to Wazuh mailing list
PS: I just read that a single node system should only have one shard.
" In general, the most optimal performance will be realized by using the same number of shards as nodes."

Can you validate that statement as I only have one node? And how can I achieve that by default when new indexes are created?

Juan Carlos Tello

unread,
Aug 16, 2022, 8:11:18 AM8/16/22
to Franck Ehret, Wazuh mailing list
Hello Franck,

Indeed in a single node installation it is recommended to only have a single shard per index. You can have new indices with this setting by modifying /etc/filebeat/wazuh-template.json to have "index.number_of_shards": "1" then update this setting by running filebeat setup --index-management -E output.logstash.enabled=false
As for the shrink action, this is only available on Wazuh installations that use an indexer based off of OpenSearch 2.x. 

However, in your case, after modifying the setting to have 1 shard by default it will not be necessary to create a policy to change this in the future. You will benefit from shrinking old indices and you may do so through the Dev Tools API with calls such as:
PUT wazuh-alerts-4.x-2022.07.20/_settings
{
  "settings": {
    "index.number_of_replicas": 0,                                
    "index.blocks.write": true                                    
  }
}


POST wazuh-alerts-4.x-2022.07.20/_shrink/wazuh-alerts-4.x-2022.07.20_shrunk
  {
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1,
    "index.blocks.write": true
  }
}

And after confirming that the shrink operation has been successful, delete the old indices.

Best Regards,
Juan C. Tello


Franck Ehret

unread,
Aug 16, 2022, 10:01:04 AM8/16/22
to Wazuh mailing list
Hi Juan,

Again, thanks for helping me moving forward every time 😊
I edited the template, so new indexes should not have 3 shards anymore, that is one big step forward. 
However, I'm still above 1000 shards overall (current limit 1500 that I'd like to take back someday) so I have to keep going...

I'm a bit confused about the shrink: you said that shrink action is only available on Wazuh installations based off of OpenSearch 2.x. 
I have 1.13.X installed (which was the default when I started in December last year) so is it possible at all? And if I have to migrate, how to do it?

About the commands you've just mentionned, is there a possibility to use a wildcard to do it months by months for instance?
Sample:

PUT wazuh-alerts-4.x-2022.*wildcard*.20/_settings

{
  "settings": {
    "index.number_of_replicas": 0,                                
    "index.blocks.write": true                                    
  }
}


Again, thanks in advance.

Kind regards
Franck

Juan Carlos Tello

unread,
Aug 16, 2022, 10:39:12 AM8/16/22
to Franck Ehret, Wazuh mailing list
Hi Franck,
Until Wazuh 4.4.0 is not released then migrating to a Wazuh Indexer based on OpenSearch 2.x will not be possible, however given that you won't need to implement this at an index lifecycle policy level then this will not affect your progress.

It is possible to run the first action (to set indices to be read-only) with wildcards, but this is not the case for the second (shrinking action), instead you may achieve this through a simple script, for example:

curl -k -u <USERNAME>:<PASSWORD> -XPUT "https://localhost:9200/wazuh-alerts-4.x-2022.07.*/_settings" -H 'Content-Type: application/json' -d'

{
  "settings": {
    "index.number_of_replicas": 0,                                
    "index.blocks.write": true                                    
  }
}
'

for i in {00..31};
  do
  curl -k -u <USERNAME>:<PASSWORD> -X POST "https://localhost:9200/wazuh-alerts-4.x-2022.07.${i}/_shrink/wazuh-alerts-4.x-2022.07.${i}_shrunk?pretty" -H 'Content-Type: application/json' -d'

  {
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1,
    "index.blocks.write": true
  }
}
';
done

This will first set all indices for the month of July as read only and then one by one request that they be reindexed into a shrunk version.

After verifying that all indices have an equal number of documents you may delete the original indices through a DELETE API request.

Best Regards,
Juan C. Tello
Reply all
Reply to author
Forward
0 new messages