shards failed.

2,718 views
Skip to first unread message

Odie

unread,
Apr 7, 2019, 11:28:38 PM4/7/19
to Wazuh mailing list
Hi Wazuh Team,



I encountered shards error/warning and followed this work around adding  entry to /etc/elasticsearch/elasticsearch.yml then restart elasticsearch but I got new error. I revert back the configuration and now no display on my Dashboard.


Error: Request to Elasticsearch failed: {"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"","phase":"fetch","grouped":true,"failed_shards":[],"caused_by":{"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@449946e8 on QueueResizingEsThreadPoolExecutor[name = KaE46e-/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 696nanos, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@65552234[Running, pool size = 7, active threads = 7, queued tasks = 3271, completed tasks = 171872]]"}},"status":503}
KbnError@https://172.16.24.31/bundles/commons.bundle.js?v=16588:1:6798
RequestFailure@https://172.16.24.31/bundles/commons.bundle.js?v=16588:1:7530
callResponseHandlers/<@https://172.16.24.31/bundles/commons.bundle.js?v=16588:1:859739





ERROR:

Routes. Error. {"data":{"statusCode":500,"error":"Internal Server Error","message":"An internal server error occurred"},"status":500,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","headers":{"Accept":"application/json, text/plain, /","kbn-version":"6.2.2"},"timeout":8000,"url":"/api/wazuh-api/apiEntries"},"statusText":"Internal Server Error"}


here's the output of



# curl localhost:9200/?pretty

# curl localhost:9200/?pretty
{
  "name" : "KaE46e-",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "v3q5IeK-RdiI1mYWQvl5yA",

  "version" : {
    "number" : "6.2.2",
    "build_hash" : "10b1edd",
    "build_date" : "2018-02-16T19:01:30.685723Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"

# tailf /var/logs/elasticsearch/elasticsearch.log

[2019-04-06T02:21:00,562][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [aggs-matrix-stats]
[2019-04-06T02:21:00,563][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [analysis-common]
[2019-04-06T02:21:00,563][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [ingest-common]
[2019-04-06T02:21:00,563][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [lang-expression]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [lang-mustache]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [lang-painless]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [mapper-extras]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [parent-join]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [percolator]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [rank-eval]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [reindex]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [repository-url]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [transport-netty4]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService     ] [KaE46e-] loaded module [tribe]
[2019-04-06T02:21:00,566][INFO ][o.e.p.PluginsService     ] [KaE46e-] no plugins loaded
[2019-04-06T02:21:27,554][INFO ][o.e.d.DiscoveryModule    ] [KaE46e-] using discovery type [zen]
[2019-04-06T02:21:28,738][INFO ][o.e.n.Node               ] initialized
[2019-04-06T02:21:28,738][INFO ][o.e.n.Node               ] [KaE46e-] starting ...
[2019-04-06T02:21:29,178][INFO ][o.e.t.TransportService   ] [KaE46e-] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2019-04-06T02:21:33,099][INFO ][o.e.c.s.MasterService    ] [KaE46e-] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {KaE46e-}{KaE46e-qQd632tmthHovlQ}{HTUOCs6cQe2uV82zkexcog}{127.0.0.1}{127.0.0.1:9300}
[2019-04-06T02:21:33,112][INFO ][o.e.c.s.ClusterApplierService] [KaE46e-] new_master {KaE46e-}{KaE46e-qQd632tmthHovlQ}{HTUOCs6cQe2uV82zkexcog}{127.0.0.1}{127.0.0.1:9300}, reason: apply cluster state (from master [master {KaE46e-}{KaE46e-qQd632tmthHovlQ}{HTUOCs6cQe2uV82zkexcog}{127.0.0.1}{127.0.0.1:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2019-04-06T02:21:33,182][INFO ][o.e.h.n.Netty4HttpServerTransport] [KaE46e-] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2019-04-06T02:21:33,182][INFO ][o.e.n.Node               ] [KaE46e-] started



jesus.g...@wazuh.com

unread,
Apr 8, 2019, 3:03:27 AM4/8/19
to Wazuh mailing list

Hi Odie,

As I can see, we can start with /api/wazuh-api/apiEntries, that endpoint mainly uses the .wazuh index.

Let’s check the Wazuh internal indices:

curl elastic:9200/_cat/indices/.waz*

In addition, it would be nice to know about your Elasticsearch cluster health:

curl elastic:9200/_cluster/health

Please, paste the output from the above commands, thanks.

Regards,
Jesús

Odie

unread,
Apr 8, 2019, 4:28:42 AM4/8/19
to Wazuh mailing list

Hi Jesus,

See below output..
Thanks!
 

Let’s check the Wazuh internal indices:

curl elastic:9200/_cat/indices/.waz*

#curl localhost:9200/_cat/indices/.waz*
red open .wazuh-version KsdXRSbQTwiF4INwaHnpdA 1 1
red open .wazuh         qM68YmQoSti18a-a4OiHxA 5 1



In addition, it would be nice to know about your Elasticsearch cluster health:

curl elastic:9200/_cluster/health
#curl localhost:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":1191,"active_shards":1191,"relocating_shards":0,"initializing_shards":4,"unassigned_shards":5389,"delayed_unassigned_shards":0,"number_of_pending_tasks":4,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":3428,"active_shards_percent_as_number":18.089307411907654}



Thanks!

jesus.g...@wazuh.com

unread,
Apr 8, 2019, 4:48:52 AM4/8/19
to Wazuh mailing list

Hello again Odie,

You have many unassigned shards, and your active shards percent is 18.09%, your cluster was probably recovering and it reached the limit of incoming shard recoveries.

Since you have only one node, let’s force to use zero replicas for all indices:

curl -XPUT 'elastic:9200/*/_settings' -H 'Content-Type: application/json' -d '{ "index": { "number_of_replicas": "0" } }'

Enable allocation for all shards:

curl -XPUT elastic:9200/_cluster/settings -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}'

Restart the service:

systemctl restart elasticsearch

Once a restart is performed, Elasticsearch uses to take about 15 to 20 seconds to be up again.

Now keep this watcher running some minutes, look at the shards percentage, if it doesn’t grow along the time, stop the command, otherwise, leave it until 100% is reached.

 watch -n0 'curl -s elastic:9200/_cluster/health?pretty | grep "active_shards_percent"'

Let us know.

Regards,
Jesús

Odie

unread,
Apr 8, 2019, 7:46:01 AM4/8/19
to Wazuh mailing list
Hi Jesus,


Already done all your suggestions. and finished shards to 100%

Every 0.1s: curl -s localhost:9200/_cluster/health?...  Tue Apr  9 03:30:11 2019

  "active_shards_percent_as_number" : 100.0


Got new error Request timeout after 30000ms

jesus.g...@wazuh.com

unread,
Apr 8, 2019, 7:53:39 AM4/8/19
to Wazuh mailing list

Hi Odie,

Well, we managed to solve the unassigned shards issue, now let’s dig into the next issue, the timeout.

This is commonly caused by a heavy request to Elasticsearch, Kibana tries until 30 seconds, then it shows timeout.

We can try to check if your indices are so large for a single request:

curl "elastic:9200/_cat/indices?v=true&s=docs.count:desc" -s | head -10

What we are doing with that command is getting an index list, where we can see the number of events store per index. We are also sorting that
list and showing just the top 10 results.

That command should be useful to determine if we have very large indices or not.

Regards,
Jesús

Odie

unread,
Apr 9, 2019, 1:37:03 AM4/9/19
to Wazuh mailing list
Hi Jesus,


Here's the output of:

curl "elastic:9200/_cat/indices?v=true&s=docs.count:desc" -s | head -10
# curl "localhost:9200/_cat/indices?v=true&s=docs.count:desc" -s | head -10
health status index                                       uuid                                          pri    rep    docs.count    docs.deleted    store.size     pri.store.size
green  open   wazuh-alerts-3.x-2018.12.11      0oFIh0tZQlyVRFK-jry2Fg             5     0      3070809            0                 1.7gb            1.7gb
green  open   wazuh-alerts-2018.12.11           o6t7Lml7S9KGi3Co4o62XA          5     0      3070809            0                 1.8gb            1.8gb
green  open   wazuh-alerts-2018.12.10           EX0vxAOaTpqyxKe46MrB1w        5     0      2805494            0                 1.6gb            1.6gb
green  open   wazuh-alerts-3.x-2018.12.10      v5G5-S8vRdyvPQsB64Qf0w         5     0      2805494            0                 1.5gb            1.5gb
green  open   wazuh-alerts-2018.11.29           s2plVTY0QmGUMC2AiEWLcw     5     0      2699063            0                 1.4gb            1.4gb
green  open   wazuh-alerts-3.x-2018.11.29      PrO9iFEOR6qCyiwrXlUJ4A          5     0      2699063            0                  1.3gb            1.3gb
green  open   wazuh-alerts-2018.11.28           gelsZqOQRQi77IO8-UUB4A         5     0      2691564            0                  1.4gb            1.4gb
green  open   wazuh-alerts-3.x-2018.11.28      ZQU6hwv6SGe1ibxPHpLp2w       5     0      2691564            0                  1.2gb            1.2gb
green  open   wazuh-alerts-2018.12.03         g8WaCm9_RPqAkGryhotmaQ       5     0      2666486            0                  1.5gb            1.5gb

jesus.g...@wazuh.com

unread,
Apr 9, 2019, 3:37:07 AM4/9/19
to Wazuh mailing list

Hello again Odie,

As I can see you have a mix of indices, some of them use wazuh-alerts-YYYY.MM.DD and other ones use wazuh-alerts-3.x-YYYY.MM.DD (3.x),
and that’s pretty weird.

Please let me check your templates:

curl elastic:9200/_cat/templates

Also, I want to know if you have more than one Logstash instances running, it may be the cause. In any case, it would be nice if you
share here all your Filebeat and your Logstash configuration files, please.

To do so, we can use cat:

For Filebeat:

cat /etc/filebeat/filebeat.yml

For Logstash, you may have more than one configuration file, so please, execute one cat per every .conf file you find in /etc/logstash/conf.d/

cat /etc/logstash/conf.d/*

Once we have that information, we can continue helping you. In addition, and even if we solve your mix of indices, I think you are handling a huge amount
of events for just one Elasticsearch node, but we can discuss it later.

Best regards,
Jesús

Odie

unread,
Apr 9, 2019, 8:58:22 PM4/9/19
to Wazuh mailing list
Hi Jesus,


Here's the output as requested:


Please let me check your templates:

# curl localhost:9200/_cat/templates
wazuh                         [wazuh-alerts-3.x-*] 0
wazuh-agent                   [wazuh-monitoring*]  0
logstash                      [logstash-*]         0 60001

For Filebeat:

# cat /etc/filebeat/filebeat.yml
filebeat:
 prospectors:
  - input_type: log
    paths:
     - "/var/ossec/logs/alerts/alerts.json"
    document_type: json
    json.message_key: log
    json.keys_under_root: true
    json.overwrite_keys: true

output:
 logstash:
   # The Logstash hosts
   hosts: ["172.XX.XX.XX:5000"]
#   ssl:
#     certificate_authorities: ["/etc/filebeat/logstash.crt"]



For Logstash:  I only have one config file and one backup

#  ll /etc/logstash/conf.d/
total 8
-rw-r--r--. 1 root root 1140 Mar  9  2018 01-wazuh.conf
-rw-r--r--. 1 root root 1242 Mar  9  2018 01-wazuh.conf.bak

=======================================

# cat /etc/logstash/conf.d/01-wazuh.conf
# Wazuh - Logstash configuration file
## Remote Wazuh Manager - Filebeat input
input {
    beats {
        port => 5000
        codec => "json_lines"
       ssl => true
       ssl_certificate => "/etc/logstash/logstash.crt"
       ssl_key => "/etc/logstash/logstash.key"
    }
}
filter {
    if [data][srcip] {
        mutate {
            add_field => [ "@src_ip", "%{[data][srcip]}" ]
        }
    }
    if [data][aws][sourceIPAddress] {
        mutate {
            add_field => [ "@src_ip", "%{[data][aws][sourceIPAddress]}" ]
        }
    }
}
filter {
    geoip {
        source => "@src_ip"
        target => "GeoLocation"
        fields => ["city_name", "continent_code", "country_code2", "country_name", "region_name", "location"]
    }
    date {
        match => ["timestamp", "ISO8601"]
        target => "@timestamp"
    }
    mutate {
        remove_field => [ "timestamp", "beat", "input_type", "tags", "count", "@version", "log", "offset", "type","@src_ip"]
    }
}
output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "wazuh-alerts-3.x-%{+YYYY.MM.dd}"
        document_type => "wazuh"
    }
}





jesus.g...@wazuh.com

unread,
Apr 10, 2019, 6:26:56 AM4/10/19
to Wazuh mailing list

Hi Odie,

Your Logstash and Filebeat configurations seem to be fine.

At this point, my suggestion is to close indices whose their name is using the format wazuh-alerts-YYYY.MM.dd instead of wazuh-alerts-3.x-YYYY.MM.dd.

Closing an index doesn’t mean remove the index, they will be ignored by Elasticsearch without removing them.

Close all indices whose their name is using the format wazuh-alerts-YYYY.MM.dd:

curl -XPOST elastic:9200/wazuh-alerts-20*/_close

That command will close indices such as wazuh-alerts-2018.12.11, wazuh-alerts-2018.12.12… but it will keep opened indices such as wazuh-alerts-3.x-2018.12.11, wazuh-alerts-3.x-2018.12.12

Here you can see an example of some closed indices plus an opened index:

curl elastic:9200/_cat/indices/wazuh-alerts*
      close wazuh-alerts-3.x-2019.04.09 ZP1OjJcUQbetHjUxa-MHsg                             
green open  wazuh-alerts-3.x-2019.04.10 ettC85HUTSiTHmmsDr9fYw 1 0 935494 0 743.7mb 743.7mb
      close wazuh-alerts-3.x-2019.04.08 cPKnpENeQtidC_qIApMYmw                             
      close wazuh-alerts-3.x-2019.04.06 bQaE0USJRL69oakqdiTQ7A

Now, you’ve closed the wrong named indices, you can also close more indices for increasing your Elasticsearch performance,
such as all indices from 2018:

curl -XPOST elastic:9200/wazuh-alerts-3.x-2018*/_close

This won’t affect any retention policy from your business because the data is still in disk. In addition you can re-open an index as follow:

curl -XPOST elastic:9200/wazuh-alerts-3.x-2018.04.10/_open

The above command opens the index wazuh-alerts-3.x-2018.04.10.

Brief summary:

  • Close all indices with a wrong name.
  • Optional. Close all indices that won’t be used such as 2018 indices.
  • Learn about open and close Elasticsearch API endpoints, so you can reduce the load for searches.

Let us know your results and if this works for you. Remember, depending on the number of indices and the amount of data per index, your Elasticsearch performance may suffer,
the above suggestions are mainly used to reduce the load for a single node architecture.

I hope it helps.

Best regards,
Jesús

Odie

unread,
Apr 24, 2019, 1:31:28 AM4/24/19
to Wazuh mailing list
Hi Jesus,


Already closed all 2018 indeces but still getting errors.



rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@72815111 on QueueResizingEsThreadPoolExecutor[name = KaE46e-/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000


Request Timeout after 30000ms

Courier Fetch: 5 of 545 shards failed.



Thanks!

jesus.g...@wazuh.com

unread,
Apr 24, 2019, 3:14:26 AM4/24/19
to Wazuh mailing list

Hi Odie,

Even closing those indices, your Elasticsearch is having troubles to process all the data.

You can try to increase the queue size adding this line to /etc/elasticsearch/elasticsearch.yml:

thread_pool.search.min_queue_size: 2000

The default value is 1000, we are increasing it up to 2000. Now, restart Elasticsearch:

systemctl restart elasticsearch

Remember that a restart may take about 15-20 seconds until it’s ready again.

The definitive solution would be adding more nodes and replicas, then your cluster performance won’t suffer so much.

Regards,
Jesús

Reply all
Reply to author
Forward
0 new messages