Error: Request to Elasticsearch failed: {"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"","phase":"fetch","grouped":true,"failed_shards":[],"caused_by":{"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@449946e8 on QueueResizingEsThreadPoolExecutor[name = KaE46e-/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 696nanos, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@65552234[Running, pool size = 7, active threads = 7, queued tasks = 3271, completed tasks = 171872]]"}},"status":503}
KbnError@https://172.16.24.31/bundles/commons.bundle.js?v=16588:1:6798
RequestFailure@https://172.16.24.31/bundles/commons.bundle.js?v=16588:1:7530
callResponseHandlers/<@https://172.16.24.31/bundles/commons.bundle.js?v=16588:1:859739# curl localhost:9200/?pretty
# tailf /var/logs/elasticsearch/elasticsearch.log
[2019-04-06T02:21:00,562][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [aggs-matrix-stats]
[2019-04-06T02:21:00,563][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [analysis-common]
[2019-04-06T02:21:00,563][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [ingest-common]
[2019-04-06T02:21:00,563][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [lang-expression]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [lang-mustache]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [lang-painless]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [mapper-extras]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [parent-join]
[2019-04-06T02:21:00,564][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [percolator]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [rank-eval]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [reindex]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [repository-url]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [transport-netty4]
[2019-04-06T02:21:00,565][INFO ][o.e.p.PluginsService ] [KaE46e-] loaded module [tribe]
[2019-04-06T02:21:00,566][INFO ][o.e.p.PluginsService ] [KaE46e-] no plugins loaded
[2019-04-06T02:21:27,554][INFO ][o.e.d.DiscoveryModule ] [KaE46e-] using discovery type [zen]
[2019-04-06T02:21:28,738][INFO ][o.e.n.Node ] initialized
[2019-04-06T02:21:28,738][INFO ][o.e.n.Node ] [KaE46e-] starting ...
[2019-04-06T02:21:29,178][INFO ][o.e.t.TransportService ] [KaE46e-] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2019-04-06T02:21:33,099][INFO
][o.e.c.s.MasterService ] [KaE46e-] zen-disco-elected-as-master ([0]
nodes joined), reason: new_master {KaE46e-}{KaE46e-qQd632tmthHovlQ}{HTUOCs6cQe2uV82zkexcog}{127.0.0.1}{127.0.0.1:9300}
[2019-04-06T02:21:33,112][INFO ][o.e.c.s.ClusterApplierService] [KaE46e-] new_master {KaE46e-}{KaE46e-qQd632tmthHovlQ}{HTUOCs6cQe2uV82zkexcog}{127.0.0.1}{127.0.0.1:9300}, reason: apply cluster state (from master [master {KaE46e-}{KaE46e-qQd632tmthHovlQ}{HTUOCs6cQe2uV82zkexcog}{127.0.0.1}{127.0.0.1:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2019-04-06T02:21:33,182][INFO ][o.e.h.n.Netty4HttpServerTransport] [KaE46e-] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2019-04-06T02:21:33,182][INFO ][o.e.n.Node ] [KaE46e-] started
Hi Odie,
As I can see, we can start with /api/wazuh-api/apiEntries, that endpoint mainly uses the .wazuh index.
Let’s check the Wazuh internal indices:
curl elastic:9200/_cat/indices/.waz*
In addition, it would be nice to know about your Elasticsearch cluster health:
curl elastic:9200/_cluster/health
Please, paste the output from the above commands, thanks.
Regards,
Jesús
Let’s check the Wazuh internal indices:
curl elastic:9200/_cat/indices/.waz*
#curl localhost:9200/_cat/indices/.waz*
red open .wazuh-version KsdXRSbQTwiF4INwaHnpdA 1 1
red open .wazuh qM68YmQoSti18a-a4OiHxA 5 1
In addition, it would be nice to know about your Elasticsearch cluster health:
curl elastic:9200/_cluster/health
Hello again Odie,
You have many unassigned shards, and your active shards percent is 18.09%, your cluster was probably recovering and it reached the limit of incoming shard recoveries.
Since you have only one node, let’s force to use zero replicas for all indices:
curl -XPUT 'elastic:9200/*/_settings' -H 'Content-Type: application/json' -d '{ "index": { "number_of_replicas": "0" } }'
Enable allocation for all shards:
curl -XPUT elastic:9200/_cluster/settings -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}'
Restart the service:
systemctl restart elasticsearch
Once a restart is performed, Elasticsearch uses to take about 15 to 20 seconds to be up again.
Now keep this watcher running some minutes, look at the shards percentage, if it doesn’t grow along the time, stop the command, otherwise, leave it until 100% is reached.
watch -n0 'curl -s elastic:9200/_cluster/health?pretty | grep "active_shards_percent"'
Let us know.
Regards,
Jesús
Hi Odie,
Well, we managed to solve the unassigned shards issue, now let’s dig into the next issue, the timeout.
This is commonly caused by a heavy request to Elasticsearch, Kibana tries until 30 seconds, then it shows timeout.
We can try to check if your indices are so large for a single request:
curl "elastic:9200/_cat/indices?v=true&s=docs.count:desc" -s | head -10
What we are doing with that command is getting an index list, where we can see the number of events store per index. We are also sorting that
list and showing just the top 10 results.
That command should be useful to determine if we have very large indices or not.
Regards,
Jesús
Here's the output of:
curl "elastic:9200/_cat/indices?v=true&s=docs.count:desc" -s | head -10
Hello again Odie,
As I can see you have a mix of indices, some of them use wazuh-alerts-YYYY.MM.DD and other ones use wazuh-alerts-3.x-YYYY.MM.DD (3.x),
and that’s pretty weird.
Please let me check your templates:
curl elastic:9200/_cat/templates
Also, I want to know if you have more than one Logstash instances running, it may be the cause. In any case, it would be nice if you
share here all your Filebeat and your Logstash configuration files, please.
To do so, we can use cat:
For Filebeat:
cat /etc/filebeat/filebeat.yml
For Logstash, you may have more than one configuration file, so please, execute one cat per every .conf file you find in /etc/logstash/conf.d/
cat /etc/logstash/conf.d/*
Once we have that information, we can continue helping you. In addition, and even if we solve your mix of indices, I think you are handling a huge amount
of events for just one Elasticsearch node, but we can discuss it later.
Best regards,
Jesús
Hi Odie,
Your Logstash and Filebeat configurations seem to be fine.
At this point, my suggestion is to close indices whose their name is using the format wazuh-alerts-YYYY.MM.dd instead of wazuh-alerts-3.x-YYYY.MM.dd.
Closing an index doesn’t mean remove the index, they will be ignored by Elasticsearch without removing them.
Close all indices whose their name is using the format wazuh-alerts-YYYY.MM.dd:
curl -XPOST elastic:9200/wazuh-alerts-20*/_close
That command will close indices such as wazuh-alerts-2018.12.11, wazuh-alerts-2018.12.12… but it will keep opened indices such as wazuh-alerts-3.x-2018.12.11, wazuh-alerts-3.x-2018.12.12…
Here you can see an example of some closed indices plus an opened index:
curl elastic:9200/_cat/indices/wazuh-alerts*
close wazuh-alerts-3.x-2019.04.09 ZP1OjJcUQbetHjUxa-MHsg
green open wazuh-alerts-3.x-2019.04.10 ettC85HUTSiTHmmsDr9fYw 1 0 935494 0 743.7mb 743.7mb
close wazuh-alerts-3.x-2019.04.08 cPKnpENeQtidC_qIApMYmw
close wazuh-alerts-3.x-2019.04.06 bQaE0USJRL69oakqdiTQ7A
Now, you’ve closed the wrong named indices, you can also close more indices for increasing your Elasticsearch performance,
such as all indices from 2018:
curl -XPOST elastic:9200/wazuh-alerts-3.x-2018*/_close
This won’t affect any retention policy from your business because the data is still in disk. In addition you can re-open an index as follow:
curl -XPOST elastic:9200/wazuh-alerts-3.x-2018.04.10/_open
The above command opens the index wazuh-alerts-3.x-2018.04.10.
Brief summary:
Let us know your results and if this works for you. Remember, depending on the number of indices and the amount of data per index, your Elasticsearch performance may suffer,
the above suggestions are mainly used to reduce the load for a single node architecture.
I hope it helps.
Best regards,
Jesús
Hi Odie,
Even closing those indices, your Elasticsearch is having troubles to process all the data.
You can try to increase the queue size adding this line to /etc/elasticsearch/elasticsearch.yml:
thread_pool.search.min_queue_size: 2000
The default value is 1000, we are increasing it up to 2000. Now, restart Elasticsearch:
systemctl restart elasticsearch
Remember that a restart may take about 15-20 seconds until it’s ready again.
The definitive solution would be adding more nodes and replicas, then your cluster performance won’t suffer so much.
Regards,
Jesús