Many failures on the wazuh server (API, dashboard, agents)

470 views
Skip to first unread message

Arthur Henrique Oliveira Aparício

unread,
Mar 11, 2024, 8:23:01 AM3/11/24
to Wazuh | Mailing List
Well, to start, almost 2 weeks ago, I registered some pfSense via syslog, they generated about 10 times more logs than the other 40 agents I have. I can't say if that was it, but shortly after, about 3 hours, all the screens started taking longer to load. Until then, I thought it was fine, but when I started to see an increase in logs, several agents disconnected, others went to pending, and the api started receiving the request timeout error. I tested and there were no errors other than request timeout, and whenever I restarted the manager it returned to normal. I decided to remove the pfSenses and place them on another server, but the first one continued to have this problem. 

When I test the api via terminal, it responds, but several times it responds as offline on the web. To make matters worse, I decided to do a general restart (all modules, dashboard, indexer and manager), and now only the message saying dashboard is not ready yet appears. The logs from /var/log/wazuh-indexer/wazuh-cluster.log even point out this error:

[2024-03-11T08:49:08,245][WARN ][r.suppressed ] [node-1] path: /.kibana/_count, params: {index=.kibana} org.opensearch.action.search.SearchPhaseExecutionException: all shards failed

So, what could I do? Reinstall?

Thank you in advance.

Arthur Henrique Oliveira Aparício

unread,
Mar 11, 2024, 10:08:19 AM3/11/24
to Wazuh | Mailing List

So, I managed to return the wazuh dashboard, the problem was that it tried to create another kibana, after stopping the dashboard, deleting the first one and starting again, it started working again, obviously without my dashboards but luckily they were already saved on another server (which it will only be for pfSenses due to the amount of log). However, the API not only continues with timeouts, but it seems to have gotten worse, as sometimes I can't access any module without the invalid parameters error.

Jorge Eduardo Molas

unread,
Mar 12, 2024, 7:26:27 AM3/12/24
to Wazuh | Mailing List
Sorry for the delay in the response.
Indeed, network device-generated events, such as firewalls, routers, and switches, can cause flooding due to their high rate of occurrence compared to other endpoints.
This affects Wazuh manager resource usage in reception (remoted daemon) and decoding (analysisd daemon).

In this case, a centralized Syslog server can be deployed, and a Syslog agent can be installed on it. This way, you can also control the buffer (in ossec.conf) and antiflooding.
Otherwise, you will have to increase the resources in your Wazuh manager.

If you are experiencing issues with the Wazuh API, please refer to this troubleshooting guide for assistance.
Let me know if this information is useful.
Greetings!

Arthur Henrique Oliveira Aparício

unread,
Mar 12, 2024, 8:44:09 AM3/12/24
to Wazuh | Mailing List
Hello, thanks for the help. 

Regarding Syslog, we are separating it into two different servers (one for network devices and the other for servers) to prevent any problem in one from affecting the other. In addition, it will also be useful to divide who sees what. Regarding the API, as I said, I managed to get the dashboard to work again (he was trying to create another kibana and was unable to do so, so I deleted the previous one and he recreated it), and the API showed the same error again. 

I noticed that it was in the same period that an update was released, so I upgraded the manager modules and ran the tests. For about half an hour, even after carrying out the procedure, the Timeout error continued, and code 3099 or 3021 appeared on the web, but today, after more than 12 hours, it is working correctly, apart from a general crash on hosts connection 6 am. I'm using the troubleshooting section and it has helped me understand the errors, but as they apparently stopped, I'm just going to use it to monitor logs. 

Again, thanks for the help.

Arthur Henrique Oliveira Aparício

unread,
Mar 12, 2024, 11:16:48 AM3/12/24
to Wazuh | Mailing List
Then the API failed again. It starts with Timeouts mainly from this line in api.log:

2024/03/12 12:03:34 INFO: wazuh-wui 127.0.0.1 "GET /manager/info" with parameters {} and body {} done in 10.107s: 500 

Furthermore, several times it is not possible to access any Wazuh module (although the OpenSearch modules remain operational), with the Wazuh API seems to be down page and error 3099. Through curl, it is possible to access the API, which responds quickly.

Jorge Eduardo Molas

unread,
Mar 13, 2024, 6:29:43 AM3/13/24
to Wazuh | Mailing List
Hello Arturo, could you share with me the type of Wazuh deployments you have?
This info:
- Wazuh version:
- Deployment type: (All in one | ova | docker)
- Platform (OS)

On the other hand, I am interested in the 3099 errors, which indicate errors in the Wazuh manager, can you give me the complete log?

This could be an error related to the wazuh-modulesd of the Wazuh server being stopped for some reason.
The user could check the status of wazuh-modulesd in the Wazuh server with:
/var/ossec/bin/wazuh-control status | grep "wazuh-modulesd"
Maybe you should ensure the Wazuh server is running correctly. If there is some problem with wazuh-modulesd, then he should debug the problem in the Wazuh server

Arthur Henrique Oliveira Aparício

unread,
Mar 13, 2024, 7:29:43 AM3/13/24
to Wazuh | Mailing List
Hi. We discovered it was a network issue. For some reason, pfSenses continued sending logs, but wazuh stopped reading, which made the network where the server is located unstable. 

Just for the record, I updated all wazuh installations to the latest version, which are in the all in one model and AlmaLinux 9.3 (even though it's not one of the main ones, I can say that it works normally as if it were a version of rhel). I will also note the command to check modules. I will send some logs, so in case anyone finds this topic and comes across them, maybe it can help, perhaps identify error 3099.

Command: cat /var/log/filebeat/filebeat | grep -i -E "error|warn"
Log: ERROR   [publisher_pipeline_output]     pipeline/output.go:154  Failed to connect to backoff(elasticsearch(https://127.0.0.1:9200)): Connection marked as failed because the onConnect callback failed: 1 error: Error loading pipeline for fileset wazuh/alerts: couldn't load pipeline: couldn't load json. Error: 503 Service Unavailable: {"error":{"root_cause":[{"type":"cluster_manager_not_discovered_exception","reason":null}],"type":"cluster_manager_not_discovered_exception","reason":null},"status":503}. Response body: {"error":{"root_cause":[{"type":"cluster_manager_not_discovered_exception","reason":null}],"type":"cluster_manager_not_discovered_exception","reason":null},"status":503}

Unfortunately, other logs have already been rewritten, but they all point to API timeout. I have some records but I don't remember exactly what the files are:

AxiosError: Wazuh API error: ERR_BAD_RESPONSE - Timeout executing API request at settle (https://150.163.73.132/47202/bundles/plugin/wazuh/wazuh.plugin.js:8:20234) at XMLHttpRequest.onloadend (https://150.163.73.132/47202/bundles/plugin/wazuh/wazuh.plugin.js:8:25708)

Error: 3002 - Request failed with status code 500 at Function._callee$

illegal_argument_exception

So again, thanks for the help, I managed to learn where to check important information and it even helped me plan an alternative to network logs.
Reply all
Reply to author
Forward
0 new messages