Hi team!
I have a similar issue as the one described on
this message.
Apparently agents are able to be registered, but after that, it keeps retrying to connect:
2023/05/03 13:30:18 wazuh-agentd: INFO: Trying to connect to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:30:28 wazuh-agentd: INFO: Closing connection to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:30:28 wazuh-agentd: INFO: Trying to connect to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:30:42 wazuh-agentd: INFO: Closing connection to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:30:42 wazuh-agentd: INFO: Trying to connect to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:30:58 wazuh-agentd: INFO: Closing connection to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:30:58 wazuh-agentd: INFO: Trying to connect to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:31:01 wazuh-agentd: INFO: Requesting a key from server: 10.30.1.243
2023/05/03 13:31:01 wazuh-agentd: INFO: No authentication password provided
2023/05/03 13:31:01 wazuh-agentd: INFO: Using agent name as: myhost
2023/05/03 13:31:01 wazuh-agentd: INFO: Waiting for server reply
2023/05/03 13:31:01 wazuh-agentd: ERROR: Duplicate agent name: myhost (from manager)
2023/05/03 13:31:01 wazuh-agentd: ERROR: Unable to add agent (from manager)
2023/05/03 13:31:11 wazuh-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: '10.30.1.243'.
2023/05/03 13:31:11 wazuh-agentd: WARNING: Unable to connect to any server.
2023/05/03 13:31:11 wazuh-agentd: INFO: Closing connection to server ([10.30.1.243]:1514/tcp).
2023/05/03 13:31:11 wazuh-agentd: INFO: Trying to connect to server ([10.30.1.243]:1514/tcp)
Before you come to conclusions relating the network, yes, ports 1514, 1515 and 55000 are perfectly reachable from the agent. Tested with and without the firewall (firewalld) enabled. I used netcat and telnet for the tests, everything is ok.
This is setup on a docker swarm and all ports are accessible from all nodes, as the swarm mesh network deals with that. However, different services of the swarm run on different nodes. It should not affect this, however, as said, all ports are published on all nodes.
I am using a load balancer for the agents to contact a single IP and just to test it another floating IP with keepalived. Both exhibit the same behaviour for the same machines.
In the message I reference above, one of the issues was the non standardname of the services as the dot in between is not accepted by docker compose (renamed wazuh.manager to wazuhmanager, for example). This is not the case for me as I manually deployed this and promptly figured out that (already reported that on github
here)
This is what the log displays on the cluster manager itself:
2023/05/03 13:55:02 wazuh-authd: INFO: New connection from 10.0.0.2
2023/05/03 13:55:02 wazuh-authd: INFO: Received request for a new agent (anotherhost) from: 10.0.0.2
2023/05/03 13:55:02 wazuh-authd: INFO: Agent key generated for '
anotherhost
' (requested by any)
2023/05/03 13:55:05 wazuh-remoted: INFO: (1409): Authentication file changed. Updating.
2023/05/03 13:55:05 wazuh-remoted: INFO: (1410): Reading authentication keys file.
Nothing else is displayed on the manager.
I suspect the issue is on the nginx service (the one that listens on 1514):
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:22 [warn] 30#30: *972 upstream server temporarily disabled while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "10.0.8.11:1514", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:24 [error] 30#30: *975 no live upstreams while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "mycluster", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:35 [error] 30#30: *976 connect() failed (113: No route to host) while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "10.0.8.20:1514", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:35 [warn] 30#30: *976 upstream server temporarily disabled while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "10.0.8.20:1514", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:35 [error] 30#30: *976 no live upstreams while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "mycluster", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:38 [error] 30#30: *978 connect() failed (113: No route to host) while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "10.0.8.11:1514", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:38 [warn] 30#30: *978 upstream server temporarily disabled while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "10.0.8.11:1514", bytes from/to client:0/0, bytes from/to upstream:0/0
wazuh_nginx.1.y469rcidrlz8@myhostxxx | 2023/05/03 13:56:38 [error] 30#30: *978 no live upstreams while connecting to upstream, client: 10.0.0.2, server: 0.0.0.0:1514, upstream: "mycluster", bytes from/to client:0/0, bytes from/to upstream:0/0
Thanks in advance!