Agents disconnected after cluster implementation

817 views
Skip to first unread message

Stefano Serano

unread,
Feb 17, 2020, 9:41:56 AM2/17/20
to Wazuh mailing list
Hi All.
i've recently added in production a second node and  a load balancer.
To do that without change all my agent configurations i've switched the LB ip with my master IP, so now my LB is esposed to the network and send the logs to my 2 nodes correctly except fo a portion of my agents.


50 of my 250 agents now are unable to contacts the servers. i've checkd the agents log and here what they said:

2020/02/17 15:37:26 ossec-agent: WARNING: (4101): Waiting for server reply (not started). Tried: '1**.***.**5'.
2020/02/17 15:37:37 ossec-agent: INFO: Trying to connect to server (1**.***.**5:1514/udp).


I don't know how to solve the problem, even why all others agent act like nothing change for them.

Pls tell me what you need ton help me solve this problem.

Have a nice day.

Antonio PV

unread,
Feb 17, 2020, 10:02:17 AM2/17/20
to Wazuh mailing list
Hi Stefano,

Just to check if I understood properly, you have now two managers and a LB. right?

The LB ip is the same that the manager ip, you said. Which manager? I think the problem is that the manager is receiving the agent request with the ID and key rights but the ip it receives is the LB ip, not the agent ip. In order to avoid this problem you have to register your agents with ip=any.

To get that you have to change the tag <use_source_ip> to no, in your manager. And then register the agents again.

To check if this is the case please take a look in the ossec.log of the managers and check the logs to see if the manager is sayind that doesn´t recognise those agents or they are not allowed.

It would help if you paste the logs of the manager, the ossec.log and also the cluster.log to see what´s going on.

I hope this information is helpful to you. Regards.

Stefano Serano

unread,
Feb 17, 2020, 10:29:48 AM2/17/20
to Antonio PV, Wazuh mailing list
Hi Antonio
Thanks for your fast reply.
Here my configuration on local network:
Node1:Wazuh-Master (10.0.0.7) -> Cluster Manager
Node2:Wazuh-Node2(10.0.0.6)
Node3:Nginx Load Balancer:(10.0.0.8)

i've changed the <use_source_ip> to "no" in Node 1 and 2 and restart using :"./ossec-control restart". Here a portion of ossec log:

2020/02/17 16:18:01 ossec-remoted[37468] secure.c:329 at HandleSecureMessage(): WARNING: (1408): Invalid ID 216 for the source ip: '10.0.0.8' (name 'unknown').
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-integratord[37401] integrator.c:131 at OS_IntegratorD(): DEBUG: sending new alert.
2020/02/17 16:18:02 ossec-remoted[37468] secure.c:344 at HandleSecureMessage(): WARNING: (1213): Message from '10.0.0.8' not allowed. Cannot find the ID of the agnt. Source agent ID is unknown.



I've tried to re-register some agents deleting and recreating them on manager, but still same error on Cluster Manager node.

Have a nice day.



--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/cb10f5d2-5222-497e-bdb2-3fe806c3b668%40googlegroups.com.

Antonio PV

unread,
Feb 17, 2020, 11:03:14 AM2/17/20
to Wazuh mailing list
Hi Stefano,
After change the configuration would be nice if you restart all the services of the manager. `systemctl restart wazuh-manager`

Also you have the integratord in debug mode, make sure you let it normal after you finish your tests.

The agents that are disconnected and can´t connect to the manager, are they registered as any? You can check that in your client.key file, in /var/ossec/etc/client.key.

If they are registered still with the previous IP. Then you have to register them again, you can do that with the binary. '/var/ossec/bin/agent-auth -m "IP-manager".  You can write there your LB ip depends of your configuration.

After that restart the agent and it should work properly.

Let me know if that works for you. Regards.

Stefano Serano

unread,
Feb 17, 2020, 11:14:40 AM2/17/20
to Antonio PV, Wazuh mailing list
Hi antonio

The manager Ip never changed, i've only switched that ip from the manager to the LB machine.

Before LB implementation: 

Manager :
Public Network: 152.15.22.3
Local Network: 10.0.0.7 

LB:
Public Network: none
Local Network: 10.0.0.8


After LB Implementation


Manager :
Public Network: 152.15.22.X(to reach kibana console)
Local Network: 10.0.0.7

LB:
Public Network: 152.15.22.3
Local Network: 10.0.0.8



i still have to run this command:  '/var/ossec/bin/agent-auth -m "IP-manager". ?
Anyway i've already tried to delete and recreate some agent using "any" option, but after a few minutes in "Pending" status they still change on "Disocnnected"

Have a nice day.





--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.

Antonio PV

unread,
Feb 17, 2020, 11:25:25 AM2/17/20
to Wazuh mailing list
Hi Stefano.

'2020/02/17 16:18:02 ossec-remoted[37468] secure.c:344 at HandleSecureMessage(): WARNING: (1213): Message from '10.0.0.8' not allowed. Cannot find the ID of the agnt. Source agent ID is unknown.'

That line in your logs means that the manager don´t allow that agent to connect because that ip is not related to any ID that it has in his client.key.

Can you check in your client.ley in the manager side and in one of your agents and check if the ID correspond to be the same and also if the key is the same? If they are check if the manager and the agent have the ip as any. If they don´t then you really have to run that binary to register again the agent, until you do that the agent won´t be as ip=any in the client.key.

Can you paste the configuration of your LB? just to check if you do the registration also throught it or directly to the manager.

It´s not necessary to delete any agent, only registering it again should be enough.

If you can share more logs about one concret agent that is disconecting would be very helpful. Regards

Stefano Serano

unread,
Feb 19, 2020, 2:55:46 AM2/19/20
to Antonio PV, Wazuh mailing list
Hi Antonio.

There my LB Nginx Configuration:

stream {
    upstream master {
        server 10.0.0.7:1515;
    }
    upstream mycluster {
    hash $remote_addr consistent;
        server 10.0.0.7:1514;
        server 10.0.0.6:1514;

    }
    server {
        listen 1515 udp;
        proxy_pass master;
    }
    server {
        listen 1514 udp;
        proxy_pass mycluster;
    }
}

I've tried to run the binary from client:
C:\Program Files (x86)\ossec-agent>agent-auth.exe -m my.public.ip
2020/02/19 08:45:43 agent-auth: INFO: Started (pid: 15120).
2020/02/19 08:45:43 agent-auth: INFO: No authentication password provided.
2020/02/19 08:45:44 agent-auth: ERROR: Unable to connect to my.public.ip:1515


What i really don't understand is why just a 60 client of 200 total are unable to connect to the master. Anyway i've already tried to delete and reconfigure some clients with any configuration. inside client.keys the "any" configuration is correct in the client and even in the master.




--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.

Stefano Serano

unread,
Feb 19, 2020, 2:58:19 AM2/19/20
to Antonio PV, Wazuh mailing list
Here some output from ossec.log about some agents that are unable to connect:
 manager.c:124 at save_controlmsg(): DEBUG: Agent -EXCHANGE15 sent HC_STARTUP from 10.0.0.8.
2020/02/19 08:54:44 wazuh-modulesd:database[53122] wm_database.c:545 at wm_sync_agentinfo(): DEBUG: Empty file '/var/ossec/queue/agent-info/EXCHANGE15-any'. Agent is pending.
2020/02/19 08:54:44 ossec-remoted[53099] manager.c:124 at save_controlmsg(): DEBUG: Agent GIMATIC-LINUX sent HC_STARTUP from 10.0.0.8.
2020/02/19 08:54:44 wazuh-modulesd:database[53122] wm_database.c:545 at wm_sync_agentinfo(): DEBUG: Empty file '/var/ossec/queue/agent-info/GIMATIC-LINUX-any'. Agent is pending.

Antonio PV

unread,
Feb 19, 2020, 11:28:55 AM2/19/20
to Wazuh mailing list
Hi Stefano,
I can see in your nginx configuration that you are listening in the port 1515 UDP.

The agent registration is always made in TCP. So the registration is in 1515 TCP port. you must change that and restart your nginx, also the agent and try to register it again.

That´s why you receive that log about Unable to connect.

After that the communication between agent-manager is made by 1514 UDP port, unless you change that to TCP.  I think that´s why some agents are able to send information to the manager, they were already registered.

Let´s try that and let me know if this help you.
Regards.

Stefano Serano

unread,
Feb 19, 2020, 11:57:23 AM2/19/20
to Antonio PV, Wazuh mailing list
Thank a lot Antonio.
I've made a little stepo forward, but unfortunately there is still a problem during authentication:


2020/02/19 17:54:30 agent-auth: ERROR: SSL error (5). Connection refused by the             manager. Maybe the port specified is incorrect. Exiting.

Let me know.
Have anice day

--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.

Antonio PV

unread,
Feb 19, 2020, 12:02:21 PM2/19/20
to Wazuh mailing list
Hi Stefano,
Are we sure you can make a telnet to that 1515 TCP port from another host and the port is opened?

It seems like the problem now is related with a connectivity issue.

If that´s not the case, please let me know. Also would be nice if you can provide more information from those logs, agent and manager sides.

Regards.

Stefano Serano

unread,
Feb 24, 2020, 8:10:27 AM2/24/20
to Antonio PV, Wazuh mailing list
Hi Antonio.
Sorry for my delay, i'll make a recap of the situation:

1- Now i'm able to authenticate the clients over port 1515 tcp, the first time i made an error in configuration.
2- the problem of connection over port 1514 for some agents is still on. Here some logs:

AGENTS SIDE:
2020/02/24 13:56:49 ossec-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: '151.1.210.45'.
2020/02/24 13:57:00 ossec-agentd: INFO: Trying to connect to server (151.1.210.45:1514/udp).

MANAGER SIDE:
2020/02/24 13:50:37 ossec-remoted[45826] manager.c:124 at save_controlmsg(): DEBUG: Agent SCORPIO sent HC_STARTUP from 10.0.0.8.
2020/02/24 13:50:37 wazuh-modulesd:database[45851] wm_database.c:545 at wm_sync_agentinfo(): DEBUG: Empty file '/var/ossec/queue/agent-info/SCORPIO-any'. Agent is pending.

NGINX SIDE:
I had thi error:
2020/02/24 13:49:55 [alert] 17255#0: *1776 1024 worker_connections are not enough while connecting to upstream, udp client: IPADDRESS, server: 0.0.0.0:1514, upstream: "10.0.0.6:1514", bytes from/to client:107/0, bytes from/to upstream:0/0

I've checked on internet about this error and increased the events field in nginx.conf to 5000, now it seem not more happening, but the agets still not connecting

NO Firewall is between the disconnected agents and then LB.

Have a nice day.



--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.

Stefano Serano

unread,
Feb 24, 2020, 10:15:21 AM2/24/20
to Antonio PV, Wazuh mailing list
Hi Antonio
Some other update, on Wazuh Manager i'm receiving a lot of this logs from disconnected agents:
2020/02/24 16:09:09 ossec-remoted[30935] manager.c:124 at save_controlmsg(): DEBUG: Agent AGENTNAME sent HC_STARTUP from 10.0.0.8.
2020/02/24 16:09:09 wazuh-modulesd:database[30959] wm_database.c:545 at wm_sync_agentinfo(): DEBUG: Empty file '/var/ossec/queue/agent-info/AGENTNAME-any'. Agent is pending.

Hope this help.

Antonio PV

unread,
Feb 24, 2020, 3:00:36 PM2/24/20
to Wazuh mailing list

Hi Stefano.


That means that the agent is pending, you can see in the next link below what this exactly means.


https://documentation.wazuh.com/3.11/user-manual/agents/agent-life-cycle.html#agent-status


You can check this in the next command for the disconnected agents.


/var/ossec/bin/agent-control -l 


If the state is pending, that means that the agent got registered but the manager didn´t receive any other information, usually this come from firewall problems or something symilar.


You can try to register the agent again, to check if it starts working. If that doesn´t work I guess you have a connectivity issue, perhaps your proxy is truncating the information from the agents. You can check to point one of your disconnected agents to your manager directly and registered like that, and see if it works without the proxy, if that works then your proxy is truncating the information.


Let me know if the new registration worked for you.


Regards.


Antonio PV

unread,
Feb 25, 2020, 9:25:29 AM2/25/20
to Wazuh mailing list
Hi Stefano,

Did you check if they are actually in pending status also?

After to point to the manager, if you try to point the agent to the proxy and register the agent again is still pending?

You can try that to check if it works, it might be that onces that is in pending the agent won´t work unless you register it again to force it to restart the connection with the manager.

Let me know if you get any news with that.

Regards.


Hi Antonio.
You're right, if i point to the manager directly  the agent is able to connect. 
But the thing that make me crazy is that all the agent that are unable to connect are in the same local network. from outside all the agents are able to connect.

Antonio PV

unread,
Feb 25, 2020, 1:42:28 PM2/25/20
to Wazuh mailing list
Hi Stefano,

Can you check that you have traffic in the manager from the disconnected agents?

You can try with tcpdump command to check if there is traffic arriving at the manager.
tcpdump -i "interface" udp port 1514 and scr host "agent_ip"

You can check the interface you are using to receive the traffic with the ifconfig command.

You can check also the same in the LB, but I guess the traffic is reaching there with no problem, but to the manager, it is better to get sure.

Also, I would check the ossec.conf of one agent that is working and others that is not, to compare the differences.

Can you tell me if all the disconnected agents are pointing to the same managers and if the managers have the same configuration? Check the protocol of the agents and on the manager side, to check if they are both TCP or UDP, remember you have to enable the same protocol in the manager an the agent.

Regards.
 

Hi Antonio
After i point the agent back to the proxy and try to register again, they are still in pending status.

on agent the log are still the same:
2020/02/24 13:56:49 ossec-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: 'LB IP'.
2020/02/24 13:57:00 ossec-agentd: INFO: Trying to connect to server (LB IP:1514/udp).  


Have a nice day

Stefano Serano

unread,
Feb 26, 2020, 3:19:23 AM2/26/20
to Antonio PV, Wazuh mailing list
Hi Antonio.
I really appreciate your help.
as you said, tcpdump on LB give me info about connection from the disconnected agent, but i not see nothing on the manager.

No particular difference i've found comparing agents config files, except for CIS policies that is disabled on disconnected agent.

Protocol in Wazuh Manager and Wazuh Worker is the same:
Port 1514
Proto:udp

Maybe a problem in nginx traffic to forward data received from host in same network? because all the disconnected agent are in the same network.
ive tried to open a tkt through nginx mail list.

Have a nice day.




--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.

Antonio PV

unread,
Feb 26, 2020, 11:38:08 AM2/26/20
to Wazuh mailing list
Hi Stefano,

If the configuration of ossec.conf in the managers and the agents is right, then I think we can culprit to Nginx of not working properly.

Unfortunately, without more information about your system, there is nothing I can do. I hope you understand.

If you have any updates let me know.

Regards.

Stefano Serano

unread,
Feb 26, 2020, 11:46:28 AM2/26/20
to Antonio PV, Wazuh mailing list
Hi Antonio.
No problem.
I'll try to find a solution by myself.

Thanks anyway for the support, have a nice day.

--
You received this message because you are subscribed to a topic in the Google Groups "Wazuh mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wazuh/N1usYy2lTYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wazuh+un...@googlegroups.com.

Stefano Serano

unread,
Feb 27, 2020, 9:15:07 AM2/27/20
to Antonio PV, Wazuh mailing list
Hi Antonio.
i've figure out that the problem is the udp protocol.
Making this change on port 1514 on Wazuh manager and worker, and on nginx, the agent are able to connect.

Now my question is, there is a way to change the protocol in all agents in distribution mode? or i need to make this change manually?

Antonio PV

unread,
Feb 27, 2020, 9:26:53 AM2/27/20
to Wazuh mailing list
Hi Stefano,

We have the agent.conf to push the configuration to all the agents.

But in this case, the global tag part is not included so I´m afraid you must go one by one my friend.

I´m sorry for the inconvenience.

Here you have the configuration you can centralize.

https://documentation.wazuh.com/3.11/user-manual/reference/centralized-configuration.html

Regards.

Daniel Ruiz

unread,
Mar 26, 2020, 5:58:57 AM3/26/20
to Wazuh mailing list
Hi,

I was reading this post and just wanted to add some useful information for anyone who lands in this thread:
I hope it helps.

Regards.

José Raeiro

unread,
Sep 5, 2022, 10:40:05 AM9/5/22
to Wazuh mailing list
> 1- Now i'm able to authenticate the clients over port 1515 tcp, the first time i made an error in configuration.

What error did you make? I ask because I'm facing the same issue as you did before you corrected the error in the configuration.

Thank you in advance!

Stefano Serano

unread,
Sep 5, 2022, 10:43:55 AM9/5/22
to José Raeiro, Wazuh mailing list
Hi Josè.
I've open this thread in 2020, and i've dismissed the cluster some months later, so i could not help you sorry.
maybe is better if you open another request.

Have a nice day.

José Raeiro

unread,
Sep 5, 2022, 1:13:32 PM9/5/22
to Stefano Serano, Wazuh mailing list
Hi Stefano,

Of course. I solved it using the API enrollment approach. It worked! Thanks anyways!

Kind Regards,

José Raeiro
Mailtrack Sender notified by
Mailtrack
09/05/22, 06:12:21 PM

Reply all
Reply to author
Forward
0 new messages