Agents get disconnected after few hours (v3.9)

411 views
Skip to first unread message

Shubham Shrivastav

unread,
Oct 16, 2019, 3:29:02 PM10/16/19
to Wazuh mailing list
Hi all,

I'm not able to find any errors or warning in my Wazuh manager or agent file:

Agent and Manager:

cat /var/ossec/logs/ossec.log

However after the last keep-alive signal. I saw the logs of wazuh-clusterd daemon stopped occurring (cat /var/ossec/logs/cluster.log)

Still, I could not find any errors on:

cat /var/ossec/logs/cluster.log | grep -i -E "(error|warning|critical)"


I did a /var/ossec/bin/ossec-control status
and found out clusterd was being removed:

wazuh-clusterd: Process 11734 not used by Wazuh, removing...

wazuh-clusterd not running...

wazuh-modulesd is running...

ossec-monitord is running...

ossec-logcollector is running...

ossec-remoted is running...

ossec-syscheckd is running...

ossec-analysisd is running...

ossec-maild not running...

ossec-execd is running...

wazuh-db is running...

ossec-authd not running...

ossec-agentlessd not running...

ossec-integratord not running...

ossec-dbd not running...

ossec-csyslogd not running...



Restarting the manager solved the problem and agents got connected, but I still am not able to figure out why this daemon stops automatically.


Daniel Ruiz

unread,
Oct 17, 2019, 3:00:21 AM10/17/19
to Wazuh mailing list
Hi Shubham,

I'm sorry to read that.

It is very strange that the cluster daemon stops without any error in its log. I can affirm that the cluster service shouldn't stop automatically at all unless an exception occurs or it is manually killed.
Is this a recurrent problem or just happened once? Which nodes got their clusterd service stopped? Maybe did you try to run the cluster manually with /var/ossec/bin/wazuh-clusterd?

In order to do some troubleshooting, I would need the following:
  • Exact version of your Wazuh managers in the cluster (all should be the same version).
  • The cluster.log file of your master and workers in the moment clusterd stopped. If the daemon stops recurrently, you could enable the debug mode to get more information in the log. To do this, just add wazuh_clusterd.debug=0
    in the /var/ossec/etc/local_internal_options.conf file.
  • Your cluster configuration in master and workers (ossec.conf).
  • Your load balancer configuration if any.

Sorry for the inconvenience.

Regards,

Shubham Shrivastav

unread,
Oct 17, 2019, 4:01:47 PM10/17/19
to Wazuh mailing list
Hey,

- clusterd stopping is happening every 10 hours.
- I have a wazuh manager master and a wazuh manager worker node. clusterd is stopping in worker node, Its working fine in the master node.
- I did not run clusterd manually. Restarted using service wazuh-manager restart.

- Both Wazuh managers have v3.9.0
- daemon stopped and agents disconnected at 15:17 (restarted wazuh-manager service at 17:40)
clusterd logs:
Master:

2019/10/17 15:30:03 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:03 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:03 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:03 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:30:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:30:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:30:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:30:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:30:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:30:21 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:21 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:21 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:21 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:30:26 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:30:26 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/17 15:31:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Main] Disconnected.

2019/10/17 15:31:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Main] Cancelling pending tasks.

2019/10/17 17:13:48 wazuh-clusterd: INFO: [Local 909011] [Main] Connection received in local server.

2019/10/17 17:13:49 wazuh-clusterd: INFO: [Local 909011] [Main] Disconnected.

2019/10/17 17:13:53 wazuh-clusterd: INFO: [Local 483341] [Main] Connection received in local server.

2019/10/17 17:13:53 wazuh-clusterd: INFO: [Local 483341] [Main] Disconnected.


Worker:

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:30:30 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:30:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:30:39 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:30:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:30:48 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:30:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:30:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:31:03 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Keep Alive] Sucessful response from master: keepalive

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:31:06 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:31:15 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:31:16 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:31:25 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:31:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:31:34 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Starting to send agent status files

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Permission to synchronize granted.

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Compressing files

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Sending compressed file to master

2019/10/17 15:31:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Worker files sent to master.

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Permission to synchronize granted.

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Compressing files

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Sending compressed file to master

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Worker files sent to master.

2019/10/17 15:31:43 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] The master has verified that the integrity is right.

2019/10/17 17:40:07 wazuh-clusterd: INFO: [Local Server] [Main] Serving on /var/ossec/queue/cluster/c-internal.sock

2019/10/17 17:40:08 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Main] Sucessfully connected to master.


- Load balancer config is copied as it is from wazuh-kubernetes repo
- Attached config files
ossec-manager-master.conf
ossec-manager-worker.conf

Daniel Ruiz

unread,
Oct 22, 2019, 3:46:39 AM10/22/19
to Wazuh mailing list
Hi Shubham,

sorry for the late response.

I made a mistake in my last response, you need to add the line wazuh_clusterd.debug=2 in order to activate the debug mode, sorry. Could you please send me again the worker logs with debug information?

I would recommend using the last Wazuh version. If you want to stuck on 3.9.x, use v3.9.5 since it includes several fixes for cluster. If you can afford upgrading to the very last one, you could use v3.10.2 which includes some useful new features.

I guess you have a kubernetes environment. If I am right, check that the worker has enough memory and check also for system logs just in case you find an out of memory error. I would check disk space too.


I hope we find the root of the problem soon.

Regards,

unknown man

unread,
Oct 23, 2019, 6:11:55 AM10/23/19
to Daniel Ruiz, Wazuh mailing list
Hello Daniel,

I was following the thread as I was also facing the same issue.

I have Wazuh cluster running on v3.9.5 and we have the issues reported here. But the interval is not 10hrs but its irregular.
So we manually restart ossec-control on wazuh manager and also on worker pods to reconnect agents.

Regards,
Aravind

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/37f069f2-0a3f-476b-9842-2e7af56d6b80%40googlegroups.com.

Daniel Ruiz

unread,
Oct 23, 2019, 6:58:27 AM10/23/19
to Wazuh mailing list

Hi,

in order to debug the issue I need the logs I requested in my previous messages and as much information as you can give about how is your environment configured.

Otherwise, it will be quite hard for me to do any troubleshooting. We have lots of environments running a Wazuh Cluster with Kubernetes without any interruption.

I look forward receiving any further data to help fixing this issue.

Regards,
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

Shubham Shrivastav

unread,
Oct 23, 2019, 2:49:43 PM10/23/19
to Wazuh mailing list
Hi Daniel,
Sorry for the delayed response.
I've upgraded my wazuh cluster and stack to v3.9.5. deamon stopped after 13 hours.
Below are the logs. This time the worker nodes were working fine. Master node had its clusterd crashed.

Master: /var/ossec/logs/cluster.log

2019/10/23 17:23:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/23 17:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'new_file'

2019/10/23 17:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_upd'

2019/10/23 17:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_end'

2019/10/23 17:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_a_w_m_e'

2019/10/23 17:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Agent info] Received file from worker: '/var/ossec/queue/cluster/wazuh-manager-worker-1/wazuh-manager-worker-1-1571851427.5509737-8248794260368851.zip'

2019/10/23 17:23:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_i_w_m_p'

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_i_w_m'

2019/10/23 17:23:49 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'new_file'

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_upd'

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_end'

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_i_w_m_e'

2019/10/23 17:23:49 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Integrity] Received file from worker: '/var/ossec/queue/cluster/wazuh-manager-worker-1/wazuh-manager-worker-1-1571851429.938105-014996066442838019.zip'

2019/10/23 17:23:49 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/23 17:23:49 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/23 17:23:50 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.

2019/10/23 17:23:54 wazuh-clusterd: DEBUG: [Master] [File integrity] Calculating

2019/10/23 17:23:54 wazuh-clusterd: DEBUG: [Master] [File integrity] Calculated.

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'sync_i_w_m_p'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'sync_i_w_m'

2019/10/23 17:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Waiting to receive zip file from worker

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'new_file'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'file_upd'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'file_end'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'sync_i_w_m_e'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Received file from worker: '/var/ossec/queue/cluster/wazuh-manager-worker-0/wazuh-manager-worker-0-1571851436.420233-3624107286434103.zip'

2019/10/23 17:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/23 17:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/23 17:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Finished integrity synchronization.

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'sync_a_w_m_p'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'sync_a_w_m'

2019/10/23 17:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Waiting to receive zip file from worker

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'new_file'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'file_upd'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'file_end'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: b'sync_a_w_m_e'

2019/10/23 17:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Agent info] Received file from worker: '/var/ossec/queue/cluster/wazuh-manager-worker-0/wazuh-manager-worker-0-1571851436.9350169-9797402420092962.zip'

2019/10/23 17:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/23 17:23:56 wazuh-clusterd: ERROR: [Worker wazuh-manager-worker-0] [Main] Error updating agent group/status (/var/ossec/queue/cluster/wazuh-manager-worker-0/vce-load-web-any): [Errno 2] No such file or directory: '/var/ossec/queue/agent-info/vce-load-web-any.tmp'

Traceback (most recent call last):

  File "/var/ossec/framework/python/lib/python3.7/shutil.py", line 563, in move

    os.rename(src, real_dst)

FileNotFoundError: [Errno 2] No such file or directory: '/var/ossec/queue/agent-info/vce-load-web-any.tmp' -> '/var/ossec/queue/agent-info/vce-load-web-any'


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/var/ossec/framework/python/lib/python3.7/site-packages/wazuh-3.9.5-py3.7.egg/wazuh/cluster/master.py", line 408, in update_file

    time=(mtime_epoch, mtime_epoch)

  File "/var/ossec/framework/python/lib/python3.7/site-packages/wazuh-3.9.5-py3.7.egg/wazuh/utils.py", line 365, in safe_move

    shutil.move(tmp_target, target, copy_function=shutil.copyfile)

  File "/var/ossec/framework/python/lib/python3.7/shutil.py", line 577, in move

    copy_function(src, real_dst)

  File "/var/ossec/framework/python/lib/python3.7/shutil.py", line 120, in copyfile

    with open(src, 'rb') as fsrc:

FileNotFoundError: [Errno 2] No such file or directory: '/var/ossec/queue/agent-info/vce-load-web-any.tmp'

2019/10/23 17:23:57 wazuh-clusterd: ERROR: [Worker wazuh-manager-worker-0] [Agent info] Errors updating worker files: /queue/agent-info/: 1

NoneType: None

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_a_w_m_p'

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_a_w_m'

2019/10/23 17:23:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Waiting to receive zip file from worker

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'new_file'

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_upd'

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_end'

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_a_w_m_e'

2019/10/23 17:23:57 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Agent info] Received file from worker: '/var/ossec/queue/cluster/wazuh-manager-worker-1/wazuh-manager-worker-1-1571851437.6211765-9242756052244206.zip'

2019/10/23 17:23:57 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Agent info] Analyzing worker files: Received 1 files to check.

2019/10/23 17:23:57 wazuh-clusterd: ERROR: [Worker wazuh-manager-worker-1] [Main] Error updating agent group/status (/var/ossec/queue/cluster/wazuh-manager-worker-1/vce-load-Weblb-any): [Errno 2] No such file or directory: '/var/ossec/queue/agent-info/vce-load-Weblb-any.tmp'

Traceback (most recent call last):

  File "/var/ossec/framework/python/lib/python3.7/shutil.py", line 563, in move

    os.rename(src, real_dst)

FileNotFoundError: [Errno 2] No such file or directory: '/var/ossec/queue/agent-info/vce-load-Weblb-any.tmp' -> '/var/ossec/queue/agent-info/vce-load-Weblb-any'


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/var/ossec/framework/python/lib/python3.7/site-packages/wazuh-3.9.5-py3.7.egg/wazuh/cluster/master.py", line 408, in update_file

    time=(mtime_epoch, mtime_epoch)

  File "/var/ossec/framework/python/lib/python3.7/site-packages/wazuh-3.9.5-py3.7.egg/wazuh/utils.py", line 365, in safe_move

    shutil.move(tmp_target, target, copy_function=shutil.copyfile)

  File "/var/ossec/framework/python/lib/python3.7/shutil.py", line 577, in move

    copy_function(src, real_dst)

  File "/var/ossec/framework/python/lib/python3.7/shutil.py", line 120, in copyfile

    with open(src, 'rb') as fsrc:

FileNotFoundError: [Errno 2] No such file or directory: '/var/ossec/queue/agent-info/vce-load-Weblb-any.tmp'

2019/10/23 17:23:57 wazuh-clusterd: ERROR: [Worker wazuh-manager-worker-1] [Agent info] Errors updating worker files: /queue/agent-info/: 1

NoneType: None

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_i_w_m_p'

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_i_w_m'

2019/10/23 17:23:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Waiting to receive zip file from worker

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'new_file'

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_upd'

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'file_end'

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Main] Command received: b'sync_i_w_m_e'

2019/10/23 17:23:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-1] [Integrity] Received file from worker: '/var/ossec/queue/cluster/wazuh-manager-worker-1/wazuh-manager-worker-1-1571851439.024201-055798795310555316.zip'

2019/10/23 17:23:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Received 15 files to check.

2019/10/23 17:23:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Analyzing worker integrity: Files checked. There are no KO files.

2019/10/23 17:23:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-1] [Integrity] Finished integrity synchronization.


And the logs stop after this

Thanks for actively replying,
S

Daniel Ruiz

unread,
Oct 24, 2019, 4:06:02 AM10/24/19
to Wazuh mailing list
Hi Shubham,

that is a known issue: https://github.com/wazuh/wazuh/issues/4007. It was solved for version 3.11 which is currently under testing.

However, if you do not want to wait for the new release, I leave you these intructions just in case you feel comfortable changing the source code:
1. Upgrade to version 3.9.5 if you are not already in it.
2. For each node and assuming you are in 3.9.5 version, edit the file in WAZUH_HOME/framework/python/lib/python3.7/site-packages/wazuh-3.9.5-py3.7.egg/wazuh/utils.py
3. Locate the line 361:

    tmp_target = f"{target}.tmp"

and replace it by the following two lines:

    tmp_path, tmp_filename = path.split(target)
    tmp_target
= path.join(tmp_path, f".{tmp_filename}.tmp")

Be careful not to alter the number of leading spaces.

4. Restart Wazuh in that node.

Although it is not optimal for users changing the source code, I consider you need the faster solution to your problem and maybe you cannot afford waiting to 3.11 release. I apologize for the inconvenience.
Keep in mind that this fix will be gone if you upgrade to 3.10.x. In that case you could repeat the instructions above or upgrade directly to 3.11 where it is already fixed.

I hope it helps.

Regards,

Shubham Shrivastav

unread,
Oct 24, 2019, 1:46:09 PM10/24/19
to Wazuh mailing list
I have made the desired changes in all my manager nodes. I'll observe their behavior for a few hours. Also, I tried to upgrade my entire stack to 3.10.2 and my Grafana dashboard kept losing wazuh API config. Shifting back to 3.9.5 solved it! Below is the link for that just in case :

S
...

Shubham Shrivastav

unread,
Oct 25, 2019, 9:28:07 PM10/25/19
to Wazuh mailing list
I'm still facing the same issue. The wazuh master-worker node 1 crashed after approximately 10 hours.
I can't see any error logs.

The logs when the daemon stopped:

2019/10/25 09:23:01 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:01 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:01 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:01 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:01 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.06557583808898926 s

2019/10/25 09:23:01 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:01 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:23:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:23:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:23:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:23:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:23:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:23:02 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.008559703826904297 s

2019/10/25 09:23:10 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:10 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:10 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:10 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:10 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.011239767074584961 s

2019/10/25 09:23:10 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:10 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:23:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:23:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:23:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:23:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:23:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:23:12 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.008646249771118164 s

2019/10/25 09:23:19 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:19 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:19 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:19 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:19 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.011734962463378906 s

2019/10/25 09:23:19 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:19 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:23:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:23:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:23:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:23:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:23:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:23:22 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.06198263168334961 s

2019/10/25 09:23:28 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:28 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:28 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:28 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:28 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.010253429412841797 s

2019/10/25 09:23:28 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:28 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:23:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:23:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:23:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:23:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:23:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:23:32 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.007727622985839844 s

2019/10/25 09:23:33 wazuh-clusterd: DEBUG: [Local Server] [Keep alive] Calculating.

2019/10/25 09:23:33 wazuh-clusterd: DEBUG: [Local Server] [Keep alive] Calculated.

2019/10/25 09:23:35 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Keep Alive] Sucessful response from master: keepalive

2019/10/25 09:23:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:37 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.010514259338378906 s

2019/10/25 09:23:37 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:23:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:23:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:23:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:23:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:23:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:23:42 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.008680582046508789 s

2019/10/25 09:23:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:46 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.04662013053894043 s

2019/10/25 09:23:47 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:23:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:23:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:23:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:23:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:23:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:23:52 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.0078122615814208984 s

2019/10/25 09:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.012543439865112305 s

2019/10/25 09:23:56 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:23:56 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.


And the node crashed at 09:23:58.

2019/10/25 09:24:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:24:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:24:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:24:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:24:02 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:24:02 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.007932424545288086 s

2019/10/25 09:24:05 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:05 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:05 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:05 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:05 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:05 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:24:05 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.04628944396972656 s

2019/10/25 09:24:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:24:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:24:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:24:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:24:12 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:24:12 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.008406639099121094 s

2019/10/25 09:24:14 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:14 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:14 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:14 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:14 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.011544942855834961 s

2019/10/25 09:24:14 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:14 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:24:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:24:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:24:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:24:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:24:22 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:24:22 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.0074727535247802734 s

2019/10/25 09:24:23 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:23 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:23 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:23 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:23 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.010594367980957031 s

2019/10/25 09:24:23 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:23 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:32 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:24:32 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.07049894332885742 s

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:24:32 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:24:32 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.007704257965087891 s

2019/10/25 09:24:33 wazuh-clusterd: DEBUG: [Local Server] [Keep alive] Calculating.

2019/10/25 09:24:33 wazuh-clusterd: DEBUG: [Local Server] [Keep alive] Calculated.

2019/10/25 09:24:35 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Keep Alive] Sucessful response from master: keepalive

2019/10/25 09:24:41 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:41 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:41 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:41 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:41 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.01472163200378418 s

2019/10/25 09:24:41 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:41 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:24:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:24:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:24:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:24:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:24:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:24:42 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.013339757919311523 s

2019/10/25 09:24:50 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:50 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:50 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:50 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:50 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:50 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/10/25 09:24:50 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.07037782669067383 s

2019/10/25 09:24:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/10/25 09:24:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/10/25 09:24:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/10/25 09:24:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/10/25 09:24:52 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/10/25 09:24:52 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.007358074188232422 s

2019/10/25 09:24:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/10/25 09:24:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/10/25 09:24:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/10/25 09:24:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/10/25 09:24:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.015290260314941406 s

2019/10/25 09:24:59 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/10/25 09:24:59 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right. 

Daniel Ruiz

unread,
Oct 28, 2019, 5:31:53 AM10/28/19
to Wazuh mailing list
Hi Shubham,

according to the logs and the behaviour you are experiencing, I can't see any bug in wazuh-clusterd that make it crashing. I suspect your operating system is killing the process for some reason.

Maybe it is being killed externally by the OS. I suggest some system variables such as memory consumption, disk space or network activity in order to get any clue that leads us to the root cause.
If the OS would be killing the clusterd process, it should appear in some log. Depending on your OS, there are utilities like dmesg to query any killed processes by the kernel.

I can help you with the troubleshooting but I need further information about your environment:
  • Operating system
  • Load balancer and network configuration
  • Number of agents
  • Type of the environment (docker, kubernetes, ...), volumes mounted if any, ports mapped
  • Memory, disk and network usage of clusterd in the moment of crashing or being killed
  • Log of the kernel in the moment of clusterd crashing or being killed
I hope we find out something useful soon.

Sorry for the inconvenience.

Regards,



...

Shubham Shrivastav

unread,
Oct 28, 2019, 2:51:35 PM10/28/19
to Wazuh mailing list
Hi Daniel,
My OS:
Wazuh Manager and worker container:
wazuh/wazuh:3.9.5_7.2.1
OS: phusion/baseimage:latest

wazuh agent:
Centos 7.4

The load balancer and network configuration:
apiVersionv1
kindService
metadata:
  namewazuh  # Don't change, unless you update the Wazuh Kibana app config
  namespacemanagement-np
  labels:
    appwazuh-manager
    #dns: route53
  annotations:
    #domainName: 'nonprod.vocera.com'
spec:
  typeLoadBalancer
  selector:
    appwazuh-manager
    node-typemaster
  ports:
    - nameregistration
      port1515
      targetPort1515
    - nameapi
      port55000
      targetPort55000
---
apiVersionv1
kindService
metadata:
  namewazuh-cluster
  namespacemanagement-np
  labels:
    appwazuh-manager
spec:
  selector:
    appwazuh-manager
  ports:
    - namecluster
      port1516
      targetPort1516
  clusterIPNone
---
apiVersionv1
kindService
metadata:
  namewazuh-workers
  namespacemanagement-np
  labels:
    appwazuh-manager
    #dns: route53
  annotations:
    #domainName: 'wazuh-manager.nonprod.vocera.com'  # TODO: Change this for a Hosted Zone you configured in AWS Route 53
spec:
  typeLoadBalancer
  selector:
    appwazuh-manager
    node-typeworker
  ports:
    - nameagents-events
      port1514
      targetPort1514

The number of agents:

   ID: 000, Name: wazuh-manager-master-0 (server), IP: 127.0.0.1, Active/Local

   ID: 016, Name: vce-load-web, IP: any, Active

   ID: 015, Name: VCE-LOAD-ETL, IP: any, Active

   ID: 017, Name: vce-load-Weblb, IP: any, Active

   ID: 018, Name: agent-test, IP: any, Active


Environment: Kubernetes : deployment file below
apiVersionapps/v1
kindStatefulSet
metadata:
  namewazuh-manager-master
  namespacemanagement-np
spec:
  replicas1
  selector:
    matchLabels:
      appwazuh-manager
      node-typemaster
  serviceNamewazuh-cluster
  podManagementPolicyParallel
  template:
    metadata:
      labels:
        appwazuh-manager
        node-typemaster
      namewazuh-manager-master
    spec:
      volumes:
        - nameconfig
          configMap:
            namewazuh-manager-master-conf
      containers:
        - namewazuh-manager
          image'wazuh/wazuh:3.9.5_7.2.1' #wazuh/wazuh:3.10.2_7.3.2 
          resources:
            requests:
              cpu250m
              memory128Mi
            limits:
              cpu500m
              memory256Mi
          volumeMounts:
            - nameconfig
              mountPath/wazuh-config-mount/etc/ossec.conf
              subPathossec.conf
              readOnlytrue
            - namewazuh-manager-master
              mountPath/var/ossec/data
            - namewazuh-manager-master
              mountPath/etc/postfix
          ports:
            - containerPort1515
              nameregistration
            - containerPort1516
              namecluster
            - containerPort55000
              nameapi
  volumeClaimTemplates:
    - metadata:
        namewazuh-manager-master
        namespacemanagement-np
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassNamewazuh-stg-cs
        resources:
          requests:
            storage10Gi
---
apiVersionapps/v1
kindStatefulSet
metadata:
  namewazuh-manager-worker-0
  namespacemanagement-np
spec:
  replicas1
  selector:
    matchLabels:
      appwazuh-manager
      node-typeworker
      sts-id'0'
  serviceNamewazuh-cluster
  podManagementPolicyParallel
  template:
    metadata:
      labels:
        appwazuh-manager
        node-typeworker
        sts-id'0'
      namewazuh-manager-worker-0
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - keysts-id
                      operatorIn
                      values:
                        - '1'
                topologyKeykubernetes.io/hostname
      volumes:
        - nameconfig
          configMap:
            namewazuh-manager-worker-0-conf
      containers:
        - namewazuh-manager
          image'wazuh/wazuh:3.9.5_7.2.1'
          resources:
            requests:
              cpu250m
              memory128Mi
            limits:
              cpu500m
              memory256Mi
          volumeMounts:
            - nameconfig
              mountPath/wazuh-config-mount/etc/ossec.conf
              subPathossec.conf
              readOnlytrue
            - namewazuh-manager-worker
              mountPath/var/ossec/data
            - namewazuh-manager-worker
              mountPath/etc/postfix
          ports:
            - containerPort1514
              nameagents-events
            - containerPort1516
              namecluster
  volumeClaimTemplates:
    - metadata:
        namewazuh-manager-worker
        namespacemanagement-np
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassNamewazuh-stg-cs
        resources:
          requests:
            storage10Gi
---
apiVersionapps/v1
kindStatefulSet
metadata:
  namewazuh-manager-worker-1
  namespacemanagement-np
spec:
  replicas1
  selector:
    matchLabels:
      appwazuh-manager
      node-typeworker
      sts-id'1'
  serviceNamewazuh-cluster
  podManagementPolicyParallel
  template:
    metadata:
      labels:
        appwazuh-manager
        node-typeworker
        sts-id'1'
      namewazuh-manager-worker-1
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - keysts-id
                      operatorIn
                      values:
                        - '0'
                topologyKeykubernetes.io/hostname
      volumes:
        - nameconfig
          configMap:
            namewazuh-manager-worker-1-conf
      containers:
        - namewazuh-manager
          image'wazuh/wazuh:3.9.5_7.2.1'
          resources:
            requests:
              cpu250m
              memory128Mi
            limits:
              cpu500m
              memory256Mi
          volumeMounts:
            - nameconfig
              mountPath/wazuh-config-mount/etc/ossec.conf
              subPathossec.conf
              readOnlytrue
            - namewazuh-manager-worker
              mountPath/var/ossec/data
            - namewazuh-manager-worker
              mountPath/etc/postfix
          ports:
            - containerPort1514
              nameagents-events
            - containerPort1516
              namecluster
  volumeClaimTemplates:
    - metadata:
        namewazuh-manager-worker
        namespacemanagement-np
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassNamewazuh-stg-cs
        resources:
          requests:
            storage5Gi


Storage class:
---
kindStorageClass
apiVersionstorage.k8s.io/v1
metadata:
  namewazuh-stg-cs
  namespace: {{k8s_env.namespace}}
provisionerkubernetes.io/aws-ebs
volumeBindingModeWaitForFirstConsumer
parameters:
  typegp2
reclaimPolicyDelete
---
kindPersistentVolumeClaim
apiVersionv1
metadata:
  namewazuh-cluster-claim
  namespace: {{k8s_env.namespace}}
spec:
  storageClassNamewazuh-stg-cs
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage10Gi

I'm not able to find the below options since I'm using Wazuh within a docker container
  • Memory, disk and network usage of clusterd in the moment of crashing or being killed
  • Log of the kernel in the moment of clusterd crashing or being killed
Regards,
Shubham Shrivastava

Daniel Ruiz

unread,
Oct 29, 2019, 4:53:35 AM10/29/19
to Wazuh mailing list
Hi Shubham,

you can get more info with these two commands:
kubectl -n <namespace> describe pod <pod-name>
kubectl -n <namespace> log <pod-name>

The second one gives you all system logs. I would try to find any information about the PID of the clusterd service just in case we get further info about it.

Furthermore, I find the limit of resources too low:
          resources:
            requests
:
              cpu
: 250m
              memory
: 128Mi
            limits
:
              cpu
: 500m
              memory
: 256Mi

We never use less than:

          resources:
            requests
:
              cpu
: 500m
              memory
: 512Mi
            limits
:
              cpu
: 500m
              memory
: 512Mi

Give it a try and tell me if there is any change in the behaviour.

Regards,
...

Shubham Shrivastav

unread,
Oct 29, 2019, 7:49:32 PM10/29/19
to Wazuh mailing list
Made the relevant changes in deployment! 
Also, I can't find PID of the clusterd service using ps -e command.
I can see:

11506 ?        00:00:00 ossec-authd

11515 ?        00:00:01 wazuh-db

11532 ?        00:00:00 ossec-execd

11540 ?        00:00:02 ossec-analysisd

11547 ?        00:00:04 ossec-syscheckd

11556 ?        00:00:03 ossec-remoted

11565 ?        00:00:00 ossec-logcollec

11590 ?        00:00:00 ossec-monitord

11596 ?        00:00:00 wazuh-modulesd


But no clusterd

Daniel Ruiz

unread,
Oct 30, 2019, 4:21:49 AM10/30/19
to Wazuh mailing list
Hi Shubham,

you should see the clusterd PID with this command:

root@wazuh-master:/# ps -edf | grep clusterd
ossec      640     1  1 08:18 ?        00:00:01 /
var/ossec/framework/python/bin/python3 /var/ossec/framework/scripts/wazuh-clusterd.p

I hope it helps

Regards,

Shubham Shrivastav

unread,
Oct 30, 2019, 8:05:20 PM10/30/19
to Wazuh mailing list
Hey, the agents show as disconnected on kibana dashboard and on wazuh manager, but the clusterd daemon is running for all the instances. Weirdly enough I'm receiving logs (such as a user logging into agent) from the agents to the wazuh manager as well as the kibana dashboard. But I guess the keep-alive signal is causing the problems. 
/var/ossec/bin/agent_control -l gives all agents aS DISCONNECTED.

Regards
S

Daniel Ruiz

unread,
Oct 31, 2019, 6:36:42 AM10/31/19
to Wazuh mailing list
Hi Shubham,

Have you checked again these logs just in case any error appear?:
  • Load balancer log
  • ossec.log and cluster.log of the managers
  • ossec.log of the agent
Maybe the cluster is not synchronizing files properly and the state of the agents is not up to date. In that case I think you should see some errors in the cluster.log.

Have you tried to restart all managers in cluster? Does the problem persists? Were agents shown as connected when you first configured your cluster environment?

In order to get more info about the current status of the cluster, try these commands and send me the result:
  • Only in master node:
# /var/ossec/bin/cluster_control -a

  • In all nodes (both master and workers):
# ls -lrt /var/ossec/queue/agent-info/

# ls -lrtR /var/ossec/queue/cluster/

Regards,
...

Shubham Shrivastav

unread,
Nov 5, 2019, 3:21:14 AM11/5/19
to Wazuh mailing list
The worker-0 manager node crashed yesterday at 

2019/11/03 00:20:47 and /var/ossec/bin/cluster_control -a showed the cluster as disconnected.


Worker-0 logs:


2019/11/03 00:20:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/11/03 00:20:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/11/03 00:20:27 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.023639917373657227 s

2019/11/03 00:20:27 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/11/03 00:20:27 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/11/03 00:20:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/11/03 00:20:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/11/03 00:20:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/11/03 00:20:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/11/03 00:20:36 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.018378019332885742 s

2019/11/03 00:20:36 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/11/03 00:20:36 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/11/03 00:20:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/11/03 00:20:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/11/03 00:20:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/11/03 00:20:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/11/03 00:20:37 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/11/03 00:20:37 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.008826494216918945 s

2019/11/03 00:20:42 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Keep Alive] Sucessful response from master: keepalive

2019/11/03 00:20:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Permission to synchronize granted.

2019/11/03 00:20:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Compressing files

2019/11/03 00:20:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Sending compressed file to master

2019/11/03 00:20:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] Worker files sent to master.

2019/11/03 00:20:45 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Integrity] Time synchronizing integrity: 0.11236786842346191 s

2019/11/03 00:20:45 wazuh-clusterd: DEBUG: [Worker wazuh-manager-worker-0] [Main] Command received: 'b'sync_m_c_ok''

2019/11/03 00:20:45 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Integrity] The master has verified that the integrity is right.

2019/11/03 00:20:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Starting to send agent status files

2019/11/03 00:20:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Permission to synchronize granted.

2019/11/03 00:20:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Compressing files

2019/11/03 00:20:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Sending compressed file to master

2019/11/03 00:20:47 wazuh-clusterd: INFO: [Worker wazuh-manager-worker-0] [Agent info] Worker files sent to master.

2019/11/03 00:20:47 wazuh-clusterd: DEBUG2: [Worker wazuh-manager-worker-0] [Agent info] Time synchronizing agent statuses: 0.007305622100830078 s

Daniel Ruiz

unread,
Nov 5, 2019, 5:29:41 AM11/5/19
to Wazuh mailing list
Hi Shubham,

in order to perform the troubleshooting I need you to follow the steps in my previous emails and answer my questions.
Otherwise I won't be able to help you.

Please, review my two previous emails and send me the requested information.

As I explained before, I think the operating system is killing the wazuh-clusterd process, probably because of running out of memory.

We will carry on with the troubleshooting when I receive all the information.

I hope you understand.

Regards,
...

Shubham Shrivastav

unread,
Nov 5, 2019, 3:52:32 PM11/5/19
to Wazuh mailing list
Hey,

1) You wanted me to find information about the PID of the clusterd service in logs however I'm not able to find anything. I just grepped the output of the "kubectl -n <namespace> log <pod-name>" command.I looked at the cluster logs too still didn't find anything.
Am I supposed to do this in some other way?

2) How do I check for Load balancer logs?

3) ossec.log and cluster.log of the managers showed no errors except  https://github.com/wazuh/wazuh/issues/4007 for cluster log. This might have caused the crash since even after making the relevant changes in the utils.py, the file got overridden with the default version. I've set up a config map for this file and changes have been made.

I've redeployed this yesterday, I'm hoping it won't happen again.

Thanks,
S

Shubham Shrivastav

unread,
Nov 11, 2019, 11:46:13 PM11/11/19
to Wazuh mailing list
Hi, the servers crashed again yesterday. Screenshots attached!
Did not get any outputs from the 
command :   

cat /var/ossec/logs/cluster.log | grep -i -E "error|warn"

cat /var/ossec/logs/ossec.log | grep -i -E "error|warn"


I'm still receiving alerts from the agents (can see it in the Discover section of the dashboard)

Thanks,
S
...
Screen Shot 2019-11-11 at 8.30.13 PM.png
Screen Shot 2019-11-11 at 8.31.05 PM.png
Screen Shot 2019-11-11 at 8.31.39 PM.png

Shubham Shrivastav

unread,
Nov 12, 2019, 1:50:58 PM11/12/19
to Wazuh mailing list
The outputs for the commands that you gave me are :

Master

root@wazuh-manager-master-0:/# /var/ossec/bin/cluster_control -a

ID   NAME                    IP             STATUS        VERSION       NODE NAME               

000  wazuh-manager-master-0  127.0.0.1      Active        Wazuh v3.9.5  wazuh-manager-master    

015  VCE-LOAD-ETL            172.16.14.62   Disconnected  Wazuh v3.9.5  wazuh-manager-worker-0  

016  vce-load-web            172.16.14.247  Disconnected  Wazuh v3.9.5  wazuh-manager-worker-1  

017  vce-load-Weblb          172.16.15.203  Disconnected  Wazuh v3.9.5  wazuh-manager-worker-0  

018  agent-test              172.16.15.96   Disconnected  Wazuh v3.9.5  wazuh-manager-worker-0  


root@wazuh-manager-master-0:/# ls -lrt /var/ossec/queue/agent-info/

total 16

-rw-rw---- 1 ossec ossec 341 Nov  1 18:31 vce-load-Weblb-any

-rw-rw---- 1 ossec ossec 261 Nov 10 17:12 VCE-LOAD-ETL-any

-rw-rw---- 1 ossec ossec 338 Nov 10 17:12 agent-test-any

-rw-rw---- 1 ossec ossec 262 Nov 11 02:37 vce-load-web-any


Worker 1:

total 16

-rw-rw---- 1 ossecr ossec 341 Oct 30 21:18 vce-load-Weblb-any

-rw-rw---- 1 ossecr ossec 338 Nov  8 00:32 agent-test-any

-rw-rw---- 1 ossecr ossec 261 Nov  8 00:32 VCE-LOAD-ETL-any

-rw-rw---- 1 ossecr ossec 262 Nov 12 18:50 vce-load-web-any


Worker 0:

total 16

-rw-rw---- 1 ossecr ossec 341 Nov  1 18:31 vce-load-Weblb-any

-rw-rw---- 1 ossecr ossec 262 Nov  9 00:29 vce-load-web-any

-rw-rw---- 1 ossecr ossec 261 Nov 12 18:50 VCE-LOAD-ETL-any

-rw-rw---- 1 ossecr ossec 338 Nov 12 18:50 agent-test-any


Thanks,
S

Daniel Ruiz

unread,
Nov 13, 2019, 7:04:38 AM11/13/19
to Wazuh mailing list
Hi Shubham,

according to my email from Oct 24, I told you I believed that your clusterd process was being killed by the OS. In order to do the troubleshooting and to know that reason I requested you to monitor the disk and memory usage just to get a clue of what is going on. I never received that information. Furthermore, another problem raised regarding agents disconnected, and so far we still do not know if it is related to the first one or not. However, I still will try to help you.

Let me sum up some ideas:
  • Problem: clusterd crashing in worker nodes
    • Let's do another test. Configure all your cluster nodes (both master and workers) like below. If clusterd still crashes in the workers, problem may be other different from out of memory:
          resources:
            requests
:
              cpu
: 1000m
              memory
: 4096Mi
            limits
:
              cpu
: 1000m
              memory
: 4096Mi
    • When your clusterd process crash, execute this command in the host where your container is running 
      sudo journalctl -xb
      Then, type /killed and enter to find any occurrences for killed processes. Type n to go to next occurrences. If the kernel killed your clusterd process, the reason should be shown there. 

  • Problem: agents are shown as disconnected.
    • First of all, ensure that clusterd process is running in every node.
    • Run this command on every cluster node:
      find /var/ossec/queue/agent-info/ -mmin -30 -type f -exec ls -l {} + | wc -l
      That gives you the agents connected in the last 30 minutes. If you do not get anything, maybe you have a communication problem or some issue with your keys.
    • Check communication between nodes so as to discard any connectivity issues:
      • On linux agents:
        nc -zv YOUR_LB_IP 1514
        nc
        -zv YOUR_LB_IP 1515
      • On Windows agents using Powershell:
        (new-object Net.Sockets.TcpClient).Connect("YOUR_LB_IP", 1514)
        (new-object Net.Sockets.TcpClient).Connect("YOUR_LB_IP", 1515)
    • In case you get the port is opened, the agent should be able to connect to. Check also lb logs to discard any issues at this point.
    • Maybe your agent-info files has not properly synchronized due to the first problem. Let's try to restore the agent status by removing all files in /var/ossec/queue/agent-info in all cluster nodes. This kind of files should be restored quickly by ossec-remoted every node where the agent is reporting and all agent-info.
Give it a try and let me know how it goes.

Regarding to the issue you have opened, I will close it since we do not know yet what the error is. If we find a bug after all this research, then we will open a new issue to explain it with all the detail. I hope you understand.

Regards,
...
Reply all
Reply to author
Forward
0 new messages