Wazuh Kubernetes - Wazuh manager worker nodes not able to communicate with Wazuh manager master

914 views
Skip to first unread message

Shubham Shrivastav

unread,
Oct 8, 2019, 5:34:59 PM10/8/19
to Wazuh mailing list
Hello,
I'm trying to deploy the Wazuh server in Kubernetes (I'm using your Wazuh Kubernetes repo for reference).
I've deployed **wazuh/wazuh:3.9.0_6.7.2** docker image in my k8s cluster hosted on AWS.
I have performed all the steps as instructed.

The problem I am facing is:
The wazuh agent I registered shows as **never connected** in Kibana dashboard.

On further investigation, I tried to curl to my wazuh load balancer services at ports 1515 and 1514 from the machine which holds wazuh agent and it was able to connect to both of them with empty reply from load-balancer:1514/tcp

However, my agent logs showed me this:

2019/10/08 21:07:27 ossec-agentd: WARNING: Unable to reload hostname for 'my-nlb-url-pointing-at-1514'. Using previous address.
2019/10/08 21:07:27 ossec-agentd: INFO: Trying to connect to server (my-nlb-url-pointing-at-1514/172.23.5.32:1514/tcp).
2019/10/08 21:07:31 ossec-syscheckd: INFO: (6010): File integrity monitoring scan frequency: 43200 seconds
2019/10/08 21:07:31 ossec-syscheckd: INFO: (6008): File integrity monitoring scan started.
2019/10/08 21:07:49 ossec-agentd: WARNING: Unable to reload hostname for 'my-nlb-url-pointing-at-1514'. Using previous address.
2019/10/08 21:07:49 ossec-agentd: INFO: Trying to connect to server (my-nlb-url-pointing-at-1514:1514/tcp).
2019/10/08 21:08:10 ossec-agentd: WARNING: Unable to reload hostname for 'my-nlb-url-pointing-at-1514'. Using previous address.
2019/10/08 21:08:10 ossec-agentd: INFO: Trying to connect to server (my-nlb-url-pointing-at-1514/IP_ADDRESS:1514/tcp).
2019/10/08 21:08:31 ossec-agentd: WARNING: Unable to reload hostname for 'my-nlb-url-pointing-at-1514'. Using previous address.
2019/10/08 21:08:31 ossec-agentd: INFO: Trying to connect to server (my-nlb-url-pointing-at-1514/IP_ADDRESS:1514/tcp).

My agent config file snippet:
```
  <client>
    <server>
      <address>my-nlb-url-pointing-at-1514</address>
      <port>1514</port>
      <protocol>tcp</protocol>
    </server>
    <config-profile>centos, centos7, centos7.6</config-profile>
    <notify_time>10</notify_time>
    <time-reconnect>60</time-reconnect>
    <auto_restart>yes</auto_restart>
    <crypto_method>aes</crypto_method>
  </client>
```

On further investigation, I tried to find any errors on my wazuh manager master side using **cat /var/ossec/logs/ossec.log**  and could not find any warnings.

But when i connected to wazuh worker pod, i saw this error

2019/10/08 21:25:20 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:25:30 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:25:40 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:25:50 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:26:00 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:26:10 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:26:20 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:26:30 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:26:40 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.
2019/10/08 21:26:50 wazuh-clusterd: ERROR: [Worker] [Main] Could not connect to master: [Errno -2] Name or service not known. Trying again in 10 seconds.


Its not able to connect to wazuh master

Shubham Shrivastav

unread,
Oct 8, 2019, 8:01:46 PM10/8/19
to Wazuh mailing list

I'm looking at the /var/ossec/etc/ossec.conf in my worker nodes and I see this:

    <nodes>
        <node>wazuh-manager-master-0.wazuh-cluster.wazuh.svc.cluster.local</node>
    </nodes>

My guess is wording is divided into something like:

    <nodes>
        <node>name-of-master-pod.service-name.service-name.svc.cluster.local</node>
    </nodes>

Is this accurate? Because this may cause problems!

Mayte Ariza

unread,
Oct 9, 2019, 5:33:50 AM10/9/19
to Wazuh mailing list
Hello Shubham,

It seems your Wazuh cluster is not working properly. Let's start with this issue before debugging the communication with the agent. Can you do some checking to help us to understand the problem?

Run /var/ossec/bin/cluster_control -i in your Wazuh master node to check your cluster name.

Run /var/ossec/bin/cluster_control -l in your Wazuh master node to list the current Wazuh cluster. That command will show you your Wazuh master node name, type, version, and address.
Check the cluster block in /var/ossec/etc/ossec.conf file:

- name should match your cluster name. (showed in the first command output)
- node_name should match the name (showed in the second command output).
- node_type should match the type.
- node should match the address: <node>name-of-master-pod.service-wazuh-cluster-name.kubernetes-namespace.svc.cluster.local</node>

Do the same checks in the Wazuh worker node. And also the following ones concerning the cluster block in /var/ossec/etc/ossec.conf file:

- Does the key match for both nodes? (It must be the same)
- Are they using the same port? By default, it should be 1516. Can it be reached?

Let us know the results you got when performing the checks so that we can continue to debug the problem.

Regards,
Mayte Ariza.

Shubham Shrivastav

unread,
Oct 23, 2019, 4:21:40 PM10/23/19
to Wazuh mailing list
Hi, the issue was caused due to improper statefulset stable network id config. I had a separate name for my namespace. Rechanging it fixed my problem.

Thanks,
S

Mayte Ariza

unread,
Oct 24, 2019, 4:56:30 AM10/24/19
to Wazuh mailing list

Hi Shubham,


I'm glad the problem is solved.


Regards,
Mayte Ariza.
Reply all
Reply to author
Forward
0 new messages