Agent and database issues

688 views
Skip to first unread message

Aishwarya Vinod

unread,
Jul 23, 2024, 6:39:00 AM7/23/24
to Wazuh | Mailing List
Hi Team, 

 I have deployed wazuh on kubernetes cluster based on documentation provided https://documentation.wazuh.com/current/deployment-options/deploying-with-kubernetes/kubernetes-deployment.html

I was able to previously connect with agents however from last 2 days agent enrollment kind of got stuck . Agent is able to get enrolled however it is unable to communicate with manager . Few agents are in 'never connected state' and few in disconnected state . Agent has never connected : The agent has been registered but has not yet connected to the manager.

Agent logs : 

2024/07/23 10:20:22 wazuh-agentd: ERROR: (1208): Unable to connect to enrollment service at '[xx.xx.xx]:1515'
2024/07/23 10:20:32 wazuh-agentd: WARNING: (4101): Waiting for server reply (not started). Tried: 'xx.xx.xx'. Ensure that the manager version is 'v4.8.1' or higher.
2024/07/23 10:20:32 wazuh-agentd: WARNING: Unable to connect to any server.
2024/07/23 10:20:32 wazuh-agentd: INFO: Closing connection to server ([xx.xx.xx]:1514/tcp).
2024/07/23 10:20:32 wazuh-agentd: INFO: Trying to connect to server ([xx.xx.xx]:1514/tcp).
2024/07/23 10:21:12 wazuh-agentd: INFO: Requesting a key from server: xx.xx.xx

Checked var/ossec/bin/wazuh-control status for master and worker and confirmed wazuh-agentlessd not running...

wazuh-clusterd is running...
wazuh-modulesd is running...
wazuh-monitord is running...
wazuh-logcollector is running...
wazuh-remoted is running...
wazuh-syscheckd is running...
wazuh-analysisd is running...
wazuh-maild not running...
wazuh-execd is running...
wazuh-db is running...
wazuh-authd is running...
wazuh-agentlessd not running...
wazuh-integratord is running...
wazuh-dbd not running...
wazuh-csyslogd not running...
wazuh-apid is running...


In the manager logs (/var/ossec/logs/ossec.log) , I am getting error : 

024/07/23 08:42:49 wazuh-modulesd:syscollector: ERROR: sqlite: COMMIT TRANSACTION. cannot commit - no transaction is active
2024/07/23 08:42:50 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot perform range checksum
2024/07/23 08:42:50 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot perform range checksum
2024/07/23 08:42:50 wazuh-db: ERROR: Deleting old information from 'sys_osinfo' table: database or disk is full
2024/07/23 08:42:50 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:42:50 wazuh-db: ERROR: Deleting old information from 'sys_hwinfo' table: database or disk is full
2024/07/23 08:42:50 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:42:50 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 08:42:50 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:42:50 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 08:42:50 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:43:24 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/auth.log' due to [(2)-(No such file or directory)].
2024/07/23 08:43:24 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/syslog' due to [(2)-(No such file or directory)].
2024/07/23 08:43:24 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/dpkg.log' due to [(2)-(No such file or directory)].
2024/07/23 08:43:24 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/kern.log' due to [(2)-(No such file or directory)].
2024/07/23 08:43:26 wazuh-analysisd: ERROR: FIM decoder: Bad response from database: Cannot save fim control message
2024/07/23 08:43:26 wazuh-db: ERROR: DB(000) Error updating rootcheck PM tuple on SQLite database
2024/07/23 08:43:28 wazuh-modulesd:vulnerability-scanner: ERROR: VulnerabilityScannerFacade::start: Failed to open RocksDB database. Reason: While appending to file: queue/vd/state_track/000065.dbtmp: No space left on device
2024/07/23 08:43:29 wazuh-modulesd:syscollector: ERROR: sqlite: COMMIT TRANSACTION. database or disk is full
2024/07/23 08:43:29 wazuh-modulesd:syscollector: ERROR: sqlite: database or disk is full
2024/07/23 08:43:29 wazuh-modulesd:syscollector: ERROR: sqlite: database or disk is full
2024/07/23 08:43:30 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot perform range checksum
2024/07/23 08:43:30 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot perform range checksum
2024/07/23 08:43:30 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot perform range checksum
2024/07/23 08:43:30 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 08:43:30 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:43:30 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 08:43:30 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:43:30 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 08:43:30 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 08:43:41 wazuh-syscheckd: ERROR: DB error, id: 1. sqlite: COMMIT TRANSACTION. cannot commit - no transaction is active
2024/07/23 08:43:42 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscheck

Worker logs : 

2024/07/23 00:00:10 wazuh-monitord: ERROR: Compression error: logs/alerts/2024/Jul/ossec-alerts-22.log.gz: No space left on device
2024/07/23 01:09:42 wazuh-db: ERROR: Cannot set connection_status for agent 3
2024/07/23 01:39:28 wazuh-modulesd:syscollector: ERROR: sqlite: database or disk is full
2024/07/23 01:39:28 wazuh-modulesd:syscollector: ERROR: sqlite: no such column: db_status_field_dm
2024/07/23 01:39:29 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot perform range checksum
2024/07/23 01:39:30 wazuh-syscheckd: ERROR: DB error, id: 1. sqlite: no such column: db_status_field_dm
2024/07/23 01:39:30 wazuh-syscheckd: ERROR: (6719): Could not start DBSync transaction ({"table": "file_entry"})
2024/07/23 01:39:30 wazuh-analysisd: ERROR: FIM decoder: Bad response from database: Cannot save fim control message
2024/07/23 01:39:51 wazuh-db: ERROR: DB(000) Error updating rootcheck PM tuple on SQLite database
2024/07/23 01:47:11 wazuh-db: ERROR: Deleting old information from 'sys_osinfo' table: database or disk is full
2024/07/23 01:47:11 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 01:47:11 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 01:47:11 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 01:47:11 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save Syscollector
2024/07/23 01:47:12 wazuh-db: ERROR: SQLite: database or disk is full
2024/07/23 01:47:12 wazuh-analysisd: ERROR: dbsync: Bad response from database: Cannot save S2024/07/23 08:38:40 wazuh-db: INFO: Graceful process shutdown.
2024/07/23 08:39:16 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/auth.log' due to [(2)-(No such file or directory)].

2024/07/23 08:43:25 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities-wazuh', retrying until the connection is successful.
2024/07/23 10:16:37 wazuh-modulesd: WARNING: Could not connect to socket 'queue/cluster/c-internal.sock': Connection refused
2024/07/23 10:16:40 wazuh-modulesd: ERROR: Could not send message through the cluster after '10' attempts.
2024/07/23 10:16:40 wazuh-modulesd:agent-upgrade: ERROR: (8123): There has been an error executing the request in the tasks manager.
2024/07/23 10:25:33 agent_control: ERROR: Wazuh is running in cluster mode: agent_control is not available in worker nodes. Please, try again in the master node: wazuh-manager-master-0.wazuh-cluster.wazuh.
2024/07/23 10:25:40 wazuh-remoted: WARNING: (1408): Invalid ID 010 for the source ip: 'xx.xx.x.x' (name 'unknown').


Based on the logs, I suspect that issues for agents are arising due to database issue. I have increased the disk space of master and worker nodes to 5 Gi, dont see any change. Can someone suggest what can be done to eliminate above errors? 




Aishat Motunrayo Awujola

unread,
Jul 23, 2024, 7:30:17 AM7/23/24
to Wazuh | Mailing List
Hello  Aishwarya,

Can you please confirm if you made any change to your environment prior to these errors?

Regards.

Aishwarya Vinod

unread,
Jul 23, 2024, 9:05:52 AM7/23/24
to Wazuh | Mailing List
Hey Ayshat, 

  Yes, I am doing few custom integrations in wazuh master.conf by mounting few files to the required path . Below is the master.conf : 

<ossec_config>
  <global>
    <jsonout_output>yes</jsonout_output>
    <alerts_log>yes</alerts_log>
    <logall>no</logall>
    <logall_json>no</logall_json>
    <email_notification>no</email_notification>
    <smtp_server>localhost</smtp_server>
    <email_from>email_id</email_from>
    <email_to>email_id</email_to>
    <email_maxperhour>120</email_maxperhour>
    <email_log_source>alerts.log</email_log_source>
    <agents_disconnection_time>10m</agents_disconnection_time>
    <agents_disconnection_alert_time>0</agents_disconnection_alert_time>
  </global>


  <!-- Choose between "plain", "json", or "plain,json" for the format of internal logs -->
  <logging>
    <log_format>plain</log_format>
  </logging>

  <remote>
    <connection>secure</connection>
    <port>1514</port>
    <protocol>tcp</protocol>
    <queue_size>131072</queue_size>
  </remote>

  <!-- Policy monitoring -->
  <rootcheck>
    <disabled>no</disabled>
    <check_files>yes</check_files>
    <check_trojans>yes</check_trojans>
    <check_dev>yes</check_dev>
    <check_sys>yes</check_sys>
    <check_pids>yes</check_pids>
    <check_ports>yes</check_ports>
    <check_if>yes</check_if>

    <!-- Frequency that rootcheck is executed - every 12 hours -->
    <frequency>43200</frequency>

    <rootkit_files>etc/rootcheck/rootkit_files.txt</rootkit_files>
    <rootkit_trojans>etc/rootcheck/rootkit_trojans.txt</rootkit_trojans>

    <skip_nfs>yes</skip_nfs>
  </rootcheck>

  <wodle name="open-scap">
    <disabled>yes</disabled>
    <timeout>1800</timeout>
    <interval>1d</interval>
    <scan-on-start>yes</scan-on-start>
  </wodle>

  <wodle name="cis-cat">
    <disabled>yes</disabled>
    <timeout>1800</timeout>
    <interval>1d</interval>
    <scan-on-start>yes</scan-on-start>

    <java_path>wodles/java</java_path>
    <ciscat_path>wodles/ciscat</ciscat_path>
  </wodle>

  <!-- Osquery integration -->
  <wodle name="osquery">
    <disabled>yes</disabled>
    <run_daemon>yes</run_daemon>
    <log_path>/var/log/osquery/osqueryd.results.log</log_path>
    <config_path>/etc/osquery/osquery.conf</config_path>
    <add_labels>yes</add_labels>
  </wodle>

  <!-- System inventory -->
  <wodle name="syscollector">
    <disabled>no</disabled>
    <interval>1h</interval>
    <scan_on_start>yes</scan_on_start>
    <hardware>yes</hardware>
    <os>yes</os>
    <network>yes</network>
    <packages>yes</packages>
    <ports all="no">yes</ports>
    <processes>yes</processes>
  </wodle>

  <vulnerability-detection>
    <enabled>yes</enabled>
    <index-status>yes</index-status>
    <feed-update-interval>60m</feed-update-interval>
  </vulnerability-detection>

  <indexer>
    <enabled>yes</enabled>
    <hosts>
      <host>https://indexer:9200</host>
    </hosts>
    <ssl>
      <certificate_authorities>
        <ca>/etc/ssl/root-ca.pem</ca>
      </certificate_authorities>
      <certificate>/etc/ssl/filebeat.pem</certificate>
      <key>/etc/ssl/filebeat.key</key>
    </ssl>
  </indexer>

  <!-- File integrity monitoring -->
  <syscheck>
    <disabled>no</disabled>

    <!-- Frequency that syscheck is executed default every 12 hours -->
    <frequency>43200</frequency>

    <scan_on_start>yes</scan_on_start>

    <!-- Generate alert when new file detected -->
    <alert_new_files>yes</alert_new_files>

    <!-- Don't ignore files that change more than 'frequency' times -->
    <auto_ignore frequency="10" timeframe="3600">no</auto_ignore>

    <!-- Directories to check  (perform all possible verifications) -->
    <directories check_all="yes">/etc,/usr/bin,/usr/sbin</directories>
    <directories check_all="yes">/bin,/sbin,/boot</directories>

    <!-- Files/directories to ignore -->
    <ignore>/etc/mtab</ignore>
    <ignore>/etc/hosts.deny</ignore>
    <ignore>/etc/mail/statistics</ignore>
    <ignore>/etc/random-seed</ignore>
    <ignore>/etc/random.seed</ignore>
    <ignore>/etc/adjtime</ignore>
    <ignore>/etc/httpd/logs</ignore>
    <ignore>/etc/utmpx</ignore>
    <ignore>/etc/wtmpx</ignore>
    <ignore>/etc/cups/certs</ignore>
    <ignore>/etc/dumpdates</ignore>
    <ignore>/etc/svc/volatile</ignore>
    <ignore>/sys/kernel/security</ignore>
    <ignore>/sys/kernel/debug</ignore>

    <!-- Check the file, but never compute the diff -->
    <nodiff>/etc/ssl/private.key</nodiff>

    <skip_nfs>yes</skip_nfs>

    <!-- Remove not monitored files -->
    <remove_old_diff>yes</remove_old_diff>

    <!-- Allow the system to restart Auditd after installing the plugin -->
    <restart_audit>yes</restart_audit>
  </syscheck>

  <!-- Active response -->
  <global>
    <white_list>127.0.0.1</white_list>
    <white_list>^localhost.localdomain$</white_list>
    <white_list>10.66.0.2</white_list>
  </global>

  <command>
    <name>disable-account</name>
    <executable>disable-account.sh</executable>
    <expect>user</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>restart-ossec</name>
    <executable>restart-ossec.sh</executable>
    <expect></expect>
  </command>

  <command>
    <name>firewall-drop</name>
    <executable>firewall-drop</executable>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>host-deny</name>
    <executable>host-deny.sh</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>route-null</name>
    <executable>route-null.sh</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>win_route-null</name>
    <executable>route-null.cmd</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>win_route-null-2012</name>
    <executable>route-null-2012.cmd</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>netsh</name>
    <executable>netsh.cmd</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <command>
    <name>netsh-win-2016</name>
    <executable>netsh-win-2016.cmd</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
  </command>

  <!--
  <active-response>
    active-response options here
  </active-response>
  -->

  <!-- Log analysis -->
  <localfile>
    <log_format>command</log_format>
    <command>df -P</command>
    <frequency>360</frequency>
  </localfile>

  <localfile>
    <log_format>full_command</log_format>
    <command>netstat -tulpn | sed 's/\([[:alnum:]]\+\)\ \+[[:digit:]]\+\ \+[[:digit:]]\+\ \+\(.*\):\([[:digit:]]*\)\ \+\([0-9\.\:\*]\+\).\+\ \([[:digit:]]*\/[[:alnum:]\-]*\).*/\1 \2 == \3 == \4 \5/' | sort -k 4 -g | sed 's/ == \(.*\) ==/:\1/' | sed 1,2d</command>
    <alias>netstat listening ports</alias>
    <frequency>360</frequency>
  </localfile>

  <localfile>
    <log_format>full_command</log_format>
    <command>last -n 20</command>
    <frequency>360</frequency>
  </localfile>

  <ruleset>
    <!-- Default ruleset -->
    <decoder_dir>ruleset/decoders</decoder_dir>
    <rule_dir>ruleset/rules</rule_dir>
    <rule_exclude>0215-policy_rules.xml</rule_exclude>
    <list>etc/lists/audit-keys</list>
    <list>etc/lists/amazon/aws-sources</list>
    <list>etc/lists/amazon/aws-eventnames</list>

    <!-- User-defined ruleset -->
    <decoder_dir>etc/decoders</decoder_dir>
    <rule_dir>etc/rules</rule_dir>
  </ruleset>

  <rule_test>
    <enabled>yes</enabled>
    <threads>1</threads>
    <max_sessions>64</max_sessions>
    <session_timeout>15m</session_timeout>
  </rule_test>

  <!-- Configuration for ossec-authd
    To enable this service, run:
    wazuh-control enable auth
  -->
  <auth>
    <disabled>no</disabled>
    <port>1515</port>
    <use_source_ip>no</use_source_ip>
    <force>
      <enabled>yes</enabled>
      <key_mismatch>yes</key_mismatch>
      <disconnected_time enabled="yes">1h</disconnected_time>
      <after_registration_time>30m</after_registration_time>
    </force>
    <purge>no</purge>
    <use_password>yes</use_password>
    <ciphers>HIGH:!ADH:!EXP:!MD5:!RC4:!3DES:!CAMELLIA:@STRENGTH</ciphers>
    <!-- <ssl_agent_ca></ssl_agent_ca> -->
    <ssl_verify_host>no</ssl_verify_host>
    <ssl_manager_cert>/var/ossec/etc/sslmanager.cert</ssl_manager_cert>
    <ssl_manager_key>/var/ossec/etc/sslmanager.key</ssl_manager_key>
    <ssl_auto_negotiate>no</ssl_auto_negotiate>
  </auth>

  <cluster>
    <name>wazuh</name>
    <node_name>wazuh-manager-master</node_name>
    <node_type>master</node_type>
    <key>to_be_replaced_by_cluster_key</key>
    <port>1516</port>
    <bind_addr>0.0.0.0</bind_addr>
    <nodes>
        <node>wazuh-manager-master-0.wazuh-cluster.wazuh</node>
    </nodes>
    <hidden>no</hidden>
    <disabled>no</disabled>
  </cluster>
</ossec_config>

<ossec_config>
  <localfile>
    <log_format>syslog</log_format>
    <location>/var/ossec/logs/active-responses.log</location>
  </localfile>
  <localfile>
    <log_format>syslog</log_format>
    <location>/var/log/auth.log</location>
  </localfile>

  <localfile>
    <log_format>syslog</log_format>
    <location>/var/log/syslog</location>
  </localfile>

  <localfile>
    <log_format>syslog</log_format>
    <location>/var/log/dpkg.log</location>
  </localfile>

  <localfile>
    <log_format>syslog</log_format>
    <location>/var/log/kern.log</location>
  </localfile>
 
  <!-- CPU, memory, disk metric -->
  <localfile>
     <log_format>full_command</log_format>
     <command>echo $(top -bn1 | grep Cpu | awk '{print $2+$4+$6+$12+$14+$16}' ; free -m | awk 'NR==2{printf "%.2f\t\t\n", $3*100/$2 }' ; df -h | awk '$NF=="/"{print $5}'|sed 's/%//g')</command>
     <alias>general_health_metrics</alias>
     <out_format>$(timestamp) $(hostname) general_health_check: $(log)</out_format>
     <frequency>30</frequency>
  </localfile>

<!-- load average metrics -->
  <localfile>
     <log_format>full_command</log_format>
     <command>uptime | grep load | awk '{print $(NF-2),$(NF-1),$NF}' | sed 's/\,\([0-9]\{1,2\}\)/.\1/g'</command>
     <alias>load_average_metrics</alias>
     <out_format>$(timestamp) $(hostname) load_average_check: $(log)</out_format>
     <frequency>30</frequency>
  </localfile>

<!-- memory metrics -->
  <localfile>
     <log_format>full_command</log_format>
     <command>free --bytes| awk 'NR==2{print $3,$7}'</command>
     <alias>memory_metrics</alias>
     <out_format>$(timestamp) $(hostname) memory_check: $(log)</out_format>
     <frequency>30</frequency>
  </localfile>

<!-- disk metrics -->
  <localfile>
     <log_format>full_command</log_format>
     <command>df -B1 | awk '$NF=="/"{print $3,$4}'</command>
     <alias>disk_metrics</alias>
     <out_format>$(timestamp) $(hostname) disk_check: $(log)</out_format>
     <frequency>30</frequency>
  </localfile>
</ossec_config>
<ossec_config>
  <integration>
    <name>custom-teams</name>
    <level>9</level>
    <hook_url>webhookurl</hook_url>
    <alert_format>json</alert_format>
  </integration>
  <gcp-pubsub>
    <pull_on_start>yes</pull_on_start>
    <interval>2m</interval>
    <project_id>project_id</project_id>
    <subscription_name>su-nam</subscription_name>
    <credentials_file>/var/ossec/wodles/gcloud/auth.json</credentials_file>
  </gcp-pubsub>
  <integration>
    <name>custom-urlhaus.py</name>
    <hook_url>https://urlhaus-api.abuse.ch/v1/url/</hook_url>
    <rule_id>86601</rule_id>
    <alert_format>json</alert_format>
</integration>
</ossec_config>

<ossec_config>
  <integration>
   <name>virustotal</name>
   <api_key>API Key</api_key>
   <group>syscheck</group>
   <rule_id>100300,100301</rule_id>
   <alert_format>json</alert_format>
  </integration>
</ossec_config>



Also , one of my team members did backup for wazuh files using documentation mentioned in https://documentation.wazuh.com/current/migration-guide/files-backup/creating/wazuh-central-components.html


did df-h on master-pod : 

Filesystem      Size  Used Avail Use% Mounted on
overlay          95G  4.7G   90G   5% /
tmpfs            64M     0   64M   0% /dev
/dev/sdb        4.9G  4.9G   96K 100% /etc/filebeat
shm              64M     0   64M   0% /dev/shm
/dev/sda1        95G  4.7G   90G   5% /etc/hosts



Any suggestions would be appreciated
Reply all
Reply to author
Forward
Message has been deleted
0 new messages