Wazuh anaysisd going down abruptly

345 views
Skip to first unread message

Ranjith Kesavan

unread,
Jun 17, 2022, 3:27:31 AM6/17/22
to Wazuh mailing list
We have faced an issue with Wazuh-analysisd going down abruptly after upgrading to Wazuh 4.3. Now we are running Wazuh-4.3.4-1 with the same issue. After starting Wazuh, it runs for a few hours and then analysisd goes down with Segfault in the syslog. We had debugging enabled for analysisd with value 1. Please find the logs below. ossec log before the issue attached. 


Any help on this is appreciated. 

From syslog : 

Jun 17 04:37:48 ADMM-ES-10 kernel: [725697.704027] wazuh-analysisd[13166]: segfault at 0 ip 000000000041a17b sp 00007fd8df7ec6e0 error 4 in wazuh-analysisd[400000+154000]


From ossec.log : Analysisd went down at 2022/06/17 04:37:46

  18033 2022/06/17 04:37:45 wazuh-analysisd[12943] to_json.c:197 at Eventinfo_to_jsonstr(): WARNING: Mitre Technique ID 'T1492' not found in database.
  18034 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:211 at DecodeSCA(): DEBUG: Got summary event: '{"type":"summary","scan_id":1413655502,"name":"Benchma  18034 rk for Windows audit","policy_id":"sca_win_audit","file":"sca_win_audit.yml","description":"This document provides a way of ensuring the security of the Windows systems.","passed":  18034 33,"failed":7,"invalid":31,"total_checks":71,"score":82.5,"start_time":1655440662,"end_time":1655440663,"hash":"ff8606c5dad726113d4494a56f5def7a89dd58f8882b372bac19d937a83e642e","h  18034 ash_file":"da409ead5682c644e5bf9b99c91fc1c1bbc439126696ed406a7c932bc9d5c499"}'
  18035 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:1070 at HandleScanInfo(): DEBUG: Retrieving sha256 hash for policy id: sca_win_audit
  18036 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:311 at FindScanInfo(): DEBUG: Find scan information for policy id: sca_win_audit
  18037 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:635 at SaveScanInfo(): DEBUG: Saving scan info for policy id 'sca_win_audit', agent id '312'
  18038 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:426 at FindPolicyInfo(): DEBUG: Find policies IDs for policy 'sca_win_audit', agent id '312'
  18039 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:462 at FindPolicySHA256(): DEBUG: Find sha256 for policy 'sca_win_audit', agent id '312'
  18040 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:349 at FindCheckResults(): DEBUG: Find check results for policy id: sca_win_audit
  18041 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:225 at DecodeSCA(): DEBUG: Got policies event: '{"type":"policies","policies":["cis_win10_enterprise"  18041 ,"sca_win_audit"]}'
  18042 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:1506 at HandlePoliciesInfo(): DEBUG: Checking policy JSON fields.
  18043 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:1515 at HandlePoliciesInfo(): DEBUG: Retrieving policies from database.
  18044 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:388 at FindPoliciesIds(): DEBUG: Find policies IDs for agent id: 312
  18045 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:1534 at HandlePoliciesInfo(): DEBUG: Comparing policy: 'cis_win10_enterprise' 'cis_win10_enterprise'
  18046 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:1534 at HandlePoliciesInfo(): DEBUG: Comparing policy: 'cis_win10_enterprise' 'sca_win_audit'
  18047 2022/06/17 04:37:46 wazuh-analysisd[12943] security_configuration_assessment.c:1534 at HandlePoliciesInfo(): DEBUG: Comparing policy: 'sca_win_audit' 'sca_win_audit'
  18048 2022/06/17 04:37:48 wazuh-remoted: ERROR: socketerr (not available).
  18049 2022/06/17 04:37:48 wazuh-remoted: ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Connection refused'
  18050 2022/06/17 04:37:48 wazuh-remoted: ERROR: socketerr (not available).
  18051 2022/06/17 04:37:48 wazuh-remoted: ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Bad file descriptor'
  18052 2022/06/17 04:37:48 wazuh-remoted: ERROR: socketerr (not available).
  18053 2022/06/17 04:37:48 wazuh-remoted: ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Bad file descriptor'
  18054 2022/06/17 04:37:48 wazuh-remoted: ERROR: socketerr (not available).
  18055 2022/06/17 04:37:48 wazuh-remoted: ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Bad file descriptor'
  18056 2022/06/17 04:37:49 wazuh-logcollector: ERROR: socketerr (not available).
  18057 2022/06/17 04:37:49 wazuh-logcollector: ERROR: Unable to send message to 'queue/sockets/queue' (wazuh-analysisd might be down). Attempting to reconnect.
  18058 2022/06/17 04:38:26 wazuh-syscheckd: ERROR: socketerr (not available).
logs_when_analysisd_failed

Victor M Fernandez-Castro

unread,
Jun 17, 2022, 6:23:14 AM6/17/22
to Ranjith Kesavan, Wazuh mailing list
Hi Ranjith,

Thank you for the detailed explanation. The most useful information is in the syslog message. However, I failed to locate the instruction at 0x41a17b. The address depends on the exact binary file. So, I would ask you which OS you are running, and the version you had installed when the system produced that log.

On the other hand, if you could share your Analysisd configuration with us, we might be able to reproduce the problem more quickly. We would need:
  • etc/ossec.conf
  • etc/decoders/*
  • etc/rules/*
  • etc/lists/* (if you added some custom lists)

We recently fixed an issue in Analysisd, which impacted 4.3.1 to 4.3.3 (fixed in 4.3.4): #13604. This bug made Analysisd crash when there was any <active-response> stanza at ossec.conf pointing to a rule that had been ignored or overwritten. Could you try to remove your <active-response> blocks (if any)?

One last question: Which manager version did you have installed before upgrading?

I hope your answers help us to reproduce the issue and identify the cause. Otherwise, we will let you know how to generate core dumps, for you to send us one if that's possible.

Best regards,

Wazuh
Victor M. Fernandez-Castro
Director of engineering
Wazuhvic...@wazuh.comWazuhvikman90
Wazuhwazuh.com


--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/98789a11-8730-43d2-9433-751c028523bcn%40googlegroups.com.

Ranjith Kesavan

unread,
Jun 17, 2022, 7:56:30 AM6/17/22
to Wazuh mailing list
Hello Victor, 

Thanks for the response.

  •  We are running 4.3.4-1. 
  • Previous version was 4.2.4. Meanwhile, we had completely purged Wazuh reinstalling with latest version and restored the files. 
  • There is no active-response block in ossec.conf.  
  • OS running is Ubuntu 18.04. 
  • Unfortunatly, we would not be able to share the decoders and rule files as it has many sensitive information. 
  • There are no lists configured under OSSEC.conf
  • Please find attached ossec.conf file attached. 

Can you share the steps to perform coredump on analysisd failure ? 

Juan Cabrera

unread,
Jun 17, 2022, 8:17:05 AM6/17/22
to Wazuh mailing list

Hi Ranjith,

In Ubuntu the core dumps are handled by Apport and can be located in /var/crash/. But it is disabled by default in stable releases.

Apport service should be also started to capture core dumps

# systemd start apport.service

Crash files are located in the /var/crash directory and consist of a package that, not only contains the core dump file but also processes environment information about the event.

To obtain the specific core dump, the crash report can be unpacked by using apport-unpack

# cd /var/crash 
# apport-unpack <dump-filename>.crash <outputdir>

<dump-filename> must be replaced by Apport crash file, that is the full path of the file where slashes (/) were replaced by underscores (_), plus an incremental counter.

For example, first wazuh-analysisd crash will create a report named _var_ossec_bin_wazuh-analysisd.0.crash

Core dump brief information can be obtained using file command

# file /var/crash/<outputdir>/Coredump

 CoreDump: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/var/ossec/bin/wazuh-analysisd',   real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/var/ossec/bin/wazuh-analysisd', platform: 'x86_64'

To know more about Apport see Apport Wiki.

Best regards

Juan Cabrera

unread,
Jun 17, 2022, 8:55:40 AM6/17/22
to Wazuh mailing list

Hi Ranjith,

In order to detect where the problem occurs, it is important to install Wazuh by sources in debug mode.

To do this, follow the steps below:

  • Compile Wazuh

    # cd wazuh-4.3.4/src
    # make deps
    #make TARGET=server DEBUG=yes -jN
    

    Change “N” for the number of threads that you want to use to compile

  • Install the manager

    # ../install.sh
    
  • When the script asks what kind of installation you want, type manager to install the Wazuh manager:

    1- What kind of installation do you want (manager, agent, local, hybrid, or help)? manager
    

Regards !

Victor M Fernandez-Castro

unread,
Jun 17, 2022, 3:36:49 PM6/17/22
to Juan Cabrera, Wazuh mailing list
Hi Ranjith,

We've made some progress here, and I've opened an issue: Segmentation fault in Analysisd #13912.

In summary, I think we managed to locate the point where the crash happened, from the information you provided to us. However, this seems to be caused by a condition that should not occur. Let us go back in the change history, in order to figure out the cause of the problem.

We will keep you posted on this issue.

Thank you very much for reporting that.

Best regards,


Wazuh
Victor M. Fernandez-Castro
Director of engineering
Wazuhvic...@wazuh.comWazuhvikman90
Wazuhwazuh.com

Victor M Fernandez-Castro

unread,
Jun 20, 2022, 8:32:36 AM6/20/22
to Juan Cabrera, Wazuh mailing list
Hi Ranjith, 

We continue working on finding the cause of the problem. We're practically sure it crashes at this line of rules.c, due to an unexpected null pointer to the dynamic fields (<field name="...">). However, we've not managed to reproduce that yet.

I understand you're unable to send us your rules. Is there any clue about them you can give to us? For instance, we recently changed the code around the overwrite="yes" tag. ¿Do you have any rules overwriting another? ¿Do you have any <rule> with more <field> blocks than the setting "analysisd.decoder_order_size" (256 by default, at etc/internal_options.conf or etc/local_internal_options.conf)?

I hope to find out the cause of the problem soon.

Thank you very much,

Wazuh
Victor M. Fernandez-Castro
Director of engineering
Wazuhvic...@wazuh.comWazuhvikman90
Wazuhwazuh.com

Reply all
Reply to author
Forward
0 new messages