Custom Local Decoder Regex

1,200 views
Skip to first unread message

J. Desipeda

unread,
Apr 18, 2018, 2:51:21 AM4/18/18
to Wazuh mailing list
Hello again,

I'm trying to decode this line from an hmailserver log file (I changed the numbers):
"SMTPD" 1234 1234567 "2018-01-23 00:01:23.456" "12.234.45.56" "SENT: 535 Authentication failed. Restarting authentication process."

I created this entry in the local_decoder.xml file:

<decoder name="hmailserver">
    <prematch>SMTPD</prematch>
    <regex>"SMTPD" (\d+)    (\d+) "(\d+-\d+-\d+) (\d+:\d+:\d+.\d+)"       "(\d+.\d+.\d+.\d+)" "(\S+): (\w+)"</regex>
    <order>threadid_hmail,sessionid_hmail,date_hmail,time_hmail,srcip_hmail,action__hmail,content_hmail</order>
</decoder>

The result of ossec-logtest is:

**Phase 1: Completed pre-decoding.
       full event: '"SMTPD" 1234 1234567 "2018-01-23 00:01:23.456" "12.234.45.56" "SENT: 535 Authentication failed. Restarting authentication process."'
       timestamp: '(null)'
       hostname: 'ip-10-3-0-21'
       program_name: '(null)'
       log: '"SMTPD" 1234 1234567 "2018-01-23 00:01:23.456" "12.234.45.56" "SENT: 535 Authentication failed. Restarting authentication process."'

**Phase 2: Completed decoding.
       decoder: 'hmailserver'

**Phase 3: Completed filtering (rules).
       Rule id: '1002'
       Level: '2'
       Description: 'Unknown problem somewhere in the system.'

The things I tried that still resulted to the exact same ossec-logtest output as above:
1. imitating the spaces in the actual log line for the regex. I was using single spaces in between before
2. separating the regex and order line and putting them in a child decoder
3. changing the static fields to standard fields just to check if the non-standard fields were the error

From the things I've tried my conclusion is that may regex is wrong.

Sorry I've been posting a lot of questions lately. I know this might be an easy question but I think if I get the correct regex for this one I'll be able to understand how to construct regex better for other custom logs. Thank you.

(Btw I haven't created a rule yet. Idk if that's the reason but if it is please let me know.)

J. Desipeda

unread,
Apr 18, 2018, 9:34:07 PM4/18/18
to Wazuh mailing list
I tried decoding part by part and this worked:

<decoder name="hmailserver">
    <prematch>SMTPD</prematch>
    <regex> "(\w+): (\.+)"$</regex>
    <order>action,extra_data</order>
</decoder>

Output of ossec-logtest:

**Phase 2: Completed decoding.
       decoder: 'hmailserver'
       action: 'SENT'
       extra_data: '535 Authentication failed. Restarting authentication process."'


Miguelangel Freitas

unread,
Apr 18, 2018, 10:30:14 PM4/18/18
to J. Desipeda, Wazuh mailing list
Hi James,

It appears that your logline can contain tabs as a field separator, let me suggest you the following decoder:

<decoder name="hmailserver">
   <prematch>^"SMTPD"</prematch>
   <regex offset="after_prematch">^\s*\t*(\d+)\s*\t*(\d+)\s*\t*"(\S+)\s+(\S+)"\s*\t*"(\S+)"\s*\t*"(\S+):\s(\.*)."$</regex>
   <order>threadid_hmail, sessionid_hmail, date_hmail, time_hmail, srcip_hmail, action_hmail, content_hmail</order>
</decoder>

The above regex will match for spaces or tabs as a field separator (\s*\t*). In addition, here a brief rule for the above decoder:

<group name="hmailserver,">
   <rule id="120000" level="3">
      <decoded_as>hmailserver</decoded_as>
      <description>hmail events.</description>
   </rule>
</group>

And the ossec-logtest output.

**Phase 1: Completed pre-decoding.
       full event: '"SMTPD"     1234    1234567 "2018-01-23 00:01:23.456"       "12.234.45.56"  "SENT: 535 Authentication failed. Restarting authentication process."'
       timestamp: '(null)'
       hostname: 'wazuh'
       program_name: '(null)'
       log: '"SMTPD"    1234    1234567 "2018-01-23 00:01:23.456"       "12.234.45.56"  "SENT: 535 Authentication failed. Restarting authentication process."'

**Phase 2: Completed decoding.
       decoder: 'hmailserver'
       threadid_hmail: '1234'
       sessionid_hmail: '1234567'
       date_hmail: '2018-01-23'
       time_hmail: '00:01:23.456'
       srcip_hmail: '12.234.45.56'
       action_hmail: 'SENT'
       content_hmail: '535 Authentication failed. Restarting authentication process'

**Phase 3: Completed filtering (rules).
       Rule id: '120000'
       Level: '3'
       Description: 'hmail events.'
**Alert to be generated.

Here some links to our documentation that could help you to create custom rules and decoders:


Please not hesitate to contact us again, I hope this helps.

Best Regards,


Miguelangel Freitas

https://docs.google.com/uc?export=download&id=0B3pCWMHmmbAmVF9pUHlfVU5KSms&revid=0B3pCWMHmmbAmRUlDdWppaktmNVFNZnROZVNoUDVUV1RIVkJVPQ 

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.
To post to this group, send email to wa...@googlegroups.com.
Visit this group at https://groups.google.com/group/wazuh.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/9110362b-1da7-4a13-a32b-90be2cd87dd8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

J. Desipeda

unread,
Apr 18, 2018, 11:29:29 PM4/18/18
to Wazuh mailing list
Thank you for the response :)

I've actually finished creating a successful regex:


<decoder name="hmailserver">
    <prematch>SMTPD</prematch>
    <regex>^"(\w+)"     (\d+)   (\d+)   "(\d+-\d+-\d+) (\d+:\d+:\d+.\d+)"       "(\d+.\d+.\d+.\d+)"     "(\w+): (\d+) (\.+)"$</regex>
    <order>source,threadid,sessionid,date,time,srcip,action,status,message</order>
</decoder>

Will the null timestamp affect this kind of log when it appears in the interface?

Miguelangel Freitas

unread,
Apr 19, 2018, 9:17:20 AM4/19/18
to J. Desipeda, Wazuh mailing list
Hi,

That should not present any trouble in term of the alert generation, the manager will put the time where it receives the event as the timestamp.

Best Regards,

Miguelangel Freitas

https://docs.google.com/uc?export=download&id=0B3pCWMHmmbAmVF9pUHlfVU5KSms&revid=0B3pCWMHmmbAmRUlDdWppaktmNVFNZnROZVNoUDVUV1RIVkJVPQ 

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.
To post to this group, send email to wa...@googlegroups.com.
Visit this group at https://groups.google.com/group/wazuh.

Jim Ivey

unread,
May 24, 2021, 6:50:11 PM5/24/21
to Wazuh mailing list

I’m in a similar situation in which my company customizes logs, including the timestamp format, and built a custom decoder. If I understand correctly, timestamps are estimated and rules with frequency= and timeframe= should work.

I’m trying to build unit tests for such rules, using ossec-logtest. Currently, these don’t work — I assume because of timestamp: '(null)' for all the events with non-standard timestamps. Is there a way to fix this — either by having the decoder read the timestamps correctly or somehow convincing ossec-logtest that the simulated log events all occurred within the timeframe?

Many thanks!

On Thursday, April 19, 2018 at 6:17:20 AM UTC-7 Miguelangel Freitas wrote:
Hi,

That should not present any trouble in term of the alert generation, the manager will put the time where it receives the event as the timestamp.

Best Regards,

Miguelangel Freitas

https://docs.google.com/uc?export=download&id=0B3pCWMHmmbAmVF9pUHlfVU5KSms&revid=0B3pCWMHmmbAmRUlDdWppaktmNVFNZnROZVNoUDVUV1RIVkJVPQ 

On Wed, Apr 18, 2018 at 10:29 PM, J. Desipeda <james...@gmail.com> wrote:
Thank you for the response :)

I've actually finished creating a successful regex:

<decoder name="hmailserver">
    <prematch>SMTPD</prematch>
    <regex>^"(\w+)"     (\d+)   (\d+)   "(\d+-\d+-\d+) (\d+:\d+:\d+.\d+)"       "(\d+.\d+.\d+.\d+)"     "(\w+): (\d+) (\.+)"$</regex>
    <order>source,threadid,sessionid,date,time,srcip,action,status,message</order>
</decoder>

Will the null timestamp affect this kind of log when it appears in the interface?

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

To post to this group, send email to wa...@googlegroups.com.
Visit this group at https://groups.google.com/group/wazuh.

Juan Carlos

unread,
May 28, 2021, 3:02:58 PM5/28/21
to Wazuh mailing list
Hi Jim,

The Wazuh analysis engine uses the time it receives the events as the reference for any time dependent rule matching. If the event contains a timestamp it recognizes it will extract it and include it in a specific field of the logged event so it may be used for creating filters and visualizations in Kibana.

If you wish to test a composite rule that looks for multiple events that match a single (atomic) rule within a specific timeframe you may open ossec-logtest (or its replacement wazuh-logtest) and paste the same log as many times as necessary to trigger the composite rule.

Alternatively you may create a file with multiple lines where each line is one of the event, and then use cat to pipe that into the logtest utility:
cat multiple-events.log | /var/ossec/bin/wazuh-logtest

If you share a sanitized sample log and rule I could help you troubleshooting any issue you may be facing.
Best Regards,
Juan Carlos Tello

Jim Ivey

unread,
May 28, 2021, 5:39:35 PM5/28/21
to Wazuh mailing list

Thanks!

Here’s an example from the Ansible code we’re using to test some rules:

# mimic ten (10) 503 return codes within 240 seconds in `/var/log/squid/access.log`
- name: 'Test rule 35058'
  command:
  cmd: '/var/ossec/bin/ossec-logtest -U 35058:10:squid-accesslog'
  stdin: |
    01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:23:18    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:23:28    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:23:38    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:23:48    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:23:58    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:24:08    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:24:18    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:24:28    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:24:38    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:24:48    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
    01/Jan/2021:16:24:58    243 3681 10.10.10.10 TCP_TUNNEL/503 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -

FWIW, we use a similar task to test other rules and they work fine where we haven’t changed the log format. In addition, rules that operate on a single event for the same log (same format) also work fine with this type of task.

What’s different here is that the timestamps don’t seem to be recognized and the parent rule fires ten (10) times.

The timestamp format is %tg in the squid log format.

Thanks again!

Jim Ivey

unread,
May 28, 2021, 5:41:59 PM5/28/21
to Wazuh mailing list
P.S.  The rule we're testing is the original rule of the distribution: 35058.  We haven't changed it.  We have a separate, similar playbook for testing rules after we've changed them.  Thanks again!

Jim Ivey

unread,
May 28, 2021, 8:48:09 PM5/28/21
to Wazuh mailing list

Maybe this is relevant: in the custom decoder to recognize this log format, I tried to get clever recognizing the date with <prematch>^\d\d/\w\w\w/2\d\d\d:\d\d:\d\d:\d\d\s+\d+ \d+ \S+ TCP_\S+ .... Could that mess up the timestamp parsing? Thanks!

Juan Carlos

unread,
May 31, 2021, 10:28:10 AM5/31/21
to Wazuh mailing list
Hi Jim,

It is indeed related to the timestamp, Wazuh is expecting a %tu timestamp format and as such it is not decoding the message correctly. However the issue is not specific to the composite rule, but that it is not matching it's parent (35009) atomic rule in the first place or any other rule.

You may address this by adding the following custom decoder:
<decoder name="squid-accesslog-tu">
 <type>squid</type>
 <prematch>^\d\d/\w\w\w/\d\d\d\d:\d\d:\d\d:\d\d\s+\d+\s+\d+\s+\S+\s+\w+/\d+</prematch>
 <regex>^\d\d/\w\w\w/\d\d\d\d:\d\d:\d\d:\d\d\s+\d+\s+\d+\s+(\S+)\s+(\w+)/(\d+)\s+\d+\s+\w+\s+(\S+)</regex>
 <order>srcip,action,id,url</order>
</decoder>



It is also worth noting that the rule you are testing for 35058 requires that the URL be different:
  <rule id="35058" level="10" frequency="$SQUID_FREQ" timeframe="240">
   <if_matched_sid>35009</if_matched_sid>
   <same_source_ip />
   <different_url />
   <description>Squid: Multiple 500/600 error codes (server error).</description>
   <mitre>
     <id>T1499</id>
   </mitre>
   <group>gdpr_IV_35.7.d,</group>
 </rule>


and if the HTTP reply code is 503 it will instead match rule 35010, so you may test this event with the following logs:

01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.1:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.2:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.3:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.4:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.5:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.6:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.7:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.8:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.9:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -
01/Jan/2021:16:23:08    243 3681 10.10.10.10 TCP_TUNNEL/501 14488 CONNECT 1.2.3.10:443 url.domain.com - ORIGINAL_DST/1.2.3.4 -


Note how the value of the log's timestamp will not affect the analysis, as it is the timeframe in which the logs are analyzed that affect the result.

Let me know if you have any more questions.
Best Regards,
Juan Carlos Tello

Reply all
Reply to author
Forward
0 new messages