Trying to be more specific with decoder and regex

222 views
Skip to first unread message

Jamie Navarro

unread,
Mar 30, 2023, 2:49:47 PM3/30/23
to Wazuh mailing list
New message:

I have a pfSense router/firewall that I wanted to get alerts for all 'DHCPACK' lines.

I temporarily turned on 'logall' for regular and json events. Checked archives.json, found the relevant entry, and the 'full_log' field contains:
Mar 13 15:19:43 dhcpd[7416]: DHCPACK on 10.10.10.77 to 56:8d:3d:8f:0c:90 via mvneta0.4091

I originally was trying archives.log and was seeing a completely different format:
2023 Mar 13 22:19:43 ubuntusrvwazuhtest1->10.10.10.253 Mar 13 15:19:43 dhcpd[7416]: DHCPACK on 10.10.10.77 to 56:8d:3d:8f:0c:90 via mvneta0.4091
And was trying to base my decoder off of that, very unsuccessfully. Anyway, once I went by the log in archives.json, that's when I eventually found success.
So a quick side question - should I only always use archives.json? Because I thought that somewhere in the documentation it says to look at archives.log.

Anyway, so following are my current decoder and rules files.

local_decoder.xml:

<decoder name="SGPfsense">
    <prematch>\w\w\w \w\w \d\d\p\d\d\p\d\d</prematch>
</decoder>

<decoder name="SGPfsense">
    <parent>SGPfsense</parent>
    <regex>(DHCP\w+) on (\d+.\d+.\d+.\d+) to (\w\w:\w\w:\w\w:\w\w:\w\w:\w\w)</regex>
    <order>dhcpreqorack,ipassigned,tomacaddress</order>
</decoder>




local_rules.xml:

  <rule id="100010" level="3">
    <decoded_as>SGPfsense</decoded_as>
    <field name="dhcpreqorack">DHCPACK</field>
    <description>SG - IP address $(ipassigned) assigned to $(tomacaddress)</description>
  </rule>




Well, this works fine, I'm getting actual alerts as expected. And just for reference, here's what I get with wazuh-logtest:

Starting wazuh-logtest v4.3.10
Type one log per line

Mar 29 15:10:56 dhcpd[3246]: DHCPACK on 10.10.10.65 to 84:98:66:ca:f5:e3 (Galaxy-Tab-A) via mvneta0.4091

**Phase 1: Completed pre-decoding.
        full event: 'Mar 29 15:10:56 dhcpd[3246]: DHCPACK on 10.10.10.65 to 84:98:66:ca:f5:e3 (Galaxy-Tab-A) via mvneta0.4091'
        timestamp: 'Mar 29 15:10:56'
        hostname: 'dhcpd[3246]:'

**Phase 2: Completed decoding.
        name: 'SGPfsense'
        dhcpreqorack: 'DHCPACK'
        ipassigned: '10.10.10.65'
        tomacaddress: '84:98:66:ca:f5:e3'

**Phase 3: Completed filtering (rules).
        id: '100010'
        level: '3'
        description: 'SG - IP address 10.10.10.65 assigned to 84:98:66:ca:f5:e3'
        groups: '['SG']'
        firedtimes: '1'
        mail: 'False'
**Alert to be generated.


Wonderful.

Now finally to my main point. The reason I'm posting this is that I wanted to drill down and get the regular expression for that log more specific (I recall reading somewhere on the Wazuh documentation that you should be as explicit as possible so that logs and decoders don't get mixed up).

I've tried all the below variations to match the log better:

\w\w\w \w\w \d\d\p\d\d\p\d\d dhcpd
\w\w\w \w\w \d\d\p\d\d\p\d\d dhcpd\p
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+\p
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+\p\d+
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+\p\d+\p
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+\p\d+\p:
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+\p\w+\p:
\w\w\w \w\w \d\d\p\d\d\p\d\d \w+\p\w+\p+



But they all end up giving me the following in wazuh-logtest:

Starting wazuh-logtest v4.3.10
Type one log per line

Mar 29 15:10:56 dhcpd[3246]: DHCPACK on 10.10.10.65 to 84:98:66:ca:f5:e3 (Galaxy-Tab-A) via mvneta0.4091

**Phase 1: Completed pre-decoding.
        full event: 'Mar 29 15:10:56 dhcpd[3246]: DHCPACK on 10.10.10.65 to 84:98:66:ca:f5:e3 (Galaxy-Tab-A) via mvneta0.4091'
        timestamp: 'Mar 29 15:10:56'
        hostname: 'dhcpd[3246]:'

**Phase 2: Completed decoding.
        No decoder matched.



I'm very new to regular expressions, but I feel like those should all work, according to:
https://documentation.wazuh.com/current/user-manual/ruleset/ruleset-xml-syntax/regex.html
Soo, I guess my question is why don't any of the above variations work?

And kind of a side question, but very much related is that why doesn't this work:
\w\w\w \d\d
but this does:
\w\w\w \w\w

I thought that \d would match anything between 0 and 9. And from my sample log, you'll notice that what the \d\d represents is: 29 which are digits. So why doesn't that match, but \w\w does?


Thank you,
Jamie

Miguel Angel Cazajous

unread,
Mar 31, 2023, 10:03:37 PM3/31/23
to Wazuh mailing list
Hi Jamie,

Could you share more logs of what you are trying to decode? I'm not well familiar with the expected format of the logs for pfSense, and If I didn't misunderstand, you also received something like this

2023 Mar 13 22:19:43 ubuntusrvwazuhtest1->10.10.10.253

What makes it useless to use program_name = dhcpd.

I would like to see more logs to see if I can make a regex expression more specific to these logs because I think that this ( <prematch>\w\w\w \w\w \d\d\p\d\d\p\d\d</prematch>) could collide with several other logs.

Concerning the last question about digits you need to specify the amount of them you are expecting. For example, to match 2 digits exactly you need to use \d{2}.

I suggest you use this website https://regex101.com/, it is very helpful when trying to develop a regex expression.

I'm looking forward to your comments. Regards!

Jamie Navarro

unread,
Apr 3, 2023, 12:03:09 PM4/3/23
to Wazuh mailing list
Hi Miguel,

Thanks for getting back to me!

Sure! Here are some more logs that I grabbed from archives.json (from the 'full_log' field):

Apr  3 08:16:09 dhcpd[3246]: DHCPACK on 10.10.10.69 to 02:ec:36:d6:3e:bb via mvneta0.4091
Apr  3 08:18:53 dhcpd[3246]: DHCPREQUEST for 10.10.10.69 from 02:ec:36:d6:3e:bb via mvneta0.4091
Apr  3 08:18:53 dhcpd[3246]: reuse_lease: lease age 1874 (secs) under 25% threshold, reply with unaltered, existing lease for 10.10.10.69
Apr  3 08:18:53 dhcpd[3246]: DHCPACK on 10.10.10.69 to 02:ec:36:d6:3e:bb via mvneta0.4091
Apr  3 08:20:36 dhcpd[3246]: DHCPREQUEST for 10.10.10.73 from 00:3d:e8:e3:6a:e8 (android-9d43c0513f181151) via mvneta0.4091
Apr  3 08:20:36 dhcpd[3246]: DHCPACK on 10.10.10.73 to 00:3d:e8:e3:6a:e8 (android-9d43c0513f181151) via mvneta0.4091
Apr  3 08:22:34 dhcpd[3246]: DHCPREQUEST for 10.10.10.65 from 84:98:66:ca:f5:e3 (Galaxy-Tab-A) via mvneta0.4091
Apr  3 08:22:34 dhcpd[3246]: DHCPACK on 10.10.10.65 to 84:98:66:ca:f5:e3 (Galaxy-Tab-A) via mvneta0.4091
Apr  3 08:23:50 dhcpd[3246]: DHCPREQUEST for 10.10.10.76 from d0:13:fd:56:fc:80 (android-ba8e1085186445ed) via mvneta0.4091
Apr  3 08:23:50 dhcpd[3246]: DHCPACK on 10.10.10.76 to d0:13:fd:56:fc:80 (android-ba8e1085186445ed) via mvneta0.4091
Apr  3 08:28:11 dhcpd[3246]: reuse_lease: lease age 455 (secs) under 25% threshold, reply with unaltered, existing lease for 10.10.10.73
Apr  3 08:28:11 dhcpd[3246]: DHCPREQUEST for 10.10.10.73 (10.10.10.253) from 00:3d:e8:e3:6a:e8 (android-9d43c0513f181151) via mvneta0.4091
Apr  3 08:28:11 dhcpd[3246]: DHCPACK on 10.10.10.73 to 00:3d:e8:e3:6a:e8 (android-9d43c0513f181151) via mvneta0.4091


Yes, exactly, I'm worried that my current regular expression is too generic and may pick up other things (or collide as you called it) that I don't want to, so yes, I would love some help getting it refined better.

Next,
The other format log that I mentioned that I'm getting is coming from archives.log:

2023 Mar 13 22:19:43 ubuntusrvwazuhtest1->10.10.10.253 Mar 13 15:19:43 dhcpd[7416]: DHCPACK on 10.10.10.77 to 56:8d:3d:8f:0c:90 via mvneta0.4091

Are logs always formatted differently between archives.json and archives.log like that?


Thank you for the link to the regular expression website. That brings up another question though. From what I can tell on the Wazuh documentation, I think I'm using the 'OS_Regex' syntax? If that's the case, which 'flavor' should I choose on the Regex101 website? I don't see that in the list. I only see: PCRE2, PCRE, ECMAScript, Python, GoLang, Java 8, .NET (C#), and Rust

Thank you!

Miguel Angel Cazajous

unread,
Apr 3, 2023, 5:18:14 PM4/3/23
to Wazuh mailing list
Hi Jamie,

Well I identified 3 different groups here, it is difficult to create one single regular expression that matches all those cases, but I have made some progress.

First of all, the extra information 2023 Mar 13 22:19:43 ubuntusrvwazuhtest1 is a header added that has nothing to do with the way we parse the original log. I didn't notice at first.

Personally, I feel more confident with PCRE2 regex. OS_regex was the legacy regex format used in Ossec.

I create 3 different regexes for each group.

\w+\s(?=on)on(?<=on)\s+.*\s+(?=to)to(?<=to)\s+.*\s+(?=via)via(?<=via)\s+.*

1.png
As you can see it matches the logs that have a format (... on ... to ... via)

\w+\s(?=for)for(?<=for)\s+.*\s+(?=from)from(?<=from)\s+.*\s+(?=via)via(?<=via)\s+.*

2.png
This is similar to the previous. I think you can merge those two into one if you want.

\w+:\s(?=lease age)lease age(?<=lease age)\s+.*\s+(?=under)under(?<=under)\s+.*\s+(?=threshold)threshold(?<=threshold),\s+.*
3.png

Finally, if you group each regex expression between () and use the OR operator you can match them all.

(\w+\s(?=on)on(?<=on)\s+.*\s+(?=to)to(?<=to)\s+.*\s+(?=via)via(?<=via)\s+.*)|(\w+\s(?=for)for(?<=for)\s+.*\s+(?=from)from(?<=from)\s+.*\s+(?=via)via(?<=via)\s+.*)|(\w+:\s(?=lease age)lease age(?<=lease age)\s+.*\s+(?=under)under(?<=under)\s+.*\s+(?=threshold)threshold(?<=threshold),\s+.*)


4.png

Remember that for using PCRE2 you need to specify it in your decoders with type="pcre2"
https://documentation.wazuh.com/current/user-manual/ruleset/ruleset-xml-syntax/decoders.html

Here you have a brief explanation of both types:
https://documentation.wazuh.com/current/user-manual/ruleset/ruleset-xml-syntax/regex.html

If you find some other log that have a different format you can follow the same approach I did here.

Regards!

Miguel Angel Cazajous

unread,
Apr 3, 2023, 5:20:03 PM4/3/23
to Wazuh mailing list
One more thing, you do not need to consider the header because that is processed during the pre-decoding stage

5.png

Jamie Navarro

unread,
Apr 5, 2023, 8:43:15 AM4/5/23
to Wazuh mailing list
Hi Miguel,

I hate to bother you again, but it seems that Wazuh is not liking the regexes (notice how the colors change from white to blue in the regular expression) you provided:
I'm just trying 1 out of the 3 that you gave me here:

wazuhlocal_decoderpossiblebadregex.png

If I save the above local_decoder.xml file, and restart wazuh-manager, it fails to start and gives me an error:

 wazuh-manager.service - Wazuh manager
     Loaded: loaded (/lib/systemd/system/wazuh-manager.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2023-04-05 05:26:49 PDT; 19s ago
    Process: 181558 ExecStart=/usr/bin/env /var/ossec/bin/wazuh-control start (code=exited, status=1/FAILURE)
        CPU: 1.735s

Apr 05 05:26:46 ubuntusrvwazuhtest1 systemd[1]: wazuh-manager.service: Consumed 3h 40min 53.143s
 CPU time.
Apr 05 05:26:46 ubuntusrvwazuhtest1 systemd[1]: Starting Wazuh manager...
Apr 05 05:26:49 ubuntusrvwazuhtest1 env[181583]: 2023/04/05 12:26:49 wazuh-analysisd: ERROR: (1226): Error reading XML f
ile 'etc/decoders/local_decoder.xml': XMLERR: Element '=on)\s+.*\s+(?=to)to(?<=to)\s+.*\s+(?=via)via(?<=via)\s+.*</regex
' not closed. (line 57).
Apr 05 05:26:49 ubuntusrvwazuhtest1 env[181583]: 2023/04/05 12:26:49 wazuh-analysisd: CRITICAL: (1202): Configuration er
ror at 'etc/decoders/local_decoder.xml'.
Apr 05 05:26:49 ubuntusrvwazuhtest1 env[181558]: wazuh-analysisd: Configuration error. Exiting
Apr 05 05:26:49 ubuntusrvwazuhtest1 systemd[1]: wazuh-manager.service: Control process exited, c
ode=exited, status=1/FAILURE
Apr 05 05:26:49 ubuntusrvwazuhtest1 systemd[1]: wazuh-manager.service: Failed with r
esult 'exit-code'.
Apr 05 05:26:49 ubuntusrvwazuhtest1 systemd[1]: Failed to start Wazuh manager.
Apr 05 05:26:49 ubuntusrvwazuhtest1 systemd[1]: wazuh-manager.service: Consumed 1.735s CPU time.


I tried the other 2 you gave me and it gave me the same error when I try to (re)start wazuh-manager. Do you know why this would be?

And I really appreciate all the help you've given me so far!

Thank you,
Jamie

Miguel Angel Cazajous

unread,
Apr 5, 2023, 12:02:29 PM4/5/23
to Wazuh mailing list
Hi Jamie,

Mmm it seems the XML parser does not like the look-behind symbol, anyway that's redundant, we can use only look-ahead.

Take a look at this approach.

<decoder name="SGPfsense">
   <prematch>dhcpd</prematch>
</decoder>

<decoder name="SGPfsense">
   <use_own_name>true</use_own_name>
   <parent>SGPfsense</parent>
   <regex type="pcre2">(\w+).*:\s+(\w+)\s(?=on)on\s+(.*)\s+(?=to)to\s+(.*)\s+(?=via)via\s+(.*)</regex>
   <order>var1,var2,var3,var4,var5</order>

</decoder>

<decoder name="SGPfsense">
   <parent>SGPfsense</parent>
   <regex type="pcre2">(\w+).*:\s+(\w+)\s(?=for)for\s+(.*)\s+(?=from)from\s+(.*)\s+(?=via)via\s+(.*)</regex>
   <order>var1,var2,var3,var4,var5</order>

</decoder>

<decoder name="SGPfsense">
   <parent>SGPfsense</parent>
   <regex type="pcre2">(\w+).*:\s+(\w+):\s(?=lease age)lease age\s+(.*)\s+(?=under)under\s+(.*)\s+(?=threshold)threshold,\s+(.*)</regex>
   <order>var1,var2,var3,var4,var5</order>
</decoder>


I really don't know the meaning of those logs, so modify them according to your needs. But the parsing is working.

For more information about this approach, please refer to the documentation: https://documentation.wazuh.com/current/user-manual/ruleset/ruleset-xml-syntax/sibling-decoders.html

2023-04-05_12-59.png

You can see all the logs are parsed properly, but again, modify the decoders if they are not parsing the information as you want.

Regards!

Jamie Navarro

unread,
Apr 10, 2023, 5:09:13 PM4/10/23
to Wazuh mailing list
Hi Miguel,

Thank you so much! These are working just fine, so yeah, I'll just modify them a little bit for my use.

Thank you,
Jamie

Miguel Angel Cazajous

unread,
Apr 11, 2023, 8:35:31 AM4/11/23
to Wazuh mailing list
Glad to know that helped! Have a good day!
Reply all
Reply to author
Forward
0 new messages