Wodle interval and time offset config

484 views
Skip to first unread message

Gen

unread,
Mar 26, 2019, 10:47:52 AM3/26/19
to Wazuh mailing list
Hi,

I'm struggling to configure the correct <interval> and <time_offset> settings in my "azure-logs" wodle to avoid duplicated events in Kibana.

The problem I am having at the moment is that the file structure within the azure container that I am pulling logs from is as follows:

../../../y=2-019/m=03/d=26/h=13/m=00/LOGLIFE.json
../../../y=2-019/m=03/d=26/h=12/m=00/LOGLIFE.json

and so on.

My wodle is configured as follows:

  <wodle name="azure-logs">
   
<disabled>no</disabled>
   
<interval>1h</interval>
   
<run_on_start>yes</run_on_start>

   
<storage>

     
<account_name>*****</account_name>
     
<account_key>***************************************</account_key>
     
<tag>nsg-flow</tag>

     
<container name="******************">
       
<blobs>.json</blobs>
       
<content_type>json_file</content_type>
       
<time_offset>1h</time_offset>
     
</container>

   
</storage>

 
</wodle>

What seems to happen is that the query first executes when ossec is restarted and then every hour at the same minute. The problem seems to be that because of the nature of the json file, the <time_offset> seems to be ignored and the query processes the whole LOGFILE.json. Moreover, as <interval> is set to "1h", the query seems to process what's in the current file and the previous file fully.

For example - if the query was to be executed at 14:30 it would process events from "/m=03/d=26/h=14/m=00/LOGLIFE.json" which will contain data from 14:00 to 14:30 (the file in Azure is updated at minute intervals) as well as events from "/m=03/d=26/h=13/m=00/LOGLIFE.json" which contains events from 13:00 to 14:00.

The problem is that the next time the query is executed at 15:30, it will once again process the whole of "/m=03/d=26/h=14/m=00/LOGLIFE.json" which will result in 30 minutes of duplicated data (14:00 to 14:30). Here's the resulting chart to demonstrate this:

Capture.JPG


Any ideas on how to avoid this?


Gen

unread,
Mar 26, 2019, 11:05:55 AM3/26/19
to Wazuh mailing list
Also, I would like to have these logs processed as close to real time as possible without putting too much of a strain on resources.

Gen

unread,
Mar 29, 2019, 5:49:18 AM3/29/19
to Wazuh mailing list
Anyone?

Gen

unread,
Apr 3, 2019, 4:12:08 AM4/3/19
to Wazuh mailing list
Bump.

If there's no method to configure this in real-time, can someone assist me with configuring the wodle to run at a specific time every hour? For example, at 12:01, 13:01, 14:01 etc...? So it will be hourly intervals but at a specific time. That way I can get the wodle to read the full log from the previous hour.

Thanks,
Gen.

Sergio Peral

unread,
Apr 3, 2019, 12:01:10 PM4/3/19
to Wazuh mailing list
Hi Gen,

In the first place, sorry for the late reply.

In the wodle's directory, /var/ossec/wodles/azure, there should be a json file called last_dates.json, where we store the last generated event's date. We should check if that file is storing those dates properly. Can you attach your last_dates.json file after a few executions of the azure-logs wodle? 

About your latest post, please try the following configuration:

 
<wodle name="azure-logs">
   
<disabled>no</disabled>

   
<time>01:01</time>
   
<interval>1h</interval>
   
<run_on_start>no</run_on_start>

   
<storage>


     
<account_name>*****</account_name>
     
<account_key>***************************************</account_key>
     
<tag>nsg-flow</tag>


     
<container name="******************">
       
<blobs>.json</blobs>
       
<content_type>json_file</content_type>
       
<time_offset>1h</time_offset>
     
</container>


   
</storage>


 
</wodle>


This way, Wazuh will get your first events at 01:01, being those events the ones which took place from 00:01 to 01:01.

Please let us know if this worked for you.

Best regards,
Sergio.

Sergio Peral

unread,
Apr 3, 2019, 12:04:30 PM4/3/19
to Wazuh mailing list
You can also set the <run_at_start> flag to yes, or the starting <time> that fits your needs.

Gen

unread,
Apr 4, 2019, 4:10:36 AM4/4/19
to Wazuh mailing list
Hi Sergio,

Thanks for your reply!

The last_dates.json file looks as follows:

{"graph": {}, "log_analytics": {"***************": "2019-03-25T15:18:17.651Z", "***************": "2019-02-07T13:10:04.471Z"}, "storage": {"***************": "2019-04-04 07:42:19+00:00"}}

Last execution time is 08:42 (BST) according to logs in Kibana.

Gen

unread,
Apr 5, 2019, 4:01:15 AM4/5/19
to Wazuh mailing list
Hi Sergio,

I've tried your suggestion by using

    <time>01:01</time>
   
<interval>1h</interval>

but when restarting ossec I get a warning:

 WARNING: At module 'azure-logs': Interval must be a multiple of one day. New interval value: 1d.

It seems you can't use both <time> and have <interval> set to 1h.

Thanks,
Gen.

On Wednesday, April 3, 2019 at 5:01:10 PM UTC+1, Sergio Peral wrote:

Gen

unread,
Apr 5, 2019, 8:58:07 AM4/5/19
to Wazuh mailing list
Is there a way to specify multiple time values? e.g. <time>01:01 02:01 03:01 etc...</time>?

Or would it work if I create a new woddle config that is identical except increment the time by 1 hour until all the hours are covered? It would have 24 identical wodles then.

Ideally I would like to solve the issue of it not recognising the time in the blob .json storage file so that I can have close to real-time logs but I'm out of ideas on how to do that.

Thanks,
Gen.

Gen

unread,
Apr 8, 2019, 8:55:31 AM4/8/19
to Wazuh mailing list
For some reason the wodle stopped processing azure container logs completely now.

I've checked /var/ossec/logs/azure_logs.log and I'm getting the following output:

04/08/2019 01:37:29 PM INFO: AZURE Azure Storage starting.
04/08/2019 01:37:29 PM ERROR: AZURE Error: The file of the last dates could not be updated: 'No JSON object could be decoded.
04/08/2019 01:37:29 PM INFO: AZURE Storage: Authenticating.
04/08/2019 01:37:29 PM INFO: AZURE Storage: Authenticated.
04/08/2019 01:37:29 PM INFO: AZURE Storage: Getting containers.
04/08/2019 01:37:29 PM ERROR: AZURE  Storage account: 'local variable 'all_dates' referenced before assignment'.
04/08/2019 01:37:29 PM INFO: AZURE Storage: End

Also when checking the last_dates.json file now - it's empty.

Any pointers on how to fix this? I don't think I've changed anything that would have broken it.

It seems the wodle is expecting a specific date format in the container blob which it fails to recognise if the date in the json file doesn't match that. I've found the Python script for the azure-logs wodle (/var/ossec/wodles/azure/azure-logs) but i'm not that familiar with Python. Can anyone direct me to the relevant "dates" sections?

Thanks in advance!
Gen

Gen

unread,
Apr 8, 2019, 9:11:31 AM4/8/19
to Wazuh mailing list
Ok I've fixed the logs not coming in issue by deleting the last_dates.json file.

Now just to fix the problem of not getting the correct time_offset...

Sergio Peral

unread,
Apr 8, 2019, 2:26:15 PM4/8/19
to Wazuh mailing list
Hi Gen,

Can you try setting a 1h <interval> and a 1s <time_offset>? <time_offset> is only relevant for the first execution, since the next ones will look inside last_dates.json to know when the last event was registered. I want to see if this procedure helps fixing your duplicated events issue.

If it didn't work for you, please answer the following questions:

- How many Azure events do you have per hour?
- Could you please show me a couple of duplicated alerts' timestamp in Kibana?

Regards,
Sergio.

Gen

unread,
Apr 9, 2019, 4:54:20 AM4/9/19
to Wazuh mailing list
Hi Sergio,

I had tried that previously. <time_offset> only accepts minute, hour or day interval. I currently have the <time_offset> set to 1m with a 1h <interval> which means it only processes the logs for the past hour depending on when ossec was restarted. For example, if I restart ossec at 13:30, it will get the logs from 13:00 to 13:30. If I reset ossec at 14:01, it will get the logs for 13:00 to 14:00 (with no duplicated results). However, this isn't ideal for two reasons:
  1. If ossec is restarted for any reason at a time that isn't exactly 1 minute past the hour, it will mess up the log collection and create missing events (no duplicates as time_offset only covers 1m which will activate only the CURRENT log file).
  2. Logs get collected only once an hour and the oldest ones are 1 hour late.
I think the problem is with the structure of the json log file in azure. A new folder structure is created every hour inside the blob (1 minute past the hour) with a new log file inside the "m=00" subfolder. Once the json file is created, it is updated every minute with new data (with timestamps) but the data gets appended to the same line (the whole hourly log file is 1 long line). If I show you an example of the log file it will probably help to explain the issue better.

Here's an extract (note this is 1 line but there are events from 2 timestamps, 23:00:35 and 23:01:35):

{"records":[{"time":"2019-04-08T23:00:35.4315192Z","systemId":"***********","category":"NetworkSecurityGroupFlowEvent","resourceId":"***************","operationName":"NetworkSecurityGroupFlowEvents","properties":{"Version":2,"flows":[{"rule":"*******","flows":[{"mac":"********","flowTuples":["1554764375,<src.ip>,<dst.ip>,<src.port>,<dst.port>,T,O,A,B,,,,","1554764375,<src.ip>,<dst.ip>,<src.port>,<dst.port>,T,O,A,B,,,,"]}]},{"rule":"*********","flows":[{"mac":"********","flowTuples":["1554764400,<src.ip>,<dst.ip>,<src.port>,<dst.port>,55000,T,O,A,B,,,,","1554764401,<src.ip>,<dst.ip>,<src.port>,<dst.port>,T,O,A,B,,,,"]}]}]}},{"time":"2019-04-08T23:01:35.4336223Z","systemId":"***********","category":"NetworkSecurityGroupFlowEvent","resourceId":"***********","operationName":"NetworkSecurityGroupFlowEvents","properties":{"Version":2,"flows":[{"rule":"********","flows":[{"mac":"*********","flowTuples":["1554764434,<src.ip>,<dst.ip>,<src.port>,<dst.port>,T,O,A,B,,,,"]}]}]}},

More info on the structure of the log file: https://docs.microsoft.com/en-us/azure/network-watcher/network-watcher-nsg-flow-logging-overview

What would help is to understand how the script checks the time of the log file (does it look at the "Last Modified" property of the json file or does it look inside the json file for specific time-based strings)? I suspect the former.

I don't think the number of events is relevant but I'm getting around 7,000 events per hour (up to 120 per minute).

Thanks!
Gen.

Rukender attri

unread,
Apr 6, 2020, 5:04:28 AM4/6/20
to Wazuh mailing list
Hi Gen,

I can see your Azure log monitoring is working fine. 
Could you please help me work on my setup. I have configured all the required settings but till I'm not seeing any alerts in Kibaina and neither I'm seeing anything on Wazuh alert logs. 
When I checked the azure. logs, I found some error messages. I'm sharing the azure.log with you. Please help me resolve this issue.

04/06/2020 08:52:04 AM INFO: AZURE Azure Storage starting.
04/06/2020 08:52:04 AM INFO: AZURE Storage: Authenticating.
04/06/2020 08:52:04 AM INFO: AZURE Storage: Authenticated.
04/06/2020 08:52:04 AM INFO: AZURE Storage: Getting containers.
04/06/2020 08:52:04 AM INFO: AZURE Storage: Getting blobs.
04/06/2020 08:52:04 AM DEBUG: AZURE String_to_sign=GET

x-ms-client-request-id:**************
x-ms-date:Mon, 06 Apr 2020 08:53:06 GMT
x-ms-version:2018-03-28
comp:list
restype:container
04/06/2020 08:53:06 AM INFO: AZURE Client-Request-ID=************** Outgoing request: Method=GET, Path=/wazuh.blob.core.windows.net/insights-operational-logs, Query={'restype': 'container', 'comp': 'list', 'prefix': None, 'delimiter': None, 'marker': None, 'maxresults': None, 'include': None, 'timeout': None}, Headers={'x-ms-version': '2018-03-28', 'User-Agent': 'Azure-Storage/1.4.2-1.4.0 (Python CPython 3.7.2; Linux 3.10.0-1062.9.1.el7.x86_64)', 'x-ms-client-request-id': '**************', 'x-ms-date': 'Mon, 06 Apr 2020 08:53:06 GMT', 'Authorization': 'REDACTED'}.
04/06/2020 08:53:06 AM DEBUG: AZURE Starting new HTTPS connection (4): architecture:443
04/06/2020 08:53:06 AM INFO: AZURE Client-Request-ID=************** Operation failed: checking if the operation should be retried. Current retry count=3, , HTTP status code=Unknown, Exception=HTTPSConnectionPool(host='architecture', port=443): Max retries exceeded with url: /wazuh.blob.core.windows.net/insights-operational-logs?restype=container&comp=list (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc42330bc88>: Failed to establish a new connection: [Errno -2] Name or service not known')).
04/06/2020 08:53:06 AM ERROR: AZURE Client-Request-ID=************** Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=HTTPSConnectionPool(host='architecture', port=443): Max retries exceeded with url: /wazuh.blob.core.windows.net/insights-operational-logs?restype=container&comp=list (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc42330bc88>: Failed to establish a new connection: [Errno -2] Name or service not known')).
04/06/2020 08:53:06 AM ERROR: AZURE Error getting blobs: 'HTTPSConnectionPool(host='architecture', port=443): Max retries exceeded with url: /wazuh.blob.core.windows.net/insights-operational-logs?restype=container&comp=list (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc42330bc88>: Failed to establish a new connection: [Errno -2] Name or service not known'))'.
04/06/2020 08:53:06 AM ERROR: AZURE  Storage account: 'local variable 'blobs' referenced before assignment'.
04/06/2020 08:53:06 AM INFO: AZURE Storage: End
Reply all
Reply to author
Forward
0 new messages