Best way to troubleshoot "Agent event queue is flooded"

608 views
Skip to first unread message

Charl Jordan

unread,
Mar 20, 2020, 5:32:24 AM3/20/20
to Wazuh mailing list
Hi All,

After such great response here I've decided to ask another question here.
What is the best way the troubleshoot "Agent event queue is flooded"

I have about 5/6 agent that regularly get this.

2020/03/12 08:55:21 ossec-agent: WARNING: Agent buffer at 90 %.
2020/03/12 08:55:22 ossec-agent: WARNING: Agent buffer is full: Events may be lost.
2020/03/12 08:55:37 ossec-agent: WARNING: Agent buffer is flooded: Producing too many events.
2020/03/12 08:55:51 ossec-agent: WARNING: Agent buffer is full: Events may be lost.
2020/03/12 08:56:06 ossec-agent: WARNING: Agent buffer is flooded: Producing too many events.
2020/03/12 09:01:00 ossec-agent: WARNING: Agent buffer is full: Events may be lost.
2020/03/12 09:01:15 ossec-agent: WARNING: Agent buffer is flooded: Producing too many events.
2020/03/12 09:06:48 ossec-agent: INFO: Agent buffer is under 70 %. Working properly again.


Sor far I have:
  • I have tried to analyse the Windows event logs to attempt to find the offending logs
  • Increase the EPS to maximum on an agent
As I have 5 or 6 different servers all doing the same thing it makes me think that filtering out specific logs is maybe not the solution here.
Is there a better way to figure out what is causing this or how to avoid it. Could it be server resources considering my cluster is on AWS as per my last issue?

Regards,
Charl

Nicolas Papp

unread,
Mar 23, 2020, 11:16:00 AM3/23/20
to Charl Jordan, Wazuh mailing list
Hi Charl, 
It looks like the agent queues are flooding. For troubleshooting this I suggest taking a look at both the ossec.log and alerts.json files in the agents that are flooding. I suspect that you have one or multiple alerts that are being triggered constantly. 
Once we identify which rules are being triggered we could take measures to avoid this situation.

Taking a look at the alerts in the moment previous to the flooding (2020/03/12 08:55:21) should give us a good insight into which types of alert might be generating the flooding. 

You could also check in the manager for messages like the following:

ossec-remoted: WARNING: Message queue is full (262144). Events may be lost.
ossec-analysisd: WARNING: Input buffer is full (1500000). Events may be lost.  
 
This could tell us if the manager is flooding too. If that is the case then we can take a look at your current setup to see if you need to scale up resources.

Best Regards,
Nicolas

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ca821058-2362-4fa8-b5c6-00a48cbc4e62%40googlegroups.com.

Charl Jordan

unread,
Mar 30, 2020, 10:43:03 AM3/30/20
to Wazuh mailing list
Hi Nicolas,

Thanks for the response.

Forgive my ignorance here, is there an alerts.json on the client as well as the Master. I only see one on the Master, in which case, shall I just grep out alerts relating to this host? We have Windows clients.
As for ossec.log on the client. No indication there as to alerts.

I searched from the manager flooded alerts you provided on the master and found nothing, so points back to agent.

Thanks for the time thus far!
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

Nicolas Papp

unread,
Mar 30, 2020, 12:27:38 PM3/30/20
to Charl Jordan, Wazuh mailing list
Hi Charl,

There is no alerts.json on the client side. Yes, I recommend you to grep alerts from that particular host and try to figure out which module is causing most of the alerts. Both Syscheck and Windows Filtering Platform could be causing problems depending on 
your configuration. You can try ignoring folders or turning off features that are causing the alerts to see if you stop seeing the flooding messages.

Here is a link to the Anti-flooding mechanism documentation: https://documentation.wazuh.com/3.12/user-manual/capabilities/antiflooding.html 

Here you can see common source of Flooding. Normally there are either directories that are tracked by FIM and are constantly changing, or a very broad configuration of Windows Security Log Events.

Let me know if this helps,
Best Regards,
Nicolas

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/b10bdfe1-ab5b-422a-9550-42400d9ba7c1%40googlegroups.com.

Charl Jordan

unread,
Apr 2, 2020, 3:50:17 AM4/2/20
to Wazuh mailing list
Hi Nicolas,

I don't see a massive amount of alerts for the host


[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T05:" | wc -l

1273

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T06:" | wc -l

12832

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T07:" | wc -l

12261

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T07:02:" | wc -l

1633

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T08:" | wc -l

927


An example of the spike here


{"win":{"system":{"providerName":"Dynamics Server 01","eventID":"110","level":"2","task":"0","keywords":"0x80000000000000","systemTime":"2020-03-12T07:03:59.965448700Z","eventRecordID":"6650397","channel":"Application","computer":"AZ-PAX-AOS01.hosted.domain.local","severityValue":"ERROR","message":"Object Server 01:  User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied."},"eventdata":{"data":"Object Server 01:, User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied."}}}

{"win":{"system":{"providerName":"Dynamics Server 01","eventID":"110","level":"2","task":"0","keywords":"0x80000000000000","systemTime":"2020-03-12T07:03:59.965448700Z","eventRecordID":"6650397","channel":"Application","computer":"AZ-PAX-AOS01.hosted.domain.local","severityValue":"ERROR","message":"Object Server 01:  User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied."},"eventdata":{"data":"Object Server 01:, User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied."}}}


However I did notice something interesting here


 @timestamp
Mar 12, 2020 @ 09:02:59.894
 _id
-k-NzXABANMpWbSJtWTT
 _index
wazuh-alerts-3.x-2020.03.12
 _score
1
 _type
_doc
 agent.id
006
 agent.ip
10.102.12.8
 agent.name
AZ-PAX-AOS01
 cluster.name
wazuh
 cluster.node
wazuh-master
 data.win.eventdata.data
Object Server 01:, User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied.
 data.win.system.channel
Application
 data.win.system.computer
AZ-PAX-AOS01.hosted.domain.local
 data.win.system.eventID
110
 data.win.system.eventRecordID
6617292
 data.win.system.keywords
0x80000000000000
 data.win.system.level
2
 data.win.system.message
Object Server 01: User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied.
 data.win.system.providerName
Dynamics Server 01
 data.win.system.severityValue
ERROR
 data.win.system.systemTime
2020-03-12T07:01:29.620933300Z




There is a 2 hour discrepency between systemtime and timestamp. Perhaps this is causing confusion as to when the flooding is actually occurring?


That being said, I dont seem to find any alerts that are breaching the 500EPS rate.



To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

Nicolas Papp

unread,
Apr 7, 2020, 8:25:21 AM4/7/20
to Charl Jordan, Wazuh mailing list
Hi Charl, 
Yes, indeed by the way you are greeping the alerts files you should consider that 2 hour difference. For what I see the corresponding time has a considerable higher amount of alerts than in other time frames:

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T06:" | wc -l

12832

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T07:" | wc -l

12261

They do not look very high enough to flood the queue, but could be the case if you are also experiencing disconnections or some other network problems. Could that be the case? 

Can you send me your client_buffer stanza configuration? just in case we are missing out on something.

Best Regards,

Nicolas


To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ad966387-6eee-4de6-807a-4c3c962193fb%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages