Best way to troubleshoot "Agent event queue is flooded"

Charl Jordan

unread,

Mar 20, 2020, 5:32:24 AM3/20/20

to Wazuh mailing list

Hi All,

After such great response here I've decided to ask another question here.

What is the best way the troubleshoot "Agent event queue is flooded"

I have about 5/6 agent that regularly get this.

2020/03/12 08:55:21 ossec-agent: WARNING: Agent buffer at 90 %.

2020/03/12 08:55:22 ossec-agent: WARNING: Agent buffer is full: Events may be lost.

2020/03/12 08:55:37 ossec-agent: WARNING: Agent buffer is flooded: Producing too many events.

2020/03/12 08:55:51 ossec-agent: WARNING: Agent buffer is full: Events may be lost.

2020/03/12 08:56:06 ossec-agent: WARNING: Agent buffer is flooded: Producing too many events.

2020/03/12 09:01:00 ossec-agent: WARNING: Agent buffer is full: Events may be lost.

2020/03/12 09:01:15 ossec-agent: WARNING: Agent buffer is flooded: Producing too many events.

2020/03/12 09:06:48 ossec-agent: INFO: Agent buffer is under 70 %. Working properly again.

Sor far I have:

I have tried to analyse the Windows event logs to attempt to find the offending logs
Increase the EPS to maximum on an agent

As I have 5 or 6 different servers all doing the same thing it makes me think that filtering out specific logs is maybe not the solution here.

Is there a better way to figure out what is causing this or how to avoid it. Could it be server resources considering my cluster is on AWS as per my last issue?

Regards,

Charl

Nicolas Papp

unread,

Mar 23, 2020, 11:16:00 AM3/23/20

to Charl Jordan, Wazuh mailing list

Hi Charl,

It looks like the agent queues are flooding. For troubleshooting this I suggest taking a look at both the ossec.log and alerts.json files in the agents that are flooding. I suspect that you have one or multiple alerts that are being triggered constantly.

Once we identify which rules are being triggered we could take measures to avoid this situation.

Taking a look at the alerts in the moment previous to the flooding (2020/03/12 08:55:21) should give us a good insight into which types of alert might be generating the flooding.

You could also check in the manager for messages like the following:

ossec-remoted: WARNING: Message queue is full (262144). Events may be lost.

ossec-analysisd: WARNING: Input buffer is full (1500000). Events may be lost.

This could tell us if the manager is flooding too. If that is the case then we can take a look at your current setup to see if you need to scale up resources.

Best Regards,
Nicolas

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ca821058-2362-4fa8-b5c6-00a48cbc4e62%40googlegroups.com.

Charl Jordan

unread,

Mar 30, 2020, 10:43:03 AM3/30/20

to Wazuh mailing list

Hi Nicolas,

Thanks for the response.

Forgive my ignorance here, is there an alerts.json on the client as well as the Master. I only see one on the Master, in which case, shall I just grep out alerts relating to this host? We have Windows clients.

As for ossec.log on the client. No indication there as to alerts.

I searched from the manager flooded alerts you provided on the master and found nothing, so points back to agent.

Thanks for the time thus far!

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

Nicolas Papp

unread,

Mar 30, 2020, 12:27:38 PM3/30/20

to Charl Jordan, Wazuh mailing list

Hi Charl,

There is no alerts.json on the client side. Yes, I recommend you to grep alerts from that particular host and try to figure out which module is causing most of the alerts. Both Syscheck and Windows Filtering Platform could be causing problems depending on

your configuration. You can try ignoring folders or turning off features that are causing the alerts to see if you stop seeing the flooding messages.

Here is a link to the Anti-flooding mechanism documentation: https://documentation.wazuh.com/3.12/user-manual/capabilities/antiflooding.html

Here you can see common source of Flooding. Normally there are either directories that are tracked by FIM and are constantly changing, or a very broad configuration of Windows Security Log Events.

Let me know if this helps,

Best Regards,

Nicolas

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ca821058-2362-4fa8-b5c6-00a48cbc4e62%40googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/b10bdfe1-ab5b-422a-9550-42400d9ba7c1%40googlegroups.com.

Charl Jordan

unread,

Apr 2, 2020, 3:50:17 AM4/2/20

to Wazuh mailing list

Hi Nicolas,

I don't see a massive amount of alerts for the host

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T05:" | wc -l

1273

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T06:" | wc -l

12832

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T07:" | wc -l

12261

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T07:02:" | wc -l

1633

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T08:" | wc -l

927

An example of the spike here

{"win":{"system":{"providerName":"Dynamics Server 01","eventID":"110","level":"2","task":"0","keywords":"0x80000000000000","systemTime":"2020-03-12T07:03:59.965448700Z","eventRecordID":"6650397","channel":"Application","computer":"AZ-PAX-AOS01.hosted.domain.local","severityValue":"ERROR","message":"Object Server 01: User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied."},"eventdata":{"data":"Object Server 01:, User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied."}}}

However I did notice something interesting here

@timestamp	Mar 12, 2020 @ 09:02:59.894
_id	-k-NzXABANMpWbSJtWTT
_index	wazuh-alerts-3.x-2020.03.12
_score	1
_type	_doc
agent.id	006
agent.ip	10.102.12.8
agent.name	AZ-PAX-AOS01
cluster.name	wazuh
cluster.node	wazuh-master
data.win.eventdata.data	Object Server 01:, User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied.
data.win.system.channel	Application
data.win.system.computer	AZ-PAX-AOS01.hosted.domain.local
data.win.system.eventID	110
data.win.system.eventRecordID	6617292
data.win.system.keywords	0x80000000000000
data.win.system.level	2
data.win.system.message	Object Server 01: User 'MauP' is not authorised to select a record in table 'CustPackingSlipSalesLink'. Request denied.
data.win.system.providerName	Dynamics Server 01
data.win.system.severityValue	ERROR
data.win.system.systemTime	2020-03-12T07:01:29.620933300Z

There is a 2 hour discrepency between systemtime and timestamp. Perhaps this is causing confusion as to when the flooding is actually occurring?

That being said, I dont seem to find any alerts that are breaching the 500EPS rate.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ca821058-2362-4fa8-b5c6-00a48cbc4e62%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+unsubscribe@googlegroups.com.

Nicolas Papp

unread,

Apr 7, 2020, 8:25:21 AM4/7/20

to Charl Jordan, Wazuh mailing list

Hi Charl,

Yes, indeed by the way you are greeping the alerts files you should consider that 2 hour difference. For what I see the corresponding time has a considerable higher amount of alerts than in other time frames:

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T06:" | wc -l

12832

[root@ip-10-0-0-72 Mar]# cat ossec-alerts-12-AZPAXAOS01.log | grep "2020-03-12T07:" | wc -l

12261

They do not look very high enough to flood the queue, but could be the case if you are also experiencing disconnections or some other network problems. Could that be the case?

Can you send me your client_buffer stanza configuration? just in case we are missing out on something.

Best Regards,

Nicolas

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ca821058-2362-4fa8-b5c6-00a48cbc4e62%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/b10bdfe1-ab5b-422a-9550-42400d9ba7c1%40googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Wazuh mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to wazuh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wazuh/ad966387-6eee-4de6-807a-4c3c962193fb%40googlegroups.com.

Reply all

Reply to author

Forward