Getting "Fluent::Plugin::Parser::ParserError" when Elasticsearch cluster runs out of space

sges

unread,

Oct 26, 2018, 1:45:02 PM10/26/18

to Fluentd Google Group

I'm relatively new to fluentd.

I setup a fluentd aggregator to send logs simultaneously to ES (Elasticsearch Service managed by AWS), S3 and Splunk. In testing different failure scenarios, when the ES cluster ran out of space I started getting the errors similar to the following in the fluentd agent logs:

2018-10-25 17:32:05 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not match with data 'level=DEBUG hostIP= correlationId= thread=-kinesis-consumer-1 component=com.amazonaws.requestId message=\"x-amzn-RequestId: f56f4f15-d357-ed74-a123-4f43886ebec0\"'" location=nil tag="raw.syslog" time=2018-10-25 17:32:06.000000000 +0000 record={"host"=>"xxxxxx-xxxxxxx-6bbb864b4f-l9ck4", "ident"=>"BillingManagement", "@timestamp"=>"2018-10-25T17:32:06.566Z", "log"=>"level=DEBUG hostIP= correlationId= thread=-kinesis-consumer-1 component=com.amazonaws.requestId message=\"x-amzn-RequestId: f56f4f15-d357-ed74-a123-4f43886ebec0\""}

While ES stopped accepting logs, S3 and Splunk continued receiving logs from the fluentd aggregator without a glitch. As soon as the disk space issue was resolved, the parser error in the fluentd aggregator stopped and ES resumed accepting logs and I was able to see them in Kibana again. All the logs received while the ES cluster was out of space were not buffered to disk by the ES plugin and were lost.

What I was expecting in this scenario was that the out_elasticsearch plugin would fail to flush the buffer and keep buffering the logs to disk until the issue is resolved. Why was I getting a parser error instead that seemed to only affect the ES output plugin? What am I missing?

Thanks,

td-agent.conf

Mr. Fiber

unread,

Oct 30, 2018, 5:50:21 PM10/30/18

to flu...@googlegroups.com

> the ES cluster was out of space were not buffered to disk by the ES plugin and were lost.

fluentd stores logs in the buffer as staged/queued.

When the number of retry reaches retry_max_times value during buffer flush,

fluentd clears queued chunks for next flush.

if you want to change this behaviour, set "retry_forever true" in the buffer configuration.

> Why was I getting a parser error instead that seemed to only affect the ES output plugin? What am I missing?

Hmm... from your configuration, the error happens in parser filter plugin.

Someone seems to send unexpected message to the instance.

I'm not sure this is related with ES issue because your regexp setting doesn't match "log" field value.

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sges

unread,

Nov 19, 2018, 12:14:21 PM11/19/18

to Fluentd Google Group

Thank you Masahiro for the explanation regarding the buffer behavior. I'll try to set retry_forever true.

Regarding the parser error, I was as puzzled as you are. The only reason I assumed it is related to ES is because:

The log messages mentioned in the parser errors somehow made it into Splunk and S3 and they looked correctly parsed
The parser error disappear as soon as the ES storage issue was resolved

I was able to reproduce it. As soon as the ES cluster ran out of space, the parser error reoccurred, but disappeared as soon as I deleted some indexes in ES to clear some space.