grok plugin performance

pkae...@launchdarkly.com

unread,

Apr 25, 2016, 2:19:15 AM4/25/16

to Fluentd Google Group

Hello

I'm trying to get started with fluentd (td-agent, actually) to ferry logs around (currently to just S3 and elasticsearch, but I'd like to be able to have alerts trigger on certain log messages as well).

The solution I am trying to replace used grok to define the log format, so I figured I could just port those rules to fluentd using the grok parser plugin [1]. Everything looked good in my low-volume test environment, but when I started sending production logs through it, the CPU pegged. At this point, I am really only sending HTTP access logs from one of our load balancers, so this is not even full production load.

I have read that fluentd is written in ruby, except for the perfomance-critical parts, but I fear that I may have lost that performance tuning by depending on a ruby plugin for such a critical aspect.

Is this something that others have seen too? Is anyone using the grok plugin in a high-volume environment (~25k messages/sec)? Would I be better off writing a regex from scratch and using the builtin fluentd stuff? Would I be better off getting haproxy to log JSON messages (it can't do this on purpose, but I found someone's idea [2] that is pretty clever to get JSON logs)?

Thanks for any ideas!

-Patrick

[1] https://github.com/kiyoto/fluent-plugin-grok-parser

[2] https://jablonskis.org/2014/haproxy-logging-to-syslog-in-json/index.html

Mr. Fiber

unread,

Apr 25, 2016, 7:55:43 AM4/25/16

to Fluentd Google Group

What the Grok pattern do you use?

I tried nginx log with 25k msgs/sec on my Mac and Fluentd's CPU usage is around 30 - 50%.

https://github.com/fluent/fluentd/blob/73b54713ce9083443a5f0b5c0d4fbc8504f0d436/lib/fluent/parser.rb#L691

Could you show me your configuration and actula log example?

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pkae...@launchdarkly.com

unread,

Apr 25, 2016, 1:46:35 PM4/25/16

to Fluentd Google Group

Thanks Masahiro, here is the grok pattern I am using:

%{NOTSPACE} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request}/%{INT:time_queue}/%{INT:time_backend_connect}/%{INT:time_backend_response}/%{NOTSPACE:time_duration} %{INT:http_status_code} %{NOTSPACE:bytes_read} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn}/%{INT:feconn}/%{INT:beconn}/%{INT:srvconn}/%{NOTSPACE:retries} %{INT:srv_queue}/%{INT:backend_queue} (\{%{LDHAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )? "(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:uri_path})?( HTTP/%{NUMBER:http_version})?))?"

The custom patterns referenced are:

LDHAPROXYCAPTUREDREQUESTHEADERS %{DATA:user_agent}\|%{DATA:request_id}\|%{DATA:account_id}\|%{DATA:user}\|%{DATA:origin}\|%{DATA:auth_kind}\|%{DATA:environment_id}

At this point, I just have this, so I'm pretty sure that the input parsing is what is pegging the CPU (as opposed to processing the events or outputting, though I could have issues there as well):

<source>
  @type syslog
  port 2514
  protocol_type udp
  tag haproxy


  format grok


  grok_pattern %{NOTSPACE} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request}/%{INT:time_queue}/%{INT:time_backend_connect}/%{INT:time_backend_response}/%{NOTSPACE:time_duration} %{INT:http_status_code} %{NOTSPACE:bytes_read} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn}/%{INT:feconn}/%{INT:beconn}/%{INT:srvconn}/%{NOTSPACE:retries} %{INT:srv_queue}/%{INT:backend_queue} (\{%{LDHAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )? "(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:uri_path})?( HTTP/%{NUMBER:http_version})?))?"
  custom_pattern_path /etc/td-agent/grok/haproxy_grok_patterns
  </source>


<match haproxy.**>
  @type null
</match>

Here is a sample of a log message (I commented out the grok bits, and changed the the output to stdout):

2016-04-25 17:44:26 +0000 haproxy.local1.info: {"host":"ip-10-10-1-251","ident":"haproxy","pid":"10627","message":"10.10.3.85:52610 [25/Apr/2016:17:44:26.573] attribute-recorder-3030-in attribute-recorder-3030-out/attribute-recorder-10.10.3.62 0/0/1/0/1 202 132 - - ---- 27/20/3/0/0 0/0 {Go-http-client/1.1|571E577A3A099577AB|560d7010f6281a925f604343||event-recorder|token|560d7010f1e8582fe3000006}

It looks like my grok rule is quite a bit more complex than the one you are using for nginx. Is that the source of my troubles?

Thanks!

Mr. Fiber

unread,

Apr 26, 2016, 4:49:09 AM4/26/16

to Fluentd Google Group

Thanks for the detailed information.

I think your complicated regexp causes slow performance.

I will check grok's built-in regexps use slow pattern or not with your example.

Mr. Fiber

unread,

Apr 26, 2016, 11:06:36 PM4/26/16

to Fluentd Google Group

I compared the CPU usage between fluentd's regexp and grok pattern with 25k apache log msg/sec.

- format apache2

CPU usage is around 30%(25 - 33%) on my Mac.

- format grok and grok_pattern %{COMBINEDAPACHELOG}

CPU usage is around 60%(52% - 65$) on my Mac.

I assume the cause is grok's complicated regexp patterns.

fluentd's apache2 uses following pattern:

/^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/

grok's %{COMBINEDAPACHELOG} uses following pattern:

/(?<clientip>(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))|\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))) (?<ident>[a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)|[a-zA-Z0-9._-]+) (?<auth>[a-zA-Z0-9._-]+) \[(?<timestamp>(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])/\b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|ä)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b/(?>\d\d){1,2}:(?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9]) (?:[+-]?(?:[0-9]+)))\] "(?:(?<verb>\b\w+\b) (?<request>\S+)(?: HTTP/(?<httpversion>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))))?|(?<rawrequest>.*?))" (?<response>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?:(?<bytes>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))|-) (?<referrer>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<agent>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))/

If you don't need these strict patterns, writing simple pattern is better for the performance.

Patrick Kaeding

unread,

Apr 26, 2016, 11:58:58 PM4/26/16

to flu...@googlegroups.com

Oh, wow, this is immensely helpful! Thanks for running the benchmarks; I will see if I can write a simple regex by hand. I'm sure I can come up with something simpler than what is generated by the grok rule.

Thanks again!

You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/3iXJNOwfpcM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Patrick Kaeding

pkae...@launchdarkly.com

unread,

Apr 29, 2016, 8:45:18 PM4/29/16

to Fluentd Google Group

Okay, I got some improvement from using the following regex pattern:

(?<message>\S*?+ \[(?<time>\d{2}+/\w+/\d{4}+:\d{2}+:\d{2}+:\d{2}+\.\d{3}+)\] (?<frontend_name>\S+) (?<backend_name>\S+)/(?<server_name>\S+) (?<time_request>\d+)/(?<time_queue>\d+)/(?<time_backend_connect>\d+)/(?<time_backend_response>\d+)/(?<time_duration>\d+) (?<http_status_code>\d+) (?<bytes_read>\S++) \S++ \S++ (?<termination_state>\S+) (?<act_conn>\d+)/(?<fe_conn>\d+)/(?<be_conn>\d+)/(?<srv_conn>\d+)/(?<retries>\S+) (?<srv_queue>\d+)/(?<backend_queue>\d+) {(?<user_agent>[^|]*+)\|(?<request_id>[^|]*+)\|(?<account_id>[0-9a-fA-f]*?)\|(?<user>[^|]*?)\|(?<origin>[^|]*?)\|(?<auth_kind>[^|]*?)\|(?<environment_id>[0-9a-fA-f]*+)}(?: )? "(?<http_verb>\w+) (?:(?<http_proto>\w+)://)?(?<http_host>[^/]+)?(?<uri_path>\S+)(?: HTTP/(?<http_version>[^"]*))?")

However, the CPU usage is still quite high (80-95%), and I'm only doing about 3k messages per second at this point. I'm not using the beefiest of boxes (EC2 m3.medium), so maybe that is the problem?

Is there anything else I can try?

Thanks!

To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/3iXJNOwfpcM/unsubscribe.

To unsubscribe from this group and all its topics, send an email to fluentd+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Patrick Kaeding
pkae...@launchdarkly.com

Kiyoto Tamura

unread,

Apr 29, 2016, 9:23:29 PM4/29/16

to flu...@googlegroups.com

Hi Patrick (the original author of Grok parser for Fluentd)

I see. One idea that I've been eager to try: using the re2 binding instead of the native Ruby regex (re2 is known to have more stable performance against matching a long sequence). I will look into this this weekend and see if this is going to help at all with regexp parsing (It's possible that it won't at all).

Kiyoto

You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/3iXJNOwfpcM/unsubscribe.

To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Patrick Kaeding
pkae...@launchdarkly.com

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Refer a friend to Treasure Data and get paid! Ask me about our referral program =)

Kiyoto Tamura

unread,

Apr 30, 2016, 5:54:28 PM4/30/16

to flu...@googlegroups.com

Patrick,

Can you give me an example of the string that the above regex matches against?

Patrick Kaeding

unread,

Apr 30, 2016, 10:40:47 PM4/30/16

to flu...@googlegroups.com

Hi Kiyoto

Here is an example log message that I'm trying to parse:

23.235.46.38:48438 [29/Apr/2016:22:05:56.009] gonfalon-3000-fastly-in~ gonfalon-3000-out/gonfalon-10.10.4.225 25/0/0/3/28 200 404 - - ---- 216/213/0/1/0 0/0 {PythonClient/0.17.0|387404993|||||} "POST /api/events/bulk HTTP/1.1"

Thanks!

Kiyoto Tamura

unread,

Apr 30, 2016, 10:42:09 PM4/30/16

to flu...@googlegroups.com

Patrick,

Thanks!

For the leading "23.235.46.38:48438", can you assume that it's \S+?

Kiyoto

Patrick Kaeding

unread,

Apr 30, 2016, 11:13:16 PM4/30/16

to flu...@googlegroups.com

Yeah, that would be a safe assumption.

Kiyoto Tamura

unread,

May 1, 2016, 3:00:04 AM5/1/16

to flu...@googlegroups.com

Hi Patrick,

I did some benchmarking, and unfortunately it looks like you are hitting Ruby's regex performance limit: https://gist.github.com/anonymous/5a05e5f45e206e47bb82ea8d91cb57f9 My machine was similar to yours (n1-standard-1 on Google Cloud Platform. 1vCPU + 3.75GB memory)

With Ruby's native regexp, you should still be able to performance 100k-200k

There's a couple of ways to improve performance:

1. Parse less: Instead of parsing all fields, performance may improve if you bundle together fields (such as the ones separated by /'s) and delegate further parsing to your backend.

2. Test on a beefier machine

3. Write a custom parser: Although your mileage may vary. When I tested naive whitespace-based splitting, it was only 2x as fast as the above regexps.

4. Instead of tailing a log file, create an nginx module and send data directly to Fluentd. I see an old module here (https://github.com/fluent/nginx-fluentd-module) and not sure if it still works today.

Kiyoto

pkae...@launchdarkly.com

unread,

May 3, 2016, 7:09:33 PM5/3/16

to Fluentd Google Group

Hi Kiyoto

Thanks for your help on this!

I think I'm going to try going with something like option 4 that you mentioned. I'm using HAProxy, but it seems there is a way to trick HAProxy into logging most fields as json [1]. However, my captured requent headers would come in as a single string field (eg: "request_headers": "{Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36||||||}")

Is there a way to parse individual fields in fluentd, after they are ingested into the pipeline? I looked briefly through http://www.fluentd.org/plugins, but I'm not really sure what such a thing would be called. I imagine that splitting that field into each header field would be much easier than using that complext regex on the whole line. I do want to be sure the field is split before it goes into elasticsearch, though.

Thanks,

Patrick

[1]: https://jablonskis.org/2014/haproxy-logging-to-syslog-in-json/index.html

Kiyoto Tamura

unread,

May 3, 2016, 10:40:01 PM5/3/16

to flu...@googlegroups.com

Patrick,

Yes, you can use tagomoris's fluent-plugin-filter. iirc, there's both output plugin and filter plugin versions, but the filter one is more performant (unless you need to change the tag of the message).

https://github.com/tagomoris/fluent-plugin-parser

You would use is something like

@type parser

key_name request_headers

format /^{...}$/ # your regex to parse just the request_headers field here

reserve_data yes # this keeps fields other than the one specified in key_name

</filter>

Kiyoto

pkae...@launchdarkly.com

unread,

May 4, 2016, 1:17:25 PM5/4/16

to Fluentd Google Group

Thanks, that looks promising.

My next problem seems to be that HAProxy prepends the log string with some text, before the log_format parameter takes over. I posted a question about this in the HAProxy forum, but I wonder if there is something I can do on the fluentd side? Like a simple regex that drops the heading, and then uses the rest as json? Or some other idea?

Thanks!

pkae...@launchdarkly.com

unread,

May 4, 2016, 1:45:36 PM5/4/16

to Fluentd Google Group

Reading more about syslog, it seems that the HEADER is the part that is getting in the way. The MESSAGE part is proper JSON.

This is my fluentd input config:

<source>
  @type syslog
  port 1514
  tag haproxyjson

  format json 
  time_key timestamp
</source>

I see this in the fluentd logs:

2016-05-04 17:29:51 +0000 [warn]: pattern not match: "May  4 17:29:51 haproxy[13]: {\"message\":\"192.168.59.3:57366 [04/May/2016:17:29:51.437] app1 app1/app1_1 0/0/1/12/13 200 7872 - - ---- 1/1/0/1/0 0/0 {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36||||||} \"GET / HTTP/1.1\"\",\"timestamp\":1462382991,\"frontend_name\":\"app1\",\"backend_name\":\"app1\",\"server\":\"app1_1\",\"time_request\":0,\"time_queue\":0,\"time_backend_connect\":1,\"time_backend_response\":12,\"time_duration\":13,\"http_status_code\":200}"

Can I specify that the header should be discarded, and just use the message as the JSON?

pkae...@launchdarkly.com

unread,

May 4, 2016, 2:44:50 PM5/4/16

to Fluentd Google Group

I think I am getting somewhere now... of course as soon as I sent that last question, I realized that I can use the parser plugin to do more than just parse subfields with a regex, it can also take a subfield and use it as the JSON body. I haven't tried this under load yet, but this seems to be doing the right thing, at least:

<source>
  @type syslog
  port 1514
  tag haproxyjson

  format syslog 
</source>

<filter haproxyjson.**>
  @type parser
  format json
  time_key timestamp
  key_name message
</filter>

I wonder if I can save some bandwidth by not sending the field names on every message in the json log message. How does the CSV/TSV format parsing compare to JSON, from a performance perspective?

...

Mr. Fiber

unread,

May 4, 2016, 5:22:57 PM5/4/16

to Fluentd Google Group

> I haven't tried this under load yet, but this seems to be doing the right thing

It seems work.

If you want better perfomance, writing a parser and use it in syslog input is an alternative approach.

Parser is also pluggable.

> How does the CSV/TSV format parsing compare to JSON, from a performance perspective?

Maybe, TSV is better because it calls only `split` to parse fields but

need benchmark with your input.

--

pkae...@launchdarkly.com

unread,

May 6, 2016, 12:08:49 AM5/6/16

to Fluentd Google Group

Okay, I went the TSV route, and then used the parser plugin to parse one of the fields. It seems to work well so far, at least in the low-volume staging environment. However, I see a lot of these log messages:

2016-05-06 03:59:22 +0000 [warn]: req_hdrs does not exist

req_hdrs is the combined field that I need to parse with the smaller regex.

Here is the relevant snippet from my config:


<source>
  @type syslog

  port 2514
  protocol_type udp
  tag haproxy

  format tsv
  time_format %s
  time_key timestamp
  keys syslog_header,timestamp,frontend_name,backend_name,server,time_request,time_queue,time_backend_connect,time_backend_response,time_duration,http_status_code,bytes_read,termination_state,act_conn,fe_conn,be_conn,srv_conn,retries,srv_queue,backend_queue,req_hdrs
  types timestamp:time,frontend_name:string,backend_name:string,server:string,time_request:integer,time_queue:integer,time_backend_connect:integer,time_backend_response:integer,time_duration:integer,http_status_code:integer,bytes_read:integer,termination_state:string,act_conn:integer,fe_conn:integer,be_conn:integer,srv_conn:integer,retries:integer,srv_queue:integer,backend_queue:integer
</source>

<filter haproxy.**>
  @type parser
  format /{(?<user_agent>[^|]*+)\|(?<request_id>[^|]*+)\|(?<account_id>[0-9a-fA-f]*?)\|(?<user>[^|]*?)\|(?<origin>[^|]*?)\|(?<auth_kind>[^|]*?)\|(?<environment_id>[0-9a-fA-f]*+)}"(?<http_verb>\w+) (?:(?<http_proto>\w+)://)?(?<http_host>[^/]+)?(?<uri_path>\S+)(?: HTTP/(?<http_version>[^"]*))?"/
  key_name req_hdrs
  reserve_data yes
</filter>

Is it possible to get the parser plugin to ignore the records that are missing the key field? Or something like that? I'm actually pretty surprised that there so many messages that match 'haproxy.**' that are missing that field. So maybe there is something else going on here?

Any ideas?

Thanks!

Kiyoto Tamura

unread,

May 23, 2016, 6:29:56 PM5/23/16

to flu...@googlegroups.com

Patrick,

Have you been able to resolve this issue? I belie what you are looking for is the @ERROR label: http://docs.fluentd.org/articles/config-file#error-label

Reply all

Reply to author

Forward