keep the "pattern not matched" records

Alexandre Thomas

unread,

Mar 9, 2015, 6:15:20 AM3/9/15

to flu...@googlegroups.com, alexandr...@pi.esisar.grenoble-inp.fr

Hi everyone,

I'm trying to do something with fluentd :

I want to read csv files in a specific format.

"2015/03/06","10:15:06","005067A3D837","65532","1","1","2","1"

My first problem :

I use tail to read them as follow :

<source>
type tail
read_from_head true
path /home/toto/fluentd/csvin/*/*/*.csv
tag csvin.*
pos_file /home/toto/fluentd/csvin/position
format /^(?<time>"20\d{2}/(1[0-2]|0\d)/([0-2]\d|3[01])","([01]\d|2[0-3]):[0-5]\d:[0-5]\d"),"(?<mac>[0-9A-F]{12})","(?<kwh>\d+)","(?<site>\d+)","(?<terminal>\d+)","(?<zone>\d+)","(?<ballast>\d+)"$/
</source>

<match csvin.**>
type file
path /home/toto/fluentd/fileout/fluentd_output
format json
include_time_key true
</match>

It work very well (I actually send the rows to a mysql database etc ...) but the thing is I have to keep the not matched rows.

for example if I get the following record

"2015/03/06","10:15:06","005067A3D837","65532","1","1","2","BANANA"

I want to send this not matched row to a different database.

For example look at this :

<source>
type tail
read_from_head true
path /home/toto/fluentd/csvin/*/*/*.csv
tag raw.csvin.*
pos_file /home/toto/fluentd/csvin/position
format none
</source>

<match raw.csvin.**>
type parser
remove_prefix raw
key_name message
format /^(?<time>"20\d{2}/(1[0-2]|0\d)/([0-2]\d|3[01])","([01]\d|2[0-3]):[0-5]\d:[0-5]\d"),"(?<mac>[0-9A-F]{12})","(?<kwh>\d+)","(?<site>\d+)","(?<terminal>\d+)","(?<zone>\d+)","(?<ballast>\d+)"$/
</match>

<match csvin.**>
type file
path /home/toto/fluentd/fileout/fluentd_output
format json
include_time_key true
</match>

<match failed.raw.csvin.**>
type file
path /home/toto/fluentd/fileout/different_output
format json
include_time_key true
</match>

Do you know any way or any plugin who can do that ?

I have tried parser, grep, etc ...

I would need something like this :

<match raw.csvin.**>
type ?????
remove_prefix raw
format /^(?<time>"20\d{2}/(1[0-2]|0\d)/([0-2]\d|3[01])","([01]\d|2[0-3]):[0-5]\d:[0-5]\d"),"(?<mac>[0-9A-F]{12})","(?<kwh>\d+)","(?<site>\d+)","(?<terminal>\d+)","(?<zone>\d+)","(?<ballast>\d+)"$/
reemit_not_matched true
reemit_tag_prefixe failed
</match>

I have found "assert" but it no longer work.

Thanks for your time and sry for my english

NB : I know I can do it by capturing my logs with fluent.** and grep them but i want to find a proper way.

2015-03-09 10:42:56 +0100 [warn]: plugin/out_parser.rb:82:block (2 levels) in emit: pattern not match with data '"2015/03/06","04:45:06","005067A3D837","cacao","1","1","2","1"'

Mr. Fiber

unread,

Mar 10, 2015, 7:59:39 AM3/10/15

to flu...@googlegroups.com

Hi,

> NB : I know I can do it by capturing my logs with fluent.** and grep them but i want to find a proper way.

Yes, some users use this way or use fluent-plugin-multi-format-parser.

https://github.com/repeatedly/fluent-plugin-multi-format-parser

multi-format-parser can't separate tags so rewrite-tag-filter or similar plugin is needed.

Hmm... reemit approach resolves this problem but I can't judge this case is popular or not.

In reemit case, emit record becomes none format.

This case is covered by multi-format-parser and rewrite-tag-filter combination.

If you want to extend in_tail plugin, overwrite convert_line_to_event method.

https://github.com/fluent/fluentd/blob/f2a0afa28a98bbd9c0b77a80a580923c2c2b8887/lib/fluent/plugin/in_tail.rb#L233

Replace log.warn with emit code resolves this problem.

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexandre Thomas

unread,

Mar 11, 2015, 5:16:04 AM3/11/15

to flu...@googlegroups.com

Thanks a lot Masahiro, I managed to do it with the fluent-plugin-multi-format-parser.

Ji Tran

unread,

Jan 9, 2017, 6:18:00 PM1/9/17

to Fluentd Google Group

Hi Alexandre and Masahiro,

Does the multi-format-parser plugin work with the multiline format? For example I have a config like so:

<source>
@type tail
format multi_format
<pattern>
format multiline
format_firstline /^(\d+-\d+-\d+\s)/
format1 /(?<time>\d+-\d+-\d+\s+\d+:\d+:\d+,\d+:\s+)(?<event>.*)/
</pattern>
<pattern>
format none
</pattern>
...
</source>

I want to match a multiline log event and if the match fails, fall back to the none format. However the behaviour with the current 0.0.2 version is that it only captures the first line of a multiline log event, i.e.

2016-12-15 16:00:52,395: Starting service tomcat
Starting Tomcat (tomcat)

is captured into two json records, instead of one.

The multiline regular expressions work as expected when I don't use the multi-format-parser.

Thanks

Mr. Fiber

unread,

Jan 9, 2017, 6:27:52 PM1/9/17

to Fluentd Google Group

multi_format_parser doesn't work with multiline because multiline capture is handled by tail plugin.

I will add note for this limitation.

--

You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.

Ji Tran

unread,

Jan 9, 2017, 7:42:21 PM1/9/17

to Fluentd Google Group

Thanks for the clarification Masahiro

To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.

Reply all

Reply to author

Forward