keep the "pattern not matched" records

2,806 views
Skip to first unread message

Alexandre Thomas

unread,
Mar 9, 2015, 6:15:20 AM3/9/15
to flu...@googlegroups.com, alexandr...@pi.esisar.grenoble-inp.fr
Hi everyone,

I'm trying to do something with fluentd :

I want to read csv files in a specific format.

"2015/03/06","10:15:06","005067A3D837","65532","1","1","2","1"

My first problem :

I use tail to read them as follow :


<source>
  type tail
  read_from_head true
  path /home/toto/fluentd/csvin/*/*/*.csv
  tag  csvin.*
  pos_file /home/toto/fluentd/csvin/position
  format /^(?<time>"20\d{2}/(1[0-2]|0\d)/([0-2]\d|3[01])","([01]\d|2[0-3]):[0-5]\d:[0-5]\d"),"(?<mac>[0-9A-F]{12})","(?<kwh>\d+)","(?<site>\d+)","(?<terminal>\d+)","(?<zone>\d+)","(?<ballast>\d+)"$/
</source>
 

<match csvin.**>
  type file
  path /home/toto/fluentd/fileout/fluentd_output
  format json
  include_time_key true
</match>




It work very well (I actually send the rows to a mysql database etc ...) but the thing is I have to keep the not matched rows.

for example if I get the following record

"2015/03/06","10:15:06","005067A3D837","65532","1","1","2","BANANA"

I want to send this not matched row to a different database.

For example look at this :


<source>
  type tail
  read_from_head true
  path /home/toto/fluentd/csvin/*/*/*.csv
  tag raw.csvin.*
  pos_file /home/toto/fluentd/csvin/position
  format none
</source>


<match raw.csvin.**>
  type parser
  remove_prefix raw
  key_name message
  format /^(?<time>"20\d{2}/(1[0-2]|0\d)/([0-2]\d|3[01])","([01]\d|2[0-3]):[0-5]\d:[0-5]\d"),"(?<mac>[0-9A-F]{12})","(?<kwh>\d+)","(?<site>\d+)","(?<terminal>\d+)","(?<zone>\d+)","(?<ballast>\d+)"$/
</match>



<match csvin.**>
  type file
  path /home/toto/fluentd/fileout/fluentd_output
  format json
  include_time_key true
</match>


<match failed.raw.csvin.**>
  type file
  path /home/toto/fluentd/fileout/different_output
  format json
  include_time_key true
</match>




Do you know any way or any plugin who can do that ?

I have  tried parser, grep, etc ...

I would need something like this :

<match raw.csvin.**>
  type ?????
  remove_prefix raw
  format /^(?<time>"20\d{2}/(1[0-2]|0\d)/([0-2]\d|3[01])","([01]\d|2[0-3]):[0-5]\d:[0-5]\d"),"(?<mac>[0-9A-F]{12})","(?<kwh>\d+)","(?<site>\d+)","(?<terminal>\d+)","(?<zone>\d+)","(?<ballast>\d+)"$/
  reemit_not_matched true
  reemit_tag_prefixe failed
</match>

I have found "assert" but it no longer work.

Thanks for your time and sry for my english




NB : I know I can do it by capturing my logs with fluent.** and grep them but i want to find a proper way.

2015-03-09 10:42:56 +0100 [warn]: plugin/out_parser.rb:82:block (2 levels) in emit: pattern not match with data '"2015/03/06","04:45:06","005067A3D837","cacao","1","1","2","1"'



Mr. Fiber

unread,
Mar 10, 2015, 7:59:39 AM3/10/15
to flu...@googlegroups.com
Hi,

NB : I know I can do it by capturing my logs with fluent.** and grep them but i want to find a proper way.

Yes, some users use this way or use fluent-plugin-multi-format-parser.


multi-format-parser can't separate tags so rewrite-tag-filter or similar plugin is needed.

Hmm... reemit approach resolves this problem but I can't judge this case is popular or not.
In reemit case, emit record becomes none format.
This case is covered by multi-format-parser and rewrite-tag-filter combination.

If you want to extend in_tail plugin, overwrite convert_line_to_event method.


Replace log.warn with emit code resolves this problem.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexandre Thomas

unread,
Mar 11, 2015, 5:16:04 AM3/11/15
to flu...@googlegroups.com



Thanks a lot Masahiro, I managed to do it with the fluent-plugin-multi-format-parser.

Ji Tran

unread,
Jan 9, 2017, 6:18:00 PM1/9/17
to Fluentd Google Group
Hi Alexandre and Masahiro,

Does the multi-format-parser plugin work with the multiline format? For example I have a config like so:

<source>
  @type tail
  format multi_format
  <pattern>
    format multiline
    format_firstline /^(\d+-\d+-\d+\s)/
    format1 /(?<time>\d+-\d+-\d+\s+\d+:\d+:\d+,\d+:\s+)(?<event>.*)/
  </pattern>
  <pattern>
    format none
  </pattern>
  ...
</source>

I want to match a multiline log event and if the match fails, fall back to the none format. However the behaviour with the current 0.0.2 version is that it only captures the first line of a multiline log event, i.e.

2016-12-15 16:00:52,395: Starting service tomcat
Starting Tomcat (tomcat)

is captured into two json records, instead of one.

The multiline regular expressions work as expected when I don't use the multi-format-parser.

Thanks

Mr. Fiber

unread,
Jan 9, 2017, 6:27:52 PM1/9/17
to Fluentd Google Group
multi_format_parser doesn't work with multiline because multiline capture is handled by tail plugin.
I will add note for this limitation.

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.

Ji Tran

unread,
Jan 9, 2017, 7:42:21 PM1/9/17
to Fluentd Google Group
Thanks for the clarification Masahiro
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages