Fluentd regular expression meaning? | How to skip a format1 or format2 in multiline?

35 views
Skip to first unread message

Shubhra Garg

unread,
Apr 1, 2018, 5:42:17 PM4/1/18
to flu...@googlegroups.com
Hi Team,

1. Can anybody share the regular expression meaning for fluentd parsing ?

I do not understand the meaning of the below regular experssion marked in yellow:

format1 /Started (?<method>[^ ]+) "(?<path>[^"]+)"
The quotes present inside [ ] brackets. What does it mean ?

2. Does multiline works with TCP ? As per the below link, in TCP , it has been specified that multiline works with TCP but as per mnay articles, multiline parser doesn't work with TCP. Please clarify!


2. If I have ir-regular set of logs that I need to parse, then how to escape the logs.
The first set of log has ( timestamp, code-type) while second set of log has ( timestamp, code-type, username , IP-adress, identifier) etc. The third set of log will again has different field.

I wrote multiline parser for these set of logs, but I am stuck at - how to skip the format1 if not present in the set of logs, how to skip format2 if not present in the set of logs.
For example , the first set of logs doesn't have username and IP address field, but if I write a format for it, the parser will throw error, as below:

Could you please help!

For example  : The logs look like below:

2018-03-25 06:25:26.880542
        Code-Timestamp = Sun Mar 25 06:25:26 2018
        Code-Type = Access-Reject

2018-03-25 06:25:41.452657
        Request-Timestamp = Sun Mar 25 06:25:41 2018
        Code -Type = Access-Request
        User-Name = "testuser"
        IP-Address = X.X.X.X
        Identifier = "google.com"


vim /etc/td-agent/td-agent.conf

<source>
  @type tail
  @log_level debug
  read_from_head true
  path /var/log/detail7.log
  tag multi.test
  key_name sales
  format multiline
  format_firstline /^(?<timestamp>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2})/
  format1 /^\s+Code-Timestamp\s+=\s+(?<Request-Time>(\w+)\s+(\w+)\s+(\d{1,2})\s+(\d{1,2}):(\d{1,2}):(\d{1,2})\s+(\d{4}))\n/
  format2 /^\tCode-Type\s+=\s+(?<codetype>([^ ]+))\n/
  format3 /^\tUser-Name\s+=\s+"(?<user1>([^ ]+))"\n/
</source>

<match multi.test>
  @type stdout
</match>


Error :

2018-04-01 21:29:52 +0000 [info]: plugin/in_tail.rb:578:initialize: following tail of /var/log/detail7.log
2018-04-01 21:29:52 +0000 [warn]: plugin/in_tail.rb:336:block in convert_line_to_event: pattern not match: "2018-03-25 06:25:26.880542\n\tCode-Timestamp = Sun Mar 25 06:25:26 2018\n\tCode-Type = Access-Reject\n"
2018-04-01 21:29:52 +0000 multi.test: {"Request-Time":"Sun Mar 25 06:25:26 2018","codetype":"Code-Request","user1":"testuser"}
2018-04-01 21:29:52 +0000 multi.test: {"Request-Time":"Sun Mar 25 06:25:41 2018","codetype":"Access-Request","user1":"testuser"}



--
Shubhra Garg :-)

Mr. Fiber

unread,
Apr 2, 2018, 1:24:19 AM4/2/18
to Fluentd Google Group
> 1.
>  The quotes present inside [ ] brackets. What does it mean ?


> 2.
Does multiline works with TCP ? 

It works with only one line unlike in_tail.
It means in_tcp with multiline is for matching `\n` included logs.
in_tcp uses `\n` for default separater so you need to change delimiter for it.

> 3.
> how to skip the format1 if not present in the set of logs, how to skip format2 if not present in the set of logs

Maybe, you need to write complex regular expression for it.
If you want to avoit it, writing own multiline parser or chaning output format of your log.


Masahiro


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages