Parse multiline with docker metadata

1,178 views
Skip to first unread message

thomas

unread,
Jul 26, 2016, 10:22:20 AM7/26/16
to Fluentd Google Group
Hi, 

I'm trying to parse some logs from kubernetes, with docker and kubernetes metadata, inmultiline mode (works on type JSON,, but I have problem with loglines like python traceback)

I am able to parse messages logs, but when I do this, my log messagetake too much things, and take docker metadata with its...


Log line are like this :
2016-07-26 15:58:08,985 - control.hello[INFO] sync with cel...@beat.celery1


Fields are parsed, but I have in log the following : 

time : 2016-07-26 15:58:08,985
pre_log control.hello
loglevel : INFO
log : sync with cel...@beat.intcelery1\n","stream":"stdout","time":"2016-07-26T13:58:08.986683043Z"}

As you can see, I have stream and time metadata which are in log message instead of be used for create the 2 fields.


Below my fluentd configuration :

<source>
  @type tail
  @log_level debug
  path /var/log/containers/celery*.log
  pos_file /tmp/es-containers_celery.log.pos
  time_format %Y-%m-%d %H:%M:%S,%N
  tag kubernetes.*
  format multiline
  multiline_flush_interval 5s
  format_firstline /\[*\d{4}-\d{1,2}-\d{1,2}/
  format1 /(?<time>\[*\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})(?::| -) (?:(?<loglevel>[A-Z]+)\/(?<process>[A-Za-z0-9-]+)\] |(?<pre_log>[a-z._]*)\[(?<log_level>[A-Z]+)\] )(?<log>(?:.|\s)*)/
  read_from_head true
  keep_time_key true
</source>


filter kubernetes.**>
  @type kubernetes_metadata
  preserve_json_log true
</filter>


<match **>
  @type elasticsearch 
  @log_level debug
...

When I use fluentdparser or regex101 I can parse line correctly, but with metadata at the end of the line, no way to succeed.

I don't know if it is my conf or my regex which is wrong, maybe both :)

I tried to change end of regex from (?<log>(?:.|\s)*) to (?<log>.*) ; or (?<log>(?:.|\s)*)\\n\ to match newline (works great when I use regex101) , but with no success when using fluentd...


Mr. Fiber

unread,
Jul 26, 2016, 11:05:52 AM7/26/16
to Fluentd Google Group
You said your log "2016-07-26 15:58:08,985 - control.hello[INFO] sync with cel...@beat.celery1"
but you also said parsed result is "time : 2016-07-26 15:58:08,985
pre_log control.hello
loglevel : INFO
log : sync with cel...@beat.intcelery1\n","stream":"stdout","time":"2016-07-26T13:58:08.986683043Z"}'

Where does '\n","stream":"stdout","time":"2016-07-26T13:58:08.986683043Z"}' part come from?


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

thomas

unread,
Jul 27, 2016, 3:37:03 AM7/27/16
to Fluentd Google Group
It is added by filter kubernetes I guess, it is docker/kubernetes metadata

Mr. Fiber

unread,
Jul 27, 2016, 7:23:58 AM7/27/16
to Fluentd Google Group
Hmm... that's weird.
kubernetes metedata filter is applied after in_tail.
Why does it affect in_tail result?


thomas

unread,
Jul 27, 2016, 8:43:11 AM7/27/16
to Fluentd Google Group
I don't know ... 
In cas it can help, I'm using this project : https://github.com/fabric8io/docker-fluentd-kubernetes , and just change <source> for bveing able to parse logs I got

thomas

unread,
Jul 27, 2016, 8:58:48 AM7/27/16
to Fluentd Google Group
And fluentd logs are this kind :

2016-07-27 12:56:56 +0000 [warn]: [Fluent::TailInput] pattern not match: "{\"log\":\"/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value '16'. (this warning may be suppressed after 10 occurrences)\\n\",\"stream\":\"stderr\",\"time\":\"2016-07-27T12:56:55.615370913Z\"}"

thomas

unread,
Jul 27, 2016, 9:43:39 AM7/27/16
to Fluentd Google Group
I know why the problem occurs ; docker logs are automatically in JSON format like this

{"log":"159.180.255.62 - - [27/Jul/2016:15:22:19 +0200] \"GET /marketplace_management HTTP/1.1\" 200 15703 \"https://int-tools.neteven.com/account_management\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36\"\n","stream":"stdout","time":"2016-07-27T13:22:19.445258442Z"}

And i'm so trying to parse this line with my in_tail multifile, which can't work because not compatible with json entries.

Now I have to fnid how to deal with this ...

Dani C

unread,
Oct 26, 2016, 8:39:29 AM10/26/16
to Fluentd Google Group

Any chance you can share how you fix this?

thomas

unread,
Oct 26, 2016, 9:00:57 AM10/26/16
to Fluentd Google Group
Hi dani,

First, I use concat plugin to concatenate my logs entries  (fluent-plugin-concat)

<filter kubernetes.specific.celery.**>
  @type concat
  @log_level error
  key log
  # date format
  multiline_start_regexp /\[*\d{4}-\d{1,2}-\d{1,2}/
</filter>


and then, I use parser with it (fluent-plugin-parser)

<filter kubernetes.specific.celery.**>
  @type parser
  key_name log
  reserve_data yes
  format /\[*\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3}(?::| -) (?:(?<app_loglevel>[A-Z]+)\/(?<worker>[A-Za-z0-9.-]+)\] |(?<pre_log>[a-z-._]*)\[(?<app_loglevel>[A-Z]+)\] )(?<message>.*)/
</filter>

And now I can parse multiline format from docekr / kubernetes

Hope it helps
Thomas
Reply all
Reply to author
Forward
0 new messages