Parse multiline with docker metadata

thomas

unread,

Jul 26, 2016, 10:22:20 AM7/26/16

to Fluentd Google Group

Hi,

I'm trying to parse some logs from kubernetes, with docker and kubernetes metadata, inmultiline mode (works on type JSON,, but I have problem with loglines like python traceback)

I am able to parse messages logs, but when I do this, my log messagetake too much things, and take docker metadata with its...

Log line are like this :

2016-07-26 15:58:08,985 - control.hello[INFO] sync with cel...@beat.celery1

Fields are parsed, but I have in log the following :

time : 2016-07-26 15:58:08,985

pre_log control.hello

loglevel : INFO

log : sync with cel...@beat.intcelery1\n","stream":"stdout","time":"2016-07-26T13:58:08.986683043Z"}

As you can see, I have stream and time metadata which are in log message instead of be used for create the 2 fields.

Below my fluentd configuration :

@type tail

@log_level debug

path /var/log/containers/celery*.log

pos_file /tmp/es-containers_celery.log.pos

time_format %Y-%m-%d %H:%M:%S,%N

tag kubernetes.*

format multiline

multiline_flush_interval 5s

format_firstline /\[*\d{4}-\d{1,2}-\d{1,2}/

format1 /(?<time>\[*\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})(?::| -) (?:(?<loglevel>[A-Z]+)\/(?<process>[A-Za-z0-9-]+)\] |(?<pre_log>[a-z._]*)\[(?<log_level>[A-Z]+)\] )(?<log>(?:.|\s)*)/

read_from_head true

keep_time_key true

</source>

filter kubernetes.**>

@type kubernetes_metadata

preserve_json_log true

</filter>

@type elasticsearch

@log_level debug

...

When I use fluentdparser or regex101 I can parse line correctly, but with metadata at the end of the line, no way to succeed.

I don't know if it is my conf or my regex which is wrong, maybe both :)

I tried to change end of regex from (?<log>(?:.|\s)*) to (?<log>.*) ; or (?<log>(?:.|\s)*)\\n\ to match newline (works great when I use regex101) , but with no success when using fluentd...

Mr. Fiber

unread,

Jul 26, 2016, 11:05:52 AM7/26/16

to Fluentd Google Group

You said your log "2016-07-26 15:58:08,985 - control.hello[INFO] sync with cel...@beat.celery1"

but you also said parsed result is "time : 2016-07-26 15:58:08,985

pre_log control.hello

loglevel : INFO

log : sync with cel...@beat.intcelery1\n","stream":"stdout","time":"2016-07-26T13:58:08.986683043Z"}'

Where does '\n","stream":"stdout","time":"2016-07-26T13:58:08.986683043Z"}' part come from?

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

thomas

unread,

Jul 27, 2016, 3:37:03 AM7/27/16

to Fluentd Google Group

It is added by filter kubernetes I guess, it is docker/kubernetes metadata

Mr. Fiber

unread,

Jul 27, 2016, 7:23:58 AM7/27/16

to Fluentd Google Group

Hmm... that's weird.

kubernetes metedata filter is applied after in_tail.

Why does it affect in_tail result?

thomas

unread,

Jul 27, 2016, 8:43:11 AM7/27/16

to Fluentd Google Group

I don't know ...

In cas it can help, I'm using this project : https://github.com/fabric8io/docker-fluentd-kubernetes , and just change <source> for bveing able to parse logs I got

thomas

unread,

Jul 27, 2016, 8:58:48 AM7/27/16

to Fluentd Google Group

And fluentd logs are this kind :

2016-07-27 12:56:56 +0000 [warn]: [Fluent::TailInput] pattern not match: "{\"log\":\"/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value '16'. (this warning may be suppressed after 10 occurrences)\\n\",\"stream\":\"stderr\",\"time\":\"2016-07-27T12:56:55.615370913Z\"}"

thomas

unread,

Jul 27, 2016, 9:43:39 AM7/27/16

to Fluentd Google Group

I know why the problem occurs ; docker logs are automatically in JSON format like this

{"log":"159.180.255.62 - - [27/Jul/2016:15:22:19 +0200] \"GET /marketplace_management HTTP/1.1\" 200 15703 \"https://int-tools.neteven.com/account_management\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36\"\n","stream":"stdout","time":"2016-07-27T13:22:19.445258442Z"}

And i'm so trying to parse this line with my in_tail multifile, which can't work because not compatible with json entries.

Now I have to fnid how to deal with this ...

Dani C

unread,

Oct 26, 2016, 8:39:29 AM10/26/16

to Fluentd Google Group

Any chance you can share how you fix this?

thomas

unread,

Oct 26, 2016, 9:00:57 AM10/26/16

to Fluentd Google Group

Hi dani,

First, I use concat plugin to concatenate my logs entries (fluent-plugin-concat)

@type concat

@log_level error

key log

# date format

multiline_start_regexp /\[*\d{4}-\d{1,2}-\d{1,2}/

</filter>

and then, I use parser with it (fluent-plugin-parser)

@type parser

key_name log

reserve_data yes

format /\[*\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3}(?::| -) (?:(?<app_loglevel>[A-Z]+)\/(?<worker>[A-Za-z0-9.-]+)\] |(?<pre_log>[a-z-._]*)\[(?<app_loglevel>[A-Z]+)\] )(?<message>.*)/

</filter>

And now I can parse multiline format from docekr / kubernetes

Hope it helps

Thomas

Reply all

Reply to author

Forward