Help parsing systemd logs

3,070 views
Skip to first unread message

Amit Saha

unread,
May 13, 2019, 3:30:49 AM5/13/19
to Fluent-Bit
Hi all,

I am trying to setup `fluent-bit` with the systemd input plugin. The part that I am struggling with is as follows. My nginx server is formatted to output JSON formatted logs and hence the journalctl entries (in JSON format) looks as:

```
{ "__CURSOR" : "s=52d49fb5ab724e1595924642a2ac62a9;i=2887;b=4f589b850a00441ba5009f1b9fa0ad52;m=61dfc6808;t=588bf0a8c673c;x=b5b19a9e5291639d", "__REALTIME_TIMESTAMP" : "1557728980657980", "__MONOTONIC_TIMESTAMP" : "26272884744", "_BOOT_ID" : "4f589b850a00441ba5009f1b9fa0ad52", "_MACHINE_ID" : "0accd057bf254250b441ccb48fcdb026", "PRIORITY" : "6", "SYSLOG_FACILITY" : "3", "_UID" : "0", "_GID" : "0", "_SYSTEMD_SLICE" : "system.slice", "_TRANSPORT" : "stdout", "_CAP_EFFECTIVE" : "3fffffffff", "_SELINUX_CONTEXT" : "system_u:system_r:container_runtime_t:s0", "_HOSTNAME" : "ip-192-168-12-243.ap-southeast-2.compute.internal", "_STREAM_ID" : "b2b290ee83574af0a75c1dd450a76420", "SYSLOG_IDENTIFIER" : "docker", "_PID" : "3336", "_COMM" : "docker-current", "_EXE" : "/usr/bin/docker-current", "_CMDLINE" : "/usr/bin/docker-current run --rm amitsaha/nginx", "_SYSTEMD_CGROUP" : "/system.slice/demo-nginx.service", "_SYSTEMD_UNIT" : "demo-nginx.service", "_SYSTEMD_INVOCATION_ID" : "fc1d892db3b44bb39bb8c1fc0aabe3a1", "MESSAGE" : "{\"time_local\":\"13/May/2019:06:29:40 +0000\",\"remote_addr\":\"127.0.0.1\",\"remote_user\":\"\",\"request_method\":\"GET\",\"request\":\"GET / HTTP/1.1\",\"status\": \"200\",\"body_bytes_sent\":\"12\",\"request_time\":\"0.000\",\"http_referrer\":\"\",\"http_user_agent\":\"curl/7.29.0\",\"http_x_forwarded_for\": \"10.1.1.1\"}" }
```

Given the above, `fluent-bit`'s output consists of all the fields above with the MESSAGE field's value being a string.

However, I want to parse the "MESSAGE" field and send only the objects as top-level output from `fluent-bit` discarding everything else. How can I do that? I looked at parsers/filters documentation, but couldn't quite figure out how. That is, the output from `fluent-bit` should be:

```

{
"time_local":"13/May/2019:06:29:40 +0000",
"remote_addr":"127.0.0.1",
"remote_user":"",
"request_method":"GET",
"request":"GET / HTTP/1.1",
"status": "200",
"body_bytes_sent":"12",
"request_time":"0.000",
"http_referrer":"",
"http_user_agent":"curl/7.29.0",
"http_x_forwarded_for": "10.1.1.1"
}
```

Thanks,
Amit.



Eduardo Silva

unread,
May 14, 2019, 2:05:55 AM5/14/19
to Amit Saha, Fluent-Bit
Hi Amit, 

Here is an example, for simplicity I am using tail with the content you provided in a log file, but just replace it with systemd (or apply systemd-json with a FILTER parser)

-- fluent-bit.conf --

[SERVICE]
    Flush        1
    Parsers_File parsers.conf

[INPUT]
    Name   tail 
    Path    test.log
    Parser  systemd-json

[FILTER]
    Name       record_modifier
    Match      *
    Remove_Key __CURSOR
    Remove_Key __REALTIME_TIMESTAMP
    Remove_Key __MONOTONIC_TIMESTAMP 
    Remove_Key _BOOT_ID
    Remove_Key _MACHINE_ID
    Remove_Key __PRIORITY
    Remove_Key MESSAGE
    # add any other Remove_Key that you need

[OUTPUT]
    Name       stdout
    Match      *
    Format     json_lines

--- end of file ---

the parser.conf file:

--- start ---
[PARSER]
    Name         systemd-json
    Format       json
    Decode_Field_As   escaped  MESSAGE   do_next
    Decode_Field         json         MESSAGE

--- end of file --- 

let me know if that works..

regards, 


--
You received this message because you are subscribed to the Google Groups "Fluent-Bit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluent-bit+...@googlegroups.com.
To post to this group, send email to fluen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluent-bit/bba160b2-3bd8-432d-91ac-b2985e2c602b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Eduardo Silva
Principal Engineer  | Arm
. . . . . . . . . . . . . . . . . . . . . . . . . . . 
m. +506 70138007
Arm.com
Treasuredata.com


http://twitter.com/edsiper  http://www.linkedin.com/in/edsiper 


Eduardo Silva

unread,
May 15, 2019, 3:35:04 AM5/15/19
to Amit Saha, Fluent-Bit
that's a better config! 

On Wed, May 15, 2019 at 8:31 AM Amit Saha <amits...@gmail.com> wrote:
Hi Eduardo,

Thanks for the reply. I got it to work with the following (just before your reply):

[SERVICE]
    Flush        1
    Log_Level    info
    Parsers_file parsers.conf

[INPUT]
    Name            systemd
    Systemd_Filter  _SYSTEMD_UNIT="demo-nginx.service"

[OUTPUT]
    Name  es
    Match  *

[FILTER]
    Name parser
    Match *
    Key_Name MESSAGE
    Parser json

[FILTER]
    Name nest
    Match *
    Operation lift
    Nested_under MESSAGE


Your configuration is more human friendly i think. Other than that, what do you think of my approach?

Cheers,
Amit.



Raul Macian

unread,
Jul 25, 2019, 7:26:20 AM7/25/19
to Fluent-Bit
I am the same boat but in my case I am reading kuberenetes logs , so my config is:

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
  fluent-bit.conf: |-
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        Streams_File stream_processor.conf
    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-stdout.conf
  input-kubernetes.conf: |-
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*_gvp_*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
  output-stdout.conf: |-
    [OUTPUT]
        Name   stdout
        Match  *
        Format json_lines
    [OUTPUT]
        Name        kafka
        Match       *
        Brokers     1.2.3.4:24224
        Topics      raw_logs
        Message_Key gvp_ns_key
        Tag_Key     gvp_tag
        Include_Tag_Key On
  parsers.conf: |-
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        Decode_Field   escaped    log



this is the output from the application:
08:22:03 0|alexa  | {"request_id":"983952fc1e","component":"tvopenplatform.alexa.api","timestamp":"2019-07-25T08:22:03.816Z","level":"INFO","message":{"method":"PUT","url":"/v1/check","query_string":null,"response_time":2,"status_code":200,"response_length":16}}

and this the ouput I get with the above config

{"date":1564052210.176713,"log":"10:56:50 0|alexa  | {\"request_id\":\"a100bb14-f0e2-497d-86c4-fa7a48cce773\",\"elapsed_time\":\"1ms\",\"component\":\"tvopenplatform.alexa.api\",\"timestamp\":\"2019-07-25T10:56:50.176Z\",\"level\":\"INFO\",\"message\":{\"method\":\"GET\",\"url\":\"/healthcheck\",\"query_string\":null,\"response_time\":\"1ms\",\"status_code\":200,\"response_length\":106}}\n","stream":"stdout","time":"2019-07-25T10:56:50.176713536Z","kubernetes":{"pod_name":"alexa-7-2xjsp","namespace_name":"gvp","pod_id":"178143e0-8e94-11e9-8cc8-005056824f1c","labels":{"app":"alexa","deployment":"alexa-7","deploymentconfig":"alexa"},"annotations":{"openshift.io/deployment-config.latest-version":"7","openshift.io/deployment-config.name":"alexa","openshift.io/deployment.name":"alexa-7","openshift.io/scc":"anyuid"},"host":"vepboanvllbo002.domain.com","container_name":"alexa","docker_id":"8491b8b8d28acd6a6af086f7ce138f0eb666cd015f111356e9b8ed4f5ba7fc18"}}


I need to extract fields from the 'log' field, but if you look closely is a string with a json after a keystamp and a key between pipes. How do I parse again the record after the docker filter has enriched my record ?

I have regex that I use in tdagent to parse it on my needs outside de kubernetes plattform:

format /^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri-stem>.*?)","query_string":(?<uri-query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$/
To unsubscribe from this group and stop receiving emails from it, send an email to fluen...@googlegroups.com.

To post to this group, send email to fluen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluent-bit/bba160b2-3bd8-432d-91ac-b2985e2c602b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Eduardo Silva
Principal Engineer  | Arm
. . . . . . . . . . . . . . . . . . . . . . . . . . . 
m. +506 70138007
Arm.com
Treasuredata.com


http://twitter.com/edsiper  http://www.linkedin.com/in/edsiper 


Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages