Parsing apache custom log with fluent-plugin-grok-parse

838 views
Skip to first unread message

bmla...@bandsintown.com

unread,
Feb 24, 2016, 11:11:09 AM2/24/16
to Fluentd Google Group
Hi,

We are using fluent-plugin-grok-parser to parse custom apache access logs with td-agent. Once log is ingested to Elasticsearch and presented in Kibana, parsed line of apache log file appears as one string in the field named message while hostname and log_type are separate fields. What is the correct way to get all fields in Elasticsearch/Kibana as parsed by grok parser or to reformat before ingestion? 

Here is the info about td-agent:

Installed Packages

Name        : td-agent

Arch        : x86_64

Version     : 2.3.0

Release     : 0.el2015


This is current configuration:


<source>

  @type tail

  path /var/log/httpd/apache-access_log/<some_file>.log

  format grok

  grok_pattern %{IPORHOST:host} %{IP:clientip} (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:timestamp}\] \"%{WORD:httpmethod} %{URIPATHPARAM:request} %{WORD:httpprotocol}/%{NUMBER:httpversion}\" %{NUMBER:httpresponsecode:int

eger} (?:%{NUMBER:bytes:integer}|-) %{QS:referrer} %{GREEDYDATA:user_agent} %{INT:response_time:integer}

  tag raw.apachedev.apachedev

</source>


<match raw.**>

  type record_reformer

  tag elasticsearch.${tag_parts[1]}.${tag_parts[2]}

  hostname ${hostname}

  log_type ${tag_parts[1]}

</match>


<match elasticsearch.**>

  type forest

  subtype elasticsearch

  remove_prefix elasticsearch

  <template>

    logstash_format true

    buffer_type file

    buffer_path /opt/td-agent/buffer/elasticsearch.buffer.${tag_parts[0]}.${tag_parts[1]}

    buffer_queue_limit 1024

    flush_interval 10

    retry_limit 17

    retry_wait 1.0

    num_threads 1

    hosts elasticsearch:9200

    logstash_prefix __TAG_PARTS[1]__

    index_name __TAG_PARTS[1]__

    type_name ${tag_parts[0]}

  </template>

</match>

Mr. Fiber

unread,
Feb 24, 2016, 1:09:33 PM2/24/16
to Fluentd Google Group
  type record_reformer

At first, you can use record_transformer filter instead of record_reformer.
Using record_transformer removes tag rewriting in the data pipeline.

parsed line of apache log file appears as one string in the field named message while hostname and log_type are separate fields.

Could you show me the result of following filter plugin?

<filter elasticsearch.**>
  @type stdout
</filter>


Masahiro


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bratislav Mladjic

unread,
Feb 24, 2016, 2:02:37 PM2/24/16
to flu...@googlegroups.com
Thank you Masahiro for prompt reply. This is how config looks like now:

<source>

  type monitor_agent

  bind 0.0.0.0

  port 24220

</source>


#Apache access log parsing with grok

<source>

  @type tail

  path /var/log/httpd/apache-access_log/<file>.log

  format grok

  grok_pattern %{IPORHOST:clientip} %{IP:clientip} (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{GREEDYDATA:user_agent} %{INT:response_time}

  tag raw.apachedev.apachedev

</source>


#<match raw.**>

#  type record_reformer

#  tag elasticsearch.${tag_parts[1]}.${tag_parts[2]}

#  hostname ${hostname}

#  log_type ${tag_parts[1]}

#</match>


<filter raw.**>

   @type record_transformer

   <record>

    hostname ${hostname}

    log_type ${tag_parts[1]}

   </record>

</filter>


<filter elasticsearch.**>

  @type stdout

</filter>


<match elasticsearch.**>

  type forest

  subtype elasticsearch

  remove_prefix elasticsearch

  <template>

    logstash_format true

    buffer_type file

    buffer_path /opt/td-agent/buffer/elasticsearch.buffer.${tag_parts[0]}.${tag_parts[1]}

    buffer_queue_limit 1024

    flush_interval 10

    retry_limit 17

    retry_wait 1.0

    num_threads 1

    hosts elasticsearch:9200

    logstash_prefix __TAG_PARTS[1]__

    index_name __TAG_PARTS[1]__

    type_name ${tag_parts[0]}

  </template>

</match>


In the logfile I am getting lines from apache log file that should be parsed, here is the example of one:

x.x.x.com - - - [24/Feb/2016:13:53:34 -0500] "GET /static/css/adsportal/images/ui-bg_inset-hard_100_fcfdfd_1x100.png HTTP/1.1" 304 - "http://x.x.x.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36" 249


Hm, it seems grok pattern is not parsing well.


Bratislav Mladjic
DevOps
BANDSINTOWN GROUP


--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/Z1LO6Pa_LPo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

Mr. Fiber

unread,
Feb 24, 2016, 2:09:25 PM2/24/16
to Fluentd Google Group
Which is the result of stdout filter?

Bratislav Mladjic

unread,
Feb 24, 2016, 2:47:37 PM2/24/16
to flu...@googlegroups.com

Sorry, here it is:

016-02-24 14:44:38 -0500 elasticsearch.apachedev.apachedev: {"message":"x.x.x.com - - - [24/Feb/2016:14:44:38 -0500] \"GET /static/images/favicon.ico?20160219 HTTP/1.1\" 200 1406 \"http://x.x.x.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36\" 296","hostname":"x.x.x.x.x","log_type":"apachedev"}


Bratislav Mladjic
DevOps
BANDSINTOWN GROUP


Mr. Fiber

unread,
Feb 24, 2016, 5:24:34 PM2/24/16
to Fluentd Google Group
https://github.com/kiyoto/fluent-plugin-grok-parser/blob/fad32090d9bc5fd075dcd14e95048680f49c0cae/lib/fluent/plugin/parser_grok.rb#L63

From the code, grok parser faillback to none parser if patten doesn't match.
So your grok patten doesn't match your logs.

I checked your patten using fluentd-ui and it also says your pattern is wrong.


No highlight. If pattern is correct, the result is below:


bmla...@bandsintown.com

unread,
Feb 25, 2016, 1:03:36 PM2/25/16
to Fluentd Google Group
Thanks a lot, I'll fix parsing and check then. I did not expect to be able to import log to Elasticsearch if parser fails to match.

Mr. Fiber

unread,
Feb 26, 2016, 4:02:44 PM2/26/16
to Fluentd Google Group
If you have an idea to improve plugin, please open an issue on plugin repository.

On Fri, Feb 26, 2016 at 3:03 AM, <bmla...@bandsintown.com> wrote:
Thanks a lot, I'll fix parsing and check then. I did not expect to be able to import log to Elasticsearch if parser fails to match.

--
Reply all
Reply to author
Forward
0 new messages