Hi,
Scenario
I have a Kafka which has a topic where the Apache logs are sent in the original format, not transformed.
This logs in one side are read and stored in HDFS to save all the apache logs for a while in their original format.
On the other side I wanted to transform them to JSON and upload them to Solr to be indexed and queried in there.
Problem
The problem is that Kafka plugins seems to only admit the following formats:
format <input text type (text|json|ltsv|msgpack)> :default => json
I wanted to have a format "apache" as in_tail has.
But this is not an option.
I read then:
To address such cases, for v0.10.46 and above, Fluentd has a pluggable system that enables the user to create their own parser formats.
How To Use
- Write a custom format plugin. See here for more information.
- From any input plugin that supports the “format” field, call the custom plugin by its name. Here is an example with in_tail.
I added the parser of Apache to /etc/td-agent/plugin/:
But it is not reading it.
By looking to the source code (fluent-plugin-kafka-0.3.1/lib/fluent/plugin/in_kafka.rb) it does not seem to read other formats as it should have happened from what was pointed by the "fluentd" documentation.
Questions
- I am doing anything wrong or missing something and what I want to do can be done easily?
- At a personal level I don't understand the fluentd design. Why each plugin (kafka,in_tail, etc) would need to create a parser for the same type of data? Would not have had more sense to create a type "apache" and anyone being able to read it?
Thanks a lot!
Carlos