implementing log event sequence with fluentd versions

567 views
Skip to first unread message

Ye Deng

unread,
Jan 13, 2015, 6:08:25 PM1/13/15
to flu...@googlegroups.com
Hi,

Several days ago, I followed the official guide to install fluentd (actually td-agent) on Ubuntu14.04.
The version number shown is: td-agent 0.10.58.

I plan to implement a sequence number for log events that are generated from the same log file.
For example, if all log events are tailed and generated from "server1.nginx.logfile1", I may want all those log events have these fields below:
  "tag": "server1.nginx.logfile1",
  "sequence": n,
  "original_log_txt": "original txt msg in tailed file",
  "parsed_log_field1": "parsed log field1 like HTTP GET/POST method",
  "parsed_log_field2": "parsed log field2 like browser agent types such as Safari/Chrome",
  "parsed_log_field3": "... ...",
  ... ...
  "timestamp": "epoch time, or logstash format time, or whatever time format"
}
The sequence number n is a strictly increasing number that is used as a logic clock.
With such a precise logic clock, I can always perfectly restore the log event order on log database (on Elasticsearch or MongoDB). This sequence is useful because sometimes log events do not reach log database in order, or some log events happens at the same/close timestamp. Sequence number is also efficient and precise when I build some queries on log database.

To the job with fluentd, I did searches and found some posts that are related to my requirements.

With td-agent 0.10.58, I may need to use this plugin: https://github.com/repeatedly/fluent-plugin-record-modifier
It seems I even need to insert some Ruby code into cfg file to implement such a sequence number?
I think this is the right direction. Any suggestion?

But I also found an official blog saying that "v0.12 is Released".
I checked the blog, that the new "filter" feature changes the way how previous plugins (like fluent-plugin-record-reformer) work.

My questions for the versions are:
v0.12 with new "filter" feature is new (with no detailed documentations and samples). It is sufficient to implement the log event sequence number I want?
v0.12 will replace the 0.10.58? Or, they will be maintained concurrently? Sometimes, I saw "V1 format ( http://docs.fluentd.org/articles/config-file#v1-format )". Are these "V1 format" things related to differences between v0.12 and 0.10.58?

Many questions.
Thanks a lot in advance!


Ye





Ye Deng

unread,
Jan 13, 2015, 6:33:15 PM1/13/15
to flu...@googlegroups.com
On Tuesday, January 13, 2015 at 6:08:25 PM UTC-5, Ye Deng wrote:
Hi,

Several days ago, I followed the official guide to install fluentd (actually td-agent) on Ubuntu14.04.
The version number shown is: td-agent 0.10.58.

I plan to implement a sequence number for log events that are generated from the same log file.
For example, if all log events are tailed and generated from "server1.nginx.logfile1", I may want all those log events have these fields below:
  "tag": "server1.nginx.logfile1",
  "sequence": n,
  "original_log_txt": "original txt msg in tailed file",
  "parsed_log_field1": "parsed log field1 like HTTP GET/POST method",
  "parsed_log_field2": "parsed log field2 like browser agent types such as Safari/Chrome",
  "parsed_log_field3": "... ...",
  ... ...
  "timestamp": "epoch time, or logstash format time, or whatever time format"
}
The sequence number n is a strictly increasing number that is used as a logic clock.
With such a precise logic clock, I can always perfectly restore the log event order on log database (on Elasticsearch or MongoDB). This sequence is useful because sometimes log events do not reach log database in order, or some log events happens at the same/close timestamp. Sequence number is also efficient and precise when I build some queries on log database.

To the job with fluentd, I did searches and found some posts that are related to my requirements.

With td-agent 0.10.58, I may need to use this plugin: https://github.com/repeatedly/fluent-plugin-record-modifier
Here is a typo, I mean I may need to use: https://github.com/sonots/fluent-plugin-record-reformer

Lance N.

unread,
Jan 14, 2015, 8:36:54 PM1/14/15
to flu...@googlegroups.com
Does the sequence number have to be continuous?  A sequence number can just be a timestamp that is altered to "skip ahead" to maintain uniqueness.
 
Suppose the timer is in milliseconds, and the system timer has a resolution of 10ms. The timestamps might be X.000, X.001, X.002, then skip to X.010 because the system timer jumped forward. 

Mr. Fiber

unread,
Jan 15, 2015, 5:06:34 AM1/15/15
to flu...@googlegroups.com
Sorry for the delay. Gmail judged this thread as a spam...

v0.12 with new "filter" feature is new (with no detailed documentations and samples). It is sufficient to implement the log event sequence number I want?

Yes. I think filter mechanizm fits for this case.
I and kiyoto now write v0.12 documents . It will be published.

On the other hand, extending parser is an alternative approach.
Parser is pluggable so implementing xxx_with_seq is more efficient in this case.
Of course, filter approach is more general.

v0.12 will replace the 0.10.58? Or, they will be maintained concurrently?

We maintain two versions but v0.10.58 is maintenance mode.
v0.10.58's main maintainer is seo-san, a.k.a sonots on github.

v0.12 uses v1 format by default and v0.10 also use v1 format by specifying --use-v1-config.
If you use td-agent 2, --use-v1-config is used.


Masahiro


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ye Deng

unread,
Jan 15, 2015, 10:47:56 AM1/15/15
to flu...@googlegroups.com
@ lance 
I really want a continuous sequence number.
With that, I can easily figure out if some events are omitted in the middle (e.g. omitted middle events not valid to query yet on Elasticsearch). It is also helpful when I precisely define some "range/scope of the context" of a matched log event when querying my log database.


@ repeatedly
So v0.12 will become the V1? awesome.

I am a little confused about this: "Parser is pluggable so implementing xxx_with_seq is more efficient in this case.Of course, filter approach is more general."

Since there is no doc telling me how to implement a increasing counter in v0.12 fluentd with filter feature. I guess I may let filter remember a variable (not in some inserted Ruby code?) that will be attached to each log event, and increase the remembered variable each time when the filter sees a new log event?
I didn't understand why this job above is not efficient compared to the "Parser" based solution?


Ye



On Tuesday, January 13, 2015 at 6:08:25 PM UTC-5, Ye Deng wrote:

Mr. Fiber

unread,
Jan 16, 2015, 3:54:30 PM1/16/15
to flu...@googlegroups.com
So v0.12 will become the V1? awesome.


Here is a roadmap. v0.12 is the first step in v1 release.

I guess I may let filter remember a variable (not ...)

Your implementation approach is correct.


This is a prototype document for filter development. We need more improvement.

> I didn't understand why this job above is not efficient compared to the "Parser" based solution?

It means adding sequence in Parser reduces additional iterations.
In filter, filter plugin iterates events to apply own routine.
In parser, can apply own routine in record creation phase, e.g. parsing apache log.
This approach is not re-usable for other input sources. So it's a trade-off.
Almost cases, Filter approach is enough.


Masahiro


sephy shen

unread,
Aug 29, 2017, 9:54:42 PM8/29/17
to Fluentd Google Group
Hi Deng,

I met the same requirement as yours.

How do you implement the requirement of outputing an increment value like line number for each log?

which type plugin do you use? do you embed raw ruby code in filter section?

Thanks a lot.

BR
Reply all
Reply to author
Forward
0 new messages