implementing log event sequence with fluentd versions

Ye Deng

unread,

Jan 13, 2015, 6:08:25 PM1/13/15

to flu...@googlegroups.com

Hi,

Several days ago, I followed the official guide to install fluentd (actually td-agent) on Ubuntu14.04.

The version number shown is: td-agent 0.10.58.

I plan to implement a sequence number for log events that are generated from the same log file.

For example, if all log events are tailed and generated from "server1.nginx.logfile1", I may want all those log events have these fields below:

{

"tag": "server1.nginx.logfile1",

"sequence": n,

"original_log_txt": "original txt msg in tailed file",

"parsed_log_field1": "parsed log field1 like HTTP GET/POST method",

"parsed_log_field2": "parsed log field2 like browser agent types such as Safari/Chrome",

"parsed_log_field3": "... ...",

... ...

"timestamp": "epoch time, or logstash format time, or whatever time format"

}

The sequence number n is a strictly increasing number that is used as a logic clock.

With such a precise logic clock, I can always perfectly restore the log event order on log database (on Elasticsearch or MongoDB). This sequence is useful because sometimes log events do not reach log database in order, or some log events happens at the same/close timestamp. Sequence number is also efficient and precise when I build some queries on log database.

To the job with fluentd, I did searches and found some posts that are related to my requirements.

https://groups.google.com/forum/#!searchin/fluentd/fluent-plugin-record-modifier|sort:relevance/fluentd/yCY6h-HBc9I/PIA8G43Od3sJ

https://groups.google.com/forum/#!searchin/fluentd/fluent-plugin-record-modifier|sort:relevance/fluentd/sRcGHgKkVe4/T4_OkCr23lMJ

With td-agent 0.10.58, I may need to use this plugin: https://github.com/repeatedly/fluent-plugin-record-modifier

It seems I even need to insert some Ruby code into cfg file to implement such a sequence number?

I think this is the right direction. Any suggestion?

But I also found an official blog saying that "v0.12 is Released".

http://www.fluentd.org/blog/fluentd-v0.12-is-released

I checked the blog, that the new "filter" feature changes the way how previous plugins (like fluent-plugin-record-reformer) work.

My questions for the versions are:

v0.12 with new "filter" feature is new (with no detailed documentations and samples). It is sufficient to implement the log event sequence number I want?

v0.12 will replace the 0.10.58? Or, they will be maintained concurrently? Sometimes, I saw "V1 format ( http://docs.fluentd.org/articles/config-file#v1-format )". Are these "V1 format" things related to differences between v0.12 and 0.10.58?

Many questions.

Thanks a lot in advance!

Ye

Ye Deng

unread,

Jan 13, 2015, 6:33:15 PM1/13/15

to flu...@googlegroups.com

On Tuesday, January 13, 2015 at 6:08:25 PM UTC-5, Ye Deng wrote:

Hi,

Several days ago, I followed the official guide to install fluentd (actually td-agent) on Ubuntu14.04.
The version number shown is: td-agent 0.10.58.

I plan to implement a sequence number for log events that are generated from the same log file.
For example, if all log events are tailed and generated from "server1.nginx.logfile1", I may want all those log events have these fields below:
{
"tag": "server1.nginx.logfile1",
"sequence": n,
"original_log_txt": "original txt msg in tailed file",
"parsed_log_field1": "parsed log field1 like HTTP GET/POST method",
"parsed_log_field2": "parsed log field2 like browser agent types such as Safari/Chrome",
"parsed_log_field3": "... ...",
... ...
"timestamp": "epoch time, or logstash format time, or whatever time format"
}
The sequence number n is a strictly increasing number that is used as a logic clock.
With such a precise logic clock, I can always perfectly restore the log event order on log database (on Elasticsearch or MongoDB). This sequence is useful because sometimes log events do not reach log database in order, or some log events happens at the same/close timestamp. Sequence number is also efficient and precise when I build some queries on log database.

To the job with fluentd, I did searches and found some posts that are related to my requirements.
https://groups.google.com/forum/#!searchin/fluentd/fluent-plugin-record-modifier|sort:relevance/fluentd/yCY6h-HBc9I/PIA8G43Od3sJ
https://groups.google.com/forum/#!searchin/fluentd/fluent-plugin-record-modifier|sort:relevance/fluentd/sRcGHgKkVe4/T4_OkCr23lMJ

With td-agent 0.10.58, I may need to use this plugin: https://github.com/repeatedly/fluent-plugin-record-modifier

Here is a typo, I mean I may need to use: https://github.com/sonots/fluent-plugin-record-reformer

Lance N.

unread,

Jan 14, 2015, 8:36:54 PM1/14/15

to flu...@googlegroups.com

Does the sequence number have to be continuous? A sequence number can just be a timestamp that is altered to "skip ahead" to maintain uniqueness.

Suppose the timer is in milliseconds, and the system timer has a resolution of 10ms. The timestamps might be X.000, X.001, X.002, then skip to X.010 because the system timer jumped forward.

Mr. Fiber

unread,

Jan 15, 2015, 5:06:34 AM1/15/15

to flu...@googlegroups.com

Sorry for the delay. Gmail judged this thread as a spam...

> v0.12 with new "filter" feature is new (with no detailed documentations and samples). It is sufficient to implement the log event sequence number I want?

Yes. I think filter mechanizm fits for this case.

I and kiyoto now write v0.12 documents . It will be published.

On the other hand, extending parser is an alternative approach.

Parser is pluggable so implementing xxx_with_seq is more efficient in this case.

Of course, filter approach is more general.

> v0.12 will replace the 0.10.58? Or, they will be maintained concurrently?

We maintain two versions but v0.10.58 is maintenance mode.

v0.10.58's main maintainer is seo-san, a.k.a sonots on github.

v0.12 uses v1 format by default and v0.10 also use v1 format by specifying --use-v1-config.

If you use td-agent 2, --use-v1-config is used.

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ye Deng

unread,

Jan 15, 2015, 10:47:56 AM1/15/15

to flu...@googlegroups.com

@ lance

I really want a continuous sequence number.

With that, I can easily figure out if some events are omitted in the middle (e.g. omitted middle events not valid to query yet on Elasticsearch). It is also helpful when I precisely define some "range/scope of the context" of a matched log event when querying my log database.

@ repeatedly

So v0.12 will become the V1? awesome.

I am a little confused about this: "Parser is pluggable so implementing xxx_with_seq is more efficient in this case.Of course, filter approach is more general."

Since there is no doc telling me how to implement a increasing counter in v0.12 fluentd with filter feature. I guess I may let filter remember a variable (not in some inserted Ruby code?) that will be attached to each log event, and increase the remembered variable each time when the filter sees a new log event?

I didn't understand why this job above is not efficient compared to the "Parser" based solution?

Ye

On Tuesday, January 13, 2015 at 6:08:25 PM UTC-5, Ye Deng wrote:

Mr. Fiber

unread,

Jan 16, 2015, 3:54:30 PM1/16/15

to flu...@googlegroups.com

> So v0.12 will become the V1? awesome.

https://github.com/fluent/fluentd/wiki/V1-Roadmap

Here is a roadmap. v0.12 is the first step in v1 release.

> I guess I may let filter remember a variable (not ...)

Your implementation approach is correct.

http://docs.fluentd.org/v0.12/articles/plugin-development#filter-plugins

This is a prototype document for filter development. We need more improvement.

> I didn't understand why this job above is not efficient compared to the "Parser" based solution?

It means adding sequence in Parser reduces additional iterations.

In filter, filter plugin iterates events to apply own routine.

In parser, can apply own routine in record creation phase, e.g. parsing apache log.

This approach is not re-usable for other input sources. So it's a trade-off.

Almost cases, Filter approach is enough.

Masahiro

sephy shen

unread,

Aug 29, 2017, 9:54:42 PM8/29/17

to Fluentd Google Group

Hi Deng,

I met the same requirement as yours.

How do you implement the requirement of outputing an increment value like line number for each log?

which type plugin do you use? do you embed raw ruby code in filter section?

Thanks a lot.

BR

Reply all

Reply to author

Forward