File Size in prometheus_tail_monitor input plugin

57 views
Skip to first unread message

Suvelee Sarpotdar

unread,
Dec 18, 2020, 3:49:04 PM12/18/20
to Fluentd Google Group
We are using fluentd prometheus plugin to monitor all the Fluentd forwarders. Currently the fluentd promethues in tail plugin gives file position and file inode. However this doesn't help us figure out how far behind the forwarder is.

For e.g., say the forwarder was down for 30 minutes and in the meantime the application kept producing logs. After 30 minutes when the forwarder is brought back  up, we would like to know how far behind it is by finding the difference between file position and total file size so far.

Would it be possible to add fluentd_tail_file_size metric to the prometheus_tail_monitor plugin? I was able to fork the repo and add this new metric and test it locally. It seemed to work fine. I would be happy to contribute to the plugin?

Thanks,
Suvelee

Kentaro Hayashi

unread,
Dec 20, 2020, 7:59:10 PM12/20/20
to Fluentd Google Group
Hi,

Generally speaking, if you improved plugin itself and work fine with test code, feedback to
upstream is good thing to do so because another person  not to be suffered from same issue.
I hope you can go forward.

Regards,

2020年12月19日土曜日 5:49:04 UTC+9 suvelee....@gmail.com:

Suvelee Sarpotdar

unread,
Dec 21, 2020, 9:20:31 AM12/21/20
to Fluentd Google Group
Hi Kentaro,

Thank you so much. I will submit a PR soon.

Regards,
Suvelee

Suvelee Sarpotdar

unread,
Jan 14, 2021, 12:19:46 PM1/14/21
to Fluentd Google Group
While testing the changes locally, we found a peculiar scenario where the metrics for files that is currently unwatched (rotated/deleted/moved) by in_tail input plugin are also provided by the in_tail_prometheus monitoring plugin:

Scenario:
Fluentd's in_tail is monitoring files A, B and C. Metrics endpoint returns metrics for A, B and C correctly. Now we delete file A. in_tail correctly identifies that file A has rotated and sets a 5 second timer and then eventually detaches the watcher for file A cleans up the pos file. However the in_tail_prometheus plugin looks for all watchers and publishes metrics for all the files including the ones held by the detached watcher.

Possible Solution:
A solution that we can think of is accessing the pos_file (:@pf) instance variable and checking if the path in the watcher is still present in @pf or if its present and the value in pos file is UNWATCHED_POSITION (0xffffffffffffffff) then don't add metrics for it. 

However this is drastic change to the in_tail_prometheus pluging and also adds fragile implementation of referring to another internal variable (@pf). Please let us know your thoughts if there is better way to implement it.

Suvelee Sarpotdar

unread,
Jan 14, 2021, 3:54:11 PM1/14/21
to Fluentd Google Group
Also one more thing to note here is once an entry for a file is added in the metrics its never cleaned up. So the prometheus metrics will keep increasing over a time period.

Suvelee Sarpotdar

unread,
Jan 15, 2021, 4:53:34 PM1/15/21
to Fluentd Google Group
Hi Kentaro,

 So I did some more digging into Fluentd and Prometheus client code. Here is a solution I came up with.

Fluentd has unwatched flag in TailWatcher that is set to true when the file rotation (deletion/move) is detected. It waits for 5 seconds and then updates the io_handler to nil and marks the file position as UNWATCHED. This logic is seen in detach_watcher method - https://github.com/fluent/fluentd/blob/677e5a0c445bb413958305c5553951e3e68e27ae/lib/fluent/plugin/in_tail.rb#L474. Unfortunately I couldn't access unwatched flag. 

So in the plugin I implemented the following logic
1) Check if the io_handler for the watcher is nil.
2) If yes, unregister all the metrics from registry and intialize @metric again otherwise we cannot add metric again for other watchers. This means even if one file is rotated the metrics will not be available for a brief moment of time.
3) In the next iteration of the tails loop, a watcher that is still being tracked by Fluentd will add the metrics back and at the end of the tails.clone.each loop we will have metrics for all the watchers which are still being tracked by Fluentd.

Please let me know if this solution sounds reasonable to you. Also I am not super proficient with ruby code so I couldnt figure out why :unwatched flag is not available.
Thanks,
Suvelee
Reply all
Reply to author
Forward
0 new messages