errors when monitoring _large_ numbers of files

1,661 views
Skip to first unread message

Matthew Hollick

unread,
Jun 10, 2014, 5:34:35 AM6/10/14
to flu...@googlegroups.com
Hi,

due to a configuration error our fluentd implementation attempted to follow over 1025 log files. The user that fluentd runs as has a file handle limit of 1024.
I don't expect this to ever work, but I thought that what happens when fluentd hits this scenario might be of interest or even worthy of a bug report. I am using the td-agent package.

Fluentd is being used to monitor Stingray traffic manager vserver logs. Format is set to none. For each vserver a new log file is created every hour. There is no clean up script. Fluentd was accidentally configured with a path ending in a '*' rather than a pattern to match the current hour's log file.

In a little over 7 hours 13GB of errors were written by the fluentd agent.
 
when the errors start we get the following in the log file:

2014-06-09 09:01:17 +0100 [error]: Too many open files - /zlogs/access/cc-2014-06-06-20.log
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:529:in `initialize'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:529:in `open'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:529:in `on_notify'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:321:in `on_notify'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `call'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `on_timer'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run_once'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:203:in `run'
2014-06-09 09:01:17 +0100 [error]: Too many open files - /zlogs/access/cd-443-2014-06-05-22.log
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:529:in `initialize'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:529:in `open'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:529:in `on_notify'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:321:in `on_notify'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `call'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `on_timer'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run_once'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run'
  2014-06-09 09:01:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:203:in `run'

This continues at a rate of approx 100 events (of 10 lines each) until:

2014-06-09 10:03:17 +0100 [error]: bignum too big to convert into `long'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:356:in `seek'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:356:in `on_rotate'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:542:in `call'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:542:in `on_notify'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:321:in `on_notify'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:304:in `attach'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:133:in `setup_watcher'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:147:in `block in start_watchers'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:138:in `each'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:138:in `start_watchers'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:128:in `refresh_watchers'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `call'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `on_timer'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run_once'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:203:in `run'
2014-06-09 10:03:17 +0100 [error]: bignum too big to convert into `long'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:356:in `seek'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:356:in `on_rotate'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:542:in `call'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:542:in `on_notify'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:321:in `on_notify'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:304:in `attach'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:133:in `setup_watcher'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:147:in `block in start_watchers'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:138:in `each'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:138:in `start_watchers'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:128:in `refresh_watchers'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `call'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:407:in `on_timer'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run_once'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run'
  2014-06-09 10:03:17 +0100 [error]: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb:203:in `run'

Which continued at the same rate until operational staff stopped fluentd.

As I said, I don't expect this to work, I just thought that it might be an interesting edge case for developers to consider. As such, I am not even sure that it merits a bug report.


Regards,

Matthew.

Masahiro Nakagawa

unread,
Jun 10, 2014, 6:48:33 AM6/10/14
to flu...@googlegroups.com
Could you set ulimit to large number?


And I recommend time format instead of '*' if your log file has fixed time format.





--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Hollick

unread,
Jun 10, 2014, 7:04:07 AM6/10/14
to flu...@googlegroups.com
Hi, thanks for the response and suggestions.

I have not changed the ulimit for the td-user. I would rather not unless I really need to. If the configuration had been correct (using the time format instead of a dumb wild card, just as you say) and I had an unlimited number of file handles available, all ~8500 files would have been monitored, even though only ~12 would have fresh data. I hope that I do not need more than 1024 file handles to monitor less than 20 files.

The truth is, in this implementation, I would not want it to work. The configuration was an error. :-(

My consideration was that the particular edge case that we hit did not appear to be cleanly handled within the tail plugin.
Maybe a graceful exit + alert on the first error would be more expected by operational staff rather than creating many GB of log file. - I don't know.. Hopefully other people are more competent than I am.

Alternatively, is there a way to disable the stack trace in the log file?


Regards,

Matthew.
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/G5e7E2b9iwg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

Matthew Hollick

unread,
Jun 10, 2014, 7:09:11 AM6/10/14
to flu...@googlegroups.com
Just found the --suppress-repeated-stacktrace option..

Masahiro Nakagawa

unread,
Jun 11, 2014, 1:21:45 PM6/11/14
to flu...@googlegroups.com
Maybe a graceful exit + alert on the first error would be more expected by operational staff rather than creating many GB of log file. - I don't know.. Hopefully other people are more competent than I am.

You can monitor Fluentd's error logs using 'fluent' tag.
Following section describes a configuration example.


Alternatively, is there a way to disable the stack trace in the log file?

You already knew --suppress-repeated-stacktrace to reduce dumping same stacktrace.
See also following section.



Helio Andrade

unread,
Feb 15, 2015, 3:42:24 AM2/15/15
to flu...@googlegroups.com
Guys,

- I had the same problem and supress the stacktrace does not help. If filled up the file with message like the following at 43MB per minute.

   2015-02-15 08:08:43 +0000 [error]: bignum too big to convert into `long'
   2015-02-15 08:08:43 +0000 [error]: suppressed same stacktrace


- My file descriptor configuration is already at 65535:

   # ulimit -n
   65535

- The fluentd is already running with the supress parameter configured:

   # ps -ef | grep [f]luentd
   root 51995 1 0 08:14 ? 00:00:00 /usr/bin/ruby1.9.1 /usr/local/bin/fluentd -c /etc/fluent.conf --suppress-repeated-stacktrace
   root 51997 51995 0 08:14 ? 00:00:04 /usr/bin/ruby1.9.1 /usr/local/bin/fluentd -c /etc/fluent.conf --suppress-repeated-stacktrace

Regards,

Mr. Fiber

unread,
Feb 15, 2015, 8:05:48 AM2/15/15
to flu...@googlegroups.com
What the plugin do you use?
Your record seems to have bignum but the plugin can't handle such value.


Masahiro

Helio Andrade

unread,
Feb 15, 2015, 11:24:36 AM2/15/15
to flu...@googlegroups.com
Its the in_tail.

Helio Andrade

unread,
Feb 15, 2015, 11:25:42 AM2/15/15
to flu...@googlegroups.com
Guys,

 - It seems the two error messages Matthew Hollick described could happens seprated. I was able to recreate the steps for the second error without have the first.

Environment configuations
=====================

Here is my fluentd configuration file:
########################################################################################
# cat /etc/fluent.conf
<source>
  type tail
  path /var/log/nginx/*access.log
  pos_file /var/log/fluentd/nginx.access.log.pos
  tag nginx.access
  format nginx
</source>
<source>
  type tail
  path /var/log/nginx/*error.log
  pos_file /var/log/fluentd/nginx.error.log.pos
  tag nginx.error
  format /^(?<time>[^ ]+ [^ ]+) \[(?<log_level>.*)\] (?<pid>\d*).(?<tid>[^:]*): (?<message>.*)$/
</source>
<source>
  type tail
  path /var/log/upstart/*.log
  pos_file /var/log/fluentd/application.log.log.pos
  tag application.log
  format none
</source>
<source>
  type tail
  path /var/log/upstart/celery/*.log
  pos_file /var/log/fluentd/celery.application.log.log.pos
  tag celery.application.log
  format none
</source>
<source>
  type tail
  path /var/log/apps/*.log
  pos_file /var/log/fluentd/apps.log.log.pos
  tag apps.log
  format none
</source>
<source>
  type tail
  path /var/log/apps/celery/*.log
  pos_file /var/log/fluentd/celery.apps.log.log.pos
  tag celery.apps.log
  format none
</source>

<match **>
  type s3
  path <REMOVED>
  buffer_path <REMOVED>
  aws_key_id <REMOVED>
  aws_sec_key <REMOVED>
  s3_bucket <REMOVED>
  s3_region <REMOVED>
  time_slice_format %Y%m%d%H%M
</match>
########################################################################################

 - File descriptor limits:
########################################################################################
# ulimit -n
65535

# cat /etc/security/limits.d/all_limits.conf
# Limits settings for *
* - nofile 65535
########################################################################################

 - Sysctl configuration:
########################################################################################
# sysctl -a | egrep 'net.ipv4.tcp_tw_recycle|net.ipv4.tcp_tw_reuse|net.ipv4.ip_local_port_range'
error: permission denied on key 'vm.compact_memory'
error: permission denied on key 'net.ipv4.route.flush'
net.ipv4.tcp_tw_recycle = 0
net.ipv4.ip_local_port_range = 2000 65000
net.ipv4.tcp_tw_reuse = 0
error: permission denied on key 'net.ipv6.route.flush'
########################################################################################

 - Quantity of log files:
########################################################################################
# find /var/log -type f | grep -v gz | egrep -v '\.pos$' | wc -l
42
########################################################################################

 - Quantity os log files been used at every moment:
########################################################################################
# lsof | grep '/var/log' | wc -l
22
########################################################################################

 - Processes running:
########################################################################################
nginx
nrsysmond
sensu-client
uwsgi
fluentd
########################################################################################

 - Fluentd management configuration
########################################################################################
# cat /etc/init/fluentd.conf
start on runlevel [2345]
stop on starting rc RUNLEVEL=[016]

setuid root
setgid root

respawn
respawn limit 10 5

console log

exec fluentd -c /etc/fluent.conf --suppress-repeated-stacktrace
########################################################################################

 - upstart log rotation file:
########################################################################################
# cat /etc/logrotate.d/upstart
/var/log/upstart/*.log {
        daily
        missingok
        rotate 7
        compress
        notifempty
nocreate
}
########################################################################################

OBS: As you can see: 
1) I didn't use "*" in my fluentd.conf. I instead used "*.log";
2) The logrotate conf file were not modified. ( I didn't posted it because of it);
3) I just modified the max-file-descriptors and didn't make the network configurations as the doc says i only needed to do it when the it consists of many fluentd instances and i'm not sure "How Many instances the doc were talking about";
4) There are not much log files. Even less are the quantity been used in this test;
5) Not many programs running on the server;
6) The python/uwsgi service is managed by upstart so the log file goes to /var/log/upstart;
7) Fluentd is managed by the upstart;
8) Fluentd is already using the --suppress-repeated-stacktrace option;
9) The upstart log rotation file compresses the rotated files immediately; 

Objective
=======

 - See if the logrotation could trigger the error even when my server is configured right and if i was receiving the "Too many open files";

Proceedings
==========

1) Stop fluentd service:
# service fluentd stop
fluentd stop/waiting
#

2) Confirm fluentd is not running:
# ps -ef f | egrep [f]luentd
#

3) Erase actual fluentd.log file and compressed too:
# rm -f /var/log/upstart/fluentd*.log /var/log/upstart/fluentd.log.?.gz
#

4) Confirm all logfiles were deleted:
# ls /var/log/upstart/*fluentd*
ls: cannot access /var/log/upstart/*fluentd*: No such file or directory
#

 - Now we have a clean environment to see the fluentd logfile messages;

5) Start fluentd service:
# service fluentd start
fluentd start/running, process 56390
#

6) Confirm fluentd is not running:
# ps -ef f | egrep [f]luentd
root     56390     1  2 09:54 ?        Ssl    0:00 /usr/bin/ruby1.9.1 /usr/local/bin/fluentd -c /etc/fluent.conf --suppress-repeated-stacktrace
root     56392 56390  7 09:54 ?        Sl     0:00  \_ /usr/bin/ruby1.9.1 /usr/local/bin/fluentd -c /etc/fluent.conf --suppress-repeated-stacktrace
#

7) Confirming the fluentd log file were created:
ls -l /var/log/upstart/*fluentd*
-rw-r----- 1 root root 2992 Feb 15 09:54 /var/log/upstart/fluentd.log
root@api1n:/#

8) Seeing the output of the log file:
################################################################
# cat /var/log/upstart/fluentd.log
2015-02-15 09:54:47 +0000 [info]: reading config file path="/etc/fluent.conf"
2015-02-15 09:54:47 +0000 [info]: starting fluentd-0.12.4
2015-02-15 09:54:47 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2015-02-15 09:54:47 +0000 [info]: gem 'fluent-plugin-s3' version '0.5.1'
2015-02-15 09:54:47 +0000 [info]: gem 'fluentd' version '0.12.4'
2015-02-15 09:54:47 +0000 [info]: using configuration file: <ROOT>
  <source>
    type tail
    path /var/log/nginx/*access.log
    pos_file /var/log/fluentd/nginx.access.log.pos
    tag nginx.access
    format nginx
  </source>
  <source>
    type tail
    path /var/log/nginx/*error.log
    pos_file /var/log/fluentd/nginx.error.log.pos
    tag nginx.error
    format /^(?<time>[^ ]+ [^ ]+) \[(?<log_level>.*)\] (?<pid>\d*).(?<tid>[^:]*): (?<message>.*)$/
  </source>
  <source>
    type tail
    path /var/log/upstart/*.log
    pos_file /var/log/fluentd/application.log.log.pos
    tag application.log
    format none
  </source>
  <source>
    type tail
    path /var/log/upstart/celery/*.log
    pos_file /var/log/fluentd/celery.application.log.log.pos
    tag celery.application.log
    format none
  </source>
  <source>
    type tail
    path /var/log/apps/*.log
    pos_file /var/log/fluentd/apps.log.log.pos
    tag apps.log
    format none
  </source>
  <source>
    type tail
    path /var/log/apps/celery/*.log
    pos_file /var/log/fluentd/celery.apps.log.log.pos
    tag celery.apps.log
    format none
  </source>
  <match **>
    type s3
    path <REMOVED>
    buffer_path<REMOVED>
    aws_key_id <REMOVED>
    aws_sec_key <REMOVED>
    s3_bucket <REMOVED>
    s3_region <REMOVED>
    time_slice_format %Y%m%d%H%M
  </match>
</ROOT>
2015-02-15 09:54:47 +0000 [info]: adding match pattern="**" type="s3"
2015-02-15 09:54:47 +0000 [info]: adding source type="tail"
2015-02-15 09:54:47 +0000 [info]: adding source type="tail"
2015-02-15 09:54:47 +0000 [info]: adding source type="tail"
2015-02-15 09:54:47 +0000 [info]: adding source type="tail"
2015-02-15 09:54:47 +0000 [info]: adding source type="tail"
2015-02-15 09:54:47 +0000 [info]: adding source type="tail"
2015-02-15 09:54:48 +0000 [info]: following tail of /var/log/nginx/access.log
2015-02-15 09:54:48 +0000 [info]: following tail of /var/log/nginx/stub-status_access.log
2015-02-15 09:54:48 +0000 [info]: following tail of /var/log/nginx/error.log
2015-02-15 09:54:48 +0000 [info]: following tail of /var/log/upstart/fluentd.log
2015-02-15 09:54:48 +0000 [info]: following tail of /var/log/upstart/<APPLICATION_LOG_NAME_REMOVED> 
#
################################################################

 - I removed sensitive information from the file above

9) Force the rotation of the upstart log files:
# logrotate -f /etc/logrotate.d/upstart
#

10) Verifying the old logfile for modifications:
#zcat /var/log/upstart/fluentd.log.1.gz | tail -2
2015-02-15 09:54:48 +0000 [info]: following tail of /var/log/upstart/<APPLICATION_LOG_NAME_REMOVED>
2015-02-15 10:00:58 +0000 [info]: detected rotation of /var/log/upstart/fluentd.log; waiting 5 seconds
#

11) Verifying the new log file created:
#cat /var/log/upstart/fluentd.log
2015-02-15 10:00:58 +0000 [info]: detected rotation of /var/log/upstart/<APPLICATION_LOG_NAME_REMOVED>.log; waiting 5 seconds
2015-02-15 10:00:58 +0000 [info]: following tail of /var/log/upstart/fluentd.log
#

12) Verifying if we have any log file deleted still been handled:
#lsof | grep delete
init          1          root    8w      REG              202,1       64     271599 /var/log/upstart/dbus.log.1 (deleted)
init          1          root   12w      REG              202,1       95     271607 /var/log/upstart/acpid.log.1 (deleted)
init          1          root   13w      REG              202,1      840     269572 /var/log/upstart/<APPLICATION_LOG_NAME_REMOVED>.log.1 (deleted)
#

13) Force a second rotation of the upstart log files:
# logrotate -f /etc/logrotate.d/upstart
#

14) Verifying the old logfile for modifications:
# zcat fluentd.log.1.gz
2015-02-15 10:00:58 +0000 [info]: detected rotation of /var/log/upstart/titans-learning-center-web.log; waiting 5 seconds
2015-02-15 10:00:58 +0000 [info]: following tail of /var/log/upstart/fluentd.log
2015-02-15 10:04:50 +0000 [warn]: pattern not match: "10.0.0.5 - - [15/Feb/2015:10:04:50 +0000] \"GET / HTTP/1.0\" 302 239 \"-\" \"check_http/v1.4.15 (nagios-plugins 1.4.15)\" \"201.20.34.11\""
2015-02-15 10:12:14 +0000 [info]: detected rotation of /var/log/upstart/fluentd.log; waiting 5 seconds
#

11) Verifying the new log file created for a few minutes:
# tail -f fluentd.log
2015-02-15 10:24:50 +0000 [warn]: pattern not match: "10.0.0.5 - - [15/Feb/2015:10:24:50 +0000] \"GET / HTTP/1.0\" 302 239 \"-\" \"check_http/v1.4.15 (nagios-plugins 1.4.15)\" \"201.20.34.11\""
2015-02-15 10:25:48 +0000 [info]: following tail of /var/log/upstart/fluentd.log
2015-02-15 10:32:48 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:375:in `seek'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:375:in `on_rotate'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:564:in `call'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:564:in `on_notify'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:340:in `on_notify'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:323:in `attach'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:132:in `setup_watcher'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:150:in `block in start_watchers'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:137:in `each'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:137:in `start_watchers'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:127:in `refresh_watchers'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:427:in `call'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:427:in `on_timer'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/cool.io-1.3.0/lib/cool.io/loop.rb:88:in `run_once'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/cool.io-1.3.0/lib/cool.io/loop.rb:88:in `run'
  2015-02-15 10:32:48 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:211:in `run'
2015-02-15 10:32:49 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:375:in `seek'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:375:in `on_rotate'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:564:in `call'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:564:in `on_notify'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:340:in `on_notify'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:427:in `call'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:427:in `on_timer'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/cool.io-1.3.0/lib/cool.io/loop.rb:88:in `run_once'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/cool.io-1.3.0/lib/cool.io/loop.rb:88:in `run'
  2015-02-15 10:32:49 +0000 [error]: /var/lib/gems/1.9.1/gems/fluentd-0.12.4/lib/fluent/plugin/in_tail.rb:211:in `run'
2015-02-15 10:32:50 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:50 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:51 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:51 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:52 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:52 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:53 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:53 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:54 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:54 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:55 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:55 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:56 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:56 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:57 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:57 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:58 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:58 +0000 [error]: suppressed same stacktrace
2015-02-15 10:32:59 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:32:59 +0000 [error]: suppressed same stacktrace
2015-02-15 10:33:00 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:33:00 +0000 [error]: suppressed same stacktrace
2015-02-15 10:33:01 +0000 [error]: bignum too big to convert into `long'
  2015-02-15 10:33:01 +0000 [error]: suppressed same stacktrace

Errors
#####

1) I think I've made a mistake on the fluentd.conf file because i'm tailing all files in the /var/log/upstart/ directory and the fluentd's log files are there as the upstart says.

Questions
########

1) Could this be the source of the error?
2) Could this error still happens if i remove the fluentd.log from the fluentd configuration?

...

Mr. Fiber

unread,
Feb 15, 2015, 1:33:33 PM2/15/15
to flu...@googlegroups.com
Ah, I see.
Does this error happen with only fluentd.log?
I assume your machine is 32bit and fluentd.log size is larger than the max of 32bit.
If so, removing fluentd.log from configuration resolves the problem.


Masahiro

--

Lance N.

unread,
Feb 15, 2015, 8:02:27 PM2/15/15
to flu...@googlegroups.com
Tailing the "live" fluentd log is guaranteed to create recursive self-perpetuating error storms.
I plan to logrotate it every hour (without gzip) and tail the archived versions.

joao.m...@titansgroup.com.br

unread,
Feb 20, 2015, 8:31:29 AM2/20/15
to flu...@googlegroups.com

The Helio's machine is a Linux Server with kernel 3.2.0-44-virtual #69-Ubuntu SMP Thu May 16 17:54:14 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tks
João
Reply all
Reply to author
Forward
0 new messages