Yamano,
Unfortunately I haven't made any progress. It took me a little while but I did implement Masahiro's suggestion of adding sigdump support to my plugin. Since then I haven't experenced a "total" hang but I have had one instance of data for one output stopping completely. I had to restart immediately to remedy that and wasn't able to capture a stack trace. However, I am currently alarming because that same instance of the output plugin hasn't sent data in approximately 40 minutes, despite the fact that I am continuing to receive incoming data. It seems likely that I'm in a "bad" state and the only option I have to address the problem is restarting FluentD. There are two Ruby processes--one the parent of the other. I sent a CONT signal to each and the results are attached. PID 26114 is the child process, presumably responsible for actual processing of data given the much larger stack trace. The two stack traces are attached. To be honest, I don't really know how to interpret these. Could you tell me what function/method led to conclude that the problem was Ruby's native code in your case? I see "block in connect" from the HTTP client and I also see that in my case, though followed by a call to "connect". I also see a number of sleeping threads in FluentD's output plugin code ("output.rb"). I'd love some help on this...
The stack traces are attached but also here inline.
PARENT PROCESS (26111):
Sigdump at 2015-08-23 01:33:09 +0000 process 26111 (/usr/sbin/td-agent)
Thread #<Thread:0x007fb69b19e7a0> status=run priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:38:in `backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:38:in `dump_backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:24:in `block in dump_all_thread_backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:23:in `each'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:23:in `dump_all_thread_backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:16:in `block in dump'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:107:in `open'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:107:in `_open_dump_path'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:14:in `dump'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:7:in `block in setup'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:252:in `call'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:252:in `waitpid'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:252:in `supervise'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:134:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/command/fluentd.rb:167:in `<top (required)>'
/opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:73:in `require'
/opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:73:in `require'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/bin/fluentd:6:in `<top (required)>'
/opt/td-agent/embedded/bin/fluentd:23:in `load'
/opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
/usr/sbin/td-agent:7:in `load'
/usr/sbin/td-agent:7:in `<main>'
Built-in objects:
59,100: TOTAL
29,039: T_STRING
12,244: T_NODE
7,368: T_ARRAY
4,731: T_DATA
1,868: T_CLASS
1,558: T_OBJECT
850: FREE
610: T_REGEXP
426: T_HASH
147: T_ICLASS
126: T_MODULE
59: T_RATIONAL
38: T_STRUCT
19: T_FILE
9: T_FLOAT
4: T_BIGNUM
3: T_MATCH
1: T_COMPLEX
All objects:
21,868: String
3,753: RubyVM::InstructionSequence
3,740: Array
831: Class
586: Gem::Requirement
465: Regexp
386: Gem::Dependency
352: Hash
204: Gem::Version
173: Proc
126: Module
115: RubyVM::Env
100: Encoding
99: Gem::Specification
99: Gem::StubSpecification
99: Time
91: Gem::StubSpecification::StubLine
59: Rational
38: Range
21: Fluent::Config::ConfigureProxy
16: MatchData
14: Fluent::Config::Element
13: OptionParser::Switch::RequiredArgument
13: Mutex
11: OptionParser::Switch::NoArgument
9: Float
6: OptionParser::OptionMap
5: IO
4: File
4: Fluent::Config::V1Parser
4: StringScanner
4: Bignum
3: Object
3: OptionParser::List
3: Fluent::Registry
3: IRB::Notifier::LeveledNotifier
3: JSON::Ext::Generator::State
2: OptionParser::Switch::OptionalArgument
2: UnboundMethod
2: OptionParser::Switch::PlacedArgument
2: ThreadSafe::Cache
2: Binding
1: Fluent::Supervisor
1: Fluent::Supervisor::LoggerInitializer
1: OptionParser::CompletingHash
1: Fluent::Log
1: Fluent::StatusClass
1: Thread::Backtrace
1: IRB::Notifier::NoMsgNotifier
1: IRB::StdioOutputMethod
1: IRB::Notifier::CompositeNotifier
1: Resolv
1: Resolv::Hosts
1: OptionParser
1: URI::Generic
1: Resolv::DNS
1: URI::Parser
1: Monitor
1: Gem::PathSupport
1: Fluent::EngineClass
1: Gem::Platform
1: Resolv::DNS::Config
1: OpenSSL::X509::Store
1: Fluent::PluginClass
1: Complex
1: ThreadGroup
1: IOError
1: Thread
1: RubyVM
1: NoMemoryError
1: SystemStackError
1: Random
1: ARGF.class
1: Data
1: fatal
String 437,281 bytes
Array 2 elements
Hash 0 pairs
CHILD PROCESS (26114):
Sigdump at 2015-08-23 01:33:11 +0000 process 26114 (/usr/sbin/td-agent)
Thread #<Thread:0x007fb69b19e7a0> status=run priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:38:in `backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:38:in `dump_backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:24:in `block in dump_all_thread_backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:23:in `each'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:23:in `dump_all_thread_backtrace'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:16:in `block in dump'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:107:in `open'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:107:in `_open_dump_path'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:14:in `dump'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.2/lib/sigdump.rb:7:in `block in setup'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `call'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run_once'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/engine.rb:163:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:481:in `run_engine'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:140:in `block in start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:266:in `call'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:266:in `main_process'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:241:in `block in supervise'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:240:in `fork'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:240:in `supervise'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/supervisor.rb:134:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/command/fluentd.rb:167:in `<top (required)>'
/opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:73:in `require'
/opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:73:in `require'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/bin/fluentd:6:in `<top (required)>'
/opt/td-agent/embedded/bin/fluentd:23:in `load'
/opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
/usr/sbin/td-agent:7:in `load'
/usr/sbin/td-agent:7:in `<main>'
Thread #<Thread:0x007fb68f56a5e8> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `sleep'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `wait'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `cond_wait'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:149:in `run'
Thread #<Thread:0x007fb68f48b190> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:920:in `connect'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:920:in `block in connect'
/opt/td-agent/embedded/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:920:in `connect'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:858:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:965:in `reset'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:628:in `connection_for'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
/etc/td-agent/plugin/test_http_buf.rb:88:in `block in write'
/etc/td-agent/plugin/test_http_buf.rb:87:in `upto'
/etc/td-agent/plugin/test_http_buf.rb:87:in `write'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in `write_chunk'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:in `pop'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in `try_flush'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:in `run'
Thread #<Thread:0x007fb68f489e80> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:920:in `connect'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:920:in `block in connect'
/opt/td-agent/embedded/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:920:in `connect'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
/opt/td-agent/embedded/lib/ruby/2.1.0/net/http.rb:858:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:965:in `reset'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:628:in `connection_for'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
/etc/td-agent/plugin/test_http_buf.rb:88:in `block in write'
/etc/td-agent/plugin/test_http_buf.rb:87:in `upto'
/etc/td-agent/plugin/test_http_buf.rb:87:in `write'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in `write_chunk'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:in `pop'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in `try_flush'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:in `run'
Thread #<Thread:0x007fb68f488b70> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `sleep'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `wait'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `cond_wait'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:149:in `run'
Thread #<Thread:0x007fb68f486910> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `sleep'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `wait'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:165:in `cond_wait'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:149:in `run'
Thread #<Thread:0x007fb68f4859e8> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run_once'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/plugin/socket_util.rb:126:in `run'
Thread #<Thread:0x007fb68f484e58> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/2.1.0/drb/drb.rb:959:in `accept'
/opt/td-agent/embedded/lib/ruby/2.1.0/drb/drb.rb:959:in `accept'
/opt/td-agent/embedded/lib/ruby/2.1.0/drb/drb.rb:1614:in `main_loop'
/opt/td-agent/embedded/lib/ruby/2.1.0/drb/drb.rb:1464:in `block in run'
Thread #<Thread:0x007fb68f536d60> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/2.1.0/webrick/server.rb:170:in `select'
/opt/td-agent/embedded/lib/ruby/2.1.0/webrick/server.rb:170:in `block in start'
/opt/td-agent/embedded/lib/ruby/2.1.0/webrick/server.rb:32:in `start'
/opt/td-agent/embedded/lib/ruby/2.1.0/webrick/server.rb:160:in `start'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/plugin/in_monitor_agent.rb:215:in `block in start'
Thread #<Thread:0x007fb68f5368d8> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run_once'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/plugin/socket_util.rb:126:in `run'
Thread #<Thread:0x007fb68f536518> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run_once'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/plugin/socket_util.rb:126:in `run'
Thread #<Thread:0x007fb68f536158> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run_once'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/plugin/socket_util.rb:126:in `run'
Thread #<Thread:0x007fb68f535d20> status=sleep priority=0
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run_once'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.3.0/lib/
cool.io/loop.rb:88:in `run'
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/plugin/socket_util.rb:126:in `run'
Built-in objects:
191,173: TOTAL
103,613: FREE
42,287: T_STRING
15,087: T_DATA
12,949: T_ARRAY
6,445: T_NODE
3,170: T_OBJECT
2,632: T_CLASS
2,321: T_HASH
1,384: T_STRUCT
646: T_REGEXP
173: T_ICLASS
157: T_MODULE
121: T_MATCH
100: T_FILE
59: T_RATIONAL
19: T_BIGNUM
9: T_FLOAT
1: T_COMPLEX
Thank you!
Damon