Hi Kevin,
I'm worrying about httpfs server threads number.
If your httpfs server (tomcat) is from CDH4, tomcat works with tomcat's default configurations,
and its threads are few. Many parallel requests may fail with httpfs server.
Buffer overflow of ruby is very curious. It might be from ruby's bug, but we may solve that problem
with configuration change.
What you can do to try to avoid current trouble, i think, are:
* check httpfs server logs and status
Can you operate hdfs over httpfs server when fluentd is in trouble?
If you cannot, httpfs server may be in trouble. Check logs of httpfs server.
* use webhdfs protocol
In our environment, httpfs server was bottleneck of traffic. Many many many I/O failures were occured.
For heavy traffic, webhdfs protocol is better than httpfs.
* use fluent-plugin-forest (to reduce number of <match> directives in configuration)
> If we limit the number of match definitions, we are able to run fine,
> but I don't know exactly where the breaking point is.
How many <match> directives in your configuration?
Handreds of <match> directives may not be tested in case i knew.
To reduce Fluentd's match overhead, fluent-plugin-forest is available.
fluent-plugin-forest uses its internal match mechanism instead of fluentd's.
<match tag.x.*>
type forest
subtype webhdfs
<template>
# webhdfs settings
</template>
</match>
<match tag.y.x*>
type forest
subtype webhdfs
<template>
# ....
</template>
</match>
> My other crazy idea is to fork the webhdfs plugin to make a single connection to HDFS but write to multiple files.
I'm developer of fluent-plugin-webhdfs.
I hope patches If you need such that feature seriously, rather than forks.
tagomoris