Hi Kellan,
NXLog is great and fast tool and I've sucessfully married it with fluentd. My architecture is as follows: nxlog agents tailing apache access logs on web cluster machines - multiple machines sending their access logs to a common place - a log collector running another nxlog instance, fluentd, Elasticsearch, nginx and Kibana.
Here is critical part of /etc/nxlog/nxlog.conf on apache hosts (Ubuntu 12.04) :
<Input access_log>
Module im_file
File "/var/log/apache2/mywebvhohst.access.log"
Exec if $raw_event =~ /^(\S+) (\S+) (\S+) \[([^\]]+)\] \"(\S+) (.+) HTTP.\d\.\d\" (\d+) (\d+) \"([^\"]+)\" \"([^\"]+)\"/\
{ \
$Hostname = $1; \
if $3 != '-' $AccountName = $3; \
$EventTime = parsedate($4); \
$HTTPMethod = $5; \
$HTTPURL = $6; \
$HTTPResponseStatus = $7; \
$FileSize = $8; \
$HTTPReferer = $9; \
$HTTPUserAgent = $10; \
}
</Input>
<Output out_udp>
Module om_udp
Port 514
Host 10.1.1.9
</Output>
########################################
# Routes #
########################################
<Route apache>
Path access_log => out_udp
</Route>
Now here is nxlog config at the log collector machine (10.1.1.9):
# The buffer needed to NOT lose events when fluentd restarts
<Processor buffer_udp>
Module pm_buffer
# 1Mb buffer
MaxSize 1024
Type Mem
# warn at 512k
WarnLimit 512
</Processor>
<Input in1_udp>
Module im_udp
Host 0.0.0.0
Port 514
Exec if $raw_event =~ /^(\S+) (\S+) (\S+) \[([^\]]+)\] \"(\S+) (.+) HTTP.\d\.\d\" (\d+) (\d+) \"([^\"]+)\" \"([^\"]+)\"/\
{ \
$Hostname = $1; \
if $3 != '-' $AccountName = $3; \
$EventTime = parsedate($4); \
$HTTPMethod = $5; \
$HTTPURL = $6; \
$HTTPResponseStatus = $7; \
$FileSize = $8; \
$HTTPReferer = $9; \
$HTTPUserAgent = $10; \
} else drop();
# sometime there are misformed lines (not too many though), so we drop them
</Input>
# looks like there is no reliable way to feed it directly to fluentd, so we write all our apache access logs to a common file which will be scanned/tailed by fluentd
<Output fileout1>
Module om_file
File "/var/log/apache/apache-access.log"
# we convert it to JSON on the fly - the nxlog is great for this and very fast - pure C :)
Exec $raw_event = to_json();
</Output>
########################################
# Routes #
########################################
<Route routeout>
Path in1_udp => buffer_udp => fileout1
</Route>
Fluentd at log collector (Ubuntu again) /etc/td-agent/td-agent.conf :
<source>
# here I wish Fluentd developers would create another plugin similar to this one but to be udp/tcp enabled
type tail
path /var/log/apache/apache-access.log
pos_file /var/log/td-agent/httpd-access.log.pos
# our logs are already in JSON - thanks to nxlog, so this will make flentd running easier
format json
tag apache.access
</source>
# and finally we feed it to Elasticsearch
<match apache.**>
type elasticsearch
logstash_format true
logstash_prefix adcenter
flush_interval 10s # for testing
</match>
I had nightmares to get Logstash and its agents working proper way, on Ubuntu it was working horrible that why I switched to nxlog + Fluentd . How to configure Elasticsearch and Kibana - there is plenty of documents especially for Logstash + Elasticsearch + Kibana - you throw the Logstash away and do the rest :)
Fluentd seem to work fine, the only thing is I wish there was an input plugin similar to 'in_tail' - same functionality and parameters, same supported formats , etc, but tcp/udp enabled so I wouldn't need to create\tail that /var/log/apache/apache-access.log file. I think this would speed things up saving on disk I/O. Unfortunately I'm not very strong in programming to create such a plugin myself.
Hope this will help.