I am trying to index some syslog data with the morphline sink. I have come to this problem
11 Mar 2015 15:58:39,086 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process:130) - Morphline /etc/flume-ng/solragent/conf/morphline.conf@morphline1 failed to process record: {Facility=[10], Severity=[6], _attachment_body=[[B@344042d9], host=[topbdii03], priority=[86], producer=[syslog], timestamp=[1426085909000]}
My morphline.conf is the below, I have used 3 messages but still getting the same error.
morphlines : [
{
# identification name for morphline.conf
id : morphline1
# Import all morphline commands in these java packages and their
# subpackages. Other commands that may be present on the classpath are
# not visible to this morphline.
#importCommands : ["com.cloudera.**", "org.apache.solr.**"]
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
# Commands that modify the stream file so we can index directly to solr
commands : [
{
# Parse input attachment and emit a record for each input line
readLine {
charset : UTF-8
}
}
{
grok {
# Consume the output record of the previous command and pipe another
# record downstream.
#
# A grok-dictionary is a config file that contains prefabricated
# regular expressions that can be referred to by name. grok patterns
# specify such a regex name, plus an optional output field name.
# The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
# The input line is expected in the "message" input field.
# dictionaryFiles : [src/test/resources/grok-dictionaries]
dictionaryFiles : [ "/usr/share/doc/search-1.0.0+cdh5.2.0+0/examples/solr-nrt/grok-dictionaries" ]
expressions : {
# message : """%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
message : """ %{POSINT:facility} %{POSINT:severity} %{DATA:body_attachment} %{SYSLOGHOST:host} %{POSINT:priority} %{DATA:producer} %{SYSLOGTIMESTAMP:timestamp} %{GREEDYDATA:msg}"""
# message : """<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:host} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}"""
}
}
}
# Consume the output record of the previous command, convert
# the timestamp, and pipe another record downstream.
#
# convert timestamp field to native Solr timestamp format
# e.g. 2012-09-06T07:14:34Z to 2012-09-06T07:14:34.000Z
{
convertTimestamp {
field : timestamp
inputFormats : ["yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "MMM d HH:mm:ss"]
inputTimezone : Europe/Zurich
outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
outputTimezone : CET
}
}
# Consume the output record of the previous command, transform it
# and pipe the record downstream.
#
# This command deletes record fields that are unknown to Solr
# schema.xml. Recall that Solr throws an exception on any attempt to
# load a document that contains a field that isn't specified in
# schema.xml.
{
sanitizeUnknownSolrFields {
# Location from which to fetch Solr schema
solrLocator : {
collection : SystemLogs # Name of solr collection
zkHost : "myserver:2181/solr" # ZooKeeper ensemble
}
}
}
# log the record at INFO level to SLF4J
{ logInfo { format : "output record: {}", args : ["@{}"] } }
# load the record into a Solr server or MapReduce Reducer
{
loadSolr {
solrLocator : {
collection : SystemLogs # Name of solr collection
zkHost : "myserver:2181/solr" # ZooKeeper ensemble
}
}
}
]
}
]
The problem is I cannot see why I am getting this error. The message command has the proper sequence. I used the
log4j.logger.com.cloudera.cdk.morphline=TRACE in log4j.properties but i did not get any different log. Any ideas?