in_tail multiline assistance

1,089 views
Skip to first unread message

David Jacobson

unread,
Feb 17, 2015, 9:39:05 AM2/17/15
to flu...@googlegroups.com
Hi There,

I had logstash running sending multi line java logs to Elasticsearch fine using the following regexp:

filter {
# stacktrace java as one message
  multiline {
 #type => "all" # no type means for all inputs
    pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
    what => "previous"
  }
}

I am trying to do the same using the in_tail multiline format but I cannot get the syntax right. Could someone point me in the right direction here? Here is an example snippet of the log:

-- snip --
2015-02-17 16:32:19,829 INFO  [LmtpServer-3866] [name=hid...@email.com;mid=835;ip=20.4.5.143;] lmtp - S: 452 4.2.2 Over quota (DATA)
2015-02-17 16:32:19,856 INFO  [ImapServer-9] [name=hid...@email.com;mid=3262;ip=20.4.5.104;oip=212.33.134.66;via=10.1.5.193(nginx/1.2.0-zimbra);ua=iPhone Mail/12B466;] imap - UID SEARCH elapsed=9
2015-02-17 16:32:19,856 INFO  [LmtpServer-3870] [ip=20.4.5.143;] lmtp - Delivering message: size=47744 bytes, nrcpts=1, sender=bounce-1465...@email.com, msgid=<LYRIS-1538802-1465487-2015.02.15-00.10.07--hidden#email...@email.com>
2015-02-17 16:32:19,857 ERROR [LmtpServer-3870] [name=hid...@email.com;mid=3279;ip=20.4.5.143;] jsieve - Evaluation failed. Reason: null
2015-02-17 16:32:19,857 WARN  [LmtpServer-3870] [name=hid...@email.com;mid=3279;ip=20.4.5.143;] filter - An error occurred while processing filter rules. Filing message to /Inbox.
com.zimbra.cs.filter.ZimbraSieveException
at com.zimbra.cs.filter.ZimbraMailAdapter.executeActions(ZimbraMailAdapter.java:281)
at org.apache.jsieve.SieveFactory.evaluate(SieveFactory.java:173)
at com.zimbra.cs.filter.RuleManager.applyRulesToIncomingMessage(RuleManager.java:340)
at com.zimbra.cs.filter.RuleManager.applyRulesToIncomingMessage(RuleManager.java:302)
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliverMessageToLocalMailboxes(ZimbraLmtpBackend.java:614)
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliver(ZimbraLmtpBackend.java:384)
at com.zimbra.cs.lmtpserver.LmtpHandler.processMessageData(LmtpHandler.java:378)
at com.zimbra.cs.lmtpserver.TcpLmtpHandler.continueDATA(TcpLmtpHandler.java:75)
at com.zimbra.cs.lmtpserver.LmtpHandler.doDATA(LmtpHandler.java:367)
at com.zimbra.cs.lmtpserver.LmtpHandler.processCommand(LmtpHandler.java:183)
at com.zimbra.cs.lmtpserver.TcpLmtpHandler.processCommand(TcpLmtpHandler.java:68)
at com.zimbra.cs.server.ProtocolHandler.processConnection(ProtocolHandler.java:190)
at com.zimbra.cs.server.ProtocolHandler.run(ProtocolHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.zimbra.cs.mailbox.MailServiceException: mailbox exceeded quota of 1073741824 bytes
ExceptionId:LmtpServer-3870:1424183539857:71a8aeb42e47fb67
Code:mail.QUOTA_EXCEEDED Arg:(limit, NUM, "1073741824")
at com.zimbra.cs.mailbox.MailServiceException.QUOTA_EXCEEDED(MailServiceException.java:355)
at com.zimbra.cs.mailbox.Mailbox.checkSizeChange(Mailbox.java:1376)
at com.zimbra.cs.mailbox.Mailbox.addMessageInternal(Mailbox.java:5909)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.java:5783)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.java:5717)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.java:5712)
at com.zimbra.cs.filter.IncomingMessageHandler.addMessage(IncomingMessageHandler.java:133)
at com.zimbra.cs.filter.IncomingMessageHandler.implicitKeep(IncomingMessageHandler.java:125)
at com.zimbra.cs.filter.ZimbraMailAdapter.doDefaultFiling(ZimbraMailAdapter.java:346)
at com.zimbra.cs.filter.ZimbraMailAdapter.executeActions(ZimbraMailAdapter.java:221)
... 15 more
2015-02-17 16:32:19,857 INFO  [LmtpServer-3870] [name=hid...@email.com;mid=3279;ip=20.14.5.143;] lmtp - rejecting message from=bounce-1465...@email.com,to=hid...@email.com: overquota
-- snip --

Thanks,
David

Mr. Fiber

unread,
Feb 17, 2015, 10:07:49 AM2/17/15
to flu...@googlegroups.com
Hi David,

Could you paste your configuration here?


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Jacobson

unread,
Feb 18, 2015, 5:49:24 AM2/18/15
to flu...@googlegroups.com
Hi Massahiro,

Here's what works without the multiline support:

<source>
  type tail
  format none
  path /opt/zimbra/log/mailbox.log
  pos_file /tmp/mailbox-pos.log
  tag zimbra.mailbox
</source>

<match zimbra.**>
  type forward
  host 10.1.5.93
</match>

I've tried changing format to various things including:

format /(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)/

But received a syntax error.

Thanks,
David

On Tuesday, February 17, 2015 at 5:07:49 PM UTC+2, repeatedly wrote:
Hi David,

Could you paste your configuration here?


Masahiro
On Tue, Feb 17, 2015 at 11:39 PM, David Jacobson <jake...@gmail.com> wrote:
Hi There,

I had logstash running sending multi line java logs to Elasticsearch fine using the following regexp:

filter {
# stacktrace java as one message
  multiline {
 #type => "all" # no type means for all inputs
    pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
    what => "previous"
  }
}

I am trying to do the same using the in_tail multiline format but I cannot get the syntax right. Could someone point me in the right direction here? Here is an example snippet of the log:

-- snip --
2015-02-17 16:32:19,829 INFO  [LmtpServer-3866] [name=hid...@email.com;mid=835;ip=20.4.5.143;] lmtp - S: 452 4.2.2 Over quota (DATA)
2015-02-17 16:32:19,856 INFO  [ImapServer-9] [name=hid...@email.com;mid=3262;ip=20.4.5.104;oip=212.33.134.66;via=10.1.5.193(nginx/1.2.0-zimbra);ua=iPhone Mail/12B466;] imap - UID SEARCH elapsed=9
2015-02-17 16:32:19,856 INFO  [LmtpServer-3870] [ip=20.4.5.143;] lmtp - Delivering message: size=47744 bytes, nrcpts=1, sender=bounce-1465...@email.com, msgid=<LYRIS-1538802-1465487-2015.02.15-00.10.07--hidden#email....@email.com>

Mr. Fiber

unread,
Feb 18, 2015, 2:27:04 PM2/18/15
to flu...@googlegroups.com
I tried following configuration and it seems work.

<source>
  type tail

  tag zimbra.mailbox
  path /opt/zimbra/log/mailbox.log
  pos_file /tmp/mailbox-pos.log

  format multiline
  format_firstline /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})/  # Need to parse stacktrace parts
  format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3}) (?<level>[^\s]+)(?: *)\[(?<thread>.*?)\] (?<message>.*)/
</source>


Masahiro

David Jacobson

unread,
Feb 20, 2015, 2:30:47 AM2/20/15
to flu...@googlegroups.com
Hi Masahiro

Worked perfectly. Did you use some sort of regexp parser - or was that just ninja skills?

Thanks a lot!

Regards,
David
2015-02-17 16:32:19,856 INFO  [LmtpServer-3870] [ip=20.4.5.143;] lmtp - Delivering message: size=47744 bytes, nrcpts=1, sender=bounce-1465...@email.com, msgid=<LYRIS-1538802-1465487-2015.02.15-00.10.07--hidden#email.c...@email.com>

Mr. Fiber

unread,
Feb 20, 2015, 5:43:35 AM2/20/15
to flu...@googlegroups.com
> Did you use some sort of regexp parser - or was that just ninja skills?

No. I don't have a ninja skill for regexp :p
in_tail's multiline mode is based on deprecated fluent-plugin-tail-multiline.


I reuse above configuration and add format_firstline to detect start line of multiline logs.
Maybe we should add more examples to in_tail document: http://docs.fluentd.org/articles/in_tail

BTW, If you want to test regexp for your logs, you can use fluentular or fluentd-ui.


These products don't support multiline mode yet so
only one line could be checked. But these are useful.

fluentd-ui is bundled as td-agent-ui in td-agent 2.
You can use fluentd-ui by doing 'sudo /usr/sbin/td-agent-ui start'.


Masahiro

Lance N.

unread,
Feb 20, 2015, 4:44:40 PM2/20/15
to flu...@googlegroups.com
http://rubular.com/ 
Lets you debug Ruby regular expressions


On Tuesday, February 17, 2015 at 6:39:05 AM UTC-8, David Jacobson wrote:
Hi There,

I had logstash running sending multi line java logs to Elasticsearch fine using the following regexp:

filter {
# stacktrace java as one message
  multiline {
 #type => "all" # no type means for all inputs
    pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
    what => "previous"
  }
}

I am trying to do the same using the in_tail multiline format but I cannot get the syntax right. Could someone point me in the right direction here? Here is an example snippet of the log:

-- snip --
2015-02-17 16:32:19,829 INFO  [LmtpServer-3866] [name=hid...@email.com;mid=835;ip=20.4.5.143;] lmtp - S: 452 4.2.2 Over quota (DATA)
2015-02-17 16:32:19,856 INFO  [ImapServer-9] [name=hid...@email.com;mid=3262;ip=20.4.5.104;oip=212.33.134.66;via=10.1.5.193(nginx/1.2.0-zimbra);ua=iPhone Mail/12B466;] imap - UID SEARCH elapsed=9
2015-02-17 16:32:19,856 INFO  [LmtpServer-3870] [ip=20.4.5.143;] lmtp - Delivering message: size=47744 bytes, nrcpts=1, sender=bounce-1465...@email.com, msgid=<LYRIS-1538802-1465487-2015.02.15-00.10.07--hidden#email....@email.com>

JP D

unread,
Feb 1, 2017, 4:41:38 PM2/1/17
to Fluentd Google Group
Thanks for this post, I'm a newbie to log parsing/aggregation but have a similar logfile. I found this super helpful!
I'm wanting to extend this to parse some more of the stacktrace  aka. the '# Need to parse stacktrace parts' bit.

I added a tag <message> which just grabs the rest of the stacktrace and that works well but I would like to split it further into something like <exception> <cause>.
<exception> would be everything before the first 'Caused by' and <cause> everything after (ideally I'd like to split it up even further but this would be a good start).

This is my current failing regex, I keep getting a 'got incomplete line before first line' error message (right from the first line of my logfile which is a single line).
I think its something to do with the fact that only the multiline (stacktraces) have the 'Caused by' part and the single log lines do not.

  format multiline
  format_firstline /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})(?<exception>.*?)(?<cause>(Caused by.*))/  # Need to parse stacktrace parts
  format1 /^(?<time>[^,]*)(.*?)[ \t]+(?<level>[^ ]*)[ \t]+(?<class>[^ ]*)[ \t-]+(?<message>.*)$/


Any help would be much appreciated.
JP

On Wednesday, 18 February 2015 03:39:05 UTC+13, David Jacobson wrote:
Hi There,

I had logstash running sending multi line java logs to Elasticsearch fine using the following regexp:

filter {
# stacktrace java as one message
  multiline {
 #type => "all" # no type means for all inputs
    pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
    what => "previous"
  }
}

I am trying to do the same using the in_tail multiline format but I cannot get the syntax right. Could someone point me in the right direction here? Here is an example snippet of the log:

-- snip --
2015-02-17 16:32:19,829 INFO  [LmtpServer-3866] [name=hid...@email.com;mid=835;ip=20.4.5.143;] lmtp - S: 452 4.2.2 Over quota (DATA)
2015-02-17 16:32:19,856 INFO  [ImapServer-9] [name=hid...@email.com;mid=3262;ip=20.4.5.104;oip=212.33.134.66;via=10.1.5.193(nginx/1.2.0-zimbra);ua=iPhone Mail/12B466;] imap - UID SEARCH elapsed=9
2015-02-17 16:32:19,856 INFO  [LmtpServer-3870] [ip=20.4.5.143;] lmtp - Delivering message: size=47744 bytes, nrcpts=1, sender=bounce-1465...@email.com, msgid=<LYRIS-1538802-1465487-2015.02.15-00.10.07--hidden#email....@email.com>

JP D

unread,
Feb 1, 2017, 5:29:26 PM2/1/17
to Fluentd Google Group
I have come up with a solution.

  format multiline
  format_firstline /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})/  # Need to parse stacktrace parts
  format1 /^((?<time>[^,]*)(.*?)[ \t]+(?<level>[^ ]*)[ \t]+(?<class>[^ ]*)[ \t-]+(?<excption>(.*)?)(?<cause>(Caused by.*)))|((?<time>[^,]*)(.*?)[ \t]+(?<level>[^ ]*)[ \t]+(?<class>[^ ]*)[ \t-]+(?<message>.*)$)/

format1 tries to match with a multiline exception (something containing a 'Caused By') otherwise it must be a single line.
I will join the first section together (time, level, class) followed by ((exception,cause)OR(message)) but
any suggestions on how I can improve this further would be much appreciated as this seems a rather hacky.

JP D

unread,
Feb 1, 2017, 5:41:10 PM2/1/17
to Fluentd Google Group
Here's the slightly improved version:

  format_firstline /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})/  # Need to parse stacktrace parts
  format1 /^((?<time>[^,]*)(.*?)[ \t]+(?<level>[^ ]*)[ \t]+(?<class>[^ ]*)[ \t-]+)(((?<excption>(.*)?)(?<cause>(Caused by.*)))|((?<message>.*)$))/
  time_format %Y-%m-%d %T

JP

On Thursday, 2 February 2017 10:41:38 UTC+13, JP D wrote:

JP D

unread,
Feb 2, 2017, 7:33:53 PM2/2/17
to Fluentd Google Group
Have changed my implementation somewhat and thought I'd post it for anyone interested.
It has two formats.
Single line 
  • time
  • level
  • class
  • message - The rest of the line
Multi line stacktrace 
  • time
  • level
  • class
  • message - The cause of the error eg. "Caused by: com.foo.bar.JsonParsingException: Could not cooerce bla into subtype of [abc type, class com.foo.bar.abcdef]\n at [Source: java.io.ByteArrayInputStream@1237a6d51; line: 7, column: 2631]"
  • raw - The full formatted stacktrace (including newlines)
Config

<source>
  @type tail
  path some.log
  pos_file some.log.pos
  tag some.access
  format multiline
  format_firstline /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{1,3})/  # Need to parse stacktrace parts
  format1 /^((?<time>[^,]*)(.*?)[ \t]+(?<level>[^ ]*)[ \t]+(?<class>[^ ]*)[ \t-]+)((?<message>.*)$)/
  time_format %Y-%m-%d %T
</source>

# modify tag if multiline stacktrace
<match some.access>
  @type rewrite_tag_filter
  #if message contains the 'Caused by:' append stacktrace to tag
  rewriterule1 message  \bCaused\ by\b  stacktrace.${tag}
  # retain tag of any other single line events NOTE. can't be just ${tag} as it will enter infinite loop
  rewriterule2 message .* single.${tag}
</match>

# remove newlines and save raw stacktrace
<filter stacktrace.**>
  @type record_transformer
  enable_ruby true
  <record>
    # duplicate stacktrace into raw field (full stacktrace)
    raw ${record["message"]}
    # remove newlines from message to allow regex parser below
    message ${message.gsub("\n","\\n")}
  </record>
</filter>

# pull out cause from stacktrace
<filter stacktrace.**>
  @type parser
  reserve_data true
  # override message field with 'Caused by java.lang....' up until line number
  format /^((.*)?(?<message>((Caused by.*\]))))/
  key_name message
</filter>

# add hostname and tags to all records
<filter **>
  @type record_transformer
  <record>
    #Host Specific
    hostname somehost
    tag ${tag}
  </record>
</filter>

<match ** .... send the logs somewhere

JP

Yasin Amadmia

unread,
Feb 7, 2017, 10:36:28 AM2/7/17
to Fluentd Google Group
David (and all),

https://regex101.com/ best regex site I found so far. 

Also for help, you could go to IRC channel #regex


On Tuesday, 17 February 2015 14:39:05 UTC, David Jacobson wrote:
Hi There,

I had logstash running sending multi line java logs to Elasticsearch fine using the following regexp:

filter {
# stacktrace java as one message
  multiline {
 #type => "all" # no type means for all inputs
    pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
    what => "previous"
  }
}

I am trying to do the same using the in_tail multiline format but I cannot get the syntax right. Could someone point me in the right direction here? Here is an example snippet of the log:

-- snip --
2015-02-17 16:32:19,829 INFO  [LmtpServer-3866] [name=hid...@email.com;mid=835;ip=20.4.5.143;] lmtp - S: 452 4.2.2 Over quota (DATA)
2015-02-17 16:32:19,856 INFO  [ImapServer-9] [name=hid...@email.com;mid=3262;ip=20.4.5.104;oip=212.33.134.66;via=10.1.5.193(nginx/1.2.0-zimbra);ua=iPhone Mail/12B466;] imap - UID SEARCH elapsed=9
2015-02-17 16:32:19,856 INFO  [LmtpServer-3870] [ip=20.4.5.143;] lmtp - Delivering message: size=47744 bytes, nrcpts=1, sender=bounce-1465...@email.com, msgid=<LYRIS-1538802-1465487-2015.02.15-00.10.07--hidden#email....@email.com>
Reply all
Reply to author
Forward
0 new messages