Match string length in Morphlines grok

40 views
Skip to first unread message

Kamil Kantar

unread,
Mar 27, 2015, 11:51:16 AM3/27/15
to cdk...@cloudera.org
Hi all,

I have a grok pattern in morphlines that should match a multi-line record from WebLogic server (records delimited with "####") and I want to limit the length of the matching string to, say, 20000 chars to avoid parsing HUGE stacktraces that may appear in the log file. Instead I want to trim the message to the specified first number of chars:

message : """####<%{DATA:event_timestamp}>(\s+|\z)<%{DATA:severity}>(\s+|\z)<%{DATA:subsystem}>(\s+|\z)<%{DATA:serverName}>(\s+|\z)<%{DATA:wlsServer}>(\s+|\z)<%{DATA:threadId}>(\s+|\z)<%{DATA:user}>(\s+|\z)<%{DATA:transactionId}>(\s+|\z)<%{DATA:unknown1}>(\s+|\z)<%{DATA:unknown2}>(\s+|\z)<%{DATA:messageId}>(\s+|\z)<(?<logMessage>(.|\r|\n){1,20000})(.*\>\s*\z)"""

Example of a message:

####<Dec 20, 2014 10:31:56 AM UTC> <Warning> <Munger> <server.domain.com> <my_server1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <f5364be616ed55ae:-2195f7c3:14a6740b82d:-8000-0000000000000002> <1419071516685> <BEA-2156203> <A version attribute was not found in element application in the deployment descriptor. A version attribute is required, but this version of the Weblogic Server will assume that the JEE5 is used. Future versions of the Weblogic Server will reject descriptors that do not specify the JEE version.>

The grok debugger correctly trims the logMessage after a specified number of chars (say {1,50} ) but in Morphlines, records *longer* than the specified chars are not matched by my pattern ... 

Do you know where the problem might be?

Thanks!

Wolfgang Hoschek

unread,
Mar 27, 2015, 1:20:49 PM3/27/15
to Kamil Kantar, cdk...@cloudera.org
Sounds like the trailing pattern (.*\>\s*\z)""" can't match your actual input data.


--
You received this message because you are subscribed to the Google Groups "CDK Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.
For more options, visit https://groups.google.com/a/cloudera.org/d/optout.

Kamil Kantar

unread,
Mar 27, 2015, 2:55:29 PM3/27/15
to cdk...@cloudera.org, kamil....@gmail.com
Well, it matches it (you can try the grok debugger). The only case that it does not match the pattern is when the string is longer than 20000 chars (in case I use {1,20000} in the pattern)...

Wolfgang Hoschek

unread,
Mar 27, 2015, 3:16:32 PM3/27/15
to Kamil Kantar, cdk...@cloudera.org
{1,20000} matches up to 20000 chars, and what follows after those 20000 chars must match the rest of the regex pattern, and in your actual input data doesn't match the rest of the regex pattern. Try adjusting your regex pattern accordingly, or try adjusting your input data. The behavior of the logstash online grok debugger doesn't 100% match with the morphline grok impl.
Reply all
Reply to author
Forward
0 new messages