So we're pretty familiar with td-agent configuration and have been using it successfully to deliver our logs.
However we can't solve our multi-line multi-format log parsing challenge.
12:21:38,543 INFO [RegistrationNotifier] Notified CRS by creating sequential ephemeral node at [/component/active/1.1/aws1-prod-pub/maintenance-producer_internal_1.6.6_0000002534] with data [com.somecompany.registration.domain.ComponentDep
loyment@6181b53f[location=com.somecompany.registration.domain.Location@aa26d1c[dataCenter=aws1,environmentName=prod,environmentIndex=pub],artifactDetailsDelegate=com.somecompany.registration.notifier.MavenInfo@2da94eb8[groupId=com.somecompany.eps.m
aintenance,artifactId=maintenance-producer,version=1.6.6],connectionDetailsDelegate=com.somecompany.registration.util.ReadableConnectionDetails@466f2973[inetAddress=i-9fedcf03.prod-pub.aws1/
10.10.20.232,jmxPort=1100,webAppServerPort=<null>
],site=internal,capabilities=[EPS_PRODUCER],runtimeConfigurationProperties={application=maintenance, brokerHost=10.10.20.95, brokerName=publishing-1-prod-pub, databaseUrl=jdbc:postgresql://
dbwsp.global-vip.somecompany.com:5432/DBWSP, br
okerId=ID:i-883c3879-43760-1464126857921-0:1, brokerAddress=/
10.10.20.95:1102, dataSet=maintenance}]].
[ActiveMQ Task-3] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-5] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-7] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-9] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-11] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-13] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-15] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-17] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-19] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-21] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:110212:32:13,154 INFO [MaintenanceDeltaTask] Starting single load...
12:32:13,580 INFO [MaintenanceDeltaTask] Single load complete in 0 seconds
[ActiveMQ Task-23] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-25] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-27] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-29] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-31] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-33] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-35] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:1102[ActiveMQ Task-37] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:110212:38:49,898 ERROR [ControllableProducer] newTask method error
java.lang.NullPointerException
at java.util.concurrent.FutureTask.<init>(FutureTask.java:134)
at com.somecompany.eps.producer.AbstractProducer$2.<init>(AbstractProducer.java:154)
at com.somecompany.eps.producer.AbstractProducer.execute(AbstractProducer.java:153)
at com.somecompany.eps.producer.ControllableProducer.messageReceived(ControllableProducer.java:536)
at com.somecompany.eps.producer.ControllableProducer.newTask(ControllableProducer.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at javax.management.modelmbean.RequiredModelMBean$4.run(RequiredModelMBean.java:1245)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at javax.management.modelmbean.RequiredModelMBean.invokeMethod(RequiredModelMBean.java:1239)
at javax.management.modelmbean.RequiredModelMBean.invoke(RequiredModelMBean.java:1077)
at org.springframework.jmx.export.SpringModelMBean.invoke(SpringModelMBean.java:90)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
12:39:23,620 INFO [MaintenanceDeltaTask] Starting single load...
[ActiveMQ Task-39] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully connected to tcp://
10.10.20.95:110212:39:23,733 INFO [TimingLogger] Performance Statistics 12:32:00 - 12:32:15
Tag Avg(ms) Min Max Std Dev Count
AbstractProducer.dequeue 36.0 36 36 0.0 1
You see that there's a few log formats happening and sometimes the output is multiline (so multi-line multi-format logs).
<source>
type tail
format multi_format
<pattern>
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d{3}/
format1 /(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d{3}\S*) (?<level>[^\s]+)\s+\[(?<thread>\S*)\s*\] (?<message>(.|\n)*)/
</pattern>
<pattern>
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\,\d{3}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\,\d{3}) \[(?<thread>.*)\] (?<level>[^\s]+)\s+(?<message>(.|\n)*)/
</pattern>
<pattern>
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d{3}\S*) (?<message>(.|\n)*)/
</pattern>
<pattern>
format multiline
format_firstline /Exception in thread/
format1 /Exception in thread "(?<thread>[^\"]+)" (?<message>(.|\n)*)/
</pattern>
<pattern>
format none
</pattern>
path /var/log/td-agent/ecs/*/*-consumer:*.log,/var/log/td-agent/ecs/*/*-producer:*.log
pos_file /var/log/td-agent/ecs/eps.pos
tag eps.log.*
refresh_interval 5
</source>
The issue with the above is, while regexp is correct, multi-line logs get broken up into multiple entries because multi_format plugin doesn't support multiline parsing.
Can you suggest some workarounds or fixes short of rewriting multi_format plugin? We're really stuck here.