pattern not match: nginx

123 views
Skip to first unread message

Kilian Ries

unread,
May 8, 2017, 11:41:57 AM5/8/17
to Fluentd Google Group
Hi,

i'm trying to get my custom nginx access_log format into fluentd but it won't work. I tried my regex with fluentular and fluentd-ui and it shows no errors:


http://fluentular.herokuapp.com/parse?regexp=%5E%28%3F%3Cremote%3E%5B%5E+%5D*%29+%28%3F%3Chost%3E%5B%5E+%5D*%29+%5C%5B%28%3F%3Ctime%3E%5B%5E%5C%5D%5D*%29%5C%5D+%22%28%3F%3Cmethod%3E%5CS%2B%29+%2B%28%3F%3Cpath%3E%5B%5E%5C%22%5D*+%5CS*%29%3F%22+%28%3F%3Ccode%3E%5B%5E+%5D*%29+%28%3F%3Csize%3E%5B%5E+%5D*%29+%28%3F%3Crequest_time%3E%5B%5E+%5D*%29+%22%28%3F%3Creferer%3E%5B%5E%5C%22%5D*%29%22+%22%28%3F%3Cagent%3E%5B%5E%5C%22%5D*%29%22%24&input=10.44.0.0+deployment-dev-1033774263-0n60n+%5B08%2FMay%2F2017%3A15%3A53%3A52+%2B0200%5D+%22GET+%2Fcore%2Fmisc%2Ffavicon.ico+HTTP%2F2.0%22+200+5430+0.006+%22https%3A%2F%2Fproxy-d.kilian-ries.de%2F%22+%22Mozilla%2F5.0+%28Macintosh%3B+Intel+Mac+OS+X+10_12_4%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F57.0.2987.133+Safari%2F537.36%22&time_format=%25d%2F%25b%2F%25Y%3A%25H%3A%25M%3A%25S+%25z


td-agent error:

2017-05-08 17:34:35 +0200 [warn]: pattern not match: "May  8 17:34:35 deployment-dev-1033774263-0n60n nginx: 10.44.0.0 deployment-dev-1033774263-0n60n [08/May/2017:17:34:35 +0200] \"GET /sites/default/files/css/A.css_zK72-b6vfmaWB0-rJ1kPTuIMeQ1GMmhvuabgTvTgalg.css,,q0+css_T-mqss2oLRLX3dQ4SIIW4vkLtYqP9I1qJ7-gfyLGexI.css,,q0,Mcc.mz5GaZoFDl.css.pagespeed.cf.jT_ESDefd7.css HTTP/2.0\" 200 12101 0.001 \"https://proxy-d.kilian-ries.de/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\""


fluentd config:

<source>
  @type syslog
  protocol_type udp
  port 9514
  format /^(?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) +(?<path>[^\"]* \S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"$/
  tag nginx.access
  time_format %d/%b/%Y:%H:%M:%S %z 
</source>

<match nginx.access>
  @type stdout
</match>


nginx log format:

log_format fluent_access '$remote_addr $hostname [$time_local] ' '"$request" $status $body_bytes_sent $request_time "$http_referer" ' '"$http_user_agent"'; 


nginx log example:

10.44.0.0 deployment-dev-1033774263-0n60n [08/May/2017:17:34:35 +0200] "GET /sites/default/files/css/A.css_Z5jMg7P_bjcW9iUzujI7oaechMyxQTUqZhHJ_aYSq04.css,q0.pagespeed.cf.y5ovlwmdYo.css HTTP/2.0" 200 270 0.002 "https://proxy-d.kilian-ries.de/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
10.44.0.0 deployment-dev-1033774263-0n60n [08/May/2017:17:34:35 +0200] "GET /core/themes/bartik/logo.svg HTTP/2.0" 200 1883 0.005 "https://proxy-d.kilian-ries.de/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"

fluentd version:

0.12.35


Does anybody know why i'm seeing the error but fluentular and fluentd-ui are reporting NO errors?

Thanks
Kilian

Mr. Fiber

unread,
May 8, 2017, 11:49:31 AM5/8/17
to Fluentd Google Group
2017-05-08 17:34:35 +0200 [warn]: pattern not match: "May  8 17:34:35 deployment-dev-1033774263-0n60n nginx: 10.44.0.0 deployment-dev-1033774263-0n60n [08/May/2017:17:34:35 +0200] \"GET /sites/default/files/css/A.css_zK72-b6vfmaWB0-rJ1kPTuIMeQ1GMmhvuabgTvTgalg.css,,q0+css_T-mqss2oLRLX3dQ4SIIW4vkLtYqP9I1qJ7-gfyLGexI.css,,q0,Mcc.mz5GaZoFDl.css.pagespeed.cf.jT_ESDefd7.css HTTP/2.0\" 200 12101 0.001 \"https://proxy-d.kilian-ries.de/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\""

This log has "May  8 17:34:35 deployment-dev-1033774263-0n60n nginx: " string before your log example.
It seems not fluentd issue.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kilian Ries

unread,
May 8, 2017, 11:55:22 AM5/8/17
to Fluentd Google Group
I thought this is coming from td-agent? Because if i tell nginx to write a logfile it starts correctly with "10.44.0.0 deployment-dev-103377...". 

Maybe that comes form the nginx syslog logger ...

Kilian Ries

unread,
May 9, 2017, 3:37:47 AM5/9/17
to Fluentd Google Group
Ok indeed that seems to come from syslog ... i changed my regex:

http://fluentular.herokuapp.com/parse?regexp=%5E%28%3F%3Csyslogtimestamp%3E%22%5CS%2B+%5B%5Cs%7C%5Cd%5D%5Cd+%5Cd%7B2%7D%3A%5Cd%7B2%7D%3A%5Cd%7B2%7D%29+%28%3F%3Clogfile%3Enginx%3A%29+%28%3F%3Cremote%3E%5B%5E+%5D*%29+%28%3F%3Chost%3E%5B%5E+%5D*%29+%5C%5B%28%3F%3Ctime%3E%5B%5E%5C%5D%5D*%29%5C%5D+%5C%5C%22%28%3F%3Cmethod%3E%5CS%2B%29+%2B%28%3F%3Cpath%3E%5B%5E%5C%22%5D*+%5CS*%29%3F%5C%5C%22+%28%3F%3Ccode%3E%5B%5E+%5D*%29+%28%3F%3Csize%3E%5B%5E+%5D*%29+%28%3F%3Crequest_time%3E%5B%5E+%5D*%29+%5C%5C%22%28%3F%3Creferer%3E%5B%5E%5C%22%5D*%29%5C%5C%22+%5C%5C%22%28%3F%3Ca%0D%0Agent%3E%5B%5E%5C%22%5D*%29%5C%5C%22%22%24&input=%22May++9+09%3A03%3A59+nginx%3A+10.44.0.0+deployment-dev-1033774263-0n60n+%5B09%2FMay%2F2017%3A09%3A03%3A59+%2B0200%5D+%5C%22GET+%2Fcore%2Fmisc%2Ficons%2F505050%2Floupe.svg+HTTP%2F2.0%5C%22+200+491+0.012+%5C%22https%3A%2F%2Fproxy-d.kilian-ries.de%2Fsites%2Fdefault%2Ffiles%2Fcss%2FA.css_zK72-b6vfmaWB0-rJ1kPTuIMeQ1GMmhvuabgTvTgalg.css%2C%2Cq0%2Bcss_T-mqss2oLRLX3dQ4SIIW4vkLtYqP9I1qJ7-gfyLGexI.css%2C%2Cq0%2CMcc.mz5GaZoFDl.css.pagespeed.cf.jT_ESDefd7.css%5C%22+%5C%22Mozilla%2F5.0+%28Macintosh%3B+Intel+Mac+OS+X+10_12_4%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F57.0.2987.133+Safari%2F537.36%5C%22%22&time_format=%25d%2F%25b%2F%25Y%3A%25H%3A%25M%3A%25S+%25z

td-agent error:

2017-05-09 09:32:05 +0200 [warn]: pattern not match: "May  9 09:32:05 nginx: 10.44.0.0 deployment-dev-1033774263-0n60n [09/May/2017:09:32:05 +0200] \"GET /sites/default/files/css/A.css_Z5jMg7P_bjcW9iUzujI7oaechMyxQTUqZhHJ_aYSq04.css,q0.pagespeed.cf.y5ovlwmdYo.css HTTP/2.0\" 200 270 0.003 \"https://proxy-d.kilian-ries.de/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\""
2017-05-09 09:32:05 +0200 [warn]: pattern not match: "May  9 09:32:05 nginx: 10.44.0.0 deployment-dev-1033774263-0n60n [09/May/2017:09:32:05 +0200] \"GET /core/themes/bartik/logo.svg HTTP/2.0\" 200 1883 0.006 \"https://proxy-d.kilian-ries.de/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\""

fluentd regex:

format /^(?<syslogtimestamp>"\S+ [\s|\d]\d \d{2}:\d{2}:\d{2}) (?<logfile>nginx:) (?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+) +(?<path>[^\"]* \S*)?\\" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\""$/

Mr. Fiber

unread,
May 9, 2017, 7:59:18 AM5/9/17
to Fluentd Google Group
Yes because your fluentular's Test String is bad.

"May  9 09:32:05 nginx: 10.44.0.0 deployment-dev-1033774263-0n60n [09/May/2017:09:32:05 +0200] \"GET /sites/default/files/css/A.css_Z5jMg7P_bjcW9iUzujI7oaechMyxQTUqZhHJ_aYSq04.css,q0.pagespeed.cf.y5ovlwmdYo.css HTTP/2.0\" 200 270 0.003 \"https://proxy-d.kilian-ries.de/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\""

This is escaped result for logging.
Please use actual nginx syslog content.


Kilian Ries

unread,
May 9, 2017, 10:25:25 AM5/9/17
to Fluentd Google Group
With escaped you mean the double backslashes in front of the quotes, or? (\\")

I tried it already with and without escaping:

td-agent regex (with quotes at the beginning / end + escaped):
  format /^(?<syslogtimestamp>"\S+ [\s|\d]\d \d{2}:\d{2}:\d{2}) (?<logfile>nginx:) (?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+) +(?<path>[^\"]* \S*)?\\" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\""$/

td-agent regex (with quotes at the beginning / end):
  format /^(?<syslogtimestamp>"\S+ [\s|\d]\d \d{2}:\d{2}:\d{2}) (?<logfile>nginx:) (?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) +(?<path>[^\"]* \S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)""$/

td-agent regex (excaped):
  format /^(?<syslogtimestamp>\S+ [\s|\d]\d \d{2}:\d{2}:\d{2}) (?<logfile>nginx:) (?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+) +(?<path>[^\"]* \S*)?\\" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\"$/

td-agent regex ():
  format /^(?<syslogtimestamp>\S+ [\s|\d]\d \d{2}:\d{2}:\d{2}) (?<logfile>nginx:) (?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) +(?<path>[^\"]* \S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"$/


Non of that is working ... i don't know what td-agent expects?

Kilian Ries

unread,
May 11, 2017, 5:23:26 AM5/11/17
to Fluentd Google Group
finally got it working with lots of debugging:

format /^(?<syslogtimestamp>\S+ [\s|\d]\d \d{2}:\d{2}:\d{2}) (?<logfile>nginx_access:) (?<remote>[^ ]*) (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) +(?<path>[^\"]* \S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<request_time>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"$/


Tried it step by step with the following regex:

(?<everything_else>.*)
Reply all
Reply to author
Forward
0 new messages