how can i write the regex for the following nginx access log format in fluentd ?

5,205 views
Skip to first unread message

Ayman Shorman

unread,
May 25, 2014, 9:34:03 AM5/25/14
to flu...@googlegroups.com

Hi,


I want to parse the following nginx log format into fluentd:

log_format main

'$remote_addr - $remote_user [$time_local] $request ' '"$status" $body_bytes_sent "$http_referer" ' '"$http_HOST" $HOST ' '"$http_user_agent" "$http_x_forwarded_for" ' 'upstream_response_time $upstream_response_time ' 'upstream_addr $upstream_addr ' 'msec $msec request_time $request_time';

i'm getting pattern not match when using format nginx


Thanks

Masahiro Nakagawa

unread,
May 25, 2014, 10:08:49 AM5/25/14
to flu...@googlegroups.com
Hi,

Please use this site for trying and testing your regex with your logs.


Masahiro



--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ayman Shorman

unread,
May 25, 2014, 10:24:08 AM5/25/14
to flu...@googlegroups.com
Hi Masahiro,

I can't write any regex to test it but i got this from the internet:

Nginx log format :


'$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';
fluentd regexp:

 format /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" "(?<forwarder>[^\"]*)")?/

How can i parse the the fields i have in my nginx log ?

Kiyoto Tamura

unread,
May 25, 2014, 7:05:13 PM5/25/14
to flu...@googlegroups.com
Hi Ayman,

Try the following

<source>
  type tail
  format /^(?<remote_addr>[^ ]*) - (?<remote_user>[^ ]*) \[(?<time>[^\]]*)\] "(?<request>\S+)" (?<status>[^ ]*) (?<body_bytes_sent>[^ ]*) "(?<http_referer>[^\"]*)" "(?<http_user_agent>[^\"]*)" "(?<http_x_forwarded_for>[^\"]*)"$/
  path /path/to/your/file
  # other parameters
</source>

You also need to set the time_format field in your in_tail config since I don't know what $local_time variable outputs for nginx.

Kiyoto

--
Check out Fluentd, the open source data collector for high-volume data streams

Ayman Shorman

unread,
May 25, 2014, 7:48:20 PM5/25/14
to flu...@googlegroups.com
Hi Kiyoto,

I tried it but i still getting pattern not match :(

here is snapshot of the nginx log

"192.168.6.118 - - [25/May/2014:23:44:21 +0000]  GET /images/templates/main/favicon_16_24_32.ico HTTP/1.1 \"200\" 1811 \"-\" \"test-domain.com\" test-domain.com \"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36\" \"-\" upstream_response_time 0.028 upstream_addr 192.168.6.7:8017 msec 1401061461.752 request_time 0.028"

Ayman

Kiyoto Tamura

unread,
May 25, 2014, 8:01:50 PM5/25/14
to flu...@googlegroups.com
This format is slightly different than what you showed me earlier. After the user agent string, how many of these fields (e.g. upstream_response_time, upstream_addr, etc.) exists? Is it ALWAYS just those four fields shown in your example?

Kiyoto

Ayman Shorman

unread,
May 25, 2014, 8:25:22 PM5/25/14
to flu...@googlegroups.com
Dear Kiyoto,

Thank you for your help.

  •  the following is 3 lines from my nginx log.

192.168.6.118 - - [25/May/2014:23:44:16 +0000]  GET /images/icons/slideshow-blt.png HTTP/1.1 "200" 342 "http://test-domain.com/en/jordan/" "test-domain.com" test-domain.com "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36" "-" upstream_response_time 0.040 upstream_addr 192.168.6.7:8017 msec 1401061456.995 request_time 0.040
192.168.6.118 - - [25/May/2014:23:44:17 +0000]  GET /images/templates/mian/3.1/blt-pt.gif HTTP/1.1 "200" 65 "http://test-domain.com/en/jordan/" "test-domain.com" test-domain.com "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36" "-" upstream_response_time 0.028 upstream_addr 192.168.6.7:8017 msec 1401061457.024 request_time 0.028
192.168.6.118 - - [25/May/2014:23:44:17 +0000]  GET /images/homepage/apps-fr.png HTTP/1.1 "200" 72558 "http://test-domain.com/en/jordan/" "test-domain.com" test-domain.com "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36" "-" upstream_response_time 0.035 upstream_addr 192.168.6.7:8017 msec 1401061457.038 request_time 0.035


  • and the following is from my nginx log format from nginx.conf

log_format main '$remote_addr - $remote_user [$time_local]  $request '
                        '"$status" $body_bytes_sent "$http_referer" '
                        '"$http_HOST" $HOST '
                        '"$http_user_agent" "$http_x_forwarded_for" '
                        'upstream_response_time $upstream_response_time '
                        'upstream_addr $upstream_addr '
                          'msec $msec request_time $request_time';


After user agent as you see in the log format i have : $http_x_forwarded_for, upstream_response_time, upstream_addr, msec And  $request_time.


Ayman,

Kiyoto Tamura

unread,
May 26, 2014, 1:51:53 AM5/26/14
to flu...@googlegroups.com
Ok. how about this:

http://fluentular.herokuapp.com/parse?regexp=^%28%3F%3Cremote_addr%3E[^+]*%29+-+%28%3F%3Cremote_user%3E[^+]*%29+\[%28%3F%3Ctime%3E[^\]]*%29\]\s%2B%28%3F%3Crequest_type%3E[^+]*%29+%28%3F%3Crequest_url%3E[^+]*%29+%28%3F%3Crequest_http_protocol%3E[^+]*%29+%22%28%3F%3Cstatus%3E[^%22]*%29%22+%28%3F%3Cbody_bytes_sent%3E[^+]*%29+%22%28%3F%3Chttp_referer%3E[^%22]*%29%22+%22%28%3F%3Chttp_host%3E[^%22]*%29%22+%28%3F%3Chost%3E[^+]*%29+%22%28%3F%3Chttp_user_agent%3E[^%22]*%29%22+%22%28%3F%3Chttp_x_forwarded_for%3E[^%22]*%29%22+upstream_response_time+%28%3F%3Cupstream_response_time%3E[^+]*%29+upstream_addr+%28%3F%3Cupstream_addr%3E[^+]*%29+msec+%28%3F%3Cmsec+request_time%3E[^+]*%29+request_time+%28%3F%3Crequest_time%3E[^+]*%29&input=192.168.6.118+-+-+[25%2FMay%2F2014%3A23%3A44%3A16+%2B0000]++GET+%2Fimages%2Ficons%2Fslideshow-blt.png+HTTP%2F1.1+%22200%22+342+%22http%3A%2F%2Ftest-domain.com%2Fen%2Fjordan%2F%22+%22test-domain.com%22+test-domain.com+%22Mozilla%2F5.0+%28Windows+NT+6.3%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F35.0.1916.114+Safari%2F537.36%22+%22-%22+upstream_response_time+0.040+upstream_addr+192.168.6.7%3A8017+msec+1401061456.995+request_time+0.040&time_format=%25d%2F%25b%2F%25Y%3A%25H%3A%25M%3A%25S+%25z

The format regex is

^(?<remote_addr>[^ ]*) - (?<remote_user>[^ ]*) \[(?<time>[^\]]*)\]\s+(?<request_type>[^ ]*) (?<request_url>[^ ]*) (?<request_http_protocol>[^ ]*) "(?<status>[^"]*)" (?<body_bytes_sent>[^ ]*) "(?<http_referer>[^"]*)" "(?<http_host>[^"]*)" (?<host>[^ ]*) "(?<http_user_agent>[^"]*)" "(?<http_x_forwarded_for>[^"]*)" upstream_response_time (?<upstream_response_time>[^ ]*) upstream_addr (?<upstream_addr>[^ ]*) msec (?<msec request_time>[^ ]*) request_time (?<request_time>[^ ]*)

Ayman Shorman

unread,
May 26, 2014, 2:51:39 PM5/26/14
to flu...@googlegroups.com

Thank you Kiyoto YOU ARE ROCK, it's working fine with me.

Appreciate it.


Regards,
Reply all
Reply to author
Forward
0 new messages