Urchin 7 - FTP Log Processing not happening.

55 views
Skip to first unread message

Reggie Simmons

unread,
Sep 19, 2012, 8:55:57 AM9/19/12
to urchin...@googlegroups.com
I need input on how to get Urchin v7 to process my FTP logs. I was told by Urchin support that the cs-Method must be a 'GET' in order to retrieve Visit and PageView Data. FTP logs don't use cs-Method = GET. Help!!!!
 

The consultant recommend the I  replace on all lines that had a regular expression match of: \[.*\]sent to: GET.  When I processed this log in Urchin 7, it resulted in 1091 visits, 7346 pageviews.

 

The above numbers basically say that there were 1091 unique IP addresses connected that day that downloaded at least one file for a total of 7346 downloads.  If you'd rather see all connections, regardless of if they logged in or downloaded, then we have to replace all cs-method's to GET requests and import the log.  To do this, I used a regular expression of \[.*\][A-Z]+\s and then replaced with "GET " (with the trailing space).  I was just doing this with a text editor though and the requirement here would be to process this prior to Urchin import.  Unfortunately, adding a log filter didn't work to replace these on the fly.

 

The result after this second processing was: 1458 visits and 57473 pageviews for just the 22nd of July.

 

The attached log shows this modification.  Basically, every line has a GET request:

2012-07-22 00:00:03 67.192.248.140 anonymous GET anonymous - 331 0 - - -

2012-07-22 00:00:03 67.192.248.140 curl_by...@haxx.se GET curl_by...@haxx.se - 230 0 - - -

2012-07-22 00:00:03 67.192.248.140 curl_by...@haxx.se GET pub - 250 0 - - -

2012-07-22 00:00:03 67.192.248.140 curl_by...@haxx.se GET special.requests - 250 0 - - -

2012-07-22 00:00:03 67.192.248.140 curl_by...@haxx.se GET cpi - 250 0 - - -

2012-07-22 00:00:03 67.192.248.140 curl_by...@haxx.se GET /pub/special.requests/cpi/cpiai.txt - 226 15176 - - -

2012-07-22 00:00:03 67.192.248.140 curl_by...@haxx.se GET - - 226 0 - - -

Jeff Sturm

unread,
Sep 19, 2012, 10:44:58 AM9/19/12
to urchin...@googlegroups.com

If you need to preprocess the log files before reading them into Urchin, there are a couple things you can do.

 

One is to run a scheduled command that runs prior to Urchin's schedule to generate the logs.  So if Urchin is going to read the logs at 2 AM daily, run a job to read/transform/write the log files at 1 AM (for example).   This can be problematic if you don't know exactly how long the job requires to complete, and there is a possibility the schedules will overlap.

 

Another way is to fetch the logs remotely.  Urchin can use HTTP to retrieve log files, and the URL can invoke a remote CGI script to do the log file processing.  This can be a little more complex to setup but has the advantage that it will run just-in-time as Urchin needs logs.

 

I've used both techniques for web logs (not FTP) with good results.

 

-Jeff

 

--
You received this message because you are subscribed to the Google Groups "Urchin 6" group.
To view this discussion on the web visit https://groups.google.com/d/msg/urchin-help-6/-/Jyvc1AyYNrwJ.
To post to this group, send email to urchin...@googlegroups.com.
To unsubscribe from this group, send email to urchin-help-...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/urchin-help-6?hl=en.

Reply all
Reply to author
Forward
0 new messages