Failing log line, revisited

2 views
Skip to first unread message

Mohanaraj

unread,
Mar 28, 2008, 5:48:55 AM3/28/08
to loghetti-dev
Hello all,

Somewhat like Paul Nasrat's issue - while parsing a rather large log
file, the logging halted. Turning on debug mode resulted in the
followinf exception :

Traceback (most recent call last):
File "build/bdist.linux-i686/egg/CommandLineApp.py", line 386, in
run
exit_code = self.main(*main_args)
File "loghetti.py", line 230, in main
for line in myfilter.strainer():
File "loghetti.py", line 45, in strainer
for line in self.log:
File "/home/lca/loghetti/apachelogs.py", line 82, in __iter__
log_line = ApacheLogLine(*m.groups())
File "/home/lca/loghetti/apachelogs.py", line 54, in __init__
self.http_method, self.url, self.http_vers =
self.request_line.split()
ValueError: need more than 1 value to unpack

It turns out the log line it was choking on looks as follows:

60.51.36.91 - - [26/Mar/2008:19:44:20 +0800] "\xff\xf4\xff\xfd\x06"
501 327 "-" "-"

So whats probably happening here is you have a non standard client not
sending a method or a version and this results in the unpack error in
line 54. Probably some probe looking for non vulnerabilities.

I worked around it by changing the code as follows:

if (len(request_line_components) == 3):
self.http_method, self.url, self.http_vers =
request_line_components
else:
self.url = self.request_line
self.http_method = 'UNDEFINED'
self.http_vers = 'UNDEFINED'

Its just a hack so I could continue parsing. Probably not code you
would want in the src tree ;)

I think the earlier suggestion by Brian that, lines that do not comply
- should just be dumped into a error file makes sense.

What would would be cool is validate the log format upfront, and then,
if it does not comply maybe call an error handler. Users who have
special needs can then register their own error handler to handle the
unexpected cases , while the default error handler prints to stderr or
file.

My 2 cents. Keep up the good work. I look forward to my next 'svn up'

Mohan

Brian Jones

unread,
Mar 28, 2008, 8:30:24 PM3/28/08
to loghet...@googlegroups.com
Hi Mohanaraj,

Thanks for reporting this. I think what I might have to do here is,
rather than check for every single special case as it arises, it'll be
more efficient to allow them to occur without upsetting the program,
and at the same time without failing silently (thereby possibly
skewing results). I think the best approach here is to check that a
line conforms to the defined format (Apache's combined format), and if
it doesn't, perhaps shuffle non-conformant lines off to a separate
file (perhaps a 'loghetti.log' file).

I've been forced to take a (short!) break from loghetti due to
work-related.... work :-) But I'll get on this if nobody else does,
probably toward the end of next week. Until then, please bear with me,
and keep watching svn :-)

Thanks for trying out loghetti!
brian.

--
Brian K. Jones
Python Magazine http://www.pythonmagazine.com
My Blog http://www.protocolostomy.com

Reply all
Reply to author
Forward
0 new messages