Failing log line

0 views
Skip to first unread message

Paul Nasrat

unread,
Mar 19, 2008, 11:44:02 AM3/19/08
to loghet...@googlegroups.com
I was trying out loghetti on a large log file but managed to catch an
exception, ran it under debug. The route cause was a space in the
urlbase. I'm not sure what we should do here, but

Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/CommandLineApp.py", line 386, in run
exit_code = self.main(*main_args)
File "./loghetti.py", line 216, in main
for line in myfilter.strainer():
File "./loghetti.py", line 46, in strainer
for line in self.log:
File "/home/pnasrat/Development/loghetti-read-only/loghetti/apachelogs.py",
line 85, in __iter__
log_line = ApacheLogLine(*m.groups())
File "/home/pnasrat/Development/loghetti-read-only/loghetti/apachelogs.py",
line 57, in __init__
self.http_method, self.url, self.http_vers = self.request_line.split()
ValueError: too many values to unpack

Attached is a failing test case and log file source. I'm not sure what
the correct behaviour should be in this case.

Paul

loghetti-urlbase-hasspace.patch

Brian Jones

unread,
Mar 19, 2008, 12:28:28 PM3/19/08
to loghet...@googlegroups.com
Thanks, Paul,

That's really odd to see. It appears that the request part of that log
line has two different url's in it. There's "/www" and then,
separately, the entire url as the user typed into the address bar. To
me, this looks like a non-standard log formatting configuration,
because in the standard apache combined format log, you have the HTTP
method, followed by the *relative* url (i.e., not including
"http://www.example.com"), followed by the HTTP version.

Can you send along:

1. the format string for the line that is labeled 'combined' in your
apache config file.
2. the apache version you're running
3. the os you're running on

Please? That'll help us to get things straightened out.
Thanks a lot!!
brian.

--
Brian K. Jones
Python Magazine http://www.pythonmagazine.com
My Blog http://www.protocolostomy.com

Paul Nasrat

unread,
Mar 19, 2008, 1:07:34 PM3/19/08
to loghet...@googlegroups.com
>
> That's really odd to see. It appears that the request part of that log
> line has two different url's in it. There's "/www" and then,
> separately, the entire url as the user typed into the address bar. To
> me, this looks like a non-standard log formatting configuration,
> because in the standard apache combined format log, you have the HTTP
> method, followed by the *relative* url (i.e., not including
> "http://www.example.com"), followed by the HTTP version.

I assume it's from a broken browser, we're just using the combined log
format in apache on 2.2.6. We do have some custom stuff after User-
Agent, but I've just reproduced on a clean Ubuntu install (gutsy) of
apache2 (2.2.4-3ubuntu0.1) which has:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i
\"" combined

Simple reproducer:

echo -e "GET /www http://www.example.com HTTP/1.0\n\n" | nc localhost 80

Or
import urllib2
urllib2.urlopen("http://localhost/www http://www.example.com")

> 1. the format string for the line that is labeled 'combined' in your
> apache config file.
> 2. the apache version you're running
> 3. the os you're running on
>
> Please? That'll help us to get things straightened out.

So I think some broken clients might not be url encoding spaces.

Paul

Brian Jones

unread,
Mar 19, 2008, 2:00:34 PM3/19/08
to loghet...@googlegroups.com
Thanks, Paul.

If you're using the "supported" log format, from the supported web
server, and this is winding up in the log, then it would make sense
for us to account for it. So we'll do that. I probably won't get to it
immediately (I'm in the middle of some other loghetti-code) ;-P -- but
I plan to release a new tarball next week some time, and the fix is
likely to be in there. It might make it to svn sooner if you're using
that for updates.

Thanks again for pointing out this issue.
brian.

--

Paul Nasrat

unread,
Mar 19, 2008, 8:26:07 PM3/19/08
to loghet...@googlegroups.com

On 19 Mar 2008, at 18:00, Brian Jones wrote:

>
> Thanks, Paul.
>
> If you're using the "supported" log format, from the supported web
> server, and this is winding up in the log, then it would make sense
> for us to account for it. So we'll do that. I probably won't get to it
> immediately (I'm in the middle of some other loghetti-code) ;-P -- but
> I plan to release a new tarball next week some time, and the fix is
> likely to be in there. It might make it to svn sooner if you're using
> that for updates.

I'm happy to supply a patch, I'm just not sure what the desired
behaviour is here.

It's a corner case, but one coming from the "be liberal in what you
accept" theory of servers.

We should fail gracefully or at least throw a meaningful exception in
this case. I'm happy to get the parser to DTRT once we have common
understanding what that is.

Also would you be happy with some non-functional patches refactoring
the test cases a bit?

Cheers

Paul

Brian Jones

unread,
Mar 19, 2008, 8:52:10 PM3/19/08
to loghet...@googlegroups.com
On Wed, Mar 19, 2008 at 8:26 PM, Paul Nasrat <pna...@googlemail.com> wrote:

> I'm happy to supply a patch, I'm just not sure what the desired
> behaviour is here.

Yeah, that's kind of tough, actually, because of two things:

1. supporting this 'special case' means supporting untold numbers of
other special cases.
2. considering item 1., figuring out what "DTRT" actually *is* is
difficult. Is it safe to assume that if there are too many fields, we
can throw away the second field? What if that bit was on the other
side of the more traditional url? Then we'd throw away the important
bit!

>
> We should fail gracefully or at least throw a meaningful exception in
> this case. I'm happy to get the parser to DTRT once we have common
> understanding what that is.

I wonder if we shouldn't just punt on the special cases in favor of
offering some output to STDERR saying "There were $x
invalid/unparseable lines", maybe even dump those lines to a log file?
That would probably be simple to do within a simple try/except block.
Thoughts?

>
> Also would you be happy with some non-functional patches refactoring
> the test cases a bit?

I will have to expose my ignorance here. Can you explain what this
means? I definitely need to get a better grip on testing, and I should
do it sooner than later. :-/

brian

>
> Cheers

Reply all
Reply to author
Forward
0 new messages