bug mining

2 views
Skip to first unread message

Ray Waldin

unread,
Jun 8, 2008, 6:08:45 PM6/8/08
to wide-...@googlegroups.com
Hi everyone,

I'm trying to track down two minor differences between the coolstack results and mine.  

diff coolstack.log myresults.txt 
7c7
<      100932: /ongoing/When/200x/2003/04/26/UTF
---
>      100934: /ongoing/When/200x/2003/04/26/UTF
21c21
<    36257.0M: /ongoing/
---
>    36257.1M: /ongoing/

Since nobody has claimed any time in the next hour or two, I'm going to try combing the O.all file for relevant records in order to inspect them manually.  By using fgrep to search for fixed strings, I'm hoping this takes no more than an hour.  If it takes longer, or someone wants to do a run, please feel free to kill the process, and drop me a note.

Thanks!

-Ray

Eric Wong

unread,
Jun 8, 2008, 6:11:59 PM6/8/08
to wide-finder
On Jun 8, 3:08 pm, "Ray Waldin" <r...@waldin.net> wrote:
> Hi everyone,
> I'm trying to track down two minor differences between the coolstack results
> and mine.
> diff coolstack.log myresults.txt
> 7c7
> < 100932: /ongoing/When/200x/2003/04/26/UTF
> ---> 100934: /ongoing/When/200x/2003/04/26/UTF
>
> 21c21
> < 36257.0M: /ongoing/
> ---
>
> > 36257.1M: /ongoing/

Sorry, no help there, I forgot to tee my output from last night's
run :<

> Since nobody has claimed any time in the next hour or two, I'm going to try
> combing the O.all file for relevant records in order to inspect them
> manually. By using fgrep to search for fixed strings, I'm hoping this takes
> no more than an hour. If it takes longer, or someone wants to do a run,
> please feel free to kill the process, and drop me a note.

I'll take the time to install mawk and/or gawk and test my portability
with
those while you're doing that :)

Alex Morega

unread,
Jun 8, 2008, 6:14:13 PM6/8/08
to wide-...@googlegroups.com

On Jun 9, 2008, at 01:08 , Ray Waldin wrote:

> Hi everyone,
>
> I'm trying to track down two minor differences between the coolstack
> results and mine.
>
> diff coolstack.log myresults.txt
> 7c7
> < 100932: /ongoing/When/200x/2003/04/26/UTF
> ---
> > 100934: /ongoing/When/200x/2003/04/26/UTF
> 21c21
> < 36257.0M: /ongoing/
> ---
> > 36257.1M: /ongoing/
>

I'm seeing different numbers:
100928: /ongoing/When/200x/2003/04/26/UTF
and
36256.8M: /ongoing/

So if you get those entries mined, I'd like a copy please.

Ray Waldin

unread,
Jun 8, 2008, 6:18:50 PM6/8/08
to wide-...@googlegroups.com
Sure, of course.  And in case you missed when Tim mentioned it the other day, the output of the coolstack run is at /wf1/data/code/twbray/coolstack.log

-Ray

Tim Bray

unread,
Jun 8, 2008, 6:19:44 PM6/8/08
to wide-...@googlegroups.com
On Sun, Jun 8, 2008 at 3:18 PM, Ray Waldin <r...@waldin.net> wrote:
> Sure, of course. And in case you missed when Tim mentioned it the other
> day, the output of the coolstack run is
> at /wf1/data/code/twbray/coolstack.log

And the ruby code that produced it is right there in the same
directory, stats.rb -T

Ray Waldin

unread,
Jun 8, 2008, 7:05:39 PM6/8/08
to wide-...@googlegroups.com
Ok I found the problem.  These two lines are being missed by stats.rb because they have an extra space after protocol/version, before the closing quote of the request field:
 
dyn-88-122-239-244.ppp.tiscali.fr - - [10/Apr/2008:20:10:34 -0700] "GET /ongoing/When/200x/2003/04/26/UTF HTTP/1.0 " 200 23468 "-" "-"
dyn-88-122-238-194.ppp.tiscali.fr - - [13/Apr/2008:02:21:25 -0700] "GET /ongoing/When/200x/2003/04/26/UTF HTTP/1.0 " 200 23468 "-" "-"

Alex, I suspect you are missing these and maybe a few others as well.  Instead of waiting for fgrep to finish, I just modded my app to spit out records with this get request, and then ran a modified version of stats.rb which only prints misses, against that output.  The file is in my home directory (mybugrecords1.log).  Try running yours and spit out any line that doesn't match, like this:

ARGF.each_line do |line|
  f = line.split(/\s+/)
  next unless f[5] == '"GET'
  u, status = f[6], f[8]
  if !(u == "/ongoing/When/200x/2003/04/26/UTF" && (status == '200' || status == '304'))
      print line
  end
end

Now if I can just track down that extra 0.1M for /ongoing/...

-Ray

Ray Waldin

unread,
Jun 8, 2008, 7:58:02 PM6/8/08
to wide-...@googlegroups.com
Here are two more records missed by stats.rb due to misplaced spaces:

ded947.cotswold-internet.net - - [09/Jan/2008:05:23:29 -0800] "GET /ongoing/ When/200x/2005/03/08/BloggingIsGood HTTP/1.0" 200 15326 "-" "-"
dyn-83-156-63-220.ppp.tiscali.fr - - [14/Apr/2008:20:08:43 -0700] "GET /ongoing/ HTTP/1.0 " 200 15380 "-" "-"

These are probably the reason my byte totals for /ongoing/ are higher than those in coolstack.log

The output for this request is in my homedir in mybugrecords2.log.

I'm all done for now.

-Ray

Tim Bray

unread,
Jun 8, 2008, 8:18:57 PM6/8/08
to wide-...@googlegroups.com
On Sun, Jun 8, 2008 at 4:58 PM, Ray Waldin <r...@waldin.net> wrote:

>> Ok I found the problem. These two lines are being missed by stats.rb
>> because they have an extra space after protocol/version, before the closing
>> quote of the request field:

OK, I'll change the ruby code, but for the moment I'm not going to
burn another 26 CPU-hours re-running. I'll do that eventually but I
think the currently-recorded run is OK as a basis for performance
comparison. -Tim

Ray Waldin

unread,
Jun 8, 2008, 8:27:13 PM6/8/08
to wide-...@googlegroups.com
No, didn't mean for you to have to do that.  I just wanted to understand why my results were different and share what I learned.

-Ray

Eric Wong

unread,
Jun 8, 2008, 10:21:11 PM6/8/08
to wide-finder
Looks like my awk results are also different from Tim's:

I get the same results from both runs I just did:

mawk(worker)+perl(reducer)
mawk(worker)+gawk(reducer)

I've implemented a fix for the shifted quote after protocol
in the mawk code.

--- coolstack.log 2008-06-08 19:05:00.000000000 -0700
+++ ,sh 2008-06-08 19:04:30.000000000 -0700
@@ -1,4 +1,266 @@
-4682 resources, 128916 404s, 3573463 clients
+4682 resources, 128916 404s, 3573470 clients

Top URIs by hit:
218540: /ongoing/When/200x/2003/09/18/NXML
@@ -6,21 +268,21 @@
129746: /ongoing/When/200x/2003/10/16/Debbie
110427: /ongoing/When/200x/2003/07/17/BrowserDream
102513: /ongoing/When/200x/2003/07/30/OnSearchTOC
- 100932: /ongoing/When/200x/2003/04/26/UTF
+ 100934: /ongoing/When/200x/2003/04/26/UTF
94408: /ongoing/When/200x/2003/07/25/NotGaming
93060: /ongoing/When/200x/2003/04/06/Unicode
92006: /ongoing/When/200x/2006/01/31/Data-Protection
89795: /ongoing/When/200x/2004/12/12/BMS

Top URIs by bytes:
- 870656.0M: /ongoing/ongoing.atom
+ 870656.1M: /ongoing/ongoing.atom
373564.5M: /ongoing/potd.png
278894.3M: /ongoing/ongoing.rss
91443.9M: /ongoing/rsslogo.jpg
63061.5M: /ongoing/When/200x/2004/08/30/-big/IMGP0851.jpg
39645.0M: /ongoing/When/200x/2006/05/16/J1d0.mov
36557.5M: /ongoing/When/200x/2007/12/14/Shonen-Knife.mov
- 36257.0M: /ongoing/
+ 36257.1M: /ongoing/
33337.3M: /ongoing/moss60.jpg
31966.7M: /ongoing/When/200x/2004/02/18/Bump.png

Alex Morega

unread,
Jun 9, 2008, 12:26:45 AM6/9/08
to wide-...@googlegroups.com

On Jun 9, 2008, at 02:05 , Ray Waldin wrote:

> Ok I found the problem. These two lines are being missed by
> stats.rb because they have an extra space after protocol/version,
> before the closing quote of the request field:

As I'm splitting the line based on spaces, this will definitely bite
me. Back to regexp's, and this time I'll look for the quote marks.
Thanks!
-- Alex

Reply all
Reply to author
Forward
0 new messages