Missing test reports

3 views
Skip to first unread message

Charlie Clark

unread,
Jun 24, 2015, 1:21:32 PM6/24/15
to httpa...@googlegroups.com
Hiya,

it seems that some of the test reports are missing. Is there any
systematic way of checking this? I'm seeing it affect some sites more than
others but I suspect this is random.

For example,
http://httparchive.webpagetest.org/result/150615_H_AYAN/

but
http://httparchive.webpagetest.org/result/150601_D_8665/
(the same site, one month earlier) is missing.

Is the data being lost? Or perhaps being misplaced?

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Patrick Meenan

unread,
Jun 24, 2015, 1:42:05 PM6/24/15
to httpa...@googlegroups.com
Not misplaced but I'm getting a zero-byte file when I try to extract that specific test from the Internet Archive data store where we archive the results to.  We store the results in ~10GB zip files and then extract the individual tests from those as needed.  I'm pulling down the full zip file right now to see if the problem was in the extraction or if the test never made it into the zip file when we archived it.

--
You received this message because you are subscribed to the Google Groups "HTTP Archive" group.
To unsubscribe from this group and stop receiving emails from it, send an email to httparchive...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Patrick Meenan

unread,
Jun 24, 2015, 1:53:52 PM6/24/15
to httpa...@googlegroups.com
ok, something very messed up going on where the 150601_D archive actually contains 150601_E tests.  Checking to see how that could possibly happen right now.

Charlie Clark

unread,
Jun 29, 2015, 9:40:15 AM6/29/15
to httpa...@googlegroups.com
Am .06.2015, 19:53 Uhr, schrieb Patrick Meenan <patm...@gmail.com>:

> ok, something very messed up going on where the 150601_D archive actually
> contains 150601_E tests. Checking to see how that could possibly happen
> right now.

Just wondering, but is this the same as this bug:
https://github.com/HTTPArchive/httparchive/issues/44

Patrick Meenan

unread,
Jun 29, 2015, 9:51:17 AM6/29/15
to httpa...@googlegroups.com
No, shouldn't be.  In that case the underlying WPT data is still there and available.  In this case the archived data in the underlying WPT infrastructure isn't available.

Charlie Clark

unread,
Jun 29, 2015, 9:54:18 AM6/29/15
to httpa...@googlegroups.com
Am .06.2015, 15:51 Uhr, schrieb Patrick Meenan <patm...@gmail.com>:

> No, shouldn't be. In that case the underlying WPT data is still there
> and
> available. In this case the archived data in the underlying WPT
> infrastructure isn't available.

Okay, I wasn't sure because it's still open.

Does this mean it's lost? Any idea why it seems to affect some sites more
than others?

Patrick Meenan

unread,
Jun 29, 2015, 9:57:11 AM6/29/15
to httpa...@googlegroups.com
Yes, looks like some set of the data is lost and I'm not 100% sure why yet.  I've made a few changes that should help and make it impossible but without knowing the root cause it's hard to know for sure.  Also no idea why it hit certain sites more than others other than the URLs always run in the same sequence so it's possible that something timing wise just tends to happen at that point in time in the crawl.

Reply all
Reply to author
Forward
0 new messages