Har file?

35 views
Skip to first unread message

Shruti Kanda

unread,
Feb 19, 2013, 7:09:49 AM2/19/13
to httpa...@googlegroups.com
Hi

I was just goin through  pages and requests table and its provided data.

Just curious , how do you get site information like : user-agent,redirection with status 200,images etc.
Do you get all this information using HAR file before running the crawler?

Thanks,
Shruti


Pat Meenan

unread,
Feb 19, 2013, 1:54:27 PM2/19/13
to httpa...@googlegroups.com
The individual metrics are captured by WebPagetest as part of the
testing and then stored/aggregated into the database. The "crawler" is
WebPagetest and it collects the data while running the tests.
> --
> You received this message because you are subscribed to the Google
> Groups "HTTP Archive" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to httparchive...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Charlie Clark

unread,
Feb 20, 2013, 5:05:22 AM2/20/13
to httpa...@googlegroups.com
Hi Shruti,

the requests table essentially contains a dump of every request for every
crawl. Please bear in mind that the means around 40 GB disk space per
dataset and a non-normalised structure as it is more of a log than a
relation. That said, queries for individual sites can usually be performed
at an acceptable speed.

However, for individual runs - and there are generally three runs per site
per crawl - are available from WebPageTest.org like the HAR file. See the
API for details.

https://sites.google.com/a/webpagetest.org/docs/advanced-features/webpagetest-restful-apis

If you are only cherry-picking a few URLs then this probably the way to go.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Shruti Kanda

unread,
Mar 7, 2013, 12:24:17 AM3/7/13
to httpa...@googlegroups.com
Thank you for clearing my queries.

So instead of taking request requests table I got the data from pages table IE version .So have following questions : 

1. Does pages contains data/urls that crawler ran on specified mentioned date only?
2. What is the difference between Pages IE and iPhone version?

Thanks,
Shruti

Charlie Clark

unread,
Mar 20, 2013, 7:34:16 PM3/20/13
to httpa...@googlegroups.com
Hi Shruti,

Am 07.03.2013, 06:24 Uhr, schrieb Shruti Kanda <shrut...@gmail.com>:

> Thank you for clearing my queries.
> So instead of taking request requests table I got the data from pages
> table
> IE version .So have following questions :

> 1. Does pages contains data/urls that crawler ran on specified mentioned
> date only?

I'm not sure if I understand you correctly. The "label" in pages refers to
when a crawl started. It's not really possible to run all the tests at
once so they happen over a period of time. The actual time and date when
the test for a particular page is run is noted in createDate.

> 2. What is the difference between Pages IE and iPhone version?

iPhone refers to mobile websites where the crawler actually uses and
iPhone to do the surfing. That's what you see on
http://mobile.httparchive.org. IE refers to tests done with a desktop
browser, currently IE 9
Reply all
Reply to author
Forward
0 new messages