Run httpArchieve on given set of site?

40 views
Skip to first unread message

Shruti Kanda

unread,
Feb 15, 2013, 12:58:38 AM2/15/13
to httpa...@googlegroups.com
Hi, 

I can see the result of 100/1000  sites etc on httparchive.

Just curious Is there is a way to submit set of sites , lets say 100 sites. and  httpArchieve can traverse those sites and return results ,like how many sites are using custom fonts etc.?


Please advice,
Thanks,
Shruti

Steve Souders

unread,
Feb 15, 2013, 12:18:19 PM2/15/13
to httpa...@googlegroups.com
Hi, Shruit.

There's not a feature to generate nor extract results for a user-specified set of URLs.

-Steve

Charlie Clark

unread,
Feb 15, 2013, 4:22:07 PM2/15/13
to httpa...@googlegroups.com
Hi,

Am 15.02.2013, 06:58 Uhr, schrieb Shruti Kanda <shrut...@gmail.com>:

> Hi,
> I can see the result of 100/1000 sites etc on httparchive.

> Just curious Is there is a way to submit set of sites , lets say 100
> sites. and httpArchieve can traverse those sites and return results ,

> like how many sites are using custom fonts etc.?

Those are actually two separate questions: you cannot bulk submit URLs to
the archive, just individual ones. This makes sense and you must respect
the rights of website owners to request that their site not be included.

A local install of httparchive.org would allow you to define your own
"slices" and use the code already around to generate aggregate stats for
your sites. Additionally, you can easily run your own comparative stats
that slice. This is something I've been doing for a while and plan to add
support for in my fork.

WebPageTest.org also provides an interface for essentially low-volume
tests but this allows you to pass a list of URLs for testing but you then
have to do the parsing and analysis yourself, or adapt Steve's code for
this.

See:
https://sites.google.com/a/webpagetest.org/docs/advanced-features/webpagetest-batch-processing-command-line-tool

for more information.

I've been hacking on this Python tool and plan to add results parsing soon.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Shruti Kanda

unread,
Feb 16, 2013, 5:14:26 AM2/16/13
to httpa...@googlegroups.com
ok so one way could be to download code and then can run desired set of urls on it.

Also I noticed a link from httpachive news : http://httparchive.org/addsite.php , which allows to add a site url to crawl for next crawl run,
 But is there is a way to get stat of only that specific site? 

-shruti

Charlie Clark

unread,
Feb 18, 2013, 10:32:41 AM2/18/13
to httpa...@googlegroups.com
Am 16.02.2013, 11:14 Uhr, schrieb Shruti Kanda <shrut...@gmail.com>:

> ok so one way could be to download code and then can run desired set of
> urls on it.
> Also I noticed a link from httpachive news
> : http://httparchive.org/addsite.php , which allows to add a site url to
> crawl for next crawl run,

Yes, but this is not designed for bulk use. The number of URLs has stayed
pretty flat since September 2011 so I would strongly advise against trying
to add lots of URLs at the moment. If you want to do this then the correct
thing to do is either run your own WPT instance or manage your own tests
on WPT.org using the bulk interface.

> But is there is a way to get stat of only that specific site?

The statistics for individual sites are already available. If there is a
statistic you are interested in and it is not provided then you must write
your own query for it.

Shruti Kanda

unread,
Feb 19, 2013, 7:01:56 AM2/19/13
to httpa...@googlegroups.com
Thanks Chris for helping me out . 

Just curious I I can get specific site stats from http://httparchive.org/downloads.php ?

Table pages
Loction :  IE: pages (mysqlCSV) ?

Please advice.

Thanks,
Shruti


--
You received this message because you are subscribed to the Google Groups "HTTP Archive" group.
To unsubscribe from this group and stop receiving emails from it, send an email to httparchive+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Charlie Clark

unread,
Feb 19, 2013, 1:02:33 PM2/19/13
to httpa...@googlegroups.com
Am 19.02.2013, 13:01 Uhr, schrieb Shruti Kanda <shrut...@gmail.com>:

> Thanks Chris for helping me out .
> Just curious I I can get specific site stats from
> http://httparchive.org/downloads.php ?
> Table pages
> Loction : IE: pages
> (mysql<http://www.archive.org/download/httparchive_downloads/httparchive_Jan_15_2013_pages.gz>
> ,
> CSV<http://www.archive.org/download/httparchive_downloads/httparchive_Jan_15_2013_pages.csv.gz>)

Shruti,

there is no such thing as a free lunch. Instructions on how to use the
data are on the downloads page. If you have no experience with databases
then you will need to find another source for your research. In the
meantime it might help if you tell us exactly what you are looking for.
Reply all
Reply to author
Forward
0 new messages