October 1 crawl will take longer

6 views
Skip to first unread message

Patrick Meenan

unread,
Oct 2, 2016, 9:11:46 AM10/2/16
to httpa...@googlegroups.com
I turned on timeline capture for Chrome which also adds a step where the timelines are parsed and extends the test time.  I'll be monitoring the progress to make sure it still completes before the 10/15 crawl kicks off but it will finish later (several days) than normal.

Steve Souders

unread,
Oct 2, 2016, 1:36:06 PM10/2/16
to httpa...@googlegroups.com
We only did 28K in the first day which means we'll only get through 450K URLs. But we started late and had the high CPU python stuff for awhile. Thanks for tracking. Fun to see how we do and what we can do with that IMPORTANT data.

-Steve

On Sun, Oct 2, 2016 at 6:11 AM, Patrick Meenan <patm...@gmail.com> wrote:
I turned on timeline capture for Chrome which also adds a step where the timelines are parsed and extends the test time.  I'll be monitoring the progress to make sure it still completes before the 10/15 crawl kicks off but it will finish later (several days) than normal.

--
You received this message because you are subscribed to the Google Groups "HTTP Archive" group.
To unsubscribe from this group and stop receiving emails from it, send an email to httparchive+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Patrick Meenan

unread,
Oct 2, 2016, 1:47:33 PM10/2/16
to httpa...@googlegroups.com
I'm tuning and profiling as we go. Just pushed a 4x improvement in trace parsing for example. If it doesn't significantly improve in a day or two I can turn off timelines and make another run at it later. 





To unsubscribe from this group and stop receiving emails from it, send an email to httparchive...@googlegroups.com.

Patrick Meenan

unread,
Oct 2, 2016, 8:25:51 PM10/2/16
to httpa...@googlegroups.com
Current pace looks like it should be able to complete everything in 12-13 days.  I'll keep an eye on it to see if the pace holds when averaged over longer durations and as we get deeper into the tail of URLs and see if there is any more room to squeeze some time out of the test iterations but it's looking good as of right now.

Patrick Meenan

unread,
Oct 4, 2016, 10:31:11 AM10/4/16
to httpa...@googlegroups.com
Going to be tight.  Current rate shows it will complete the first pass is 10 days which is the day before the next crawl starts and a little too tight for my comfort.  I'll spend some more time profiling the test cycles to see if there are a few more seconds that can be eeked out here or there to give us some more headroom.

Charlie Clark

unread,
Oct 6, 2016, 1:45:55 PM10/6/16
to httpa...@googlegroups.com
Am .10.2016, 16:31 Uhr, schrieb Patrick Meenan <patm...@gmail.com>:

> Going to be tight. Current rate shows it will complete the first pass is
> 10 days which is the day before the next crawl starts and a little too
> tight for my comfort. I'll spend some more time profiling the test
> cycles to see if there are a few more seconds that can be eeked out here
> or there to give us some more headroom.

Hi,

at the moment it looks to me like it won't make it. 6 days in and not even
close to halfway through: still some 300,000 URLs to do in the first pass.
Before the changes it used to be about 40,000 a day and even that rate
would be close.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Patrick Meenan

unread,
Oct 6, 2016, 2:04:20 PM10/6/16
to httpa...@googlegroups.com
It got a late start (there was an issue with kicking off the crawl) that delayed it by ~1/2 a day and changes have been landing incrementally Since then.  With the latest tweaks as of yesterday it is running at ~45,000 per day (for each of the crawls) which should put it completing around October 13th.  Still pretty tight but a little more headroom than before.

Charlie Clark

unread,
Oct 6, 2016, 2:06:18 PM10/6/16
to httpa...@googlegroups.com
Am .10.2016, 20:04 Uhr, schrieb Patrick Meenan <patm...@gmail.com>:

> It got a late start (there was an issue with kicking off the crawl) that
> delayed it by ~1/2 a day and changes have been landing incrementally
> Since then. With the latest tweaks as of yesterday it is running at
> ~45,000 per day (for each of the crawls) which should put it completing
> around October 13th. Still pretty tight but a little more headroom than
> before.

Thanks very much for the update and looking forward to seeing the new data.
Reply all
Reply to author
Forward
0 new messages