Hi Hynek, hi Don Boscow,
yes, currently crawls are released about every two months.
> If Meghan Markle wears a Versace gown, that becomes
> a BBC article, and that article shows up on Googling "meghan markle"
> 2-3 minutes after the publishing of the article by BBC. What is the
> equivalent time for CC?
Common Crawl crawls are sample snapshots of the web. There's never
guarantee that any page or URL is included. The likelihood that
a page is included increases if
- the domain has a high harmonic centrality rank and is allowed to
contribute more pages to the dataset
- the link to a page is shared often in the public web or is even
provided on a sitemap. That way it's easier for the crawler to
find the link and more likely that it is followed.
The time from link discovery until release of a crawl dataset will take
few weeks, at least.
> And secondly, is there a place where I can see CC coverage level?
I'm not aware of any such report. The most difficult part is to estimate
the size of the web as such. There's a huge variance (see for example
https://www.worldwidewebsize.com/).
Best,
Sebastian
> partially, whether they cover
reuters.com <
http://reuters.com> at
> all, or how much of of
vice.com <
http://vice.com> they cover, etc.?
>