Monthly archive info

32 views
Skip to first unread message

pep...@gmail.com

unread,
Aug 11, 2015, 11:34:48 AM8/11/15
to Common Crawl

Hello!

First of all I want to say that this is a very helpful and awesome project. I would like to know more about the procedure of collecting the data and the definition of a monthly crawl archive. I mean, surely is does not represent the entire web, but what could be a good definition of a common crawl monthly archive?. This raise the following questions:

1. As far as I know, every month is a new crawl(I mean it starts over again). Where does it begin crawling? What is the seed?
2. What data/websites Common crawl decides to add to the archive?
3. What is the condition to stop crawling?

I am writting my MSc dissertation and any information would be very helpful. Thanks!

Reply all
Reply to author
Forward
0 new messages