Hi,
> Am I missing something?
No, the numbers are likely to be correct.
Common Crawl is not able to crawl the web exhaustively in every monthly crawl.
Only a sample snapshot is crawled - a subset of sites/hosts/domains and also only
a subset of the pages per site.
In the past the crawler relied on donations of clean seed list (mostly free of duplicates
and spam). This did not work well all the time and because of missing or too small donations
we haven't been able to keep the crawls fresh. Starting with the September crawl this has
improved and that's the reason why the number of pages for
discoverorg.com has increased.
But sampling may cause that the number of pages from a particular domain may go up and down
over time.
Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>
common-crawl...@googlegroups.com <mailto:
common-crawl...@googlegroups.com>.
> To post to this group, send email to
common...@googlegroups.com
> <mailto:
common...@googlegroups.com>.
> Visit this group at
https://groups.google.com/group/common-crawl.
> For more options, visit
https://groups.google.com/d/optout.