> for all internet sites
Every snapshot includes many web sites but definitely not all of them.
Recent snapshot crawls include around 45 million sites (unique host
names) or 35 million registered domains.
> Is every snapshot takes only updates or does it take the whole site
> again? Is retrieved data are sampled data or whole site data ?
Web pages (or URLs) are sampled. Newly discovered URLs/links have a
higher probability to be selected during sampling. But pages are
revisited after some time.