Tim Allison writes:
> I realize this is somewhat of a crazy question. My answer so far has been ¯\_(ツ)_/¯.
>
> Is there any way to estimate the coverage of CC over the "open web"?
0%, to quite a few decimal places :-).
Because the open web is, these days, unbounded is size. It's mostly
generated just-in-time in response to, and parameterised by,
properties of HTTP requests (cookies, country of origin, request
parameters). You could in principle ask how many HTTP responses there
are on the wires within any particular interval, indeed even the
interval that a crawl was being conducted, but I'm not aware of any
attempt to even estimate such a number.
And even that would not be what you really want, because by design CC
is only looking at what we might call the 'headline' or 'landing page'
responses to the requests, which these days are mostly very different
from what you might think of as the resulting web page, which is
likely to be built by dozens if not hundereds of json-scripted further
requests.
So, apologies, but what seems like a simple question turns out to need
a lot more detail before you can even begin to get a useful answer.
ht
--
Henry S. Thompson, School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND
e-mail:
h...@inf.ed.ac.uk
URL:
https://www.ltg.ed.ac.uk/~ht/
[mail from me _always_ has a .sig like this -- mail without it is forged spam]