Hi,
apologies for this issue...
A bug [1] caused that only links from the January 2018 crawl
are used in the "Nov/Dec/Jan 2017/2018" webgraph release.
Of course, the release will be fixed to include all links from
all 3 monthly crawls. This will affect the host- and domain-level graphs
and also the rankings. We'll eventually also keep the erroneous
release (but will correct the release path and file names).
Note that previous releases are not affected: while the bug [1] was
present already in the first version of the shell script to build
the host-level graphs, it also depends on the way how the script
was called - in one turn or step by step for each monthly crawl
and the merged graph.
Short story why the bug has been uncovered: while reading a paper about
the July 2017 webgraph release [2], I've (finally) wondered why the
domain-level graph is about 25% smaller than that of the previous two
releases. For the host-level graph a smaller size was expected because
spam domains with large numbers of hosts/subdomains have been excluded
during the last crawls. However, there should be only a small impact
on the number of domains. A careful check of the log files then
brought the final evidence that the smaller size is better explained
by a bug. Very sorry about that...
Best,
Sebastian
[1]
https://github.com/commoncrawl/cc-webgraph/commit/0a406f6c988678bc480340d17a2415442f75dc9a
[2]
https://arxiv.org/abs/1802.05435