We rely on Web Graphs heavily throughout our work at the Common Crawl Foundation. We are happy to share our recently launched statistics page which features:
- Top-ranked domains and hosts determined by Harmonic Centrality and PageRank (and explanations of what these mean)
- Detailed statistics on nodes, edges, indegree/outdegree distributions, SCCs (and a lot more) for each graph release
- Links to related papers
Check it out here:
https://commoncrawl.github.io/cc-webgraph-statistics/