using the columnar index  the list of subdomains is just one query away:
SELECT COUNT(*) AS pages,
COUNT(DISTINCT url_host_registered_domain) AS domains,
(length(url_host_name) - length(url_host_registered_domain))) AS subdomain
WHERE crawl = 'CC-MAIN-2021-25'
GROUP BY substr(url_host_name, 1,
(length(url_host_name) - length(url_host_registered_domain)))
ORDER BY pages DESC;
- that's for the June crawl only: crawl = 'CC-MAIN-2021-25'
- cf. Presto's string functions 
Here the top-10 results:
pages domains subdomain
1208961605 19898627 www.
16497523 32654 forum.
10375685 152410 m.
10232964 175212 blog.
9708356 97375 shop.
6730975 5648 forums.
4623635 12810 wiki.
3742254 7528 community.
Alternatively, the host-level web graphs  could be used:
+ includes also domain names seen as links but not crawled
-/+ no page counts, page rank and harmonic centrality scores instead
- would need to write a program to extract the subdomain prefixes
from the host name and do the aggregation
- maybe ask the authors of  how they created their
- the approach of  is acceptable for a single domain with not
too many pages but will definitely not scale to your use case
"extract all subdomains"
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com
> To view this discussion on the web visit