Hi Rémi,
there are two options:
- extract the domain names from the columnar index [1]
- use the the latest of the webgraphs [2]
Note that the webgraphs also include domains which were not crawled
(excluded by robots.txt, not sampled, etc.) but known from links.
But domains are not verified whether they are registered, only
the format of the domain name in the links is verified.
If the crawler visited a page, than the URL including the domain
name is automatically verified.
Best,
Sebastian
[1]
https://commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/
[2]
https://commoncrawl.org/tag/webgraph/