I extracted all unique URLS from the most recent CommonCrawl index, protocol+netloc , and I see 19,101,716 unique URLS, and I did the same for your URL index, and I see 55,585,805 unique URLS.
What is the difference between the two datasets? Or perhaps my methods are not accurate?
Thanks!