What's the approximate total size of all the metadata of the latest common crawl.
I'm not an active Amazon developer and Amazon has a bug with my S3 account now and I'm waiting for them to fix it...
That and this would generally save me a bunch of time.
My goal is to just download all the metadata if it's reasonably sized and process it within our cluster.