You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
Hi Sergey,
good point. Sorry, these listings are not yet available. To get a listing for all CDX index files,
download the AWS CLI (https://aws.amazon.com/cli/) and run (here for the 2012 index):
aws --no-sign-request s3 ls --recursive s3://commoncrawl/cc-index/collections/CC-MAIN-2012/
The problem is that the old crawls (2008 - 2012) have a different location (path prefix) on the
bucket s3://commoncrawl/:
- crawl-001/ : 2008 - 2009
- crawl-002/ : 2009 - 1010
- parse-output/ : 2012
I'll prepare the missing listings during the next days.
On 4/8/21 9:57 AM, Sergey Ivanov wrote:
> Thank you for reply,
>
> I'm not in a hurry, so I'll wait until indexes will be ready
>
> четверг, 8 апреля 2021 г. в 13:37:20 UTC+6, Sebastian Nagel:
>
> Hi Sergey,
>
> good point. Sorry, these listings are not yet available. To get a listing for all CDX index files,
>
>
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com > <mailto:common-crawl...@googlegroups.com>.
> To view this discussion on the web visit