There's a small but critical mistake in the blog post announcing the Common Crawl Index. The S3 location given for the index data files is missing the first segment (ie /common-crawl) in the path.
The correct locations for the top level index is:
s3://aws-publicdatasets/common-crawl/cc-index/collections/CC-MAIN-YYYY-WW/indexes/cluster.idx
and the individual index files are in 300 chunks (currently) at:
s3://aws-publicdatasets/common-crawl/cc-index/collections/CC-MAIN-2015-14/indexes/cdx-00000.gz
...
s3://aws-publicdatasets/common-crawl/cc-index/collections/CC-MAIN-2015-14/indexes/cdx-00299.gz
Because their part of the aws-public data sets, you don't need to pay to fetch them, so you can use the --no-sign-request switch on your copy commands
$ aws --no-sign-request s3 cp s3://aws-publicdatasets/common-crawl/cc-index/collections/CC-MAIN-2015-14/indexes/cluster.idx .
Tom