I'm aware this is perhaps not the proper group for this question, but since Common Crawl uses AWS, I thought it might be relevant nonetheless.
On an EC2 instance I tried copying crawl data from the S3 bucket, but I get the following error:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2016-36/segments/1471982290442.1/wat/CC-MAIN-20160823195810-00000-ip-10-153-172-175.ec2.internal.warc.wat.gz /var/www/html/commoncrawl/CC-MAIN-20160823195810-00000-ip-10-153-172-175.ec2.internal.warc.wat.gz
Unable to locate credentials
Completed 1 part(s) with ... file(s) remaining
It works fine with wget, but I wanted to see if there were any performance advantages to using S3 CP instead.
Any feedback is welcome and appreciated.