Hello everyone,
I have been trying to download files from Common Crawl S3, but I am frequently encountering the botocore.exceptions.ClientError with the message "An error occurred (SlowDown) when calling the GetObject operation (reached max retries: 4): Please reduce your request rate."
I have tried reducing the frequency of requests and even waiting for some time before retrying, but the issue persists. In some cases, I am only attempting to download one file and I am still getting the error.
Here is my code:
import boto3
BASE_URL = "crawl-data/CC-MAIN-2022-49/segments/1669446706285.92/warc"
s3 = boto3.resource("s3")
filename = f"CC-MAIN-20221126080725-20221126110725-00000.warc.gz"
s3.meta.client.download_file("commoncrawl", f"{BASE_URL}/{filename}", filename)
Could anyone provide some guidance on how to resolve this issue or any workarounds that may help? Also, I would appreciate if anyone can clarify if there are any specific limits on the number of requests to S3 or if there are any other factors that may be causing this issue. Thank you!
Thank you in advance!