Rate limited when downloading from S3

953 views
Skip to first unread message

Bai Li

unread,
Mar 13, 2023, 2:41:12 PM3/13/23
to Common Crawl
Hello everyone,

I have been trying to download files from Common Crawl S3, but I am frequently encountering the botocore.exceptions.ClientError with the message "An error occurred (SlowDown) when calling the GetObject operation (reached max retries: 4): Please reduce your request rate."

I have tried reducing the frequency of requests and even waiting for some time before retrying, but the issue persists. In some cases, I am only attempting to download one file and I am still getting the error.

Here is my code:

import boto3
BASE_URL = "crawl-data/CC-MAIN-2022-49/segments/1669446706285.92/warc"
s3 = boto3.resource("s3")
filename = f"CC-MAIN-20221126080725-20221126110725-00000.warc.gz"
s3.meta.client.download_file("commoncrawl", f"{BASE_URL}/{filename}", filename)

Could anyone provide some guidance on how to resolve this issue or any workarounds that may help? Also, I would appreciate if anyone can clarify if there are any specific limits on the number of requests to S3 or if there are any other factors that may be causing this issue. Thank you!

Thank you in advance!

Sebastian Nagel

unread,
Mar 13, 2023, 4:25:50 PM3/13/23
to common...@googlegroups.com
Hi,

as far as I can see, there have been some 503s during the last 48 hours
but looks like the situation has improved now.

> I have tried reducing the frequency of requests and even waiting for
> some time before retrying, but the issue persists.

Thanks! The bandwidth is shared among all users accessing the data.
So, slowing down and trying again (later) is the best option.

Best,
Sebastian

Damien Cram

unread,
Mar 29, 2023, 10:55:41 AM3/29/23
to Common Crawl
Hi Sebastian,

I am also facing this exact same issue from s3. Are there any rate limits on this dataset ? Are there any recommended rate ?

Best,
Damien

kasper...@gmail.com

unread,
Apr 8, 2023, 7:00:25 PM4/8/23
to Common Crawl
It seems the issue still persists, I have had a number of exceptions on news crawls over the last few days.

Davood Hadiannejad

unread,
Apr 14, 2023, 4:53:35 AM4/14/23
to common...@googlegroups.com
Hi everyone, 
I am facing the same issue, did anyone find any solution?
Best 
Davood 

--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-crawl/ff53020a-3529-4ade-baaa-9df2cbdd39a9n%40googlegroups.com.

Nikolay Kadochnikov

unread,
Apr 25, 2023, 1:51:52 PM4/25/23
to Common Crawl
It has been heavily affecting me as well

Craig Schmidt

unread,
Apr 26, 2023, 6:07:01 PM4/26/23
to Common Crawl
I cannot run simple Athena queries to the index with the rate-limiting error.  It there any solution other to wait for it to start working again someday?

Nikolay Kadochnikov

unread,
Apr 27, 2023, 11:18:56 AM4/27/23
to Common Crawl
Have anyone figured out a workaround for this "An error occurred (SlowDown) when calling the GetObject operation" issue?  Mine is completely erratic and some days prevents me from downloading any data at all.
Reply all
Reply to author
Forward
0 new messages