Hi Colin, hi Ekku,
> Is there something I can change? I was hoping that an authenticated
> S3 request could carve out some small amount of bandwidth, but it
> seems like it's subject to the same limits as the unauthenticated
> things.
In the end all requests (via CloudFront or S3 and authenticated) are
sent to the same bucket.
If the requests are sent from the AWS cloud in us-east-1, using the
S3 API should always be the better choice.
> under heavy pressure and will just take some time to recoupe. If this
> is the case, would be nice to know if there was some estimate of time
> it will take to recover!
Since the introduction of CloudFront-backed access in March 2022,
repeated 503s are observed infrequently and only temporarily (lasting
not more than a few hours). So, maybe wait one day and try again.
As Colin mentioned, retrying few times should be also succeed, this
could be a solution for single but urgent download, eg. path listings.
Best,
Sebastian
On 12/8/22 18:35, Ekku Jokinen wrote:
> I'm also experiencing these and I do remember having the same issue some
> time back, as well. I'm assuming it starts to thottle if the system is
> under heavy pressure and will just take some time to recoupe. If this is
> the case, would be nice to know if there was some estimate of time it
> will take to recover!
>
> On Thursday, December 8, 2022 at 6:11:18 PM UTC+2
clde...@gmail.com wrote:
>
> I'm seeing a very high rate of SlowDown responses. Unfortunately,
> it's such that I the service is unusable -- even the first step of
> fetching the list of paths fails.
>
> These steps failed, for example:
>
> $ aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2022-27/wat.paths.gz .
> download failed:
> s3://commoncrawl/crawl-data/CC-MAIN-2022-27/wat.paths.gz to
> ./wat.paths.gz An error occurred (SlowDown) when calling the
> GetObject operation (reached max retries: 4): Please reduce your
> request rate.
>
> $ wget
>
https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-27/wat.paths.gz
> <
https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-27/wat.paths.gz>
> 2022-12-08 16:07:20 ERROR 503: Service Unavailable.
>
> $ wget
> '
https://index.commoncrawl.org/CC-MAIN-2022-40-index?url=https%3A%2F%2Fattributz.github.io%2F&output=json <
https://index.commoncrawl.org/CC-MAIN-2022-40-index?url=https%3A%2F%2Fattributz.github.io%2F&output=json>'