Hi,
> Is this also caused by the bandwidth limit mentioned by the OP, or
> it's a different issue?
It's related: the CDX index server also needs to access data on
s3://commoncrawl/ and it cannot serve as many requests as it should be
if the access to S3 is slow - this includes an adaptive slower request
rate because of perceived response status codes of "HTTP 503 Slow Down".
Unfortunately, there are still users continuing sending 100k or more
requests per hour while the server is able to successfully respond just
few thousands.
Sorry about this. We know that the CDX server needs to be fixed and
we'll try to - but it may take some time.
Best,
Sebastian
On 5/11/23 14:09, Volo wrote:
> I've just discovered Common Crawl Index Server, but nothing is working
> for me. Whenever I try to open or query a search page (e.g.:
>
https://index.commoncrawl.org/CC-MAIN-2015-11-index?url=wikipedia.org&output=json&limit=1 <
https://index.commoncrawl.org/CC-MAIN-2015-11-index?url=wikipedia.org&output=json&limit=1>) it fails with a "*504* Gateway Time-out" response. Is this also caused by the bandwidth limit mentioned by the OP, or it's a different issue?
>
> On Monday, May 8, 2023 at 10:52:26 PM UTC+2 Greg Lindahl wrote:
>
> Cloudfront is a very distributed system and it is what it is.
>
> I've started monitoring
data.commoncrawl.org
> <
http://data.commoncrawl.org> and saw only 1 503 in the
> past 48 hours. But that's just the San Francisco Bay Area endpoint.
>
>
index.commoncrawl.org <
http://index.commoncrawl.org> was working
>
https://groups.google.com/d/msgid/common-crawl/fe1b0767-0292-4a41-a312-1bcf0c933814n%40googlegroups.com <
https://groups.google.com/d/msgid/common-crawl/fe1b0767-0292-4a41-a312-1bcf0c933814n%40googlegroups.com>
> > > .
> > >
> > >
> >
> > --
> > You received this message because you are subscribed to the
> Google Groups "Common Crawl" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an email to
common-crawl...@googlegroups.com.
> > To view this discussion on the web visit
>
https://groups.google.com/d/msgid/common-crawl/255d2563-f25d-4bf1-a895-4709f868ed65n%40googlegroups.com <
https://groups.google.com/d/msgid/common-crawl/255d2563-f25d-4bf1-a895-4709f868ed65n%40googlegroups.com>.
>
>
> ------------------------------------------------------------------------
> *Snapp Mobile Germany GmbH*
> Holzstrasse 28
> 80469, München
>
www.snappmobile.io <
http://www.snappmobile.io/>
>
> /Sitz der Gesellschaft: München/
> /Registergericht: Amtsgericht München, HRB 229710/
> /Geschäftsführer Jasper Alan Colville Morgan, Pasi Juhani Lehtimäki/
>
> --
> You received this message because you are subscribed to the Google
> Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
common-crawl...@googlegroups.com
> <mailto:
common-crawl...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/common-crawl/e3e36b02-3d16-4f10-bbba-c81d27afbad5n%40googlegroups.com <
https://groups.google.com/d/msgid/common-crawl/e3e36b02-3d16-4f10-bbba-c81d27afbad5n%40googlegroups.com?utm_medium=email&utm_source=footer>.