Instagram blocking App Engine's urlfetch/sockets IP block

349 views
Skip to first unread message

Ryan Barrett

unread,
May 2, 2016, 12:30:58 AM5/2/16
to Google App Engine
hi all! just FYI, it looks like Instagram is blocking/rate limiting App Engine's IPs from fetching www.instagram.com, both urlfetch and sockets, across apps. e.g. this session from https://shell-hrd.appspot.com/ :

>>> urllib2.urlopen('https://www.instagram.com/snarfed/')
Traceback (most recent call last):
...
  File "/base/data/home/runtimes/python/python_dist/lib/python2.5/urllib2.py", line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown

it's not 100% consistent - i occasionally see requests make it through - but the majority get 429ed.

not holding my breath, but i figured you all might want to know, especially in case cloud support people have lines of communication open with instagram/facebook for this kind of thing.

Nickolas Daskalou

unread,
May 2, 2016, 11:57:15 AM5/2/16
to Google App Engine
Hi Ryan,

It seems to be working fine for us (SocialPage.me).

Are you accessing their API using separate access tokens for each user?

Nick


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/be7f6ead-fe34-45c4-9ee0-00956b5f89de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nick (Cloud Platform Support)

unread,
May 2, 2016, 1:52:11 PM5/2/16
to Google App Engine
Hey Ryan,

I'm unsure that this indicates that App Engine specifically is being rate-limited. It's likely that the 429 response is directly related to the frequency with which you're making requests, regardless of the origin of those requests. While not impossible, I suppose, it would be surprising if they were keeping track of App Engine IP ranges and applying a different rate-limit, and would require some thorough A/B testing to prove. So, I recommend just checking their documentation or, if the rate-limit is undocumented, benchmarking to attempt to determine it, and try to fly under it. Generally, exponential-backoff is a good tactic when dealing with rate-limiting.

Sincerely,

Nick
Cloud Platform Community Support


On Monday, May 2, 2016 at 11:57:15 AM UTC-4, Nickolas Daskalou wrote:
Hi Ryan,

It seems to be working fine for us (SocialPage.me).

Are you accessing their API using separate access tokens for each user?

Nick
On 2 May 2016 at 14:30, Ryan Barrett <goo...@ryanb.org> wrote:
hi all! just FYI, it looks like Instagram is blocking/rate limiting App Engine's IPs from fetching www.instagram.com, both urlfetch and sockets, across apps. e.g. this session from https://shell-hrd.appspot.com/ :

>>> urllib2.urlopen('https://www.instagram.com/snarfed/')
Traceback (most recent call last):
...
  File "/base/data/home/runtimes/python/python_dist/lib/python2.5/urllib2.py", line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown

it's not 100% consistent - i occasionally see requests make it through - but the majority get 429ed.

not holding my breath, but i figured you all might want to know, especially in case cloud support people have lines of communication open with instagram/facebook for this kind of thing.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.

Ryan Barrett

unread,
May 4, 2016, 1:09:35 PM5/4/16
to Google App Engine
thanks for the replies! i should have emphasized that this is for www.instagram.com, not the API. API requests are working fine.

you're right that IP blocking wouldn't usually be the first culprit in general, especially for 429s. i tried from a few different apps, though, including shell-hrd (log in my first post), which pretty much never uses urlfetch otherwise based on its quota numbers, so i doubt it's User-Agent blocking. i tried an entirely new www.instagram.com URL and still got a 429, so it's probably not specific URLs, at least due to my own traffic. and i can fetch the same URL fine from my local machine. hence my IP suspicion.

i've already worked around this, so it's not urgent. just figured you all might want to know. thanks again!


On Monday, May 2, 2016 at 10:52:11 AM UTC-7, Nick (Cloud Platform Support) wrote:
Hey Ryan,

I'm unsure that this indicates that App Engine specifically is being rate-limited. It's likely that the 429 response is directly related to the frequency with which you're making requests, regardless of the origin of those requests. While not impossible, I suppose, it would be surprising if they were keeping track of App Engine IP ranges and applying a different rate-limit, and would require some thorough A/B testing to prove. So, I recommend just checking their documentation or, if the rate-limit is undocumented, benchmarking to attempt to determine it, and try to fly under it. Generally, exponential-backoff is a good tactic when dealing with rate-limiting.

Sincerely,

Nick
Cloud Platform Community Support

On Monday, May 2, 2016 at 11:57:15 AM UTC-4, Nickolas Daskalou wrote:
Hi Ryan,

It seems to be working fine for us (SocialPage.me).

Are you accessing their API using separate access tokens for each user?

Nick
On 2 May 2016 at 14:30, Ryan Barrett <goo...@ryanb.org> wrote:
hi all! just FYI, it looks like Instagram is blocking/rate limiting App Engine's IPs from fetching www.instagram.com, both urlfetch and sockets, across apps. e.g. this session from https://shell-hrd.appspot.com/ :

>>> urllib2.urlopen('https://www.instagram.com/snarfed/')
Traceback (most recent call last):
...
  File "/base/data/home/runtimes/python/python_dist/lib/python2.5/urllib2.py", line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown

it's not 100% consistent - i occasionally see requests make it through - but the majority get 429ed.

not holding my breath, but i figured you all might want to know, especially in case cloud support people have lines of communication open with instagram/facebook for this kind of thing.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

Nick (Cloud Platform Support)

unread,
May 4, 2016, 3:02:05 PM5/4/16
to Google App Engine
Hey Ryan,

So, you're attempting merely to fetch http://www.instagram.com/, and you receive 429 on the first request, and you're not launching many other requests at the same time? It seems odd that a rate-limit response would come without a condition being reached requiring rate-limiting... Let me know what you think in your reply.

Cheers,


Nick
Cloud Platform Community Support 

Ryan B

unread,
May 4, 2016, 3:21:46 PM5/4/16
to google-a...@googlegroups.com
On Wed, May 4, 2016 at 12:02 PM, 'Nick (Cloud Platform Support)' via
Google App Engine <google-a...@googlegroups.com> wrote:
> So, you're attempting merely to fetch http://www.instagram.com/, and you
> receive 429 on the first request, and you're not launching many other
> requests at the same time? It seems odd that a rate-limit response would
> come without a condition being reached requiring rate-limiting... Let me

i'm actually fetching profile URLs, not the front page. eg `import
urllib2; urllib2.urlopen('https://www.instagram.com/kevin/')` in
https://shell-hrd.appspot.com/ gets 429ed even though i'm not fetching
that particular URL in any of my apps.

it definitely seems odd, agreed. i only suspect rate limiting/blocking
at the IP level because i exhaused the other obvious causes. i'd be
happy to be proven wrong!
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> google-appengi...@googlegroups.com.
> To post to this group, send email to google-a...@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

--
https://snarfed.org/

Nick (Cloud Platform Support)

unread,
May 5, 2016, 5:13:49 PM5/5/16
to Google App Engine
Hey Ryan,

After some extensive testing, I've determined that the 429 you're receiving is expected behaviour from instagram, and it does relate to a windowing average, although it may not be the same as that published in their documentation for APIs. After sending a few thousand requests in a span of ~15 seconds, I began to receive 429 responses, with some 200's intermixed.

Cheers,

Nick
Cloud Platform Community Support 

Ryan B

unread,
May 5, 2016, 6:36:12 PM5/5/16
to google-a...@googlegroups.com
thanks for going above and beyond, nick! much appreciated. i'm
currently working around it by using a reverse proxy outside of app
engine, so that my requests are charged to a different IP and isolated
from other app engine apps. glad this info is here now for other
people too.
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> google-appengi...@googlegroups.com.
> To post to this group, send email to google-a...@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/29c32354-cc82-452b-bc8f-fc4f5a62e464%40googlegroups.com.

Nick (Cloud Platform Support)

unread,
May 6, 2016, 12:47:57 PM5/6/16
to Google App Engine
Hey Ryan,

Glad to be of assistance, and I really want to get to the bottom of this. Reviewing the infrastructure used by UrlFetch, this absolutely does make sense, when we consider this tantalizing detail from the documentation:

The URL Fetch service uses an HTTP/1.1 compliant proxy to fetch the result.

This is likely behind the cause of the 429 issues. Could you share some details for my edification and for the benefit of future users, as to which sort of proxy configuration you're using, the type of proxy, the average request load, etc.? 

Cheers


Nick
Cloud Platform Community Support

>> > Visit this group at https://groups.google.com/group/google-appengine.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> https://snarfed.org/
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Ryan Barrett

unread,
May 6, 2016, 2:57:49 PM5/6/16
to google-a...@googlegroups.com
On Fri, May 6, 2016 at 9:47 AM, 'Nick (Cloud Platform Support)' via
Google App Engine <google-a...@googlegroups.com> wrote:
> Hey Ryan,
>
> Glad to be of assistance, and I really want to get to the bottom of this.
> Reviewing the infrastructure used by UrlFetch, this absolutely does make
> sense, when we consider this tantalizing detail from the documentation:
>
>> The URL Fetch service uses an HTTP/1.1 compliant proxy to fetch the
>> result.

yup. if urlfetch is behind a small or even medium sized set of VIPs or
IP blocks, and instagram rate limits their www based on IP (individual
or block), that's it. you could data mine urlfetch's logs and find the
offending app(s), if any, and play the abuse whack-a-mole game,
but...meh.

> which sort of proxy configuration you're using, the type of proxy, the average
> request load, etc.?

sure! it's dirt simple, just apache mod_proxy with these lines in httpd.conf:

SSLProxyEngine on
<Location "/instagram/">
ProxyPass "https://www.instagram.com/"
</Location>

my load is miniscule and pretty constant, roughly 1-2qm on average.
most of that is to profile URLs (evenly spread across ~500 users), the
rest to individual photo URLs like eg
https://www.instagram.com/p/BE4xLpmABFz/.

i used to do 3-5x that much before i throttled down recently. i
haven't tried, but i expect i could ramp back up to that on the
reverse proxy and not get 429ed.
>> >> > https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>> >> >
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> https://snarfed.org/
>> >
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "Google App Engine" group.
>> > To unsubscribe from this topic, visit
>> >
>> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > google-appengi...@googlegroups.com.
>> > To post to this group, send email to google-a...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/google-appengine.
>> > To view this discussion on the web visit
>> >
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> google-appengi...@googlegroups.com.
> To post to this group, send email to google-a...@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/102fb6db-a520-41e0-8c2f-57d969560ad8%40googlegroups.com.

Nick (Cloud Platform Support)

unread,
May 6, 2016, 4:25:11 PM5/6/16
to Google App Engine
Thanks for the details! Hopefully this thread is useful to future users.
>> >> > To post to this group, send email to
>> >> > Visit this group at https://groups.google.com/group/google-appengine.
>> >> > To view this discussion on the web visit
>> >> >
>> >> >
>> >> > https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>> >> >
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> https://snarfed.org/
>> >
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "Google App Engine" group.
>> > To unsubscribe from this topic, visit
>> >
>> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > Visit this group at https://groups.google.com/group/google-appengine.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/google-appengine/29c32354-cc82-452b-bc8f-fc4f5a62e464%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> https://snarfed.org/
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Ryan B

unread,
May 15, 2016, 11:55:07 PM5/15/16
to google-a...@googlegroups.com
just to follow up, instagram has evidently stopped blocking/throttling
app engine's IPs, or whatever else was happening here. i can now
successfully fetch www.instagram.com profile and photo pages from a
few different app engine apps.
>> >> >> > https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>> >> >> >
>> >> >> > For more options, visit https://groups.google.com/d/optout.
>> >> >>
>> >> >> --
>> >> >> https://snarfed.org/
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to a topic in
>> >> > the
>> >> > Google Groups "Google App Engine" group.
>> >> > To unsubscribe from this topic, visit
>> >> >
>> >> >
>> >> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
>> >> > To unsubscribe from this group and all its topics, send an email to
>> >> > google-appengi...@googlegroups.com.
>> >> > To post to this group, send email to
>> >> > google-a...@googlegroups.com.
>> >> > Visit this group at https://groups.google.com/group/google-appengine.
>> >> > To view this discussion on the web visit
>> >> >
>> >> >
>> >> > https://groups.google.com/d/msgid/google-appengine/29c32354-cc82-452b-bc8f-fc4f5a62e464%40googlegroups.com.
>> >> >
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> https://snarfed.org/
>> >
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "Google App Engine" group.
>> > To unsubscribe from this topic, visit
>> >
>> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > google-appengi...@googlegroups.com.
>> > To post to this group, send email to google-a...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/google-appengine.
>> > To view this discussion on the web visit
>> >
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> google-appengi...@googlegroups.com.
> To post to this group, send email to google-a...@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/345134f9-d8dc-4da8-99b0-f418c51dfd02%40googlegroups.com.

Nick (Cloud Platform Support)

unread,
May 16, 2016, 1:09:07 PM5/16/16
to Google App Engine
An interesting update! Hard to say whether something on our side or theirs has changed, or if it's merely a matter of the average load coming from the UrlFetch proxies... If this rears its head again in future, I think it's worth reporting to the Public Issue Tracker.


Cheers,

Nick
Cloud Platform Community Support

>> >> >> > To post to this group, send email to
>> >> >> > Visit this group at
>> >> >> > https://groups.google.com/group/google-appengine.
>> >> >> > To view this discussion on the web visit
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>> >> >> >
>> >> >> > For more options, visit https://groups.google.com/d/optout.
>> >> >>
>> >> >> --
>> >> >> https://snarfed.org/
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to a topic in
>> >> > the
>> >> > Google Groups "Google App Engine" group.
>> >> > To unsubscribe from this topic, visit
>> >> >
>> >> >
>> >> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
>> >> > To unsubscribe from this group and all its topics, send an email to
>> >> > To post to this group, send email to
>> >> > Visit this group at https://groups.google.com/group/google-appengine.
>> >> > To view this discussion on the web visit
>> >> >
>> >> >
>> >> > https://groups.google.com/d/msgid/google-appengine/29c32354-cc82-452b-bc8f-fc4f5a62e464%40googlegroups.com.
>> >> >
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> https://snarfed.org/
>> >
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "Google App Engine" group.
>> > To unsubscribe from this topic, visit
>> >
>> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > Visit this group at https://groups.google.com/group/google-appengine.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/google-appengine/102fb6db-a520-41e0-8c2f-57d969560ad8%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> https://snarfed.org/
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
Reply all
Reply to author
Forward
0 new messages