Reliability of URLFetchService

166 views
Skip to first unread message

Deepak Singh

unread,
Oct 25, 2012, 12:12:06 PM10/25/12
to google-a...@googlegroups.com
Hi Alll,

I want to discuss here your experience about GAE Java URLFetchService.

We are using async feature of this service to retrieve data from 3rd party servers and our business mainly depends on the data received from their servers.
I observe that UrlFetch fails many times with java.io exception and thus we lose our business.

So i would like to know your experience about its reliability, DeadlineExceededException cases, ways to handle it and all.

Let us know how reliable is URLFetchService(GAE Java) ?

Regards
Deepak Singh

Joshua Smith

unread,
Oct 25, 2012, 12:57:55 PM10/25/12
to google-a...@googlegroups.com
I use the python version, and get a couple failures a day. The easy answer is to treat it just like mail: always use a task, so that if it fails, it will retry.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Deepak Singh

unread,
Oct 25, 2012, 1:27:16 PM10/25/12
to google-a...@googlegroups.com
No. I can not go for task as it is not for backend. I need to return result with request.

So will it be better to use our old HttpurlConnection with threadManager ?

What you say guys..
--
Deepak Singh

Vinny P

unread,
Oct 25, 2012, 1:29:28 PM10/25/12
to google-a...@googlegroups.com
In my experience, the reliability of URLFetch is rock-solid. The problem is the external server that you're connecting to. The external server can be relatively fast to respond (many web APIs, such as Google's goo.gl shortener and so forth) or relatively slow and error-prone (the Reddit API in particular is just absolutely terrible to access from AppEngine; since all GAE urlfetches come from the same pool of IPs, the Reddit servers deliberately throttle requests because they think all of the requests are coming from a single poorly-behaved app, not multiple apps. And no, the reddit api does not offer oauth or similar authentication).

There's a couple of ways to mitigate this; you can use task queues to keep retrying a urlfetch, backends to continuously urlfetch and cache the results, find a different 3rd party service, etc.

-Vinny P

Joshua Smith

unread,
Oct 25, 2012, 1:51:19 PM10/25/12
to google-a...@googlegroups.com

On Oct 25, 2012, at 1:29 PM, Vinny P <vinn...@gmail.com> wrote:

In my experience, the reliability of URLFetch is rock-solid.

Then your experience is limited.

For example, I get occasional timeouts connecting to google analytics doing a hit count. And I get occasional timeouts connecting to a server in a data center that I put there myself and has basically no load whatsoever.

Like I said, I see one or two timeouts per day across all my GAE apps. Similar to the number of failures I see using the mail API.

-Joshua

Vinny P

unread,
Oct 25, 2012, 2:17:28 PM10/25/12
to google-a...@googlegroups.com


On Thursday, October 25, 2012 12:51:51 PM UTC-5, Joshua Smith wrote:

Like I said, I see one or two timeouts per day across all my GAE apps. Similar to the number of failures I see using the mail API.


A handful of timeouts per day, when I'm doing hundreds of thousands of urlfetches per day, is utterly inconsequential and still qualifies as rock-solid in my book.  There are better places to spend my time optimizing. I would dare to say that any hosted service would experience the odd and rare timeout; not due to software, but just simply due to randomness. Perhaps the external server was having a slow day. Perhaps a router in the middle decided to break down. There's literally an infinite number of things that could go wrong. 

If you're experiencing continuous problems with a specific server, then you might want to check that particular server for any issues.

-Vinny P

Vinny P

unread,
Oct 25, 2012, 2:44:37 PM10/25/12
to google-a...@googlegroups.com
If you absolutely, must have to return the urlfetch result within the same request, I would go with a UI change.

For instance, if you go to Expedia/Travelocity and search for flights, you'll see a long "Expedia is searching for the best deal" page with a stylized loading bar. On the backend, obviously these travel sites are doing a lot of work, connecting to computer systems of airlines, etc. When the site is done searching, it redirects you (or changes with AJAX) to a page with travel deals.

You can do the same. When your request comes in on the server, immediately enqueue a task, and send back a "Loading..." page to the user. The task queue can invoke another servlet to handle all the processing, and retry urlfetches as needed. When everything is done, write the result to the datastore. In the interim, the client side loading page will periodically check back in with the server (with AJAX or you can simply reload the same loading page). When the loading page detects that processing is complete,  send the results back to the user (AJAX, or redirect to another url).

-Vinny P

Jeff Schnitzer

unread,
Oct 25, 2012, 4:40:07 PM10/25/12
to google-a...@googlegroups.com
Translated: "URLFetch is rock-solid, except because it uses a shared
IP pool it will erratically fail if you use it to fetch from almost
any third-party service that pays attention to load." Which really
isn't very rock-solid at all.

The shared IP pool is a significant problem with URLFetch, and you
really need to be careful when using it. The standard workaround is
to set up your own proxy servers elsewhere on the net - a PITA but not
optional for many services. Here's the issue to star to hopefully get
Google to do something about the issue:

http://code.google.com/p/googleappengine/issues/detail?id=6644

FWIW, I've also found that URLFetch is occasionally less than snappy.
But there are a lot of moving parts involved so it's hard to figure
out exactly where to lay blame.

One thing to watch out for is that the default URLFetch timeout is
fairly short. I usually find it necessary to increase the timeout,
especially with services with erratic performance (eg Facebook).

Jeff
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/RgAEOStwEtMJ.

Deepak Singh

unread,
Oct 26, 2012, 3:09:14 PM10/26/12
to google-a...@googlegroups.com
Is it better to use our old HttpUrlConnection instead of URLFetch ?
Deepak Singh

Jeff Schnitzer

unread,
Oct 26, 2012, 3:11:59 PM10/26/12
to google-a...@googlegroups.com
They are one and the same. HttpUrlConnection uses the urlfetch service.

Jeff

Deepak Singh

unread,
Oct 26, 2012, 3:39:55 PM10/26/12
to google-a...@googlegroups.com
So either i use URLFetch / URLFetchAsync feature OR i make a connection through HttpUrlConnection and getInputStream, both are same ?

I mean behind the scene, they are working as same object.
So when URLFetch fails, it means HttpUrlConnection also fails ?

Regards
Deepak

Vinny P

unread,
Oct 26, 2012, 4:45:31 PM10/26/12
to google-a...@googlegroups.com
On Friday, October 26, 2012 2:40:38 PM UTC-5, Deepak Singh wrote:
I mean behind the scene, they are working as same object.
So when URLFetch fails, it means HttpUrlConnection also fails ?


Correct. It doesn't matter how you write your networking code or which classes you use; internally, AppEngine relies on URLFetch for *everything*. The only alternative is to sign up for the Sockets Trusted Tester program that was recently announced.

-Vinny P 

Julie Smith

unread,
Oct 27, 2012, 1:33:54 AM10/27/12
to google-a...@googlegroups.com
I am using Python (not Java), but I was receiving significant numbers of DeadlineExceeded errors (many per day). Since using this patch to increase the timeout to 10 seconds, I have not seen any DeadlineExceeded errors.

The fix for Python was to add the following code before importing httplib2:

# Fix for DeadlineExceeded, based on code from 
from google.appengine.api import urlfetch
real_fetch = urlfetch.fetch
def fetch_with_deadline(url, *args, **argv):
    argv['deadline'] = 10 # 10 second timeout
    logservice.flush()
    return real_fetch(url, *args, **argv)
urlfetch.fetch = fetch_with_deadline

I don't know what the equivalent is for Java, but this may give you some ideas.

Regards,

Julie

Deepak Singh

--
Reply all
Reply to author
Forward
0 new messages