New Pricing + Long running requests optimization

71 views
Skip to first unread message

ESPR!T

unread,
Sep 13, 2011, 7:43:33 PM9/13/11
to Google App Engine
Hello guys,

I would like to get an advice how to deal with long running request
when the new pricing will apply.

My app is searching for the cheapest book prices on the internet. So
when user searches for the book and the price is not in database, I
have to check 20+ sellers for the current price (some of those have
API and sometime I get the info from their web by parsing the page). I
have to display these prices instantly to user - basically I would
like to get to 10 seconds at max). I am using tasks so when user
displays a page, I create a task for each supplier and update the page
trhough ajax while user is waiting for results. On the old pricing
scheme I haven't had a reason for any optimization as these request
are low CPU intensive, they just take 1-10 seconds to finish depending
on speed of supplier source data. App engine just started multiple
instances when needed and I've never get over 1 hour of CPU in a day.

But now when they switch to instance time my prognozed payments went
up from $0 to $2+ per day (just for a instance time + plus another
fees for database writes/reads). I've put the max idle instance to 1
and raised the latency to max which got me back 15 hours of instance
time (so I would pay 0-5 cents per day just for DB operations now).
Then I've implemented memcache + I am going to play with cach headers
for generated pages.

The main issue for me is now that my one instance is not able to
handle 20+ task requests + users browsing so its slow and it
eventually still starts new instance on peaks (but I was expecting
this). So my idea is to have one fron instance for all user related
stuff and process tasks instantly by other instance. On the app engine
I could use backends but I am not really keen to pay almost $2 per day
for running a minimal backend for this type of tasks.

So I am deciding to 'outsource' the task processing to some external
service which is more userfriendly for low CPU/long latency requests.
AFAIK there is an Amazon with the AWS Elastic Beans (http://
aws.amazon.com/elasticbeanstalk/) and the Heroku (which is capable to
run java now) - I also still have an option to put my little worker
app to some very cheap VPS or my own machine. So basically my GAE
instance will queue up all taks and send them to my external workers
which then will call my app back with an HTTP post with results.

Do you think this is a good approach for my task or can you see some
issues in it? Or maybe you have some other ideas for other providers
compatible with java environment?

JH

unread,
Sep 13, 2011, 8:09:35 PM9/13/11
to Google App Engine
sounds like you could use async urlfetch instead of 20 separate tasks?

Jeff Schnitzer

unread,
Sep 13, 2011, 8:17:16 PM9/13/11
to google-a...@googlegroups.com
+1 to this.

There's currently a limit of 10 async requests pending at once, but
the worst case will still be the latency of two calls.

Jeff

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

ESPR!T

unread,
Sep 13, 2011, 8:41:04 PM9/13/11
to Google App Engine
Sorry could you be more specific?

I have to display a page to user while I am fetching the prices on
background (sometimes I have prices for 10 suppliers already in db and
I need to fetch just 10 more). I though that async url fetch is more
like like that you can call a fetch from url on background and then
display the prices so in my case if the fastest supplier answer in 1
sec and slowest in 30 secs the user would have to wait 30 sec for the
results) - or maybe just didn't get what did you mean.

Tim Hoffman

unread,
Sep 13, 2011, 9:43:12 PM9/13/11
to google-a...@googlegroups.com
Hi

You could submit the request via ajax back to your appengine app
and it can then do an async requuest on all the urls, .

In your case you have some of the info already  and have to fetch some of it.
So it might be two ajax calls, one to get the list of books, the result is 
book prices for stuff you know, plus an indicator of the books that a further request
will be required, your front end can then display the details you have, submit another 
ajax request to appengine to fetch results for the books you currently have no info on.
Which can then  async urlfetch the rest of the details.

This way user gets some info straight away and you get to keep you requests to a minimum
and fill in the results later.

Just a thought ;-)

T

Rishi Arora

unread,
Sep 13, 2011, 9:57:02 PM9/13/11
to google-a...@googlegroups.com
My app is nearly identical to yours - several concurrent URL fetches are performed to "gather" content.  And when users access my site, that content is nicely formatted for them for display.  My solution - part of if has already been suggested by Jeff - use async URL fetch.  Spinning an instance for 5 to 10 seconds waiting for your supplier's website to return book pricing data is a waste of resources.  So, whenever you need to do a URL fetch, consider deferring it by queueing it up in a pull queue.  Then once you have enough deferred (10 is a good number), then call URL fetch simultaneously for all the 10 requests.  Another optimization is to use by the 9 free hours of backend time.  I agree you can't have the backend running all the time.  So, wake up the backend by a cron job that runs once every hour.  This will incur a minimum cost of 15 minutes per hour = 6 instance hours - which is under the free quota.  Each time your backend wakes up, it looks up all the URL fetch requests deferred in the last hour, and processes them.  My app does exactly this, and it takes me about 45 seconds to fetch and process all data for ~300 URL fetches every hour.

If you are attempting to stay within the free quota, absolutely use the backend hours in any way you can.  it'll be a pity to not use those free 9 instance hours.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/3dA05F9-QDsJ.

Gerald Tan

unread,
Sep 13, 2011, 10:58:07 PM9/13/11
to google-a...@googlegroups.com
You can try having your Max Idle Instances at 1, but leave your Min Pending Latency to a small number. This allows additional instances to spin up to handle other requests, but will not increase your Total Instance Hour significantly, as you will now only be charged for Active Instance (the orange line) instead of Total Instance (the blue line). The Scheduler will spin up additional instances to handle your request but you only pay for them when they are actually active.

Once the free quota for Total Instance Hours increase to 28 it shouldn't be too hard to remain under that.

ESPR!T

unread,
Sep 13, 2011, 11:40:33 PM9/13/11
to Google App Engine
With the backends I can't display data to use when he ask for them -
he can't wait next hour for getting the prices of books as he wants
them now. So I need to run the backend all the item and process price
requests as soon as possible (and as there is 5+ million books I can't
do a pre-fetch of some books on backend and then shut it down).

I will try to implement that async url fetch for batch of requests,
that can by actually fast - just some suppliers are really slow and
some of them fast so maybe I could also do two queues (one for slow
response and one for fast) so the user get info from fast responsive
suppliers quickly and doesnt need to wait) - the only issue is that
technically I should process the pull queue from backend so that means
I would need it to run all the time ;/ (I really need to display the
prices as soon as possible and can't wait with processing).

I will try to lower the latency so GAE can spin up more instance and
will see how it affect the instance time.

Thank you guys!

Rishi Arora

unread,
Sep 14, 2011, 12:20:40 AM9/14/11
to google-a...@googlegroups.com
Just out of curiosity - do book prices really change more than once every hour?  Surely, you could have some kind of a hybrid approach where you pre-fetch the prices of 5-10% of the most common books, so you don't have to fetch them when a user wants to see the prices.  This will alleviate some of the front-end instance time by offloading it to the backend that runs every hour.  Also, perhaps not all suppliers change their prices every hour - maybe only some fraction of them do.  So, run the backend only once a day for slow changing prices, and once an hour for faster moving prices, and for the 0.001% of the books that need more dynamic pricing, run a cron job every 2 minutes on a front-end, assuming it will get done in just a handful of seconds, so that the front-end can process user-facing requests at the same time.

I apologize if I don't fully understand your problem set, but I'm trying to throw out as many ideas as possible.

ESPR!T

unread,
Sep 14, 2011, 12:53:50 AM9/14/11
to Google App Engine
1) I am fetching the price for book only once a day when someone ask
for it (after that is being served from database)
2) there is nothing like most common books - user comes in and ask for
anything - I can't show him blank prices and tell him come again in an
hour
3) 5-10% of books is more than 500 000 books * 50 suppliers =>
millions of request just as a prefetch (I don't want to fetch that
each day just because I suppose that some1 would like to see the
prices for particular book)

Let say I will have 2000 common books I am doing prefetch on backend
each day - thats fine but my issue is how to serve those people who
ask for book I do not prefetch. its nromal that I will need to process
hundred fetches each 2 minutes - this is not going to take a few
seconds as each req to outside will take ~1-2 seconds in average (and
I guess I can run 200 fetches to outside at one time to get it done in
10 secs).

BTW the url is http://www.librari.st/ if you want to check how is it
working now (don't laugh, its not ideal and its just a project for
fun :)

Cheers and thanks for your ideas - I really appreciate that.
Reply all
Reply to author
Forward
0 new messages