Is MapReduce still a flexible solution on AppEngine under the new pricing model?

Raymond C.

unread,

May 20, 2011, 8:56:01 AM5/20/11

to google-a...@googlegroups.com

As I know MapReduce rely on a relative large number of instances (on top of the normal traffic) to perform the calculation efficiently in parallel. Under the new pricing model each instance will cost you 15min idle time after the job is done. Therefore 15min times n instances are wasted (cost you without using them). If n=8 (for a relatively small and slow task), there will be an additional cost of $0.16 just for one MapReduce operation. It will be very costly if you are doing sth like hourly job like reporting. 8 instances will cost you $115.2/month for hourly MapReduce task, which is in additional to the cost of the actual run time, just for MapReduce tasks.

My question is, is it still a flexible mechanism on AppEngine? Or we should rely on external service to do these kind of calculation? (complex but could be more cost effective?)

Barry Hunter

unread,

May 20, 2011, 9:11:02 AM5/20/11

to google-a...@googlegroups.com

There are two mitigating things here,

1) The scheduler and instance concurrency is set to improve - meaning
less instances anyway - to do the same workload.

2) If you doing a regular job, then reserved instances would most
likely save you money.

(And another possible, with Backends, could perhaps use a more
powerful instance to zip though tasks even quicker)

(or the shorter answer - this is still all based on speculation of the
new system, don't make decisions yet)

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

johnP

unread,

May 20, 2011, 9:56:57 AM5/20/11

to Google App Engine

Wow. So many fundamental design assumptions are being turned on their
heads with the new incentive model!!!

It is unfortunate that Google failed to make the 100% granular cost
model work. The promise that made Appengine attractive was: You build
an app (adhering to our limitations). We will make it scale, and you
pay only for what you use. This was a clear promise that only an
amazing company could provide.

But as time went on, the promise has crumbled, brick by brick.
- The limitations became more intolerable (No SSL, No Data backups,
Reliability and uptime, > 2,500 open issues in the bug tracker).
- The 'build it and we will scale it' promise retracted to "build it
and we will scale IF your response time is lower than 800 ms (wasn't
600ms also mentioned?)."
- The 'pay for what you use' promise has become, 'Amazon charges this
way - we can too."
- Finally, "Our database architecture was wrong. Pay to migrate to
our new datastore, which has the advantage of being reliable."

Google has been stating recently, "The new pricing makes it viable for
us to continue to provide Appengine." But Appengine will exist into
the future only if it is profitable for Google *AND* if customers find
it valuable.

It would be nice to see the Appengine value proposition restated.
Given the new incentive model, what makes Appengine amazing?

Thanks for listening.

On May 20, 5:56 am, "Raymond C." <windz...@gmail.com> wrote:
> As I know MapReduce rely on a relative large number of instances (on top of
> the normal traffic) to perform the calculation efficiently in parallel.
> Under the new pricing model each instance will cost you 15min idle time
> after the job is done. Therefore 15min times n instances are wasted (cost
> you without using them). If n=8 (for a relatively small and slow task),
> there will be an additional cost of $0.16 just for one MapReduce operation.
> It will be very costly if you are doing sth like hourly job like reporting.
> 8 instances will cost you $115.2/month for hourly MapReduce task, which is

> *in additional* to the cost of the actual run time, just for MapReduce

johnP

unread,

May 20, 2011, 10:11:46 AM5/20/11

to Google App Engine

One more point. We all expected prices to increase. What was a
surprise is for the incentive model to flip as much as it did.

Maybe a compromise pricing model is not based on # of instances, but
on sum of response times? This would eliminate the customer paying
for inefficiencies in the scheduler. Would return the customer value
proposition to exactly "Pay for what you use". And would bill the
same resource you are intending to bill. Maybe it's a semantic
difference, but one that would retain the attractiveness of
Appengine's original, amazing, revolutionary customer promise.

johnP

Vinuth Madinur

unread,

May 20, 2011, 10:20:09 AM5/20/11

to google-a...@googlegroups.com

+1

Interesting suggestion on pricing as a sum of response times, which is what a user should be worried about, not tinkering with the scheduler.

Stephen

unread,

May 20, 2011, 10:44:07 AM5/20/11

to google-a...@googlegroups.com

On Fri, May 20, 2011 at 2:11 PM, Barry Hunter <barryb...@gmail.com> wrote:
>
> 2) If you doing a regular job, then reserved instances would most
> likely save you money.

I don't think so. Although reserved instance hours cost less, you pay
for hours you don't use as well as those you do. Your average price
only approaches the reserved price if you use every single hour.

Reserved instance hours are a bad fit for bursty map-reduce type jobs.

Stephen

unread,

May 20, 2011, 10:51:19 AM5/20/11

to google-a...@googlegroups.com

On Fri, May 20, 2011 at 3:11 PM, johnP <jo...@thinkwave.com> wrote:
>
> One more point. We all expected prices to increase. What was a
> surprise is for the incentive model to flip as much as it did.

I expected prices to decrease. After 3 years of Moore's Law, why would
it cost more?

nickmilon

unread,

May 20, 2011, 6:49:09 PM5/20/11

to Google App Engine

Very interesting question:
"Is MapReduce still a flexible solution on AppEngine under the new
pricing model ?"
My answer: probably not, new pricing model makes mapreduce operations
a no - no. Price will be prohibitive for such operation especially
ones that depend on many instances to run a job fast, unless those
used to take hours rather than minutes to complete.
So I guess the team can drop the "reduce" part and query based
mapreduce things from roadmap, new model renders those irrelevant for
most use cases.
Also drawing a "danger - high $$$" icon as a precaution next to copy/
delete model buttons on control panel would be a good idea.

Nick

On May 20, 3:56 pm, "Raymond C." <windz...@gmail.com> wrote:
> As I know MapReduce rely on a relative large number of instances (on top of
> the normal traffic) to perform the calculation efficiently in parallel.
> Under the new pricing model each instance will cost you 15min idle time
> after the job is done. Therefore 15min times n instances are wasted (cost
> you without using them). If n=8 (for a relatively small and slow task),
> there will be an additional cost of $0.16 just for one MapReduce operation.
> It will be very costly if you are doing sth like hourly job like reporting.
> 8 instances will cost you $115.2/month for hourly MapReduce task, which is

> *in additional* to the cost of the actual run time, just for MapReduce

Jason Collins

unread,

May 21, 2011, 12:01:35 PM5/21/11

to Google App Engine

We have applications that do large batches of work - mapper-driven,
fantasm-driven, other custom fan-out driven. It is a very, very
powerful feature of App Engine to be able to scale out massively to
handle these large jobs in a single spike of work. No other platform
provides this capability and it is a very important part of how our
applications operate.

I really, really hope the scheduler can be made very aggressive. Maybe
there is a way to identify one of these jobs (a custom HTTP header)
and allow the scheduler to be very aggressive spinning up and spinning
down these instances? The nature of these jobs are such that state is
not important; we just want many, many instances to do a very small
chunk of work in parallel. I realize that the startup requests for
these instances needs to be slimmed up to provide this rapid ramp-up.

Does anyone have any thoughts as to how we might approach this under
the new billing model?

j

Raymond C.

unread,

May 31, 2011, 2:20:43 AM5/31/11

to google-a...@googlegroups.com

> 2) If you doing a regular job, then reserved instances would most
likely save you money.

But you pay for 24hr per days for each reserved instance even you are not using them. So in my hourly report case, instead of wasting 15 min x N instance per hour, you are going to waste 45 min x N instance if the MapReduce task finish in 15 min ( more if the task complete earlier). Reserved instance cost 5/8, that means reserved instance will cost you 5/8 x 3 = 1.8 times on the idle CPU.

Gregory D'alesandre

unread,

May 31, 2011, 5:57:54 PM5/31/11

to google-a...@googlegroups.com

Hi Raymond, reserved as actually reserved instance-hours. They are a pre-commitment to buying a certain number of instance hours per week. The instance-hours can be spent as you see fit. So, to be clear you are NOT paying for 24hr per day for each reserved instance even if you are not using them.

Greg

Reply all

Reply to author

Forward