Re: Project too successful - due to Arctic Sea Ice Drama.

156 views
Skip to first unread message

Richard Watson

unread,
Aug 22, 2012, 5:40:59 AM8/22/12
to google-a...@googlegroups.com
Which resources are being hit the hardest? Outgoing bandwidth?

On Wednesday, August 22, 2012 6:51:45 AM UTC+2, Torsten Becker wrote:
Hi,

since two years I’m running a blog at GAE focusing the Arctic and as an unique feature a Google Map with daily high resolution Arctic satellite images from NASA was included. The NASA images need to be tiled, cached and served and this process runs on GAE, too.


Usually interest is low during dark Arctic Winter and rises in September the time sea ice reaches its minimum extent. No problem so far with the free quota on outgoing bandwidth.

This year is different: Latest Arctic storm reduced sea ice extent by a million square kilometers in a week and public interest was so high free quota was exhausted 5 hours after reset.

Now - end of August - it is absolutely clear that this year will or has already broke all records in terms of sea ice minimum and makes a major step direction ice free-ness. When in a few years the Arctic lacks sea ice completely in September, it will change weather pattern all over the northern hemisphere - one explanation of accelerating public interests.

I’d like to mention the project is ad free and totally beyond any economic interests. All I want is to keep it running and give everybody on the planet the chance to see with his own eyes how dramatic the situation in the Arctic is. True color satellite images are free of interpretation and do not lead to discussions whether there is sea ice or not.

Here is the thing: If I enable billing to satisfy the need for pure information I’m bankrupt next month. If not 99% of the users are going to see nothing, get frustrated and possibly never come back.

So my best option is to close the site now.

What do you think?

-- Torsten

Rob Coops

unread,
Aug 22, 2012, 5:55:13 AM8/22/12
to google-a...@googlegroups.com
The big question for me is where are you serving these images from?
If you are serving them directly from NASA servers or from an alternative source then you would most likely see very little traffic as most of it will be just URI's pointing to the images. There are a lot of hosting companies out there that claim to deliver unlimited bandwidth (not true I am sure, but worth giving it a shot). GAE seems to have been purely at delivering functionality actual content should be served from other locations if you want to keep bill within reason.

If you are already showing people the images from an location other then GAE you will have to choose to limit the amount of data people can consume (only a static image no zooming, panning or any other fancy stuff) informing visitors that you are unable to afford this functionality due to the high bandwidth demands. If that still won't do it you could attempt to find some company willing to sponsor your efforts, unfortunately the nature of GAE means that portability is not to great so moving to an alternative host will be difficult at best.

I have seen many projects that got sponsorship from companies or government organisations to allow them to continue to provide the unique information to viewers.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/OTYSfYfbDgoJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

noiv

unread,
Aug 22, 2012, 7:09:34 AM8/22/12
to google-a...@googlegroups.com
Thanks Rob, Richard,

the image tiles are directly served from GAE and only outgoing bandwidth limits capacity. I'm going to reach out in Winter for sponsoring, but the question is how to survive next 8 weeks?

If I understand you correctly there might be an option to put a provider with unlimited bandwidth in front, so GAE serves each image only once. How could I start that?

-- Torsten 

Barry Hunter

unread,
Aug 22, 2012, 7:25:28 AM8/22/12
to google-a...@googlegroups.com
I would perhaps suggest trying a linux Amazon Micro instance sitting
in front. Just a basic install of varnish should do the trick*
http://harish11g.blogspot.co.uk/2012/03/varnish-page-cache-aws-configure.html
(and make sure your application is serving headers that allow caching)

Can get it free for a year:
http://aws.amazon.com/free/

Although you do only get 15 GB of outgoing bandwidth free. After than
it is $0.120 per GB - pay as go, no minimum fee.

But that might be enough to tie you over for a few weeks.


I dont know of a provider that gives you unlimited bandwidth for free.
If they do, its probably really bad (slow/unreliable), so as to be not
worth the bother.
Can get a reasonable VPS tho, for about $15 a month. Some offer
generous bandwidth allowances.


But if you going to be paying about $15 a month, may as well pay that
for appengine directly, its also 0.12/Gb. Your $9 a month minimum
gives you 75Gb a month.




* If one instance struggles with the load, can also get a free windows
instance. And an elastic load balancer, but the configuration is
getting much more complicated then.
> https://groups.google.com/d/msg/google-appengine/-/z9jcS0stqjoJ.

Rob Coops

unread,
Aug 22, 2012, 7:42:42 AM8/22/12
to google-a...@googlegroups.com
That really depends on how you get your images...

Assuming the images are accessible via http on the NASA servers I would begin by using that and let NASA deal with the traffic. Contact their web admin and inform them about the expected traffic and ask them if this is going to be a problem before you do so. No reason to upset a US government agency if you don't have to ;-)

If NASA is not happy with the idea of so much additional traffic (I would expect them to protest) or you cannot access the images via http but only after logging in or via some other protocol etc.
Then I would basically find any provider that is offering web hosting where I could run a code from cron or a similar scheduler and of course the unlimited bandwidth bit in the conditions. After that simply run a script from a scheduler to pull the images from NASA over to this host and figure out a way to inform your app about the URI of the images and any other additional data required (I would expect some geo-coding information would be needed to place the images correctly)

In both situation your current application provides the Google maps API with a URI to each of the images you want to overlay, this URI happens to be pointing to your GAE instance but there is no reason for that it could just as well point to another machine (like the one that you now have a copy of the NASA images stored) As far as the Maps API is concerned there is no problem it does it's thing and points the client to the image location using the URI. The client follows the URI requests the image and you save a lot of bandwidth.

That should do the trick, every image is now being served by NASA or another provider and you only point the clients to a source where they can cheaply access the images.


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/z9jcS0stqjoJ.

Barry Hunter

unread,
Aug 22, 2012, 7:52:22 AM8/22/12
to google-a...@googlegroups.com
Or just MaxCDN

http://www.maxcdn.com/pricing

Can get a Terabyte of free bandwidth. After that its $0.070 / GB. But
often with cheap packages available.


(I do use maxcdn, and the service is pretty good. I have had problems
with large number of concurrent requests. ie they struggle, serving a
page with many hundreds of images. But for less than 50 or so, is
fine)

Richard Watson

unread,
Aug 22, 2012, 9:20:27 AM8/22/12
to google-a...@googlegroups.com
Some options:

1) At the very least, ensure Google's Edge Cache is able to cache your images by adjusting cache control.  Search this forum for some thoughts on how.

2) You could try Google's PageSpeed service, although it could take time to set up (if they accept you):

3) Cloudflare.com, which should reduce the load significantly, and is fairly quick to set up.  No changes to your code.

4) Put the images on Amazon's S3 and CloudFront, but that'll mean some changes to publish images to it.

5) Use another CDN, as already mentioned.

Sameer Lodha

unread,
Aug 22, 2012, 11:03:30 AM8/22/12
to google-a...@googlegroups.com
I would suggest Cloudflare as they have no limits on bandwidth utilization & the bandwidth is completely free. You can make do with there free plan. Setup is trivial as well.


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/6B2A1AnEMhYJ.

Jeff Schnitzer

unread,
Aug 22, 2012, 12:12:36 PM8/22/12
to google-a...@googlegroups.com
The general consensus here is right - use a cache. This is useful on
any system that generates tiles, not just GAE. Hopefully your traffic
tends to predominantly hit a core set of tiles.

Make sure that you have a Cache-Control header with a reasonable
max-age. Nothing will work right without this.

GAE has a built-in edge cache with undocumented behavior and no
guarantee of service. That's the first layer of defense. If you
already have Cache-Control set then this isn't good enough.

Try CloudFlare. It's free, documented, and you will be able to see
charts of exactly how effective the cache is.

If CF isn't effective enough, look around at other CDNs. You could
set up a varnish instance yourself somewhere but you need to look
carefully at the bandwidth costs; it doesn't do any good to push the
cost around if you're still paying the same amount per gigabyte.

A quick web search turned up these guys advertising $0.06/gb:
http://www.scaleengine.com/esc/

(Referenced from
http://stackoverflow.com/questions/72369/whats-the-best-cdn-for-image-hosting-on-a-high-volume-web-site.
Also numerous recommendations for MaxCDN, already mentioned)

Jeff

Barry Hunter

unread,
Aug 22, 2012, 12:20:14 PM8/22/12
to google-a...@googlegroups.com
> GAE has a built-in edge cache with undocumented behavior and no
> guarantee of service. That's the first layer of defense. If you
> already have Cache-Control set then this isn't good enough.

Unless I missed something, you still pay for Outgoing bandwidth. So it
wont help. It will not save anything on a bandwidth-bound app.

Its just saving the requests from hitting instances - and the cost
savings that that brings.

But also pretty sure the edge-cache only works on Billing Enabled apps
anyway. (And only works on custom domains - not on appspot.com urls )

noiv

unread,
Aug 22, 2012, 2:26:54 PM8/22/12
to google-a...@googlegroups.com
Thanks a lot for all the ideas and input.

It's a custom domain not server via appspot.com.

There are about new 3200 tiles/images every day. The problem seems not to be a few users request same tile every minute or so. 

Maxcdn currently offers first TB for free and next 10 for 700$. But, since today GoogleMapsMania links to and new referrer are popping up fast. I calculate 1MB per visitor, that's just another receipt for bankruptcy in case of getting slashdotted.

I don't get CloudFlare. The Wiki is full of dead links and a change to the DNS records would solve everything? How do devs access the site then? But they promise to not charge for bandwidth. If all is true is seems perfect. I've started to google detailed instruction.

Thanks again. I now believe there are more options to consider and shutting down the site is no longer on top of the list.

-- Torsten   





Wilson MacGyver

unread,
Aug 22, 2012, 3:45:52 PM8/22/12
to google-a...@googlegroups.com
others have made very good suggestions. I don't see google cloud
storage for developers
mentioned. So I'll toss it out there. Basically it's CDN with very
good app engine integration.
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/QgNTaol2L8UJ.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.



--
Omnem crede diem tibi diluxisse supremum.

Jeff Schnitzer

unread,
Aug 22, 2012, 4:01:09 PM8/22/12
to google-a...@googlegroups.com
On Wed, Aug 22, 2012 at 2:26 PM, noiv <noi...@gmail.com> wrote:
>
> There are about new 3200 tiles/images every day. The problem seems not to be
> a few users request same tile every minute or so.

The most helpful stats would be:

* How many tile requests per day total?
* How many unique tile requests per day?
* What is the average size of a tile?

> Maxcdn currently offers first TB for free and next 10 for 700$. But, since
> today GoogleMapsMania links to and new referrer are popping up fast. I
> calculate 1MB per visitor, that's just another receipt for bankruptcy in
> case of getting slashdotted.

Maybe you can get Exxon to sponsor it ;-)

I presume you have already decreased the jpg quality as much as you are willing?

The first thing I notice when looking at your site is that you are not
caching anything. Not even in the browser - it refetches every single
time. I get this:

Cache-Control:private
Cache-Control:must-revalidate, post-check=0, pre-check=0

Switch to:

Cache-Control: public, max-age=3600

Make the age as long as you are willing. If the imagery changes every
day at midnight, you could use an Expires header to make it expire at
a particular time.

> I don't get CloudFlare. The Wiki is full of dead links and a change to the
> DNS records would solve everything? How do devs access the site then? But
> they promise to not charge for bandwidth. If all is true is seems perfect.
> I've started to google detailed instruction.

After you have fixed your cache headers, this is going to be your best
bet. CloudFlare is a reverse proxy. What makes them a little
different is that they handle DNS for you so it's trivial to turn on
and off the proxy. I actually use their DNS service for domains which
don't use CloudFlare's proxy because their DNS service is free and the
UI is not retarded (like, say, Rackspace's).

Set up a copy of your DNS records in CloudFlare. Make sure they are
correct. Then switch authority for your domain to CloudFlare. You
can turn on/off the proxy with a buttonclick; this causes DNS to
resolve either through CloudFlare's servers or directly to yours
(ghs.google.com). Keep in mind that it takes a few minutes for DNS
changes to propagate.

By the way, right now your app is making each tile request to
ice-map.appspot.com which gets 302 redirected to
lance-modis.eosdis.nasa.gov. This slows things down even more.

> Thanks again. I now believe there are more options to consider and shutting
> down the site is no longer on top of the list.

#1: Make tile requests to the correct URL
#2: Add a correct caching header
#3: Try CloudFlare

If that doesn't dramatically improve things, come back for more advice :-)

Jeff
Reply all
Reply to author
Forward
0 new messages