I have Java Servlet code (currently hosted elsewhere) that responds
very quickly to requests. My code relies on a background spider
process that collects and updates data from a number of sources once
every hour. This process takes around 2 minutes on test hardware, but
could take as long as 15 minutes on bulk production data. I have
built it so it is low intensity (using only one socket), so that user
HTTP requests can be handled concurrently without any lag.
Due to third party timeout requirements, I don’t think I can split
this background activity into 30 second chunks.
Questions:
Is there an exception in the 30 second rule for background tasks?
If not, should there be?
> very quickly to requests. My code relies on a background spider > process that collects and updates data from a number of sources once > every hour. This process takes around 2 minutes on test hardware, but > could take as long as 15 minutes on bulk production data.
I think one of the flipsides to zero-configuration scalability is that
you have to keep your requests short and snappy. That's why there is a
limit on the size of data structures and a limit to how long your
instances of web code live for. It makes the scalability *work* and
without it, you would just create other constraints elsewhere - a
shame when pulling lots of data out of your app is only happening a
fraction of the time. Imagine if enabling that forced you to make
compromises with the other 99.99% of your page impressions?
There will be a way to break up your task and even if you think this
is crazy and it would be so much simpler if you could just dump 20 MB
of data across the network on other platforms don't forget about the
amazing benefits app engine gives you in reward for coding within
these constraints.
On Apr 23, 11:19 am, Whitemice <markbr...@zedray.co.uk> wrote:
I agree with what you’ve written, it’s just that I’m describing a
different kind of thread lifecycle requirement. My background thread
does not need to scale, and because of my architecture the other page
impressions benefit from having updated data stored locally. Other
approaches are orders of magnitude less efficient.
Coming from an Android programming background (my requesting
application is an Android device), I see that Google have recognised
this requirement by creating a “background Service” process which can
run for longer and have different responsiveness requirements to
regular Activities.
It just sounds like hitting the same process every 30 seconds (to
simulate a longer running process) breaks the sprit of the original
rule.
On Apr 23, 2:42 pm, Whitemice <markbr...@zedray.co.uk> wrote:
> Coming from an Android programming background (my requesting
> application is an Android device), I see that Google have recognised
> this requirement by creating a “background Service” process which can
> run for longer and have different responsiveness requirements to
> regular Activities.
I too am looking at android and would like appengine to act as its
backend.
To be fair to google, their decision to implement services on Android
and their decision *not* to implement them on app engine isn't
contradictory in my opinion, just different approaches for very
different platforms.
> It just sounds like hitting the same process every 30 seconds (to
> simulate a longer running process) breaks the sprit of the original
> rule.
I know *exactly* where you are coming from! It's almost as if you are
abusing the system if you do this! But what *is* the 'spirit of the
rule'?
I'm guessing at the reasoning behind the decision to limit the process
to 30 seconds (which used to be 10 seconds) but I'm sure it is all
about making scaling easier and *not* about stopping you hogging
resources. And not having to worry about users hogging resources is
what google get by making it scale well. Hosting providers who do not
do the scaling for you automatically still have to worry about
developers hogging resources on shared hosting.
If you chop processing time and storage requirements into smaller
chunks, then that is easier to distribute. If you chop your big job
into 100 smaller jobs then it isn't cheating or hacking around a
limitation, it's removing the impact that your large job has by making
it easy to distribute. It also frees up the hardware that other
application developers *need* for scaling of their apps.
>>I too am looking at android and would like appengine to act as its backend. To be fair to google, their decision to implement services on Android and their decision *not* to implement them on app engine isn't contradictory in my opinion, just different approaches for very different platforms.<<
Agreed, I was just reaching for the closest available metaphor.
>>I know *exactly* where you are coming from! It's almost as if you are abusing the system if you do this! But what *is* the 'spirit of the rule'?....<<
You make a very good point, as I was assuming that the rule was about
making code responsive (which I know a lot about) rather than about
process distribution (of which I know nothing). *If* you are correct,
then I could rewrite my cron loop to generate self referencing
requests (in up to 30 second chunks) which would hammer the server and
make it perform the required work.
Could someone who knows recommend a strategy for doing this, and
confirm that this kind of thing is permitted?
Another approach for Google would be to limit processes intensity
(CPU, memory usage, etc) rather than elapsed runtime (i.e. for non-
HTTP request activities), although I now understand that this is a
less efficient solution for distributed systems (as well as a way to
let bad code run for longer).
On Thu, Apr 23, 2009 at 11:48 AM, Whitemice <markbr...@zedray.co.uk> wrote:
> >>I too am looking at android and would like appengine to act as its
> backend. To be fair to google, their decision to implement services on
> Android and their decision *not* to implement them on app engine isn't
> contradictory in my opinion, just different approaches for very different
> platforms.<<
> Agreed, I was just reaching for the closest available metaphor.
> >>I know *exactly* where you are coming from! It's almost as if you are
> abusing the system if you do this! But what *is* the 'spirit of the
> rule'?....<<
> You make a very good point, as I was assuming that the rule was about
> making code responsive (which I know a lot about) rather than about
> process distribution (of which I know nothing). *If* you are correct,
> then I could rewrite my cron loop to generate self referencing
> requests (in up to 30 second chunks) which would hammer the server and
> make it perform the required work.
> Could someone who knows recommend a strategy for doing this, and
> confirm that this kind of thing is permitted?
I can't give you a detailed strategy for doing this at the moment, but I can
confirm that it is reasonable and permitted to break up longer background
tasks into shorter background tasks.
> Another approach for Google would be to limit processes intensity
> (CPU, memory usage, etc) rather than elapsed runtime (i.e. for non-
> HTTP request activities), although I now understand that this is a
> less efficient solution for distributed systems (as well as a way to
> let bad code run for longer).
Hi Toby
Judging by your @google.com email address I’ll regard this as a semi-
official Google response and start thinking about my implementation.
May I suggest that the App Engine documentation be updated to include
this scenario, as it currently puts off developers. Also (if no one
else does), I will put a tutorial together on my blog showing my
implementation for other developers to follow (and to solicit outside
critique).
Have you looked at the AppEngine roadmap? Asynchronous processing
driven off an event queue is high on the list. Google now has a pretty
good track record of stating intentions and delivering into AppEngine
so my guess is we will hear more about async tasks at Google I-O.
>> Asynchronous processing driven off an event queue is high on the list.<<
Yes, that’s exactly what I was looking for! :-D
I will delay my Java Google App Engine migration till after this
rollout, and instead concentrate on getting the Android bit finished.
Thanks everyone for improving my understanding.
Mark