----
Hello, dummy here.
I'm just beginning my first experiments with python 2.7 apps, using
"threadsafe: true". But I'm a clueless n00b as far as python goes.
Well, not a n00b, but still a beginner. And then this multi-threading
thing turns up, and I find myself groaning "oh man, really, does it
have to get this complex?" I think I hear a lot of similar groans out
there ;-)
I'm betting that the whole "multithreaded" thing in python appengine
apps is scaring plenty of people. I've done a lot of concurrent
programming, but the prospect of dealing with threading in python has
daunted me a bit because I'm a beginner with python and appengine as
it is - this just makes life harder. But hey, it's being added for a
reason; I'd best quit complaining and start figuring it out!
Thinking about threads and python, I realised that I didn't know how I
needed to actually use multi-threading to make my apps leaner and
meaner. I mean, why would I use them? They're for doing inherently
concurrent things. Serving up pages isn't inherently concurrent stuff,
at the app development level. What exactly is expected here? Shouldn't
the framework be doing that kind of thing for me?
And of course that was the aha moment. The framework *is* doing the work for me.
The situation with python appengine development up until now has been
that instances process serially. They take a request, see it through
to its end. They take another request. And so on. That's cool, but
instances spend a lot of time sitting around waiting when they could
be doing more work.
But with the new python 2.7 support, you can tell appengine that it
would be ok to give instances more work when they are blocked waiting
for something. eg: if they are doing a big url fetch, or a long query
from datastore, something like that, then it's cool to give them
another request to begin working on, and come back to the waiting
request later when its ready. You do that by setting "threadsafe:
true" in your app.yaml .
Being threadsafe sounds scary! But actually it shouldn't be a huge
deal. Pretty much it's about what you shouldn't do.
Multi-threading means having multiple points of execution on the one
codebase in the one address space. Anything you do to touch things
external to that (like datastore, memcache, url fetches) shouldn't
care about that (assuming the client libraries are threadsafe). And
normal code touching local variables will be fine.
Probably the only real thing you've got to worry about is using
instance memory (global variables more or less). That's because
multiple requests, ie: multiple threads, can come in and fiddle with
that global memory at the same time. You can fix that with some
concurrency primitives, but if that sounds scary you can just avoid
touching global memory in the first place.
So if you're using instance memory as part of a caching strategy, for
instance (caching like instance-memory -> memcache -> datastore), then
you either need to make the instance memory caching threadsafe, or
just stop using instance memory for that purpose.
The other big gotcha, implied by this issue with global memory, is
libraries. Which libraries are threadsafe? Plenty probably aren't,
especially some of those shady 3rd party python libs you found lying
around on code.google.com . Why not? Because they use global memory.
But the built in libs should be ok, unless we've been specifically
told they're not, and I don't recall any information like that.
Oh, and your app needs to use WSGI script handlers, presumably because
the cgi method we were recommended to use in py 2.5 apps is not
threadsafe.
So to sum up, if you aren't too sure about multi threading and want to
keep it simple, it seems like you can get your existing app processing
parallel requests by doing the following:
- Remove uses of global instance memory (if you don't know what that
means you're probably not doing it anyway)
- Remove/replace non threadsafe libraries (tricky - do more
experienced pythonistas know of any way to easily determine this? eg
pre-existing lists?)
- Modify your app starting point, the bit that wrangles your
WSGIApplication, so that it works like this:
http://code.google.com/appengine/docs/python/gettingstartedpython27/helloworld.html
and not like this:
http://code.google.com/appengine/docs/python/gettingstarted/usingwebapp.html
- Set up your app.yaml properly, as per:
http://code.google.com/appengine/docs/python/gettingstartedpython27/helloworld.html
- Update your SDK to 1.5.5 (or later) otherwise it'll refuse to upload.
I don't think the dev appserver will run your code concurrently yet,
but you can always set threadsafe: false for local development, then
change it before you upload.
On a related note, there is other stuff that you need to check to make
sure your app is ready for python 2.7, largely around newer versions
of libraries being used (eg: webob has changed). Check this page:
http://code.google.com/appengine/docs/python/python27/newin27.html
--
Emlyn
http://my.syyn.cc - Synchonise Google+, Facebook, WordPress and Google
Buzz posts,
comments and all.
http://point7.wordpress.com - My blog
Find me on Facebook and Buzz
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Thanks for reading the long email. Sorry, I should keep them shorter,
but I'm a natural blatherer.
I want to run some tests on the efficacy of using threadsafe:true.
Actually hitting real resources in those tests is a bit rude
(datastore might be ok, but urlfetch is a bit tough on the
target/victim).
If I use time.sleep() (eg: use frontend tasks that basically go
time.sleep(10)), is that going to block in a similar way to urlfetch
or db gets/puts, ie: in a way that'll let the instance process more
work?
Multi-threaded Python 2.7 WTFAQ?
http://appenginedevelopment.blogspot.com/2011/10/multi-threaded-python-27-wtfaq.html
--
Emlyn
http://my.syyn.cc - Synchonise Google+, Facebook, WordPress and Google
Buzz posts,
comments and all.
http://point7.wordpress.com - My blog
Find me on Facebook and Buzz
On 15 October 2011 13:00, Emlyn <emlyn...@gmail.com> wrote:
> On 15 October 2011 09:12, Ikai Lan (Google) <ika...@google.com> wrote:
>> Yep, your thinking here is correct! Be careful when using global memory as a
>> cache, though. Instances are capped at 128mb of memory, and if you exceed
>> that, your instance will be killed. This could lead to instance thrashing.
>> [On another note: congrats, you got me to read a long email ;).]
>> --
>> Ikai Lan
>> Developer Programs Engineer, Google App Engine
>> plus.ikailan.com | twitter.com/ikai
--
thanks Anand, good to know you guys are on the case. Brandon's comment
gave me hope so I looked into my code a little deeper. Here are a
couple of comments (I'm by no means a Python expert so I'm unsure
about a GIL details)
1) I still see the issue above, random delays between calls under
load. For example, a call to memcache takes a few ms on one request
but then takes 200-300ms on another. Under no load it always quick
2) access from the native appspot.com domain seems faster then a
custom one (I can't remember if this is an issue with the existing
runtime too)
3) on one of my larger tests involving existing code that was running
horribly in the threaded runtime , the main culprit turned out to be
the xml minidom. I'm assuming it's not threadsafe so I'll need to move
to lxml
4) I use a lot of static class methods, I'm not sure if the class def
is locked by the GIL (I had thought not)
Is this threadsafe with the GIL?
class Util():
@staticmethod
def DoSomething():
pass
5) I'm using webapp2, are there any know issues there?
6) Are async RPC calls necessary to get the threadsafe benefits (ex
async urlfetch)? or can we use the standard sync calls as well?