Blocking, long running function [async again]

1,041 views
Skip to first unread message

grundic

unread,
Feb 28, 2010, 10:39:31 AM2/28/10
to Tornado Web Server
Hello everyone!

I have a problem, that a encoutered two weeks ago - and still no
success :(
I've read all messages in Google groups, found several solutions. But
they are not applicable for me.

So, the problem: say, I would like to create host monitoring
application, using python and tornado, certainly.
Let it be just s simple ping function:

def ping(host):
...
some long executing/scanning mechanism
...

So, how could I run it asynchronously via "get" request? I've read,
that tornado is a single threaded application and I should't use
threads (maybe I'm wrong and missunderstuded smthing). I saw chat
example, but I don't understand it clearly :(
I found solution with executing some shell command async (http://
brianglass.wordpress.com/2009/11/29/asynchronous-shell-commands-with-
tornado/). So I can put my function in file ping.py and make a call:

....
self.pipe = p = os.popen('python ping.py')
self.ioloop.add_handler( p.fileno(),
self.async_callback(self.on_response), self.ioloop.READ )

--save results to file and read them back--
....

But it looks like a crap :(

I looked at PeriodicalCallback example, but it's also doesn't work for
me:


class RandHandler(tornado.web.RequestHandler):
@tornado.web.asynchronous
def get(self):
self.scheduler = tornado.ioloop.PeriodicCallback(self._check,
1000)
self.scheduler.start()

def _check(self):
# here bad things happen - sleep blocks the whole application.
The same will be on any long-running function :(
sleep(10)
self.scheduler.stop()
self.write("Hello from sleep request!")
self.finish()


So, all I want to do is call some long-running function, using
@tornado.web.asynchronous annotation in "get" RequestHandler. And then
be notified, when it's finished - to return data back.

For example, when i navigate my browser to http://localhost:8888/ - it
says "Hello!" - it's simple.
And when I go to http://localhost:8888/ping - it executes my function,
waits and after some loading time gives me result. Within loading
time, root page should not be blocked, of course.

Thanks in advance and sorry, if there are some mistakes...
Feel free to ask any questions, if my explanation is not very clear.

Douglas Stanley

unread,
Feb 28, 2010, 2:19:24 PM2/28/10
to python-...@googlegroups.com

I posted something to the list a few weeks ago.
Basically, you can use the process pool from the multiprocessing module to perform asynchronous actions in a separate process.

I made a modification to the hello world example and posted it as a gist on github. I'm not at a computer now, or I'd post the link again. If you search the archive for multiprocessing, you'll probably find it. Or if you're interested, I can repost the link when I get back.

Doug

Matthew Ferguson

unread,
Feb 28, 2010, 2:25:49 PM2/28/10
to python-...@googlegroups.com
Grundic, welcome to our Python-Tornado group.

It's a little confusing what exactly you are trying to accomplish, if perhaps you could try to make a list of the things you'd like to accomplish using Tornado, we'll have an easier time boiling down the issue and steering you in the proper direction.

Working with what you've given us so far:

When you add the @tornado.web.asynchronous decorator to a method, it doesn't block while your socket is idle, it simple goes into the IOLoop and is handled when ready. However, when you explicitly call time.sleep(), or any other "blocking" function, it blocks your application. Tornado applications are one-thread-per, and because of the GIL (Global Interpreter Lock, read up on it if you're not familiar with it), "true" concurrency within a Python application is a bit harder. Kqueue, epoll, poll, and select enable Tornado's magic.

How exactly is your ping function going to work? If all you're trying to do is resolve a URL, Tornado includes an asynchronous HTTPClient which takes care of the dirty work for you. See: http://github.com/facebook/tornado/blob/master/tornado/httpclient.py#L78.

grundic

unread,
Mar 1, 2010, 12:36:52 PM3/1/10
to Tornado Web Server
Thanks a lot, Douglas!
Yep, I've found your example code - here it is, if someone is
interested - http://gist.github.com/312676.

Matthew, after your answer I begun to think, what actually do I
want? :)
So, the problem: I have several servers, which I would like to monitor
and some processes, running on them.
For now, I've got a multi-threaded application, which checks, that
host is alive (got response from ping command), connects to it via
SSH, checks, that process is running and do some other stuff. Then i
generate plain html page, using result from my scanning. Page is saved
and I can view it. My script is running via cron.

Now I would like to upgrade my monitoring system - start using Tornado
and AJAX with long-polling.
So, the client navigates to my URL, javascript (jquery or prototype
lib) connects to my server and waits for results. In a while my
program runs long running scan of all my servers. So, two or more
clients can wait, until scan is complete. After that, all clients got
response from my scanner, that job is complete - with results of that
job (via XML or json).

So, to start, I would like just to implement simple host-pinger. Can
you give me advise, what is better solution?
I suggest, that there is some repeatedly executed method, that pings
hosts from list. But how to connect this method with Tornado's "get"
and how to push result back to client?

Sorry, my explanation very confusing, but I hope, I let you know, what
I want :)
Any help is appreciated!

> > For example, when i navigate my browser tohttp://localhost:8888/- it


> > says "Hello!" - it's simple.

> > And when I go tohttp://localhost:8888/ping- it executes my function,

Douglas Stanley

unread,
Mar 2, 2010, 11:37:17 AM3/2/10
to python-...@googlegroups.com
I'd still look into using ProcessPool from the mulitprocessing module
and specifically, the apply_async method
(I think that's what it's called, it's whatever I used in my example).

So you create all of your scanning logic in your separate module, and
expose it via a function or encapsulate it
in a class. Then in your tornado code, you simply create an async
handler (like the one in my example), and
have that async request handler call the apply_async method on your
process pool (also like I did in my example),
and pass your scanning function/method as the first argument to
apply_async. Then have apply_async run whatever
callback finishes your tornado request and send the json result to the client.

I honestly think that's the cleanest method. I still haven't seen
anyone on the list talk about the multiprocessing
module and whether or not it's a good fit for tornado. I'd really like
to hear some of the developers give their 2 cents.
So maybe there's some nasty thing where the multiprocessing module is
a bad idea, but as far as I can tell, it's
great.

Can anyone else comment on it? I'm also thinking it'd be a decent way
to call blocking db servers...but again,
I could be wrong and perhaps it's horribly inefficient, or maybe the
universe will implode if you do it...

Doug

--
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Arek Bochinski

unread,
Mar 2, 2010, 2:58:27 PM3/2/10
to python-...@googlegroups.com
I think Multiprocessing adds a very clean approach.
I added some code to Douglas' multiprocessing example make it a simple 'pinger'.

Python code: http://pastebin.com/isCqpupZ
Template: http://pastebin.com/p6vcmkun

There are caveats to spawning sub-processes instead of threads, but each will in the end.
One UP for this approach, is the bonus of bypassing the global interpreter lock.

@grundic, take a look at the code and see if that's close to your idea.

-Arek

grundic

unread,
Mar 7, 2010, 11:04:04 AM3/7/10
to Tornado Web Server
Hello, again!

I've completed my task - it works now as I wanted.
Everyone can have a look at result here: http://github.com/grundic/thePinger

Thanks a lot everybody for help =)

Hari

unread,
Apr 14, 2010, 3:40:47 PM4/14/10
to Tornado Web Server
Douglas, I started looking at tornado's async functionality myself and
came across your post. I am wondering why multiprocessing is
recommended over threading, unless there is a specific need that is
met by spawning a new process (e.g., I had to resort to using
multiprocessing when calling into R, as R is not thread-safe). One
specific reason I prefer threading (other than being lighter in
weight) over multiprocessing is for the handling of exceptions.
Multiprocessing makes it harder to propagate exceptions and impossible
to preserve some exception handling semantics. In this regard,
wouldn't a simple worker-thread solution such as the below be good
enough?

class AsyncRequestHelperThread(Thread):
def __init__(self, callback, action, *args):
self.callback = callback
self.action = action
self.args = args
super(AsyncRequestHelperThread, self).__init__()

def run(self):
try:
result = self.action(*self.args)
except:
result = sys.exc_info()

self.callback(result)

Just spawn a new thread and finish() the result in callback method (or
handle the exception). The callback needs to check the type of result
to handle it differently in case of an exception (by using some
expression such as: "result and isinstance(result, tuple) and
len(result) == 3 and isinstance(result[0], type)").

-- Hari

On Mar 1, 10:36 am, grundic <grun...@gmail.com> wrote:
> Thanks a lot, Douglas!
> Yep, I've found your example code - here it is, if someone is
> interested -http://gist.github.com/312676.
>

[snip]

Douglas Stanley

unread,
Apr 14, 2010, 4:24:02 PM4/14/10
to python-...@googlegroups.com
No, you're correct. Normally threading is the better option. However,
due to how the GIL works, if one of your threads blocks, it could
block incoming connections, sort of defeating the purpose of making
your project async.

Atleast that's what I've read...

Doug

--

Claudio Freire

unread,
Apr 14, 2010, 4:29:23 PM4/14/10
to python-...@googlegroups.com
On Wed, Apr 14, 2010 at 5:24 PM, Douglas Stanley <douglas....@gmail.com> wrote:
No, you're correct. Normally threading is the better option. However,
due to how the GIL works, if one of your threads blocks, it could
block incoming connections, sort of defeating the purpose of making
your project async.

Atleast that's what I've read...

That kind of GIL blocking is rare and a bug in python.
Usually, any kind of blocking C-level API call has to release the GIL.
All of python's standard I/O libraries do that, although I remember there were a few obscure exceptions.

In any case, it's not the norm.

Claudio Freire

unread,
Apr 14, 2010, 4:32:12 PM4/14/10
to python-...@googlegroups.com

Sorry for the double posting... what CAN happen is a long-running opcode within one of the threads blocking your  main thread. That is a common source of priority inversion issues in python threading.

Ie: filter(bool, huge_list_of_strings) will block for the duration of the entire operation, with the GIL acquired.

So threading isn't the best option if you absolutely need responsive threads. Doing I/O in threads is OK.


Hari

unread,
Apr 14, 2010, 4:40:35 PM4/14/10
to Tornado Web Server
That is interesting, I will have to find more information on this.
Thank you.

On Apr 14, 1:32 pm, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 5:29 PM, Claudio Freire <klaussfre...@gmail.com>wrote:
[snip]

Reply all
Reply to author
Forward
0 new messages