Thread.join() timeout param not respected

135 views
Skip to first unread message

Ben Simmons

unread,
Jun 29, 2015, 6:53:52 PM6/29/15
to gev...@googlegroups.com
First of all, let me say that I know it's silly to use Threads instead of Greenlets. This is purely an issue of legacy code.

I recently switched our gunicorn server to use gunicorn's GeventWorkers. I was under the impression that gevent patched the essential bits of the thread and threading python modules, and that it would be okay to leave in the Threads rather than to explicitly replace them with Greenlets.

There were a number of cases in our codebase where we directly instantiated multiple Thread() objects to wrap long-running service calls, so that they could be joined with an explicit timeout. The timeout is useful because we run on Heroku, which imposes a hard 30-second request limit; the timeout allows us a buffer to kill the service calls that probably won't return in time.

After switching to gevent workers, this timeout param is no longer respected. My question is, does monkey.patch_all() actually tie Thread.join() to the gevent hub? And if it does, is it expected that the timeout param would be ignored?


Jason Madden

unread,
Jun 29, 2015, 6:56:25 PM6/29/15
to gev...@googlegroups.com
Can you tell us what version of Python and what version of gevent you're using?

Ben Simmons

unread,
Jun 29, 2015, 7:15:34 PM6/29/15
to gev...@googlegroups.com
Sure thing:

* Python 2.7.9
* gevent 1.0.2
* greenlet 0.4.7
* gunicorn 19.3.0

I'm still investigating, so I will let you know if there is a different root cause. But so far it looks like it's an issue with the timeout param.

Jason Madden

unread,
Jun 30, 2015, 10:47:47 AM6/30/15
to gev...@googlegroups.com
If the system is monkey-patched soon enough, I would expect `Thread.join` to respect the `timeout` parameter. But expectations can be wrong, so lets try a quick test:

I begin by monkey-patching the system and checking the version:

$ python
Python 2.7.10 (default, May 25 2015, 13:06:17)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gevent, gevent.monkey; gevent.monkey.patch_all()
>>> gevent.version_info
(1, 0, 2, 'final', None)

Next, we'll create and start a thread that's going to sleep for awhile to give us a chance to join it:

>>> import threading, time
>>> def target():
...     print("in target")
...     time.sleep(20)
...     print("done target")
...
>>> t = threading.Thread(target=target)
>>> t.start(); t.join(timeout=1); print("timed out?", t.is_alive())
in target
timed out? True
>>> t.is_alive()
True

So above we saw that the timeout parameter worked as expected. First the new thread ran, then it went to sleep and the main thread ran, joined the new thread, but the join timed out and the new thread is still running.

>>> # 20 seconds pass
>>> t.is_alive()
True

Here, I simply left the command prompt sitting there for more than 20 seconds so the new thread should have finished its sleep call and returned to us. But it's still alive, and we don't have any printed output. Why is that? What happens if I join the new thread now?

>>> t.join(timeout=1)
done target

As soon as I join the new thread, it prints its last output (and the join returns "immediately"). This demonstrates that in a monkey-patched system, Threads are in fact greenlets under the covers, and they run cooperatively. Leaving the command prompt sitting there didn't give the other greenlet a chance to switch in, so nothing after its (cooperative) `sleep` call got a chance to run. Joining it again switched to it and let it run and finish.

But that suggests something. The same could arise with any blocking operations in any greenlet. And since `join` is itself cooperative, the timeout parameter won't work if what you're joining isn't cooperative. That may be what's happening to you. We can see this if we try the same scenario, but using the non-cooperative sleep function:

>>> def target():
...    print("in target", gevent.monkey.get_original('time', 'sleep'))
...    gevent.monkey.get_original('time', 'sleep')(20)
...    print("done target")
...
>>> t = threading.Thread(target=target)
>>> t.start(); t.join(timeout=1); print("timed out?", t.is_alive())
in target <built-in function sleep>
done target
timed out? False

Here, the `timeout` parameter appeared to malfunction. The new thread/greenlet ran to completion, taking 20 seconds, before the main thread got to print "timed out" and check the status of the thread. Using a non-cooperative function in the new thread/greenlet prevented any switching from happening, and so the main greenlet didn't get a chance to notice its timeout had elapsed.

Ben Simmons

unread,
Jun 30, 2015, 7:55:03 PM6/30/15
to gev...@googlegroups.com
Thank you, your response was very informative! It's given me a more intuitive understanding of how greenlets cooperatively yield to each other.

I ran some further tests, and it appears that this is an issue with the 'requests' library (as well as 'urllib2') making HTTPS calls from a gunicorn server. Apparently the greenletified threads do not yield under this scenario, whereas they yield when making HTTP calls from gunicorn or when making either HTTP/HTTPS calls from a python console.

At this point, I think it's safe to say that the issue is either with gunicorn or my configuration of it, so I will follow up with the gunicorn project. Do you have any ideas why this might be happening? I would assume it's something to do with how python's ssl module is monkey-patched under gunicorn, but their GeventWorker appears to call monkey.patch_all() without hardly any tweaks.

Results from python console:

>>> test(is_secure=False)
All threads joined after 5 seconds.
>>> test(is_secure=True)
All threads joined after 5 seconds.

Results from a gunicorn server:
(code slightly refactored to fit within the django framework)

>>> test(is_secure=False)
All threads joined after 5 seconds.
>>> test(is_secure=True)
All threads joined after 30 seconds, and the first join takes 30 seconds to complete. Timeout param not respected.

Example code:

import gevent, gevent.monkey; gevent.monkey.patch_all()
import datetime
import requests
import threading

def target(name="", is_secure=False):
    '''Make a http call that will take 30 seconds to complete'''
    print(name + ": in target")
    scheme = "https" if is_secure else "http"
    url = scheme + "://localhost:8000/sleep/30" # response takes 30 seconds
    print(name + ": getting " + url)
    response = requests.get(url)
    print(name + ": response: " + str(response))
    print(name + ": done target")

def calc_thread_timeout(start_time, seconds=5):
    '''Work around the additive nature of Thread.join() timeouts, to do an aggregate timeout'''
    elapsed = (datetime.datetime.now() - start_time).total_seconds()
    remaining = float(seconds) - elapsed
    return 0.0 if remaining < 0.0 else remaining

def test(is_secure=False):
    threads = []
    threads.append(threading.Thread(target=target, kwargs={"name": "t1", "is_secure": is_secure}))
    threads.append(threading.Thread(target=target, kwargs={"name": "t2", "is_secure": is_secure}))
    threads.append(threading.Thread(target=target, kwargs={"name": "t3", "is_secure": is_secure}))
    threads.append(threading.Thread(target=target, kwargs={"name": "t4", "is_secure": is_secure}))
    print("STARTING THREADS")
    for t in threads:
        t.start()
    print("JOINING THREADS")
    join_start = datetime.datetime.now()
    # it should take 5 seconds wall time for all joining to complete
    for t in threads:
        t.join(timeout=calc_thread_timeout(start_time=join_start, seconds=5))
    join_end = datetime.datetime.now()
    join_total_secs = (join_end - join_start).total_seconds()
    print("joined all threads after " + str(join_total_secs) + " seconds")

Jason Madden

unread,
Jun 30, 2015, 8:02:34 PM6/30/15
to gev...@googlegroups.com

> On Jun 30, 2015, at 18:55, Ben Simmons <b...@skilljar.com> wrote:
>
> I ran some further tests, and it appears that this is an issue with the 'requests' library (as well as 'urllib2') making HTTPS calls from a gunicorn server. Apparently the greenletified threads do not yield under this scenario, whereas they yield when making HTTP calls from gunicorn or when making either HTTP/HTTPS calls from a python console.
>
> At this point, I think it's safe to say that the issue is either with gunicorn or my configuration of it, so I will follow up with the gunicorn project. Do you have any ideas why this might be happening? I would assume it's something to do with how python's ssl module is monkey-patched under gunicorn, but their GeventWorker appears to call monkey.patch_all() without hardly any tweaks.

Is there any chance you are preloading your application in gunicorn (or somehow otherwise loading code before gunicorn forks workers)? If so, monkey patching is not being done soon enough---your application and imports are loaded in gunicorn's arbiter process, but only the forked worker processes get monkey patched. The `requests` library includes `urllib3`, which directly imports specific functions from the `ssl` module on import, meaning that it's too late to to monkey patch after `requests` has been imported---if requests gets imported in the arbiter, you're hosed That's the only thing I can think of.

Ben Simmons

unread,
Jun 30, 2015, 8:15:09 PM6/30/15
to gev...@googlegroups.com
AFAIK we're not loading any code before gunicorn forks workers. Our gunicorn config is pretty simple, and we're not using the preload option. We've overridden the config post_fork() to gevent-ify postgres, but that's about it.

Ben Simmons

unread,
Jul 1, 2015, 4:31:05 PM7/1/15
to gev...@googlegroups.com

Jason Madden

unread,
Jul 1, 2015, 4:43:04 PM7/1/15
to gev...@googlegroups.com

> On Jul 1, 2015, at 15:31, Ben Simmons <b...@skilljar.com> wrote:
>
> Opened an issue with gunicorn:
> https://github.com/benoitc/gunicorn/issues/1062
>
> On Tuesday, June 30, 2015 at 5:15:09 PM UTC-7, Ben Simmons wrote:
> AFAIK we're not loading any code before gunicorn forks workers. Our gunicorn config is pretty simple, and we're not using the preload option. We've overridden the config post_fork() to gevent-ify postgres, but that's about it.

We just found and fixed an "ssl is hanging" issue. It still comes down to an order of imports vs monkey-patching issue, but you might want to try the patch found here: https://github.com/gevent/gevent/commit/853b8b2cfc5869a48b6eede786db11f6c769bb66#diff-1b0b77850f218d6bdcb4b9c997ac7969

Ben Simmons

unread,
Jul 1, 2015, 6:41:41 PM7/1/15
to gev...@googlegroups.com
That fixes the issue! Thanks. Any idea when it will be officially released?

Installation steps for reference:

Reply all
Reply to author
Forward
0 new messages