gevent and thread safety

917 views
Skip to first unread message

John Byrne

unread,
Mar 15, 2019, 1:38:55 PM3/15/19
to gevent: coroutine-based Python network library
Hi, I have some code that I know is not posix-thread safe. I'm calling it in a celery task, and I have configured celery to use gevent concurrency. This particular section of code doesn't do any blocking IO, but I'm getting errors that look like my it's getting preempted, which suggests multiple OS threads.

After some investigation, I found that my celery worker typically has from 4 to 7 OS threads, and these threads appear to be getting created by gevent (via ThreadPool). It looks like there's a pool size of 10 by default. So I think this is why I'm having issues. I can fix this piece of code, but it raises a bigger question for me.

I've been reading up on everything I can find about this, and I found this github issue[1] which states"

the simplification that gevent offers is that, unlike threading.Thread, a switch to a different greenlet can only occur when specific API calls to gevent are made, instead of at any arbitrary time. So if you "know" that no such API call can happen during a compound operation, you can elide the locks

But if there can be up to 10 OS threads by default, doesn't this mean that I can't avoid locks in those situations? In other words, doesn't all code intended to be run with gevent need to be thread-safe, regardless of whether it contains IO statements?

Thanks!

Jason Madden

unread,
Mar 15, 2019, 1:41:34 PM3/15/19
to gev...@googlegroups.com
By default, gevent will create threads to handle DNS resolution in a cooperative fashion (invisibly to the caller). gevent will *never* run user code in a separate thread implicitly without being explicitly instructed to do so by direct usage of a thread pool. Unless your code or celery is doing something odd, those threads are not running your task code.

~Jason

John Byrne

unread,
Mar 16, 2019, 4:40:44 PM3/16/19
to gevent: coroutine-based Python network library
Thanks for clearing that up! I've taken a closer look, and I'm pretty sure there aren't any other OS threads being created by celery, so if it's not OS threads causing my issue, that makes me reconsider whether my code has anything that's causing a gevent switch. The only thing I can find in this piece of code that I know is monkey patched is the __import__ function. Can that cause a gevent switch? Either the in original or in the wrapper? I'm not sure how to tell for certain.

Jason Madden

unread,
Mar 16, 2019, 4:57:12 PM3/16/19
to gev...@googlegroups.com


> On Mar 16, 2019, at 14:50, John Byrne <jhn...@gmail.com> wrote:
>
> Thanks for clearing that up! I've taken a closer look, and I'm pretty sure there aren't any other OS threads being created by celery, so if it's not OS threads causing my issue, that makes me reconsider whether my code has anything that's causing a gevent switch. The only thing I can find in this piece of code that I know is monkey patched is the __import__ function. Can that cause a gevent switch? Either the in original or in the wrapper? I'm not sure how to tell for certain.

On Python 2 the global builtins `__import__` function is monkey-patched to use greenlet-aware locks. Those locks are per-module (I *think* this also happens under Python 3 but I'd have to check; it's not as explicit). So in principle its possible for a greenlet to block when attempting an import that another greenlet is *also* attempting and thus allow switching. But getting into that situation is unusual...the module being imported (which is running in the greenlet holding the lock) must have done something to allow switching to yet another greenlet in the first place for that other greenlet to block on the lock. (Yet, this has been known to happen in the real world, sometimes modules spawn greenlets and such during module import time. Issue https://github.com/gevent/gevent/issues/108 was where this was first reported.)

Jason

John Byrne

unread,
Mar 17, 2019, 10:12:05 AM3/17/19
to gevent: coroutine-based Python network library
Once again thanks for the quick response. I think "__import__" can't be my issue then, because if something is causing a switch in the first greenlet, that's enough to cause my error anyway.

I discovered that my code is creating a socket  - I'm importing requests, and somewhere down the stack, urllib3 creates a socket to test ipv6 support - it calls bind() and close(). The bind() function doesn't appear to be patched, and while close() patched, I can't tell if it causes a switch or not. From my tests it doesn't appear to. Is there one place I can insert a print statement that will tell me definitively if there's switching going on? Would hub.switch be called every time?

grady player

unread,
Mar 17, 2019, 10:22:03 AM3/17/19
to gev...@googlegroups.com
I don't think bind is a blocking call, I think it either succeeds or returns right away...
You could have something importing something before you monkeypatch, and that could be the problem.

We could also be going down the wrong path with counting threads... what kind of preemption are you getting? a signal? an exception? killed by oom killer?

--
You received this message because you are subscribed to the Google Groups "gevent: coroutine-based Python network library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gevent+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Madden

unread,
Mar 17, 2019, 10:39:39 AM3/17/19
to gev...@googlegroups.com


> On Mar 16, 2019, at 17:35, John Byrne <jhn...@gmail.com> wrote:
>
> Is there one place I can insert a print statement that will tell me definitively if there's switching going on? Would hub.switch be called every time?

`greenlet.settrace` is one central place that can know if switching is going on. https://greenlet.readthedocs.io/en/latest/#tracing-support

Matt Billenstein

unread,
Mar 17, 2019, 10:46:04 AM3/17/19
to gev...@googlegroups.com
On Sun, Mar 17, 2019 at 08:21:51AM -0600, grady player wrote:
> You could have something importing something before you monkeypatch, and
> that could be the problem.

I feel like there should be an "assert 'socket' not in sys.modules" right at
the top of gevent.monkey. This alone would fix most of the issues people have
using gevent... Like, if you want to monkey patch, it better be the first
thing you do in the interpreter.

m

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

John Byrne

unread,
Mar 17, 2019, 12:15:20 PM3/17/19
to gevent: coroutine-based Python network library
I don't think bind is a blocking call, I think it either succeeds or returns right away...
You could have something importing something before you monkeypatch, and that could be the problem.

We could also be going down the wrong path with counting threads... what kind of preemption are you getting? a signal? an exception? killed by oom killer?

What about close()? That's being called too.

I do actually have this before monkey patching:

from __future__ import (
    absolute_import
, division, print_function, unicode_literals,
)


Could that be an issue?

I am definitely considering that it might not be related to threads. It's hard to explain the issue I'm seeing without getting into a ton of detail, but basically I'm using a context manager that sets up a shared object whenever you enter a "with" block, and deletes it when the "with" block is finished. I'm maintaining a stack of context managers so that only the last greenlet to finish should delete the shared object, but I'm occasionally getting an exception that indicates it's been deleted prematurely. I have many celery tasks using this "with" block.

So far, the only way I've been able to reproduce the issue outside celery is by deliberately creating OS threads. But I've monitored celery with gdb and it doesn't seem to be creating any threads other than the gevent thread pool I mentioned in my first email. When I discovered that my context manager setup code actually does some socket stuff via the requests import, I was sure I had solved the mystery! But now it seems that although that socket close() is definitely monkey patched, it doesn't seem to be causing a switch. What I really need to know is if it *could* cause a switch.
Reply all
Reply to author
Forward
0 new messages