TLDR; I have a gevent program that hangs. Neither the "monitor thread" nor `gevent.util.print_run_info` is shedding light on why or where—but probably because I'm not really understanding the output. I need strategies for finding this hang!
This is the project that's troubling me https://www.latimes.com/entertainment-arts/story/2019-09-06/operate-on-a-puppet-in-dr-botchers-minute-medical-school
It's pretty unusual!
It doesn't spawn 1000's of greenlets to handle web requests! Rather there's just a handful of greenlets, and those greenlets mainly block waiting for messages from redis pubsub subscriptions, `sleep`ing and `join`ing on each other. All the I/O is through a monkey-patched connection to a redis server.
The entire gevent loop will occasionally freeze up, requiring a restart of the python process. When this happens, every greenlet stops working and nothing helpful is printed to the console. In this state however I have once seen a `Timeout` exception be raised and printed to the console.
I'm basically looking for strategies to find this bug. I've a long-time user of gevent. I know how to write greenlets that cooperate. But I'm completely unsophisticated about gevent's internals. Debugging this issue is the first time I've encountered a "hub."
My code kills greenlets. I might be falling victim to this caution from the `Greenlet.kill` documentation:
> Use care when killing greenlets. If the code executing is not exception safe (e.g., makes proper use of `finally`) then an unexpected exception could result in corrupted state. Using a `link()` or `rawlink()` (cheaper) may be a safer way to clean up resources.
I've hooked a SIGUSR1 handler to `gevent.util.print_run_info` and I've gone through the output line by line, but all the greenlets "look like" they're doing reasonable things (like, they're inside blocking I/O calls, joining, sleeping, or doing other ordinary things). But honestly I don't know what I'm looking for. Here's a copy of the output from one crash run: https://www.dropbox.com/s/gmeqj9s6ad9r37y/gevent_log.txt?dl=0
I tried enabling the monitor thread early on in my code
from gevent import config
config.monitor_thread = True
But it produces no output...
So I'm stumped!
Some extra info:
This project runs in python 3.10 on Ubuntu 22.04 x86_64 in Docker. I'm using whatever the latest gevent is. I'm not pegging my versions, though I might start doing that. I'm connecting to a redis server also in docker. Monkey-patching happens first thing. I have quite a few systems with this same configuration that all run like a champ.