We’re currently running v0.33 (from EPEL) using:
/usr/sbin/gearmand -d —worker-wakeup=10 -R —log-file=/var/log/gearmand/gearmand.log
I’ve also compiled V18.104.22.168 from source (on CentOS 7) and tried it with:
/usr/local/sbin/gearmand —daemon -l /var/log/gearmand/gearmand.log —worker-wakeup=10 -R —verbose=WARNING
But 22.214.171.124 hangs much sooner than 0.33. V0.33 can last for days before it hangs. V126.96.36.199 hangs within a day. It lasted approx 9 hours yesterday before stoppiing (daemon stays running, but gearadmin —status and gearman_top can’t connect, and Naemon+mod_gearman complains of no workers for any queue.) Killing gearmand and restarting it resolves the issue.
The logfile doesn’t seem to include any clearly useful messages. I tried running with —verbose=DEBUG but the logfile filled up the disk within a few hours. Last night’s hang was preceeded by just two error messages:
ERROR 2021-06-26 10:07:18.000000 [ main ] write(Bad file descriptor) -> libgearman-server/gearmand_thread.cc:213. Line 213 of that file isn’t even a write statement, so I’m confused what went wrong.
Yes, I know it’s currently July and the date in the message says June, but that date is what appeared last night just after 10pm EDT. The system clock is correct.