LocalScheduler w/ Postgres database (on a separate machine).
The processes are definitely still running. In fact, they continue writing to their task logfile.
I think I might have found some candidates for the problem, however... after getting airflow working in an environment under my own user, I redeployed it under root. However, "ps aux | grep airflow" reveals a bunch of scheduler processes (the workers?) running under the "old" environment. So perhaps I have an extra scheduler running and I'm getting something weird happening in the database?
I've been using supervisor to start and stop the scheduler and webserver and noticed that stopping them through supervisor doesn't always kill the gunicorn processes or scheduler's workers. Perhaps this is causing my problem?
Another candidate: here is ps aux for one of my "killed as zombie" tasks, supposedly killed ~21 hours ago:
root 31110 0.0 0.0 1015216 39224 ? S Jan05 3:06 /usr/local/anaconda/envs/airflow/bin/python /usr/local/anaconda/envs/airflow/bin/airflow run Pen03_Rgt_AP2350_ML1400__Site04_Z2023__B957_cat_P03_S04_Epc07-14 phy_spikesort 2016-01-01T16:27:23.698899 --local --pool phy -sd DAGS_FOLDER/jk.py
root 31138 0.0 0.0 1013704 2296 ? S Jan05 0:15 /usr/local/anaconda/envs/airflow/bin/python /usr/local/anaconda/envs/airflow/bin/airflow run Pen03_Rgt_AP2350_ML1400__Site04_Z2023__B957_cat_P03_S04_Epc07-14 phy_spikesort 2016-01-01T16:27:23.698899 --job_id 688 --pool phy --raw -sd DAGS_FOLDER/jk.py
There seem to be 2 "airflow run" processes running, each with slightly different arguments ("--local" vs "--job_id 688 ... --raw"). Is this normal or suspicious?
I'm going to try shutting down, killing anything with "airflow", and restarting.
Justin