I have a theory now - let's see if I'm correct...
A task may be marked as pending indefinitely in the following scenario:
- Task runs once and then fails. Maybe even twice, and fails twice. etc.
- The worker never reschedules the task again. Or it did reschedule but it failed again.
- In
scheduler.py line 145 a failed task is marked as pending if retry_delay > 0 (the default is 900s) and it's retry time is still in the future.
So the result is that the task remains pending - regardless of whether it has dependencies that are now complete or does not have dependencies at all.
Is this correct?
What's gotten me confused is that I assumed there's automated retry. I've found the parameter retry_delay which belongs to the server so I assumed this somehow magically also takes care worker retries.
What I now realize (hope it's correct) is that the retry mechanism has two sides. One is the server side which means "The server *allows* retry only after retry_delay time had passed" but this isn't enough - the worker also needs to *want to retry* by polling the server until it's allowed to run the task again.
Is this correct?
What's gotten me confused even more (and that's besides the point) is that in my case I did have retries, by cron every 1/2h, but they were limited to up-to 2h from the initial run, so if a task failed and more than 2 hours had passed since it's first run, there will not be a retry. That's why retries worked - in most cases (by cron) but failed in others (where > 2h passed)
If my conclusion is correct then the right thing for me to do is keep retrying up to 24h (business logic).
What I am missing is a max_retry parameter. Is there such? I don't want my tasks to continue trying to run and fail for more than, say, 3 times. If I just stop trying (from the worker) then the failed task would seem pending and that's confusing. I want it to be failed and not pending. Maybe I could get that by setting retry_delay = 0, but is there a say to combine retry_delay > 0 and max_attempts?