Currently, PDB failures get enqueued into the "scheduler" queue and get resubmitted to the regular queue after some amount of time has passed. Instead, the flow should be as follows
-
Upon failure, the message is retried in-memory 10* times
-
After 10* failures, the message progresses to 2nd level failure
-
Message is retried 10* times for each 2nd level failure
-
After 5* 2nd level failures, progresses to 3rd level
-
3rd level, the message is discarded, i.e. DLQ
Any number that has a '*' by it means it should be configurable. For now, second level can be the normal failure case (i.e. being delayed and reenqueued). Once the threadpools get in place, we will change this to scheduling a thread in memory to retry the message. To do these we need the new threadpool in place (separate ticket) then work to change this 2nd level behavior (separate ticket).
|