Is it possible to delay SuperFetch#cleanup_the_dead?

250 views
Skip to first unread message

Roberto Schneiders

unread,
Aug 11, 2022, 4:23:59 PM8/11/22
to Sidekiq
Hey, 

We are using Sidekiq Enterprise but we don't have any of the reliability features turned on just yet. We are planning to start with super_fetch! since we have been losing some jobs due to ECS killing the Sidekiq containers. 

During testing, I noticed that if Sidekiq is killed and then restarted right away the jobs are not being recovered due to the job TTL being 60 seconds. If Sidekiq is restarted 60s+ after the process was killed, everything works as expected. 

1. am I missing something?
2. is it possible to delay the execution of recovery methods so all the jobs that were killed be "dead" (TTL expired) and eligible for recovery?  


Mike Perham

unread,
Aug 11, 2022, 4:52:14 PM8/11/22
to sid...@googlegroups.com
Hey Roberto, it's not obvious but this would be a bad idea. You lose protection from poison pill jobs; a killer job will lead to a constant cycle of death in your Sidekiq processes. Make sense?

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

--
You received this message because you are subscribed to the Google Groups "Sidekiq" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sidekiq+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sidekiq/d0a571b9-d9d8-4f29-9581-1249c8bfef73n%40googlegroups.com.


--
Mike Perham – CEO, Contributed Systems
Smart, effective open source infrastructure for your apps.

Roberto Schneiders

unread,
Aug 12, 2022, 10:44:35 AM8/12/22
to sid...@googlegroups.com
Hey Mike, makes sense although for this specific test, delaying the "cleanup_the_dead" didn't cause issues with poison pills (they get killed after 3 retries as expected). I imagine there may be cases where that will cause issues. Would it be any better if we try to recover dead jobs at startup and again in 5 minutes or so? 

have you seen this before? during these tests, sometimes I get 3 "reliable" labels in the dashboard (I only have one process running):
image.png

When this happens, some messages appear 3 times in the logs (timestamp and tid are different), for example:
"timestamp":"2022-08-12T10:28:13.354869-04:00","message":"SuperFetch activated","hostname":"cf58f498a514","sidekiq":{"tid":"1t2c"}
"timestamp":"2022-08-12T10:28:13.354981-04:00","message":"SuperFetch activated","hostname":"cf58f498a514","sidekiq":{"tid":"1t2o"}
"timestamp":"2022-08-12T10:28:13.355202-04:00","message":"SuperFetch activated","hostname":"cf58f498a514","sidekiq":{"tid":"1t0c"}

I can see the "cleanup_the_dead" being executed 3 times in such cases.


You received this message because you are subscribed to a topic in the Google Groups "Sidekiq" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sidekiq/-Pesb0cIEpE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sidekiq+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sidekiq/CAPjBv7r4rKazNbGdVZyS262uA%2BxiWP_PKCX%2B3%3DFBeWmapOjtxw%40mail.gmail.com.

Mike Perham

unread,
Aug 12, 2022, 11:20:04 AM8/12/22
to sid...@googlegroups.com
If your process is crashing and creating orphaned jobs, that's a bug for you to fix. I don't think quicker job recovery is not the priority.

As for the log in triplicate, you'll need to supply version numbers. I don't see how that is possible in the latest version unless you are calling `config.super_fetch!` multiple times.

Roberto Schneiders

unread,
Aug 12, 2022, 11:37:44 AM8/12/22
to sid...@googlegroups.com
ok. I checked and `config.super_fetch!` was being called multiple times, thank you.

Reply all
Reply to author
Forward
0 new messages