How to gracefully recover from "too many open files" fatal exception.

686 views
Skip to first unread message

Mike Hall

unread,
Dec 23, 2015, 3:07:20 AM12/23/15
to Sidekiq
I've been digging in to figure out what is causing me to start receiving `too many open files` errors while running even 1 concurrent worker on 1 process of Sidekiq. I've even stripped out all other Redis calls and have bumped my ulimit directly in the redis-server start up.... but that's another problem.

I want to be able to gracefully recover from the fatal job crashes at the very least. I'm running Sidekiq 4.0.1 on Ruby 2.2.4 in Rails 4.2.5. The jobs are failing with the following error.

Too many open files @ rb_sysopen - /home/tsr/klobomedia/datastore/simplified/679311506442485760.json",

"/home/tsr/apps/twitter-charts/releases/20151223070938/app/jobs/loader/app_job.rb:15:in `read'",
"/home/tsr/apps/twitter-charts/releases/20151223070938/app/jobs/loader/app_job.rb:15:in `perform'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:150:in `execute_job'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:132:in `block (2 levels) in process'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:127:in `block in invoke'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/newrelic_rpm-3.14.1.311/lib/new_relic/agent/instrumentation/sidekiq.rb:33:in `block in call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/newrelic_rpm-3.14.1.311/lib/new_relic/agent/instrumentation/controller_instrumentation.rb:362:in `perform_action_with_newrelic_trace'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/newrelic_rpm-3.14.1.311/lib/new_relic/agent/instrumentation/sidekiq.rb:29:in `call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/server/active_record.rb:6:in `call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/server/retry_jobs.rb:74:in `call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/server/logging.rb:11:in `block in call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/logging.rb:30:in `with_context'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/server/logging.rb:7:in `call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:132:in `call'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/middleware/chain.rb:132:in `invoke'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:127:in `block in process'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:166:in `stats'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:126:in `process'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:79:in `process_one'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/processor.rb:67:in `run'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/util.rb:16:in `watchdog'",
"/home/tsr/apps/twitter-charts/shared/bundle/ruby/2.2.0/gems/sidekiq-4.0.1/lib/sidekiq/util.rb:24:in `block in safe_thread'"

my job looks like ...

module Loader
  class AppJob
    include Sidekiq::Worker
    sidekiq_options queue: :loader, retry: true, backtrace: true
    
    def perform
      # stuff
     rescue => ex
       # log
      ensure
        # cleanup
      end
   end
end

I'd like to gracefully handle the failure but it appears that the job never even gets loaded for the ensure block. The job just completely dies before it even starts. Any advice will be greatly appreciated.

Thanks,
Mike


Mike Perham

unread,
Dec 23, 2015, 11:45:13 AM12/23/15
to sid...@googlegroups.com
Use lsof to see what files your process has open.  Use sysconf to increase the allowed number of open files.

--
You received this message because you are subscribed to the Google Groups "Sidekiq" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sidekiq+u...@googlegroups.com.
To post to this group, send email to sid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sidekiq/b39682b0-8a07-45dc-8ca5-9c36c9e93ebc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Mike Perham – CEO, Contributed Systems
Smart, effective open source infrastructure for your apps.
Reply all
Reply to author
Forward
0 new messages