Creating a "Honeypot" sidekiq server... Options?

15 views

Skip to first unread message

Dave Collins

unread,

Jun 30, 2017, 6:02:08 PM6/30/17

to Sidekiq

We run a varying number of sidekiq Pro instances on EC2. They scale up and down throughout the day but 4-6 is a typical number of instances

Each is set to run concurrency: 30 in config/sidekiq.yml. We use super_fetch and reliable_scheduler.

Over the past years all seems to hum along nicely until one day when BAM sidekiq consumes huge amounts of RAM until they are killed by OOMKILLER.

Naturally, like all good problems, this never occurs in QA or staging environments - only in production. Usually on a Friday.

In this high volume scenario it has proven to be extremely difficult for us to determine the root cause - which worker was consuming the ram. We've added tons of logging but still have issues determining the class that was gobbling the ram.

One idea we've had is to make a "honeypot" sidekiq instance that would run only one job at a time. This way, if a worker on that honeypot ever happens to run away with ram and get killed, we'll know with 100% certainty which class was the culprit.

Has anyone ever tried this on AWS? Good ways to set this up? Other recommendations?

Thanks,
Dave

Dave Collins

unread,

Jul 6, 2017, 6:07:32 PM7/6/17

to Sidekiq

Just wanted to provide an update:

We were able to use this concept to manually tweak an EC2 instance to run just one thread. Using this technique we were able to identify a job that, in production in certain edge cases, would attempt a large runaway query on an unindexed table.

Also I would recommend people check out Sidekiq Middleware (https://github.com/mperham/sidekiq/wiki/Middleware) as this makes it very easy to wrap functionality around your job processing.