Celery scheduled tasks sometimes not being picked up by worker

Skip to first unread message

Paul Walsh

May 3, 2013, 7:25:54 AM5/3/13
to celery...@googlegroups.com
Amongst other things, I use Celery on several Python web apps I maintain to run scheduled data backups, using crontab.

I sometimes have a few web apps on a single server, with each app in a virtualenv, and each virtualenv has its own celery app too.

Recently, for no reason I can figure out, one django app started dropping some scheduled tasks.

I set up a logger, and log the entry into a task function, and then just before the exit. With this, I see that celery beat always sends the task, but sometimes, the worker just doesn't pick it up.

I can see no pattern for when the worker does not pick up. Out of my two tasks, backup_to_json and replicate_json_backup, both can fail to execute, not just one or the other.

On this Django app, there are two scheduled tasks: (i) data backup at 15 minutes after every hour, and (ii) emailing that same data backup at 30 minutes after every hour. 

The tasks:

def backup_to_json():
    logger.info('Inside backup_to_json: pre-command')
    # create this_file somewhere
    with gzip.open(this_file, 'w') as f:
    logger.info('Inside backup_to_json: post-command')

def replicate_json_backup():
    logger.info('Inside replicate_json_backup: pre-command')
    # get the file created at 15 past the hour
    # email the file as an attachment to someone
    logger.info('Inside replicate_json_backup: post-command') 

The settings:

from celery.schedules import crontab
import djcelery
BROKER_URL = 'redis://'
    'backup_to_json': {
        'task': 'tasks.backup_to_json',
        'schedule': crontab(hour='*/1', minute=15),
    'replicate_json_backup': {
        'task': 'tasks.replicate_json_backup',
        'schedule': crontab(hour='*/1', minute=30),

Recent celery log where you can see at 10:15, the backup_to_json task was sent, but then no worker picked it up (there are no log records for the function being executed):

[2013-05-03 09:15:13,529: INFO/MainProcess] Inside backup_to_json: post-command
[2013-05-03 09:15:13,535: INFO/MainProcess] Task tasks.backup_to_json[654b1498-ee4e-4338-98f9-25a77270995e] succeeded in 13.4751849174s: None
[2013-05-03 09:30:00,002: INFO/Beat] Scheduler: Sending due task replicate_json_backup (tasks.replicate_json_backup)
[2013-05-03 09:30:00,004: INFO/MainProcess] Got task from broker: tasks.replicate_json_backup[7540be3b-b468-44ab-b956-a60b6860d0cd]
[2013-05-03 09:30:00,030: INFO/MainProcess] Inside replicate_json_backup: pre-command
[2013-05-03 09:30:04,416: INFO/MainProcess] Inside replicate_json_backup: post-command
[2013-05-03 09:30:04,634: INFO/MainProcess] Task tasks.replicate_json_backup[7540be3b-b468-44ab-b956-a60b6860d0cd] succeeded in 4.60526013374s: None
[2013-05-03 10:00:00,028: INFO/Beat] Scheduler: Sending due task rebuild_index (tasks.rebuild_index)
[2013-05-03 10:00:00,031: INFO/MainProcess] Got task from broker: tasks.rebuild_index[216778a7-7fb7-4337-bfe8-a3e2742e4444]
[2013-05-03 10:00:00,059: INFO/MainProcess] Inside rebuild_index: pre-command
[2013-05-03 10:00:24,404: INFO/MainProcess] Inside rebuild_index: post-command
[2013-05-03 10:00:24,584: INFO/MainProcess] Task tasks.rebuild_index[216778a7-7fb7-4337-bfe8-a3e2742e4444] succeeded in 24.5246071815s: None
[2013-05-03 10:15:00,101: INFO/Beat] Scheduler: Sending due task backup_to_json (tasks.backup_to_json)
[2013-05-03 10:30:00,009: INFO/Beat] Scheduler: Sending due task replicate_json_backup (tasks.replicate_json_backup)
[2013-05-03 10:30:00,012: INFO/MainProcess] Got task from broker: tasks.replicate_json_backup[1899991f-69f2-4496-9cf0-1f5c9a9c1f6f]
[2013-05-03 10:30:00,026: INFO/MainProcess] Inside replicate_json_backup: pre-command
[2013-05-03 10:30:02,900: INFO/MainProcess] Inside replicate_json_backup: post-command
[2013-05-03 10:30:02,919: INFO/MainProcess] Task tasks.replicate_json_backup[1899991f-69f2-4496-9cf0-1f5c9a9c1f6f] succeeded in 2.89332604408s: None

Any ideas?

Ask Solem

May 3, 2013, 8:47:59 AM5/3/13
to celery...@googlegroups.com
Sounds like you have a broken worker process still running,
make sure you have stopped all other workers (and use --pidfile)

Ask Solem

Paul Walsh

May 3, 2013, 9:41:06 AM5/3/13
to celery...@googlegroups.com
Ask, thanks for the help. You are right that I don't have any pid config on any of my celery apps. They are all daemonized via Upstart.

And I do have a single zombie process on my system from around the same time these problems started, but I haven't been able to identify it, and therefore kill it.

I'll try with the pidfile, and I'll do more work to hunt down the zombie.

I am managing celery (and gunicorn) via upstart configs. Obviously each virtualenv has its own celery and gunicorn confs, so per virtualenv, I do this:

sudo service myapp start|stop|restart
sudo service myapp_scheduler start|stop|restart

You received this message because you are subscribed to a topic in the Google Groups "celery-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/celery-users/M-BgQynCvO8/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to celery-users...@googlegroups.com.
To post to this group, send email to celery...@googlegroups.com.
Visit this group at http://groups.google.com/group/celery-users?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Walsh

Reply all
Reply to author
0 new messages