reloading imported modules

389 views
Skip to first unread message

Mher Movsisyan

unread,
Jun 4, 2011, 5:04:51 AM6/4/11
to celery-d...@googlegroups.com
Currently celery doesn't have a built-in support for dynamically adding a module/function to running workers.

There is an issue about this https://github.com/ask/celery/issues/300 and a discussion in celery-users group https://groups.google.com/group/celery-users/browse_thread/thread/4eddcc433253a991

Adding a new field to the message body, as suggested in the celery-users discussion, will partially solve the issue letting dynamically add modules but not reload existing ones. Separate update requests can also bring to inconsistency among workers.

I think a special built-in broadcast command is a better solution.

from celery.task.control import broadcast
broadcast("reload", arguments={"CELERY_IMPORTS": ("tasks", "aditional_module")}

Mher

Ask Solem

unread,
Jun 10, 2011, 8:58:24 AM6/10/11
to celery-d...@googlegroups.com


Sorry for the late reply, just moved to London and have lots of unanswered mail :(


This sounds good, but there may be some issues,

How do you intent to reload modules? It is very prone to errors,
e.g. see http://stackoverflow.com/questions/437589/how-do-i-unload-reload-a-python-module

Also remote control commands are handled by the main celeryd process,
not the workers, there is no way to direct a function target to a specific
pool worker, so the pool worker that receives it is random.

I tried to update all pool workers using code like the following once:

import __builtin__
import os
from celery import current_app
from celery.worker.control.registry import Panel


def reload_in_pool_worker(modules):
for module in modules:
__builtin__.reload(module)
return os.getpid()

@Panel.register
def reload(panel, CELERY_IMPORTS=()):
conf = current_app.conf
pool = panel.consumer.pool
m = conf.CELERY_IMPORTS = tuple(set(CELERY_IMPORTS) |
set(conf.CELERY_IMPORTS))
pids = set(pool.info["processes"])
pids_reloaded = set()
while pids ^ pids_reloaded:
pids_reloaded.add(pool.apply_async(reload_in_pool_worker,
(m, )).get()


But this didn't work very well, it could take a very long time
until all of the processes got the job... :(

--
Ask Solem
twitter.com/asksol | +44 (0)7713357179

Mher Movsisyan

unread,
Jun 10, 2011, 11:10:28 AM6/10/11
to celery-d...@googlegroups.com

On Jun 10, 2011, at 5:58 PM, Ask Solem wrote:

> This sounds good, but there may be some issues,
>
> How do you intent to reload modules? It is very prone to errors,
> e.g. see http://stackoverflow.com/questions/437589/how-do-i-unload-reload-a-python-module
>
> Also remote control commands are handled by the main celeryd process,
> not the workers, there is no way to direct a function target to a specific
> pool worker, so the pool worker that receives it is random.

The main celeryd process will restart all worker processes when receives a reloading command. The actual restart of worker will take place after completing the current task.

In some cases reloading of all workers can take a long time but I think it is acceptable. The reloading command can be asynchronous. The main requirement is that all tasks submitted after initiation of reloading should be processed by reloaded workers.

Mher

Ask Solem

unread,
Jun 10, 2011, 5:44:54 PM6/10/11
to celery-d...@googlegroups.com

Ah, this may be doable, but remember that fork copies the current process memory, so ensure that the modules imported in the parent and "copied" to the new process is not outdated

Mher Movsisyan

unread,
Jun 17, 2011, 11:09:31 AM6/17/11
to celery-d...@googlegroups.com
This patch implements dynamic module loading:

https://github.com/mher/celery/commit/97369467b0706510ea1da6d4280ea2c8869e23b3

1) For importing new modules use:

broadcast("pool_restart", arguments={"imports":["newmodule"]})

2) For reloading all modules in CELERY_IMPORTS use:

broadcast("pool_restart", arguments={"reload_modules":True})

3) For reloading some of the modules mentioned in CELERY_IMPORTS use:

broadcast("pool_restart", arguments={"imports":["importedmodule"]})

The patch uses python's standard reload method for reloading previously imported modules. As documented in the standard library reload method has its restrictions [1]. Therefore reloading should be used only for "safe" modules.

By default the pool_restart command doesn't reload modules. It allows to explicitly mention modules requiring reloading. This approach helps to avoid reloading complex library modules. At the same time, it is possible to implement a custom command similar to pool_restart which will use a custom reloading method [2,3] and restart the pool.

[1] http://docs.python.org/library/functions.html#reload
[2] http://pyunit.sourceforge.net/notes/reloading.html
[3] http://www.indelible.org/ink/python-reloading/

Mher

Reply all
Reply to author
Forward
0 new messages