Hi all,
with the latest version of the
scheduler I started playing a little bit towards a script that autoscales workers. I'd need a bit of a brainstorming from all of you ^_^ .
As you can see, I added a new column with "workers statistics" to the scheduler_worker table. This can be useful (as any metric) to realize a little bit better the worker's state, without going through all the logs. As of right now I managed to pull out metrics without any additional queries, so it's a good bonus given that's "free of (additional) charge".
I realize, now and then, that many users start a boatload of workers just to accomodate the occasional spike of tasks. Until now, I managed to do it though the "max_empty_runs" parameter, given that I knew when such spikes hit during the day/week/month (and I didn't have strict deadlines for results). So, I scheduled (on system's cron facilities) a batch of "auto-expiring" workers that eventually die out when there are no more tasks to process.
Other task processors let you start a process that - once met certain criteria - start/stop workers. Until now, at least I think, users in the need of that should have turned to third-party process monitors (circus, supervisord, etc) to manage workers.
I was thinking towards letting people define their own "start/stop" criteria that gets checked by an external process, with a simple function (fixed name) that returns a True/False.
Something like this (in models):
def myautoscale():
sw = db.scheduler_worker
ticker = db(sw.is_ticker == True).select(sw.worker_stats).first()
if not ticker:
return True
if ticker:
return ticker.worker_stats['queue'] > 30
the "autoscale process" could then do something like this (pseudo-code):
from gluon.shell import run
from gluon import current
import time
code_check = "from gluon import current; current._autoscale = myautoscale() or False"
check_args = (appname, True, True, None, False, code_check)
while True:
run(*check_args)
if current._autoscale:
....start process
else:
....kill process
time.sleep(15)
What do you think ?
Problems to solve/brainstorm:
- cmdline syntax. For my usecases so far just the max number of allowed processes would be fine, as I can have (at this point), an external process checking if I need workers started (e.g. counting queued tasks). Others use "--autoscale min,max" as "--autoscale 1,3" in order to allow "from 1 to 3 workers"... this means though that the autoscale process should become also a "monitoring" process, spawning new workers until the min value is fullfilled (good added value also because now the "syntax" to start, e.g. 6 workers is -K appname,appname,appname,appname,appname,appname and then it would be, e.g., -K appname --autoscale 1,6).
- group_names.... workers can be assigned group_names to work with, and on different servers.... this means we'd have either to impose some limits towards this start/stop process and/or figure out a "solid" logic to deal with this additional layer of complexity. I'd leave completely out of the picture the possibility to autoscale "different" apps (now, you can do -K appname1,appname2): if needed, two autoscalers will be needed
- I think that either users should either use autoscale or not. The logic behind this is that there's a "managed" logic or there isn't, and once an "autoscaler" is launched, it will probably affect ALL workers. Figuring out an additional logic to avoid messing with workers not started by the "autoscaler" seems a bit stretched.
- starting workers is a relatively easy process, given that we can start processes whenever we want. Killing them is a taddle bit difficult, because they may be processing a task. Now, with the additional "limit" argument to the terminate() function, it should be easier
Following the same logic as before, these pieces should fit together pretty nicely...
code_start = "from gluon import current;current._scheduler.loop()"
code_stop = "from gluon import current;current._scheduler.terminate(limit=1);db.commit()"
start_args = (appname, True, True, None, False, code_start)
stop_args = (appname, True, True, None, False, code_stop)
##logic to start
p = Process(target=run, args=start_args)
p.start()
##logic to terminate
run(*stop_args) #as we don't need an external process