how to correctly use the scheduler (if at all)

169 views
Skip to first unread message

noam cohen

unread,
Mar 2, 2016, 10:21:46 AM3/2/16
to web2py-users
Hi
I have the following scenario:
A scheduled task runs every T seconds (e.g. 60), and executes a function F()
In function F() I scan some db tables and according to the business logic, may need to send several email messages.

The book (http://www.web2py.com/books/default/chapter/29/08/emails-and-sms#Sending-messages-using-a-background-task) uses an ad-hoc queue rather than the Scheduler. I want to use the Scheduler system
1. How can the worker process access the mailer object? I think it cannot be passed in the queue_task()

2. "At the time of writing web2py does not support attachments and encrypted emails on Google App Engine. Notice cron and scheduler do not work on GAE." Is this still true? If yes, I should keep away from the scheduler.

3. scheduler workers doing work in parallel: "As noted in Chapter 4, this type of background process should not be executed via cron (except perhaps for cron @reboot) because you need to be sure that no more than one instance is running at the same time." 
-- so if I need to send 5 emails, they have to be serialized ? why?

4. Why are old tasks kept in table scheduler_run ? are they purged automatically or I have to delete them manually?

* I DID read the other posts on this topic before writing my own...

(running v 2.12.3)

Thanks!
Noam C.


Niphlod

unread,
Mar 2, 2016, 3:56:09 PM3/2/16
to web2py-users
let's go in order:

1. the scheduler env is the same as your app's. As long as the mailer is defined and working for the app, you can use it in a queued task
2. as there's no way to let an external process live, and the scheduler being a totally separate process from the web-related one, the scheduler can't be used on GAE. Not a web2py limitation but a GAE one, that does provide an "offloading tasks pattern" with their "Task Queue" solution.
GAE attachments and encryption: some python libraries aren't available at all on GAE. again, not a web2py limitation rather a GAE one
3. the whole point is that if you code a long-running task and it doesn't complete, with cron you may end up with two processes sending the same mails (as the showed snippet commit()s at the end of the loop). The scheduler is a lot safer in that regard: every task can only be executed by a single worker at any given time. Concurrency with the scheduler is achieved starting more than a worker process and having more than a queued task (i.e. 5 workers with 10 similar queued tasks will be parallelized with each worker getting a slot of 2 tasks each)
4. for statistical purposes. if you don't need scheduler_run records, you can either:
- code a task that doesn't return any data (the scheduler_run entry won't be kept except for failing tasks)
- pass discard_results=True argument to the scheduler

noam cohen

unread,
Mar 6, 2016, 3:06:05 PM3/6/16
to web2py-users
Here is the summary of what I came up with, if anyone has similar situation.
 
modules/long_task.py:
def nwm_send_email(mailer,*args,**kwargs):
"""Send an email using the supplied dict args.
This function is called by the scheduler"""
args = args[1] # the first element is (), the second is the dict. Don't know why. kwargs was empty
email_addr = args['email_addr']
s = args['subject']
m = args['message']

if mailer is None:
raise "mailer must not be None"
logging.info("sending mail...")
mailer.send(to=[email_addr],
subject = s,
message = m)
logging.info("finished sending mail...")
return 200 # or just return True


models/schedule.py:
from long_task import nwm_send_email
def nwm_task_send_email(*pvars, **kwargs):
"""use a springboard function so we can pass the global"""
return nwm_send_email(mail, pvars,kwargs)



modules/reporter.py: /// this one is called by the periodic scheduler
def longop(scheduler):
vars = {'email_addr': 'n...@gmail.com', 'subject': 'a test message', 'message': 'this is the body !'}
rtn = scheduler.queue_task('nwm_task_send_email' , task_name="long_email", immediate=True, pvars=vars )
if rtn['id'] is None:
logging.error('error inserting task to DB: ' + str(rtn))



(thanks, Niphlod!)


Noam C.


Reply all
Reply to author
Forward
0 new messages