Has anyone integrated user-driven multiprocessing?

84 views
Skip to first unread message

Phillip

unread,
Sep 15, 2015, 3:12:31 PM9/15/15
to web2py-users
It is my understanding that, despite being able to queue tasks from the controller, workers can only be started from the command line (perhaps a security issue?).

I have a long-running python script that needs to be executed by the user for many files. Multiprocessing is essential. 

Based on a post regarding the multiprocessing module, it seems that a user's reloading/resubmitting may be the main problem.

Please correct any misunderstanding or misdirection, and provide any hints if possible. 

Thanks,

Anthony

unread,
Sep 15, 2015, 3:21:59 PM9/15/15
to web2py-users
Why can't you start one or more workers and then have the user simply schedule a task in the queue?

Phillip

unread,
Sep 16, 2015, 6:11:08 PM9/16/15
to web...@googlegroups.com
Are you saying that if I start the workers before any tasks are queued (by an arbitrary number of users), the workers will be idle in wait for queued tasks?

If the answer is no
Here is the set up: There is a grid of files from which a user can generate 'offspring' files in all possible     combinations. Previous inquiry has led me to think the scheduler could be used for multiprocessing here (instead of the ostensibly problematic multiprocessing module). 
The queued tasks are derived from the python script that processes user-selected files, called in the controller when these file id's are passed via ajax.
Otherwise,
Is there a brick wall here? If, for instance, the app was on Google App Engine, could a large number of idle workers simply be started to handle spikes in user requests?

Thank you for the response 
 
 

Anthony

unread,
Sep 17, 2015, 12:45:16 PM9/17/15
to web2py-users
Niphlod can probably provide better advice regarding managing scheduler workers and system resources, but yes, you should be able to have a few workers running ready to handle incoming tasks.

Niphlod

unread,
Sep 17, 2015, 3:19:55 PM9/17/15
to web2py-users
The basic concept of the scheduler is to have a process (or multiple ones) NOT managed by the webserver, that are ready to do some work when told to do so.
Why ? 99% of the cases is composed by:
- long-running computations that will incur in the webserver dropping the process for timeout
- async processing of a task to make the webapp feel "snappier" while providing the same functionality

Starting an external process from a "web-served" process is a contradiction in terms: we don't want zombie processes and as soon as the webserver kills the "originator", the worker would be terminated too.
Another good reason to not provide that functionality is that one may want to run the task on a totally different server than the "web-serving" one (decoupling).
This is on top of the fact that managing long-running processes is best done by softwares specifically engineered to do so, and highly platform-dependant.
Given that starting those processes (and restart them if they die, etc etc etc) is fairly easy to set up with what the underlying OS provides, and that most of them are covered in the book, it shouldn't be hard to set up a scheduler.

That being said, the "scheduler architecture" relies on a never-ending process that loops over inifitely searching for tasks to be executed, sleeping when no tasks are ready to be processed. Every process is in charge of executing one task at a time, and the same task can't be executed at the same time by another worker.
In your case it translates to starting one - or a few - scheduler processes and to queue tasks from your application. As soon as a worker is ready to process those, they'll get picked up and executed. 


Phillip

unread,
Sep 18, 2015, 2:14:31 PM9/18/15
to web...@googlegroups.com

If I understand, using the scheduler in my case would only be a viable option for my own processing purposes, not multiple users. If so, it appears that my only option would be to export a desktop version of the interface to be used for this processing. If I am offtrack here, please let me know. Otherwise, any other comments are welcome.

Niphlod

unread,
Sep 18, 2015, 3:01:30 PM9/18/15
to web2py-users
imho the scenario isn't clear. Why having a sleuth of workers waiting for tasks enqueued by your app is not viable for multiple users ?

Phillip

unread,
Sep 18, 2015, 3:37:26 PM9/18/15
to web...@googlegroups.com
This was my basic interpretation of your post: Scheduler processes shouldn't be managed by the webserver (shouldn't be controlled by user requests) which could basically create zombie processess and / or will drop long-running processes.

If you see no reason the scheduler shouldn't work for this purpose (while preventing long-running processes from dropping (which is what I thought the timeout feature might also handle)) :
 
Could there be scalability issues with too many users attempting to run too many processes (on GAE for instance)?

Niphlod

unread,
Sep 18, 2015, 3:45:56 PM9/18/15
to web2py-users
I still can't get the 

"""viable option for my own processing purposes, not multiple users"""

if they're viable for you, they're viable for multiple users. To a limit.

BTW: you can't run external processes on GAE, there's only the TaskQueue (with some limitations)

As for the number of concurrent scheduler processes, of course they're limited by the hardware you'll run them onto.
If you're looking to more than 50 concurrent processes, I'd say you have to leave web2py's scheduler and resort to state-of-the-art async processing (e.g., Celery)



On Friday, September 18, 2015 at 9:37:26 PM UTC+2, Phillip wrote:
My basic interpretation of your post: The scheduler shouldn't be managed by the webserver (shouldn't be controlled by user requests) which could basically create zombie processess and / or will drop long-running processes.

If you see no reason the scheduler shouldn't work for this purpose (while preventing long-running processes from dropping (which is what I thought the timeout feature was for)),

Dave S

unread,
Sep 18, 2015, 3:48:34 PM9/18/15
to web2py-users

Is there some confusion regarding "to have a process (or multiple ones) NOT managed by the webserver"?  It should be pointed out that these processes are run on the server (because they need access to the resources) and usually owned by the same user that owns the web2py process.  There won't be access to session variables or a req variable, but there is the pvars and pargs in the task DB which the queuing controller can fill in from its session and req, right?

<URL:http://web2py.com/books/default/chapter/29/04/the-core#queue_task->

/dps

Anthony

unread,
Sep 18, 2015, 3:52:46 PM9/18/15
to web...@googlegroups.com
On Friday, September 18, 2015 at 3:37:26 PM UTC-4, Phillip wrote:
This was my basic interpretation of your post: Scheduler processes shouldn't be managed by the webserver (shouldn't be controlled by user requests) which could basically create zombie processess and / or will drop long-running processes.

This just means that the web server (or the app) should not start the workers. Instead, when you start the server running, you also start a bunch of background worker processes. Then, as users make requests for the long-running jobs, those jobs are passed off to one of the available workers. The number of workers that can be running depends on the resources of the machine. Even if all the workers are busy at a given moment, though, as soon as one frees up, the next task will be pulled off the queue. So, as long as your app isn't receiving requests at a long-run pace faster than the hardware can do the jobs, all the jobs will eventually complete.

The scheduler was designed for exactly this kind of scenario.

Anthony
Reply all
Reply to author
Forward
0 new messages