Spawning a thread or a process for a long-running task [urgent!]

221 views
Skip to first unread message

Yassen D.

unread,
Dec 2, 2013, 1:39:18 AM12/2/13
to web...@googlegroups.com
Hello guys,

I am trying to help with a web2py application (web2py v. 2.2.1, ancient, yes) that needs an urgent patch as follows:

There is a controller function that fetches all facebook friends of the user and stores them in the db (mysql). Currently this is done plainly in that function, during the request-response cycle, so the browser waits for the contacts import to happen and then gets the response. For people with many friends (thousands) this takes minutes, so uwsgi kills the python process.

I tried to resolve this in two different ways, neither worked so far:

(1) the controller function spawns a separate thread for the import and then let it run and return response. This kinda worked but importing does strange pauses every couple of seconds, (pausing for a random period, typically 2-6 seconds, sometimes longer). Don't know why that happens, but it makes the whole process unbearably slow.

(2) tried to spawn a separate process using multiprocessing, but import multiprocessing fails.

(3) Using the scheduler is another option, but I hesitate to go for it because of the ancient version of web2py.

(4) Another approach that comes to my mind is to somehow send the response but keep the thread of that response and do the job in that thread before letting it go.

What would you recommend me to do? This is an urgent issue, so any prompt help is much appreciated! Thanks!

Yassen

Massimo Di Pierro

unread,
Dec 2, 2013, 2:41:40 AM12/2/13
to web...@googlegroups.com
The first recommendation would be upgrade to 2.8.2 (nothing should break) and use the scheduler.

Do not use threads because they may still be killed by the web server.
You can run a simple background process that every minute checks if there is something to do. Then use an auxiliary table to queue the fb credentials of those users that need to be processed.

Yassen D.

unread,
Dec 2, 2013, 6:51:26 AM12/2/13
to web...@googlegroups.com
Massimo, thanks a lot for your help!

I will upgrade right away then. (I have very limited time, so I was afraid to break things.)

My comments:

(a) We need to get the import process run instantly on user UI action (link click). Couple of seconds later is okay, but half a minute later is not okay.

(b) We may have several users doing that at the same time, thus n such tasks need to be able to run in parallel.

Will the scheduler allow me to implement an acceptable solution?

Thanks!
YD

Massimo Di Pierro

unread,
Dec 2, 2013, 11:20:54 AM12/2/13
to web...@googlegroups.com
The scheduler should do what you need. You can start many workers to manage the load.

Yassen D.

unread,
Dec 4, 2013, 2:08:07 AM12/4/13
to web...@googlegroups.com

On Monday, December 2, 2013 6:20:54 PM UTC+2, Massimo Di Pierro wrote:
The scheduler should do what you need. You can start many workers to manage the load


Thanks, Massimo! I upgraded to 2.8.2 with minor issues (MySQL date fields seem to have been mapped to strings and now to datetime objects).

I now experiment with the scheduler but the lowest interval I get between task re-runs is about 15 seconds which is not ideal for this use case. Is it possible to lower it even further, to about 5 seconds?

Thanks again!
YD

Leonel Câmara

unread,
Dec 4, 2013, 3:28:43 PM12/4/13
to web...@googlegroups.com
Instead of rerunning the task with period, schedule new tasks every time you need to go get the user's friends and have many scheduler workers if there's a chance you will have to do many of these in parallel. You should be able to lower the interval as much as you need, you could even reduce the scheduler heartbeat from the default 3 seconds.

Yassen D.

unread,
Dec 5, 2013, 12:05:28 AM12/5/13
to web...@googlegroups.com

Leonel, thanks so much for the advice!


On Wednesday, December 4, 2013 10:28:43 PM UTC+2, Leonel Câmara wrote:
Instead of rerunning the task with period, schedule new tasks every time you need to go get the user's friends and have many scheduler workers if there's a chance you will have to do many of these in parallel.

Of course, this seems great for my use case!
 
You should be able to lower the interval as much as you need, you could even reduce the scheduler heartbeat from the default 3 seconds.

Can you elaborate here? Is this a config setting somewhere? (Forgive my ignorance.)
Y.

Yassen D.

unread,
Dec 5, 2013, 1:40:28 AM12/5/13
to web...@googlegroups.com


On Wednesday, December 4, 2013 10:28:43 PM UTC+2, Leonel Câmara wrote:
Instead of rerunning the task with period, schedule new tasks every time you need to go get the user's friends

I guess I have to create a record into the  scheduler_task table, with some reasonable values, and then the first available worker will pick that up, correct?

Yassen D.

unread,
Dec 5, 2013, 2:33:15 AM12/5/13
to web...@googlegroups.com

What I do currently: create a scheduler in a dedicated model:

from gluon.scheduler import Scheduler
from jobs import testfunc
scheduler = Scheduler(db, dict(testfunc_task=testfunc))


Then in a controller, I do:

    task_kwargs = { 'immediate': True, 'task_name': 'ImportContacts-' + str(time.time())[:-4], }
    from jobs import testfunc
    scheduler.queue_task(testfunc, pargs=[request.vars.sna], kwargs=task_kwargs)

and I get an exception saying that table socialjack.scheduler_task does not exist (traceback below). But the table IS there, I can see it in the db admin interface and on the MySQL console ... Please help! Thanks!

Traceback (most recent call last):
File "/home/www-data/web2py/gluon/restricted.py", line 217, in restricted
exec ccode in environment
File "/home/www-data/web2py-2.8.2/applications/socialjack/controllers/contacts.py", line 236, in <module>
File "/home/www-data/web2py/gluon/globals.py", line 372, in <lambda>
self._caller = lambda f: f()
File "/home/www-data/web2py/gluon/tools.py", line 3239, in f
return action(*a, **b)
File "/home/www-data/web2py-2.8.2/applications/socialjack/controllers/contacts.py", line 54, in importcontacts
scheduler.queue_task(testfunc, pargs=[request.vars.sna], kwargs=task_kwargs)
File "/home/www-data/web2py/gluon/scheduler.py", line 983, in queue_task
**kwargs)
File "/home/www-data/web2py/gluon/dal.py", line 9114, in validate_and_insert
value,error = self[key].validate(value)
File "/home/www-data/web2py/gluon/dal.py", line 10036, in validate
(value, error) = validator(value)
File "/home/www-data/web2py/gluon/validators.py", line 668, in __call__
row = subset.select(table._id, field, limitby=(0, 1), orderby_on_limitby=False).first()
File "/home/www-data/web2py/gluon/dal.py", line 10450, in select
return adapter.select(self.query,fields,attributes)
File "/home/www-data/web2py/gluon/dal.py", line 1861, in select
return self._select_aux(sql,fields,attributes)
File "/home/www-data/web2py/gluon/dal.py", line 1826, in _select_aux
self.execute(sql)
File "/home/www-data/web2py/gluon/dal.py", line 1948, in execute
return self.log_execute(*a, **b)
File "/home/www-data/web2py/gluon/dal.py", line 1942, in log_execute
ret = self.cursor.execute(command, *a[1:], **b)
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
ProgrammingError: (1146, "Table 'socialjack.scheduler_task' doesn't exist")

Yassen D.

unread,
Dec 5, 2013, 2:41:19 AM12/5/13
to web...@googlegroups.com


On Thursday, December 5, 2013 9:33:15 AM UTC+2, Yassen D. wrote:
... I get an exception saying that table socialjack.scheduler_task does not exist (traceback below). But the table IS there, I can see it in the db admin interface and on the MySQL console ...

Not really, the table is shown in the database interface but is NOT present on the  SQL server, unfortunately.
What should I do to create it?

Niphlod

unread,
Dec 5, 2013, 4:53:43 PM12/5/13
to web...@googlegroups.com
check for
db = DAL(....migrate_enabled=False)

either this prevented the creation of the table, or some another migrate-related glitch.
To create scheduler's tables, you just need to do (as you did)

Scheduler(db, ...)

NB: (just a naming-convention advice) use
from gluon.scheduler import Scheduler
mysched = Scheduler(db, ....)
instead of
scheduler = Scheduler(db, ...)

to clearly state the difference between the module and the object that is used by your app ^_^

If your tables are in the admin interface but not on the backend, check that your migrations are enabled and delete any pre-existing *scheduler*.table files in the databases/ folder. This will trigger the creation both of the new .table files and the tables on the backend.

BTW: for tasks that needs **immediate** care from the scheduler, instead of lowering the heartbeat (and such having all workers to "hammer" the database asking for new tasks) just let the heartbeat as it is and use
mysched.queue_task(....., immediate=True)
for any user needing that task to run.
In a few sec the first available worker will pick up the task
Plan your workers number according on how many concurrent users will need in your app a task to be processed: a worker can only be processing one single task.

for @all: if you have the need of 20 or 30 concurrent tasks (and such need 20 or 30 workers ) test the scheduler carefully: you may find a dedicated database more performant and you may also need to put workers in a sleeping state (DISABLED) to "alleviate" the db pressure when they are not needed. Then you'll need to set them to ACTIVE (or just delete all the records in the scheduler_workers table) before queueing a new task...they'll resume their (working) state in a heartbeat.
If you need more than 30 workers..... use a different task processor: unfortunately the polling nature of the scheduler makes a "bad usecase" for such high demands.

Yassen Damyanov

unread,
Dec 6, 2013, 12:08:05 AM12/6/13
to web...@googlegroups.com
Niphlod, HUGE THANKS! I'll check that right now and post back.
> --
> Resources:
> - http://web2py.com
> - http://web2py.com/book (Documentation)
> - http://github.com/web2py/web2py (Source code)
> - https://code.google.com/p/web2py/issues/list (Report Issues)
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "web2py-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/web2py/qZ_RUAfqaOY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> web2py+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Yassen D.

unread,
Dec 6, 2013, 1:15:45 AM12/6/13
to web...@googlegroups.com

Niphlod, yes, that was it !!

It was not in the db.py but there was a 0.py with 'settings.migrate = True' (this is not my app, I was just asked to hack there to fix an issue).

An easy workaround might be to also set fake_migrations to True (fake_migrate_all=True in DAL() creation) and copy and execute the proper statements from databases/sql.log. That's what I did because just turning that settings.migrate into True brought me issues with tables already existing.

I was digging in this half a day. An idea: showing the migration status of the app somewhere in the header of the database admin page.

Huge thanks again for your great help!
Yassen
Reply all
Reply to author
Forward
0 new messages