Using scheduler with facebook api calls

59 views
Skip to first unread message

Andre

unread,
Jul 31, 2016, 12:42:13 PM7/31/16
to web2py-users
Hi,

I've created a website that utilizes the facebook api and I'd like to move my facebook requests out of the webserver request handling loop. I've played around with the scheduler and I have a working prototype in place however I'm not sure how many workers I should spawn for the scheduler. Between waiting for a response from facebook and processing the results, these "processes" can take as little as 30 seconds to upwards of 15 minutes. Anyone else run into a similar problem? Would the built-in scheduler be appropriate to use? I'm thinking of just spawning a bunch of workers (25-50 or so?)... and using trial and error to hone in the right number.

-Andre

Niphlod

unread,
Aug 3, 2016, 4:29:20 PM8/3/16
to web2py-users
15 minutes is a LOT for expecting a reply from an external source: sure you can't reduce that timespan (maybe using "cascading" steps) ? 
Assume this when working with ANY task-queue solution: if your 15 minutes task fails at the 14th minute, you wasted 14 minutes for nothing. It holds especially true for anything calling external resources (that may very well be unavailable for some time, e.g. network hiccups). If instead you can break up that in 5 steps of 3 minutes each, you "waste" 3 minutes at most. When you face a possibly-parallelizable scenario (which is quite often true in task-queue solutions), you get the additional benefit of being able to balance among available "processors" each and every step. 

That being said a few points on "sizing" the scheduler processes: the "standard" scheduler can't really support more than 20-30 workers (no SQLite, please! :P). Yep, with 50, they'll be running, but they won't churn more tasks than 20, and they'll bash your backend pretty heavily. The "redis-backed" one works always better, but with this, too, won't get you more than 50. Now that you have the MAX limit, let's speak about what really matters, that is how many concurrent tasks you'll need. 
A single worker can process a task at a time. But it will happily process 5 tasks per second (given the task ACTUALLY processes something, and doesn't wait around): this translates to a single worker processing 300 tasks per minute, if they are already queued and fast.
The "sweet point" you want to reach (assuming all you queue needs to be processed as soon as you queue it) is where you have at least one worker available to do the actual job (i.e. you have one "slot" available at the moment you queue tasks).
Let's say you are in the lower-end on the "sweet point", and assume every task ends takes 5 minutes, with only one worker available (the others are churning a task already)...you queue a task, and the result will be available in 5 minutes. During that period, any other queued task won't be processed, and if you queue 2 tasks at the same time, the result of the second queued task will be available in 10 minutes, unless some workers frees itself because another task has been completed.
With 4 available workers, you can basically queue 4 tasks at the same time and get back each result within 5 minutes. The fifth queued tasks' results will be available in 10 minutes (again, unless some other workers frees themselves).

Going up the ladder one more step, from personal experience...I feel inclined to say that if your users are willing to wait from 30 seconds to 15 minutes, I'd hardly spin up lots of workers and leave them without work to do: IMHO anything that goes on the upper end of 2 minutes doesn't need to get reported to the user in 2 minutes for the simple fact it won't be around to read it 2 minutes later (they probably went somewhere else in the meantime and they'll be back maybe in 10 minutes, maybe the next day). A simple mail at the conclusion of the whole process with "hey, the thing you wanted is ready" seals the deal. 

tl;dr: staying on the "lower" side won't consume unneeded resources and EVEN if the task took only 5 minutes to process for some users AND your server spitted up the result after 10 because it was busy processing some other user's tasks.

Andre

unread,
Aug 4, 2016, 4:17:04 PM8/4/16
to web2py-users
Thank you for the thoughtful response - some great points I need to mull over some more.

I think there needs to be a Niphlod tip jar.

Dave S

unread,
Aug 4, 2016, 4:26:19 PM8/4/16
to web2py-users


On Thursday, August 4, 2016 at 1:17:04 PM UTC-7, Andre wrote:
Thank you for the thoughtful response - some great points I need to mull over some more.

I think there needs to be a Niphlod tip jar.


I think the way we can best make him happy is by adding test coverage.  (I have that on my list, but I'm not yet over-achieving. )

/dps

Reply all
Reply to author
Forward
0 new messages