scheduler sincronization

Paolo valleri

unread,

Mar 25, 2013, 11:23:56 AM3/25/13

to web...@googlegroups.com

Dear all,

I started to use scheduler. I've created a simple starting example task: https://github.com/ilvalle/scapp/blob/master/models/scheduler.py

and with the great niphlod's plugin cs_monitor_plugin I created an initial repetitive task.

However, although I set repeats: 0 and period: 300, tasks are not repeated as expected exactly every 300s. The first five tasks were executed at:

2013-03-25 15:34:43

2013-03-25 15:40:48

2013-03-25 15:46:52

2013-03-25 15:52:08

2013-03-25 15:57:22

The time between two consecutive execution is not so straightforward. what am I missing? Is there a way to impose more accuracy ?

paolo

szimszon

unread,

Mar 25, 2013, 11:37:45 AM3/25/13

to web...@googlegroups.com

I think the period starts as the last execution ends.

Sometimes I miss the feature to start the task exactly at fixed interval rater than fixed interval starting after the last run ends but I can live without it. :) It's a great tool anyway :-D

paolo....@gmail.com

unread,

Mar 25, 2013, 11:42:04 AM3/25/13

to web...@googlegroups.com

Accordingly to the book http://web2py.com/books/default/chapter/29/04?search=scheduler#Scheduler-(experimental)

It doesn't start when the previous task ends but when the previous task starts instead.

Paolo

2013/3/25 szimszon <szim...@gmail.com>

--

---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/u_PgzKLuQmw/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

szimszon

unread,

Mar 25, 2013, 12:04:18 PM3/25/13

to web...@googlegroups.com

My bad. Never looked closely :(

Than my guess is that the scheduler is exec the task at some point after the next run time and the actual running time is a base of the next run time not the original (first) intended run time.

Niphlod

unread,

Mar 25, 2013, 12:57:35 PM3/25/13

to web...@googlegroups.com

The scheduler is not as "precise" as you would because there are some design considerations to think of....

The "uber-costraint" is that a worker can execute only one task at a time --> if the scheduler is busy with something else, queued tasks can be "delayed" down the line
If the worker is free, new tasks are checked every 5*heartbeats seconds, so (unless immediate=True) you can get a timeframe for execution that spans in the worst case scenario with the default values by 15 seconds

This means that in the worst case scenario you could have a repeating task with a 300 seconds period that are actually executed every 315 seconds.

To clear out doubts, for repetitive tasks, as documented in more detail in the w2p_scheduler_tests app, the start time of the next execution is calculated adding period seconds after the start time of the current execution.
Lockings on the db can add more "imprecision", cause by the default the scheduler is eager to store/update whatever it needs to, to ensure consinstency between the real executions and the data stored on the tables.

paolo....@gmail.com

unread,

Mar 25, 2013, 3:57:09 PM3/25/13

to web...@googlegroups.com

Hi, given the fact that I left untouched the heartbeats value and that my worker was performing only the test task, I got odd starting values:

t1: 2013-03-25 15:34:43

t2: 2013-03-25 15:40:48 ( t2 started after 6m + 5s = 365seconds )

t3: 2013-03-25 15:46:52 ( t3 started after 6m + 4s = 364seconds )

t4: 2013-03-25 15:52:08 ( t4 started after 5m +16s = 316seconds )

t5: 2013-03-25 15:57:22 ( t5 started after 5m +14s = 314seconds )

Given these starting values, it seems that only the last one is under 315. I think we should investigate more how scheduler runs repetitive tasks.

Paolo

2013/3/25 Niphlod <nip...@gmail.com>

--

Niphlod

unread,

Mar 25, 2013, 4:58:04 PM3/25/13

to web...@googlegroups.com

can't reproduce the issue, even with SQLite.
Using the demo1() function from the w2p_scheduler_test app, if I schedule it with period=300, repeats = 0, this is what I get back

r1: 2013-03-25 21:32:56 r2: 2013-03-25 21:38:12 -- 316s r3: 2013-03-25 21:43:28 -- 316s r4: 2013-03-25 21:48:31 -- 303s r5: 2013-03-25 21:53:36 -- 305s

paolo....@gmail.com

unread,

Mar 25, 2013, 5:07:45 PM3/25/13

to web...@googlegroups.com

My task is available here: https://github.com/ilvalle/scapp/blob/master/models/scheduler.py

As you can see the difference is related to the external request. Asap I will try your app too.

Paolo

2013/3/25 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 25, 2013, 5:30:32 PM3/25/13

to web...@googlegroups.com

I don't think that's the issue cause next_time_run gets calculated from the start_time of the previous task, i.e. if I put time.sleep(60) in demo1(), I still get the same execution times...

The only bit missing at this point is remembering that if you have a single worker it can't ASSIGN a new task while a task is RUNNING, so if, e.g., your task is still RUNNING while the scheduler reaches "the 5th round" it will skip the assignment and do it on the 10th, i.e. the first time it's not processing any task and 15 seconds are passed (still, worst case scenario)

paolo....@gmail.com

unread,

Mar 25, 2013, 5:46:12 PM3/25/13

to web...@googlegroups.com

I didn't get your point, with one repetitive task, should I start the scheduler with two or more workers? If so, I will try it.
Actually I have just seen the stop time, on average the task completes it cycle in just a few seconds (~1-2). Given that, is what you have suggested still valid?

Last but not least, demo1 has gone in timeout after one successful cycle, this is very odd, How I can debug the scheduler application and find its errors?

I am running scheduler as a linux service, as described here: http://web2py.com/books/default/chapter/29/13#Start-the-scheduler-as-a-Linux-service-%28upstart%29

Paolo

2013/3/25 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 25, 2013, 6:52:29 PM3/25/13

to web...@googlegroups.com

On Monday, March 25, 2013 10:46:12 PM UTC+1, Paolo valleri wrote:

I didn't get your point, with one repetitive task, should I start the scheduler with two or more workers? If so, I will try it.

The point is that the thread that manages some logic every heartbeat seconds is the one in charge of "waiting" 5 loops to trigger the additional logic to pick up new tasks (a repetitive task is just a new task to execute). If the process "doing the work" is busy processing the task and the underlying thread reaches the "let's assign tasks" loop, the logic will be skipped (it's unuseful to assign tasks if a worker is already processing them). So it can happen that even if the "assignment" time has come, if the worker is processing tasks it will skip the "assignment"

Actually I have just seen the stop time, on average the task completes it cycle in just a few seconds (~1-2). Given that, is what you have suggested still valid?

Nope. As I said it guaranteed that even in the case that the assignment loop falls into the timeframe of a RUNNING task, at the next round it will be picked up

Last but not least, demo1 has gone in timeout after one successful cycle, this is very odd, How I can debug the scheduler application and find its errors?
I am running scheduler as a linux service, as described here: http://web2py.com/books/default/chapter/29/13#Start-the-scheduler-as-a-Linux-service-%28upstart%29

SQLite locking is the most probable cause.
The fastest way is to see what's happening is starting the scheduler with debug logging ....
web2py.py -K appname -D 0

paolo....@gmail.com

unread,

Mar 26, 2013, 6:33:53 AM3/26/13

to web...@googlegroups.com

I executed again demo1, I run it several times, I got even in this case elapsed time between two consecutive executions around 360 and even more instead of 300. What can I do to understand what is not working correctly?

Moreover, I would suggest to add the timestamp to the scheduler debug log.

Paolo

2013/3/25 Niphlod <nip...@gmail.com>

--

Niphlod

unread,

Mar 26, 2013, 7:42:14 AM3/26/13

to web...@googlegroups.com

with the default logging.conf the timestamp is present as in all other web2py-related logging ....

PS: are you sure that the worker is not killed/restarted by any chance (see the worker_name in the scheduler_run table)

Paolo valleri

unread,

Mar 26, 2013, 9:21:21 AM3/26/13

to web...@googlegroups.com

When yesterday I saw demo1 in timeout with ps auxf I have seen that a new process was created. For this reason I started to debug scheduler and I asked how to log etc.

Moreover, I restarted the scheduler manually so I am not able to understand if the other different names are for an internal problem or something different.

Do you think that should be fixed by using a different db engine?

Paolo

Niphlod

unread,

Mar 26, 2013, 9:32:50 AM3/26/13

to web...@googlegroups.com

I find hard to believe that with a single worker, with that function that basically just prints something and an execution every 300 seconds the problem lies into a lock, unless the SQLite library available on your system is reallly old.

Paolo valleri

unread,

Mar 26, 2013, 9:37:52 AM3/26/13

to web...@googlegroups.com

>>> import sqlite3
>>> print sqlite3.version
2.6.0
>>> print sqlite3.sqlite_version
3.7.9

But, if the db lock is not the problem, the test application is very easy, where is it supposed to be the problem?

paolo....@gmail.com

unread,

Mar 26, 2013, 11:24:17 AM3/26/13

to web...@googlegroups.com

If can be useful, I attached part of the log file in which demo1 is executed.

First execution: 2013-03-26 15:52:31

second execution: 2013-03-26 15:58:55 (+384s)

Paolo

2013/3/26 Paolo valleri <paolo....@gmail.com>

paolo....@gmail.com

unread,

Mar 26, 2013, 11:26:11 AM3/26/13

to web...@googlegroups.com

the flle! sorry...

Paolo

2013/3/26 paolo....@gmail.com <paolo....@gmail.com>

demo1.log

Niphlod

unread,

Mar 26, 2013, 12:56:49 PM3/26/13

to web...@googlegroups.com

whoa. seems that something wrong is happening trying to assing new tasks....
normally, every
web2py.scheduler.mapserver#7791 - INFO - TICKER: I'm a ticker
should be followed closely by the lines
web2py.scheduler.mapserver#7791 - INFO - TICKER: workers are 1
web2py.scheduler.mapserver#7791 - INFO - TICKER: tasks are 0
While in your case for several times those lines are not present....
The fact is that the assignment is wrapped yet in a try except clause and every exception should be logged as well, but your log doesn't show anything of that.

I can add more debug lines to the scheduler but this didn't ever happen on all my platforms, so without reproducing it I'm a little bit unsure what the fix can be.

paolo....@gmail.com

unread,

Mar 26, 2013, 3:05:44 PM3/26/13

to web...@googlegroups.com

I can make few tests but only tomorrow, I will be out for the rest of the week.
If you send me a patch with the new log statement, I will come back with the result asap.

Paolo

2013/3/26 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 26, 2013, 4:32:01 PM3/26/13

to web...@googlegroups.com

here's the patch. I purposedly blocked the underlying db from another terminal to see what could be the issue, but I can't reproduce in other way what is happening on your system. Enough said, as soon as the db is unlocked, normal operations resume.
There are a few error() calls to the logger, now if something goes wrong it's reported accordingly.

PS: if you want to stick to SQLite, it's better to install the most updated sqlite adapter, on unix-like OSes is as simple as (giving standard build tools are available)

pip install http://pysqlite.googlecode.com/files/pysqlite-2.6.3.tar.gz --global-option build_static

and then activating WAL (reduces the chances of a locked db. Not lock free, but certainly helps out).
WAL can be activated once on every db with a simple
PRAGMA journal_mode=WAL
or, if you are on a recent web2py distribution

def activate_wal(db_instance):
db_instance.execute('PRAGMA journal_mode=WAL;')

db = DAL('sqlite://storage.sqlite', after_connection=activate_wal)

scheduler.py

paolo....@gmail.com

unread,

Mar 26, 2013, 6:02:22 PM3/26/13

to web...@googlegroups.com

I will try to update sqlite as you suggested asap.
I tried your scheduler but I cannot see any error. In the meanwhile I have seen that when the following logs are missing

"INFO - TICKER: workers are 1"

"INFO - TICKER: tasks are 0"

even the log: "DEBUG - Assigning tasks..." is missing.
could this mean that the function wrapped_assign_tasks is not called at all?

Paolo

2013/3/26 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 27, 2013, 5:03:26 AM3/27/13

to web...@googlegroups.com

well, something dark and obscure is happening ......
either
logger.error('Error cleaning up')
logger.error('Error retrieving status')
logger.error(' error popping tasks')
logger.debug('Assigning tasks...')
"must" happen

paolo....@gmail.com

unread,

Mar 27, 2013, 6:57:47 AM3/27/13

to web...@googlegroups.com

odd behavior, is possible that the function is not called at all?

I added a log line in pop_task(), in the else branch of the first test, to understand why fails the test: The statement is logger.info("pop task %s %s" % (self.is_a_ticker , self.do_assign_tasks))

The log says:

2013-03-27 11:38:01,276 - web2py.scheduler.mapserver#17922 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 11:38:01,277 - web2py.scheduler.mapserver#17922 - DEBUG - freeing workers that have not sent heartbeat

2013-03-27 11:38:01,278 - web2py.scheduler.mapserver#17922 - INFO - TICKER: I'm a ticker

2013-03-27 11:38:01,315 - web2py.scheduler.mapserver#17922 - DEBUG - looping...

2013-03-27 11:38:01,316 - web2py.scheduler.mapserver#17922 - INFO - pop task True False

2013-03-27 11:38:01,445 - web2py.scheduler.mapserver#17922 - INFO - nothing to do

2013-03-27 11:38:01,445 - web2py.scheduler.mapserver#17922 - DEBUG - sleeping...

2013-03-27 11:38:04,432 - web2py.scheduler.mapserver#17922 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 11:38:04,449 - web2py.scheduler.mapserver#17922 - DEBUG - looping...

2013-03-27 11:38:04,461 - web2py.scheduler.mapserver#17922 - INFO - pop task True False

2013-03-27 11:38:04,590 - web2py.scheduler.mapserver#17922 - INFO - nothing to do

2013-03-27 11:38:04,590 - web2py.scheduler.mapserver#17922 - DEBUG - sleeping...

2013-03-27 11:38:07,581 - web2py.scheduler.mapserver#17922 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 11:38:07,594 - web2py.scheduler.mapserver#17922 - DEBUG - looping...

2013-03-27 11:38:07,610 - web2py.scheduler.mapserver#17922 - INFO - pop task True False

2013-03-27 11:38:07,740 - web2py.scheduler.mapserver#17922 - INFO - nothing to do

2013-03-27 11:38:07,740 - web2py.scheduler.mapserver#17922 - DEBUG - sleeping...

2013-03-27 11:38:10,731 - web2py.scheduler.mapserver#17922 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 11:38:10,741 - web2py.scheduler.mapserver#17922 - DEBUG - looping...

2013-03-27 11:38:10,741 - web2py.scheduler.mapserver#17922 - INFO - pop task True False

2013-03-27 11:38:10,921 - web2py.scheduler.mapserver#17922 - INFO - nothing to do

2013-03-27 11:38:10,922 - web2py.scheduler.mapserver#17922 - DEBUG - sleeping...

2013-03-27 11:38:13,888 - web2py.scheduler.mapserver#17922 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 11:38:13,925 - web2py.scheduler.mapserver#17922 - DEBUG - looping...

2013-03-27 11:38:13,925 - web2py.scheduler.mapserver#17922 - INFO - pop task True False

2013-03-27 11:38:14,055 - web2py.scheduler.mapserver#17922 - INFO - nothing to do

2013-03-27 11:38:14,055 - web2py.scheduler.mapserver#17922 - DEBUG - sleeping...

2013-03-27 11:38:17,047 - web2py.scheduler.mapserver#17922 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 11:38:17,048 - web2py.scheduler.mapserver#17922 - DEBUG - freeing workers that have not sent heartbeat

so, pop_task is not called because do_assign_task is False, hope it helps

Paolo

2013/3/27 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 27, 2013, 7:23:41 AM3/27/13

to web...@googlegroups.com

yeps.
this is expected, but every ~15 seconds that turns to "True" (meaning more or less "I'm the Ticker, I'm free, I was free in the last 15 seconds, let's see if there are any submitted (or repeating) tasks to assign").
I don't have a python interpreter ATM but let's break up the logic, so everyone can follow (and explaining most of the times lights up the underlying problem, let's hope so)

Who "tells" if that process is a Ticker is the "being_a_ticker()" function.
being_a_ticker() is wrapped in a try_except clause, so you should see the error logging in case something goes wrong --> Error coordinating TICKER

one level up, do_assign_task is set every 5 cycles (or when "PICK" is found), with the "prune dead workers" logic in the same statements...that too is wrapped and should log --> Error cleaning up

one level up, the whole thread checking is wrapped and should log --> Error retrieving status

Now, you get "I'm a TICKER", so the being_a_ticker() is getting executed, and you don't have any errors reported...... this is the relevant snippet
                    try:
                        self.is_a_ticker = self.being_a_ticker()
                    except:
                        logger.error('Error coordinating TICKER')
                    if self.worker_status[0] == ACTIVE:
                        self.do_assign_tasks = True

So...... either the worker status is not ACTIVE at that moment, or self.do_assign_task is set to True .....

paolo....@gmail.com

unread,

Mar 27, 2013, 8:42:22 AM3/27/13

to web...@googlegroups.com

So, I added: logger.debug('is_a_ticker := %s, worker_status[0] := %s', self.is_a_ticker, self.worker_status[0]) before the last if of relevant snippet you posted,logs:

2013-03-27 13:25:01,785 - web2py.scheduler.mapserver#29806 - INFO - TICKER: I'm a ticker

2013-03-27 13:25:01,797 - web2py.scheduler.mapserver#29806 - DEBUG - looping...

2013-03-27 13:25:01,798 - web2py.scheduler.mapserver#29806 - INFO - pop task True False

2013-03-27 13:25:01,950 - web2py.scheduler.mapserver#29806 - DEBUG - is_a_ticker := True, worker_status[0] := ACTIVE

2013-03-27 13:25:01,978 - web2py.scheduler.mapserver#29806 - INFO - nothing to do

I supposed to see "is_a_ticker := True, worker_status[0] := ACTIVE" after "INFO - TICKER: I'm a ticker" and then something like: 'DEBUG - Assigning tasks..' but I didn't see it because I have 'INFO - pop task True False', the last False, means that self.do_assign_tasks was False. Later, the worker_status[0] is ACTIVE so it would set do_assign_tasks to True but it is too late.

Given that, in some cases pop_task comes before the do_assign_tasks = True statement. This could be the problem, Is It correct?

Paolo

2013/3/27 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 27, 2013, 9:27:41 AM3/27/13

to web...@googlegroups.com

the do_assign_task variable gets eventually shifted (read True to False, False to True) every loop.
The issue is .... how can the execution be pushed far (in your case even 60 seconds) with no errors whatsoever ?

Usually when a TICKER has been chosen (or re-confirmed), do_assign_task is set to True and the thread is put to sleep for 3 seconds.
There's quite no logic between the do_assign_task = True and the self.sleep() line, so there's a "sure" timeframe of 3 seconds for the main process to pick up the tasks.
Are you suggesting that sleeping the main loop and the heartbeat thread "at the same pace" generates the problem you're seeing ?
If so, as a trial, can you please replace the self.sleep() call on at the end of the loop() function to be, e.g. time.sleep(3.5) ?

paolo....@gmail.com

unread,

Mar 27, 2013, 1:15:54 PM3/27/13

to web...@googlegroups.com

I tried it, it seems to be more sporadic but the problem is not gone.

log:

2013-03-27 18:07:11,809 - web2py.scheduler.mapserver#30515 - DEBUG - freeing workers that have not sent heartbeat

2013-03-27 18:07:11,811 - web2py.scheduler.mapserver#30515 - INFO - TICKER: I'm a ticker

2013-03-27 18:07:11,843 - web2py.scheduler.mapserver#30515 - DEBUG - looping...

2013-03-27 18:07:11,853 - web2py.scheduler.mapserver#30515 - INFO - pop task True False

2013-03-27 18:07:11,970 - web2py.scheduler.mapserver#30515 - DEBUG - is_a_ticker := True, worker_status[0] := ACTIVE

2013-03-27 18:07:11,983 - web2py.scheduler.mapserver#30515 - INFO - nothing to do

2013-03-27 18:07:11,983 - web2py.scheduler.mapserver#30515 - DEBUG - sleeping...

2013-03-27 18:07:14,971 - web2py.scheduler.mapserver#30515 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 18:07:14,986 - web2py.scheduler.mapserver#30515 - DEBUG - looping...

2013-03-27 18:07:15,009 - web2py.scheduler.mapserver#30515 - INFO - pop task True False

2013-03-27 18:07:15,190 - web2py.scheduler.mapserver#30515 - INFO - nothing to do

2013-03-27 18:07:15,190 - web2py.scheduler.mapserver#30515 - DEBUG - sleeping...

2013-03-27 18:07:18,148 - web2py.scheduler.mapserver#30515 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 18:07:18,192 - web2py.scheduler.mapserver#30515 - DEBUG - looping...

2013-03-27 18:07:18,193 - web2py.scheduler.mapserver#30515 - INFO - pop task True False

2013-03-27 18:07:18,323 - web2py.scheduler.mapserver#30515 - INFO - nothing to do

2013-03-27 18:07:18,324 - web2py.scheduler.mapserver#30515 - DEBUG - sleeping...

2013-03-27 18:07:21,314 - web2py.scheduler.mapserver#30515 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 18:07:21,327 - web2py.scheduler.mapserver#30515 - DEBUG - looping...

2013-03-27 18:07:21,327 - web2py.scheduler.mapserver#30515 - INFO - pop task True False

2013-03-27 18:07:21,457 - web2py.scheduler.mapserver#30515 - INFO - nothing to do

2013-03-27 18:07:21,458 - web2py.scheduler.mapserver#30515 - DEBUG - sleeping...

2013-03-27 18:07:24,454 - web2py.scheduler.mapserver#30515 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 18:07:24,461 - web2py.scheduler.mapserver#30515 - DEBUG - looping...

2013-03-27 18:07:24,461 - web2py.scheduler.mapserver#30515 - INFO - pop task True False

2013-03-27 18:07:24,641 - web2py.scheduler.mapserver#30515 - INFO - nothing to do

2013-03-27 18:07:24,642 - web2py.scheduler.mapserver#30515 - DEBUG - sleeping...

2013-03-27 18:07:27,604 - web2py.scheduler.mapserver#30515 - DEBUG - ........recording heartbeat (ACTIVE)

2013-03-27 18:07:27,605 - web2py.scheduler.mapserver#30515 - DEBUG - freeing workers that have not sent heartbeat

2013-03-27 18:07:27,607 - web2py.scheduler.mapserver#30515 - INFO - TICKER: I'm a ticker

Paolo

2013/3/27 Niphlod <nip...@gmail.com>

Niphlod

unread,

Mar 27, 2013, 4:58:34 PM3/27/13

to web...@googlegroups.com

it was just a trial.
Explaining it further, let's see if someone spots the **supposed** implementation error....

In order to separate transactions and avoid "task contention" among several workers, we need the **master** steps to both assign tasks and process tasks.

The **thread** steps inserts a virtual "assign the task" job setting the do_assign_task flag to True when necessary.
When that is found to be true, **worker** assigns tasks and sleeps for 3 seconds, giving the change to the send_heartbeat() to reset the do_assign_task to False, so you don't have consecutive "assign_tasks" over and over.

Skipping over the actual steps taken, a **master** does a loop and sleeps.

In that loop, if do_assign_task is True, it assign tasks, returns "None" --> goes to sleep for 3 seconds

A **thread** does some cleanup, sleeps for 3 seconds and sets the do_assign_task every 5 cycles.

Ok, to be fair it's not guaranteed that a loop in either of those completes in 3 seconds, but the **steps** for both (when the **worker** doesn't anything) take a few ms (i.e. scheduler doesn't take into account how many ms passed between the start of either loop before setting the sleep of 3 seconds...)

So, self.sleep() in normal condition gets called:
- at every loop of the **thread** (at the end of the send_heartbeat())
- at every loop of the **master** when it has no tasks or when it assign tasks (at the end of the loop() function)

What you are experiencing is that, even if the **thread** sets correctly do_assign_tasks = True, when the **worker** tries to pop something for some reason do_assign_task is False.
I can imagine that such a thing happens because do_assign_task is reset to False, but that reset happens only after (at least) 3 seconds, at the next send_heartbeat() call.
What I can't reproduce is this behaviour exactly.... how is it possible that in your logs the "I'm a ticker" message comes 40 ms before pop_task, and do_assign_task is yet False ?
Can you add a logging line just between these two lines

            self.do_assign_tasks = False
            if counter % 5 == 0 or mybackedstatus == PICK:

and between

                    if self.worker_status[0] == ACTIVE:
                        self.do_assign_tasks = True

so we can monitor exactly when the flag is switched ?

paolo....@gmail.com

unread,

Apr 2, 2013, 5:05:11 AM4/2/13

to web...@googlegroups.com

Hi Niphlod, sorry for the late answer, I hadn't access to the involved server.

I added the two logs lines you proposed, below the logs:

2013-04-02 09:38:14,561 - web2py.scheduler.mapserver#4162 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-02 09:38:14,562 - web2py.scheduler.mapserver#4162 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-02 09:38:14,562 - web2py.scheduler.mapserver#4162 - DEBUG - freeing workers that have not sent heartbeat

2013-04-02 09:38:14,564 - web2py.scheduler.mapserver#4162 - INFO - TICKER: I'm a ticker

2013-04-02 09:38:14,644 - web2py.scheduler.mapserver#4162 - DEBUG - looping...

2013-04-02 09:38:14,652 - web2py.scheduler.mapserver#4162 - INFO - pop task True False

2013-04-02 09:38:14,723 - web2py.scheduler.mapserver#4162 - DEBUG - is_a_ticker := True, worker_status[0] := ACTIVE

2013-04-02 09:38:14,732 - web2py.scheduler.mapserver#4162 - INFO - nothing to do

2013-04-02 09:38:14,732 - web2py.scheduler.mapserver#4162 - DEBUG - sleeping...

2013-04-02 09:38:17,727 - web2py.scheduler.mapserver#4162 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-02 09:38:17,728 - web2py.scheduler.mapserver#4162 - DEBUG - send_heartbeat(), do_assign_tasks=True

2013-04-02 09:38:17,735 - web2py.scheduler.mapserver#4162 - DEBUG - looping...

2013-04-02 09:38:17,736 - web2py.scheduler.mapserver#4162 - INFO - pop task True False

2013-04-02 09:38:17,916 - web2py.scheduler.mapserver#4162 - INFO - nothing to do

2013-04-02 09:38:17,916 - web2py.scheduler.mapserver#4162 - DEBUG - sleeping...

2013-04-02 09:38:20,871 - web2py.scheduler.mapserver#4162 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-02 09:38:20,872 - web2py.scheduler.mapserver#4162 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-02 09:38:20,919 - web2py.scheduler.mapserver#4162 - DEBUG - looping...

2013-04-02 09:38:20,920 - web2py.scheduler.mapserver#4162 - INFO - pop task True False

2013-04-02 09:38:21,050 - web2py.scheduler.mapserver#4162 - INFO - nothing to do

2013-04-02 09:38:21,050 - web2py.scheduler.mapserver#4162 - DEBUG - sleeping...

2013-04-02 09:38:24,045 - web2py.scheduler.mapserver#4162 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-02 09:38:24,046 - web2py.scheduler.mapserver#4162 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-02 09:38:24,054 - web2py.scheduler.mapserver#4162 - DEBUG - looping...

2013-04-02 09:38:24,054 - web2py.scheduler.mapserver#4162 - INFO - pop task True False

2013-04-02 09:38:24,234 - web2py.scheduler.mapserver#4162 - INFO - nothing to do

2013-04-02 09:38:24,234 - web2py.scheduler.mapserver#4162 - DEBUG - sleeping...

2013-04-02 09:38:27,193 - web2py.scheduler.mapserver#4162 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-02 09:38:27,193 - web2py.scheduler.mapserver#4162 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-02 09:38:27,238 - web2py.scheduler.mapserver#4162 - DEBUG - looping...

2013-04-02 09:38:27,238 - web2py.scheduler.mapserver#4162 - INFO - pop task True False

2013-04-02 09:38:27,368 - web2py.scheduler.mapserver#4162 - INFO - nothing to do

2013-04-02 09:38:27,368 - web2py.scheduler.mapserver#4162 - DEBUG - sleeping...

2013-04-02 09:38:30,360 - web2py.scheduler.mapserver#4162 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-02 09:38:30,361 - web2py.scheduler.mapserver#4162 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-02 09:38:30,361 - web2py.scheduler.mapserver#4162 - DEBUG - freeing workers that have not sent heartbeat

2013-04-02 09:38:30,363 - web2py.scheduler.mapserver#4162 - INFO - TICKER: I'm a ticker

Paolo

2013/3/27 Niphlod <nip...@gmail.com>

it was just a trial.

Niphlod

unread,

Apr 2, 2013, 5:57:45 PM4/2/13

to web...@googlegroups.com

I can't tell what logs line have been added (I see new lines but I don't know what is the line following "if counter ...." and what is the one following "if self.worker_status[0] ....")

paolo....@gmail.com

unread,

Apr 3, 2013, 2:55:41 AM4/3/13

to web...@googlegroups.com

Right, I updated the log too, now I have:

self.do_assign_tasks = False

logger.debug('send_heartbeat(), do_assign_tasks=%s', self.do_assign_tasks )

if counter % 5 == 0 or mybackedstatus == PICK:

and

if self.worker_status[0] == ACTIVE:

logger.debug('Inside If, do_assign_tasks=%s', self.do_assign_tasks )

self.do_assign_tasks = True

I can send a patch with the added logs if it helps, and the complete log file as well.

The log are:

2013-04-03 08:49:47,990 - web2py.scheduler.mapserver#5108 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-03 08:49:47,990 - web2py.scheduler.mapserver#5108 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-03 08:49:47,990 - web2py.scheduler.mapserver#5108 - DEBUG - freeing workers that have not sent heartbeat

2013-04-03 08:49:47,992 - web2py.scheduler.mapserver#5108 - INFO - TICKER: I'm a ticker

2013-04-03 08:49:48,029 - web2py.scheduler.mapserver#5108 - DEBUG - looping...

2013-04-03 08:49:48,029 - web2py.scheduler.mapserver#5108 - INFO - pop task True False

2013-04-03 08:49:48,143 - web2py.scheduler.mapserver#5108 - DEBUG - is_a_ticker := True, worker_status[0] := ACTIVE

2013-04-03 08:49:48,143 - web2py.scheduler.mapserver#5108 - DEBUG - Inside If, do_assign_tasks=False

2013-04-03 08:49:48,159 - web2py.scheduler.mapserver#5108 - INFO - nothing to do

2013-04-03 08:49:48,159 - web2py.scheduler.mapserver#5108 - DEBUG - sleeping...

2013-04-03 08:49:51,147 - web2py.scheduler.mapserver#5108 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-03 08:49:51,148 - web2py.scheduler.mapserver#5108 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-03 08:49:51,162 - web2py.scheduler.mapserver#5108 - DEBUG - looping...

2013-04-03 08:49:51,162 - web2py.scheduler.mapserver#5108 - INFO - pop task True False

2013-04-03 08:49:51,292 - web2py.scheduler.mapserver#5108 - INFO - nothing to do

2013-04-03 08:49:51,292 - web2py.scheduler.mapserver#5108 - DEBUG - sleeping...

2013-04-03 08:49:54,288 - web2py.scheduler.mapserver#5108 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-03 08:49:54,289 - web2py.scheduler.mapserver#5108 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-03 08:49:54,295 - web2py.scheduler.mapserver#5108 - DEBUG - looping...

2013-04-03 08:49:54,295 - web2py.scheduler.mapserver#5108 - INFO - pop task True False

2013-04-03 08:49:54,475 - web2py.scheduler.mapserver#5108 - INFO - nothing to do

2013-04-03 08:49:54,475 - web2py.scheduler.mapserver#5108 - DEBUG - sleeping...

2013-04-03 08:49:57,430 - web2py.scheduler.mapserver#5108 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-03 08:49:57,430 - web2py.scheduler.mapserver#5108 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-03 08:49:57,479 - web2py.scheduler.mapserver#5108 - DEBUG - looping...

2013-04-03 08:49:57,479 - web2py.scheduler.mapserver#5108 - INFO - pop task True False

2013-04-03 08:49:57,583 - web2py.scheduler.mapserver#5108 - INFO - nothing to do

2013-04-03 08:49:57,584 - web2py.scheduler.mapserver#5108 - DEBUG - sleeping...

2013-04-03 08:50:00,579 - web2py.scheduler.mapserver#5108 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-03 08:50:00,579 - web2py.scheduler.mapserver#5108 - DEBUG - send_heartbeat(), do_assign_tasks=False

2013-04-03 08:50:00,587 - web2py.scheduler.mapserver#5108 - DEBUG - looping...

2013-04-03 08:50:00,587 - web2py.scheduler.mapserver#5108 - INFO - pop task True False

2013-04-03 08:50:00,717 - web2py.scheduler.mapserver#5108 - INFO - nothing to do

2013-04-03 08:50:00,717 - web2py.scheduler.mapserver#5108 - DEBUG - sleeping...

2013-04-03 08:50:03,711 - web2py.scheduler.mapserver#5108 - DEBUG - ........recording heartbeat (ACTIVE)

2013-04-03 08:50:03,712 - web2py.scheduler.mapserver#5108 - DEBUG - send_heartbeat(), do_assign_tasks=False

Regards,

Paolo

2013/4/2 Niphlod <nip...@gmail.com>

Niphlod

unread,

Apr 3, 2013, 3:22:02 AM4/3/13

to web...@googlegroups.com

attach the complete log please.
From the log lines you posted basically from 8:49:47 to 8:50:03 your assign task is never set to True, however, I'm more interested to see where the "switch" happens in your flow rather than seeing that is always False :P

paolo....@gmail.com

unread,

Apr 3, 2013, 3:28:44 AM4/3/13

to web...@googlegroups.com

Sorry :), please find attached the log file.

Paolo

2013/4/3 Niphlod <nip...@gmail.com>

web2py_scheduler.log

Niphlod

unread,

Apr 3, 2013, 3:51:58 AM4/3/13

to web...@googlegroups.com

ok, at least it seems consistent:
there's:
- no send_heartbeat(), do_assign_tasks=True line
- no Inside If, do_assign_tasks=True line
in all the log.

PS: I still can't reproduce the "erratic" behaviour, but a baseline question rises: do you get this also using another db engine ?

paolo....@gmail.com

unread,

Apr 4, 2013, 5:16:49 AM4/4/13

to web...@googlegroups.com

Unfortunately I can't install any dbms on that machine, I will make a test on an other 'similar' machine where I have postgres installed.

Paolo

2013/4/3 Niphlod <nip...@gmail.com>

--

---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/u_PgzKLuQmw/unsubscribe?hl=en.

paolo....@gmail.com

unread,

Apr 8, 2013, 4:29:34 AM4/8/13

to web...@googlegroups.com

Hi Niphlod, sorry for the delay I had time only on the weekend to run few tests. On my pc, I run it for hours using postgres. The maximum delay found in starting a new task was 17s.

Paolo

2013/4/4 paolo....@gmail.com <paolo....@gmail.com>

Niphlod

unread,

Apr 8, 2013, 5:22:31 AM4/8/13

to web...@googlegroups.com

ok, so to recap, the problem lies only with sqlite.
I'm not saying it's not an issue I'll continue looking into, I'm just glad that at least with a db that has no concurrency issues (like postgres) the code runs fine.
I'll be back on testing it with sqlite this evening: I'm not that confident that I'll reproduce the behaviour you're seeing but at least I'll try to inspect what can go wrong.

Reply all

Reply to author

Forward