v2.6.3 ignores the flag files used by @check_if_uptodate and re-runs the task regardless

23 views
Skip to first unread message

Jafar Taghiyar

unread,
Jun 9, 2015, 1:15:11 PM6/9/15
to ruffus_...@googlegroups.com
Hi everyone,

Version 2.6.3 seems to ignore the output of @check_if_uptodate decorator and runs the task regardless.
Here is an example:
import os
from ruffus import *

def sentinel_file_exists(output_file):
   
if not os.path.exists(output_file):
       
return True, "Missing file %s" % output_file
   
else:
       
return False, "File %s exists" % output_file
       
@posttask(touch_file("task1_completed.flag"))
@parallel([["task1_completed.flag"]])
@check_if_uptodate(sentinel_file_exists)
def task1(x):
   
print 'in task 1'

@follows(task1)    
@posttask(touch_file("task2_completed.flag"))
@parallel([["task2_completed.flag"]])
@check_if_uptodate(sentinel_file_exists)
def task2(x):
   
print 'in task2'
   
pipeline_run
(task2)
This code produces the following output when it is run for the first time (v2.6.3):
"
________________________________________
Tasks which will be run:


Task enters queue = 'task1'
in task 1
Completed Task = 'task1'
Task enters queue = 'task2'
in task2
Completed Task = 'task2'
"

It is supposed NOT to run 'task1' and 'task2' when re-running due to the presence of flag files, i.e. 'task1_completed.flag' and 'task2_completed.flag'. But, the same output is produced again.

Version 2.4.1, on the other hand, works well.

Cheers,
Jafar

Leo Goodstadt 顧維斌

unread,
Jun 11, 2015, 10:16:58 AM6/11/15
to ruffus_...@googlegroups.com
Dear Jafar,
Thank you for the bug report.
The bug description and fix is at https://github.com/bunbun/ruffus/issues/53
Thanks again.
Leo
P.S. I personally don't really use @parallel. Can I ask why you are bypassing timestamping? Is this to get around some other limitations in Ruffus.

Jafar Taghiyar

unread,
Jun 12, 2015, 6:06:46 PM6/12/15
to ruffus_...@googlegroups.com, llewgo...@gmail.com
Dear Leo,

Thank you for your prompt reply and for fixing the bug.
This is actually a simplified version of my code and I didn't notice much to remove the @parallel. 
Anyway, I'm wondering how I should pass the file name to the @check_if_uptodate(sentinel_file_exists) if I don't use the @parallel?

Cheers,
Jafar

Peter Ebert

unread,
Aug 25, 2015, 3:58:41 AM8/25/15
to ruffus_discuss, llewgo...@gmail.com
Hi Leo,
just stumbled across this question of yours concerning the use of @parallel and possible limitations of Ruffus; I use @parallel quite extensively in my currently largest project for those tasks where the number of (partially intermediate) files explodes as I have observed that timestamping and later checking the timestamps for hundreds of thousands of files slows down Ruffus quite considerably (let's call it the "start-up" time). Of course, my money is actually on the I/O being the slow part here - despite it being a parallel cluster file system - and not the Ruffus SQLite backend. So, my view is that this is not really a limitation inherent to Ruffus, yet it motivates the use of @parallel.
Best,
Peter
Reply all
Reply to author
Forward
0 new messages