Creating tasks from a data structure

37 views

Skip to first unread message

Prabhas Pokharel

unread,

Sep 5, 2013, 6:02:20 PM9/5/13

to pa...@googlegroups.com

Hi all, I have a big data processing pipeline to run, involving many csv files and scripts (in this case R scripts) that turn csvs into other csvs. I want to turn my make-based process into a paver-based process, and I'm wondering (1) whether this seems like a good task for paver and (2) how to create tasks programmatically from my data structure.

So, currently, my makefile have tons of tasks that look like this (for like 20 or so scripts):

a2.csv b2.csv c2.csv: a1.csv b1.csv c1.csv d1.csv e1.csv f1.csv Make1to2.R

Rscript Make1to2.R

and its a huge headache to keep that whole list updated, get all the filenames right, etc.

I have been considering paver as a replacement for this process. I have been able to write a simple python so that I can go through all of my scripts, read out all the files that script reads, and writes, and python is now able to produce a datastructure that looks like:

{ 'Make1to2.R' : {

'inputs': ['a1.csv', 'b1.csv', ...],

'outupts': ['a2.csv', 'b2.csv', 'c2.csv']

...

}

Now, I would like to do timestamp and dependency analysis, and run scripts as their input files are changed (and in the right order). How would I get such a thing done using paver?

Write a Makefile out based on this data structure and then running make is one option; I'm wondering if there is a better one.

Thanks in advance!

jhermann

unread,

Sep 17, 2013, 8:01:57 PM9/17/13

to pa...@googlegroups.com

Consider http://pydoit.org/ and use it either directly or via Paver.

Reply all

Reply to author

Forward

0 new messages