Dear Ashwin,
The obvious way to do this would be something like
chromosome_names = map(str, range(1, 20) + ["X", "Y"])
@transform(, add_inputs(["a.vcf", "b.vcf"]) ... )
etc
Unfortunately, in Ruffus, strings are assumed to be file names, and Ruffus will complain that the files, "1", "X" etc do not exist.
A future version of Ruffus in development will allow you to get around that but in the meantime, there are two things you can do:
1) Working with Ruffus
Bite the bullet and create a list of files called "1.chr", "2.chr" etc (some consistent extension as usual), with
@originate, and then work with these files in the rest of the pipeline. The
@product decorator is especially useful here (and was in fact introduced specifically for this use case) as it allows you to have each chromosome of your vcf files analysed in parallel. The syntax can be a bit tricky so please feel free to post another question if you get stuck. The following (untested) code should hopefully point you in the right direction.
chromosome_names = ["%s.chr" % cc for cc in (range(1, 20) + ["X", "Y"]))]
@originate(chromosome_names)
def create_chromosome_names(output_file):
with open(output_file, "w") as oo: pass
@product(create_chromosome_names, formatter(),
vcf_files, formatter(),
"chr{basename[0][0]}.{basename[1][0]}.output")
def(input_file_names, output_file_name):
chr_name, vcf_file_name = input_file_names
The only downside is that you get a list of extra files (called "1.chr", "2.chr" etc.). We have found that it helps to organise them in their own "chromosome" directory.
2) Working around Ruffus
If you wrap your strings in any other type, they will no longer be regarded as file names. However, you lose the handy way Ruffus constructs output file names from input strings.
class not_a_string (object):
def __init__(self, name):
@transform(your_vcfs, add_inputs(map(not_a_string, range(1, 20) + ["X" + "Y"])...)
Hope that helps.
Leo