@TRANSFORM: DEALING WITH DIRECTORY

22 views
Skip to first unread message

Olabode Ajayi

unread,
Apr 8, 2016, 3:21:11 PM4/8/16
to ruffus_discuss
Hi All,

I am using function regex to filter paired end fasta files in a step of my workflow. However, when I run the pipeline I get the following error from novoalign:

WARNING:
        In Task 'complexo::novoalign':
        No jobs were run because no file names matched.
        Please make sure that the regular expression is correctly specified.
        Job Warning: Input substitution failed:
          Missing key = {path} in '{path[0]}/{sample[0]}_R2_val_2.fq.gz'.
                input =  "'.+/(?P<sample>[a-zA-Z0-9]+)_R1_val_1.fq.gz'",
               filter = formatter(["'/usr/people/ajayi/test/complexo_pipeline/example/fastqs'/.fq.gz"]).. 


Here is my questions, 

How can I specify or set an output directory that can store and capture paired end fasta files with its extension in the directory for the next task as seen above?

My observation with @transform is that, it only deal with file-names (and extension) in a directory. However, creating a directory to keep the output files (results) I am not sure if @transform has a way to discover a directory to store output files.

For instance, Trim_galore which take paired end fasta file as input to trim, an output the result to the same directory or any directory one desire to put the result files.

Please take a look at trim_galore way of handling output-dir/.....

   # To trim paired end reads in FASTQ and output F_R1_val_1.fq.gz...expected output file
    pipeline.transform(

        task_func=stages.trim_galore,

        name='trim_galore',

        input=output_from('original_fastqs'),

        filter=formatter('.+/(?P<F>[a-zA-Z0-9]+)_R1.fastq.gz'),

        add_inputs=add_inputs('{path[0]}/{F[0]}_R2.fastq.gz'),

        #extras=['{F[0]}'],

        output=r"'{path[0]}'/.fq.gz" # expected to store output PE file after trimmed
    )

I can see that @transform (both input and output) only march files in the directory, but not creating a directory to store the output result files.

I will appreciate assistance here.

Thanks,

AJ

Leo Goodstadt 顧維斌

unread,
Apr 12, 2016, 2:51:39 AM4/12/16
to ruffus_...@googlegroups.com
Dear AJ,
Does "{path[0]}" already exist? Ruffus does not make directories for you unless so directed.
(There is pipeline.mkdir)
Can you create a minimal example with some dummy files
("touch" files with open("my.file", "w") or use pipeline.originate) which we can try?
It is quite difficult to debug remotely otherwise.
Leo

--
You received this message because you are subscribed to the Google Groups "ruffus_discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages