How to get full path from .fromFilePairs?

ChriKub

unread,

Jan 16, 2017, 8:17:13 AM1/16/17

to Nextflow

I'm trying to run a program in parallel that needs two input files of the format *_{1,2}.fq.gz

But I run into problems getting the correct path from the input channel. I either get something like: [/home/methylationPipeline/data/source/reads/paired/th_M2-T_St_1.fq.gz, /home/methylationPipeline/data/source/reads/paired/th_M2-T_St_2.fq.gz] with the current version, or when using file() I end up with just the file names without the path (which the program can not process).
How do I split my input in a way that I end up with the complete paths (like: /home/methylationPipeline/data/source/reads/paired/th_M2-T_St_1.fq.gz /home/methylationPipeline/data/source/reads/paired/th_M2-T_St_2.fq.gz )?

My code:

#!/usr/bin/env nextflow

params.ref="refPath"
params.paired = 'methylationPipeline/data/source/reads/paired/*_{1,2}.fq.gz'

reference=file(params.ref)

Channel
        .fromFilePairs(params.paired)
        .set { read_pairs }


process bwaMeth {

        input:
        file REFERENCE from reference
        set pair_id, reads from read_pairs


        output:
        set pair_id, 'methylationPipeline/data/mappings/mapped_reads.bam' into bam

        """
        echo $reads
        echo $pair_id
        /home/tools/bwa-meth-0.10/bwameth.py --reference $REFERENCE $reads
        """

}

Paolo Di Tommaso

unread,

Jan 16, 2017, 8:27:01 AM1/16/17

to nextflow

By design file paths in the process script are resolved as relative paths. You will need either to modify the python script so that it can manage relative paths or prefix the file name with the $PWD environment variable (taking care to escape the $ character).

Cheers,
Paolo

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

ChriKub

unread,

Jan 19, 2017, 4:19:02 AM1/19/17

to Nextflow

Thats the problem. The path returned by file(reads) is no relative path. I input methylationPipeline/data/source/reads/paired/*_{1,2}.fq.gz (which is the relative path from the current working directory) and get just *_{1,2}.fq.gz

as a result. It cuts away the relative path and just leaves the pure filename. Adding $PWD doesn't work as well. When adding it to the initial filepath it is not resolved during the resolving of the wildcards and when adding it to the command line running the tool it adds the path to the current working directory and not the path to the files. In addtion the path is only added to the first of the files and not to the second.
Is there no way to change it, or to access the entries of the set containing the paths in the set returned by .fromFilePairs?


Cheers
Chris

*_{1,2}.fq.gz

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,

Jan 19, 2017, 7:25:24 AM1/19/17

to nextflow

The fine name *is* the path relative to the current working directory. Bare in mind that NF automatically assigns a unique working directory for each task and stages the input files in that directory.

Thus the file name is enough that files. Since your python script requires an absolute path, you can prefix each file name with the PWD var as shown in the example below

process foo {
input:
set id, file(reads) from read_pairs
"""
echo ${reads.collect{'$PWD/'+it}.join(' ')}
"""
}