I'm trying to run a program in parallel that needs two input files of the format *_{1,2}.fq.gz
But I run into problems getting the correct path from the input channel. I either get something like:
[/home/methylationPipeline/data/source/reads/paired/th_M2-T_St_1.fq.gz, /home/methylationPipeline/data/source/reads/paired/th_M2-T_St_2.fq.gz] with the current version, or when using file() I end up with just the file names without the path (which the program can not process).
How do I split my input in a way that I end up with the complete paths (like:
/home/methylationPipeline/data/source/reads/paired/th_M2-T_St_1.fq.gz
/home/methylationPipeline/data/source/reads/paired/th_M2-T_St_2.fq.gz )?
My code:
#!/usr/bin/env nextflow
params.ref="refPath"
params.paired = 'methylationPipeline/data/source/reads/paired/*_{1,2}.fq.gz'
reference=file(params.ref)
Channel
.fromFilePairs(params.paired)
.set { read_pairs }
process bwaMeth {
input:
file REFERENCE from reference
set pair_id, reads from read_pairs
output:
set pair_id, 'methylationPipeline/data/mappings/mapped_reads.bam' into bam
"""
echo $reads
echo $pair_id
/home/tools/bwa-meth-0.10/bwameth.py --reference $REFERENCE $reads
"""
}