Hi Paolo,
I was hoping you could give me some insight as to how to handle a rather complicated (IMO) splitting strategy considering:
-I have samples from tumor, normal pairs.
i.e: (Sample_N, Sample_T)
-Along the way the samples are being split on chromosome as well.
i.e: (Sample_N_1, Sample_N_2, Sample_T_1, ...)
I don't need the T/N pairs to be aware of each other until downstream, so I don't use the FromFilePairs...
process Index {
input:
file(sample) from samples
output:
set sample, file("${sample}.bai") into bam_indices
"""
samtools index ${sample}
"""
}
process Mpileup {
tag { tumor }
input:
set file(bam), file(bai) from bam_indices
each chrom from((1..22), X, Y)
output:
file "${bam.baseName}.pileup.gz" into sample_pileups
"""
samtools mpileup -r ${chrom} -f ${genome} -Q 20 ${bam} | gzip > ${bam.baseName}_${chrom}.pileup.gz
"""
}
Things get complicated when I need to collect the same Tumor, Normals, also with the same chromosomes in the same command later on. So trying something like:
process Build {
tag { [tumor, normal] }
input:
no idea
output:
file("${sample.baseName}.gz") into out_files
"""
some_command -n sample_N_1.bam -t sample_T_1.bam > output_1.bam
"""
}
Using collect or GroupByTuple will get me 80% of the way there, but I just can't seem to pull off. Eventually the outputs from this step will just be concatenated to the final product. Any advice would be greatly appreciated.
Thanks!