Hello,
I've been using nextflow a little bit recently, but I can not get this to properly work. Here is the situation:
I have a number of files which I split into 4 smaller parts; each part is processed independently and then outputs should be merged into a new definitive big file.
Suppose the initial files are "BigFile1.bf, BigFile2.bf, etc.", I can't find a way to group them by base name and merge them accordingly.
Maybe this example clarifies:
Channel.fromPath(params.bigfiles).into{ big_files }
process split {
input:
file big_file
output:
file '*.part_[0-9].bf' into split_files mode flatten
"""
split_files.sh $big_file
"""
}
process workOnFiles {
input:
file(partial) from split_files
output:
file "${partial.baseName}.EDIT.bf" into edited_files
script:
"""
edit_files.sh ${partial} > ${partial.baseName}.EDIT.bf
"""
}
process merge {
input:
file edited_files
output:
file '*.FINAL.bf' into final_file
script:
prefix = partial.toString() - ~/(\.part_[0-9])?(\.EDIT)?(\.bf)?$/
"""
merge_file.sh ${edited_file} -o $prefix.FINAL.bf
"""
}
How do I make sure that the 'merge' process merges together only BigFile1_part_*.bf, and not others?
I've been trying with the 'groupBy' operator, with something like
edited_files
.groupBy { String str -> str - ~/(\.part_[0-9])?(\.EDIT)?(\.bf)?$/ }
but I didnt work.
Other operators allow to output values in tuples containing a specified number of values, but I should make sure those values come from the same source.
Any advice would be of great help,
Fabio