I am having a problem trying to split an input file and recognize the generated outputs as output files.
My part of the pipeline which gets stuck:
...+ [ Indels + splitClusters ] + "%_R*.split" * [ extractReads ]
So the "Indels" stage generates 3 different files which are correctly recognized.
Indels = {
doc "analyze and visualize deletions"
output.dir = "DELETIONS"
outputs = [
file(input.txt).name.replaceAll(~/.txt/ , ".Crispr.pdf"),
file(input.txt).name.replaceAll(~/.txt/ , ".CrisprReads.txt"),
file(input.txt).name.replaceAll(~/.txt/ , ".CrisprResults.txt"),
]
produce(outputs){
exec"""
module add R/latest;
Rscript bin/evaluateCrispr_Single.R $input.txt $RANGE $AMPLICON $POSITION;
"""
}
}
The problem is then in the next stage. I want to take the output2 from the previous one (which works) and split the file using awk on the 5th column:
splitClusters = {
doc "splitting the reads of each deletion cluster"
output.dir = "DELETIONS"
exec """
awk -v var="$input.prefix" '{print >> var"."\$5".split"; close(\$5)}' $input2;
"""
}
This does generate the output which I want but they are not recognized as outputs.
I tried all kind of variations using "produce" or moving the files again in order to make bpipe see them:
//for FILE in DELETIONS/*.split; do name=\$( basename $FILE .split); mv $FILE $output.dir/\${name}.split; done
Essentially the pipeline reports that it finishes correctly, but the next stage fails and cant find any "*.split" files , It tries to continue with the input from the "Indels" stage.....
extractReads = {
doc "extract reads which were found to harbour interesting deletions"
output.dir = "CRISPRed_RESULTS"
exec"""
bin/
extract_scaffold_version4.pl -f $input.fasta -i $input.split -s > $output.fasta;
"""
}
Note: a pattern '%_R*.split' was provided, but did not match any of the files provided as input [ shortened.Crispr.pdf ,shortened.CrisprReads.txt, shortened.CrisprResults.txt]