Thanks, Paolo,
We notice that the resume issue was because of the '.groupTuple()' operator, which already has some discussions about.
before the process that fails to resume we use a groupTuple operator on the channel:
sample_grouped_sorted_bam_files = sorted_bam_ch
.groupTuple()
// extract the key string from `GroupKey`
.map {sample_id, read_group, output_bam -> tuple(sample_id.toString(), read_group, output_bam)}
we noticed that it causes non-determinism in the order of the keys:
in the initial run we see these logs:
root@ip-172-26-122-180:/opt/manager# cat /efs/projects/cbu-664-6/nf/.nextflow.log.1 | grep "GroupTuple dynamic size"
Jun-29 07:44:56.628 [Actor Thread 10] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2157 size=1
Jun-29 07:44:56.694 [Actor Thread 10] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2389 size=1
but in the resume run we see :
[ec2-user@ip-172-26-127-225 ~]$ cat /efs/projects/cbu-664-6/nf/.nextflow.log | grep "GroupTuple dynamic size"
Jun-29 07:35:10.809 [Actor Thread 5] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2389 size=1
Jun-29 07:35:10.811 [Actor Thread 5] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2157 size=1
the key in our case is a GroupKey and seems that we need to define a custom sort to make it deterministicwe have tried:
sample_grouped_sorted_bam_files = sorted_bam_ch
.groupTuple( sort: {it.toString()})
// extract the key string from `GroupKey`
.map {sample_id, read_group, output_bam -> tuple(sample_id.toString(), read_group, output_bam)}
but it doesn't help.
Can you point us to the right way to do this sorting? (by the key's value)
Thanks!