Resume

Eldad Kdoshim

unread,

Jun 29, 2022, 2:15:18 AM6/29/22

to Nextflow

Hi All,
Is there a log where I can understand why Nextflow did not use the cache?
I'm running a pipeline with 3 processes, i had a problem in the 3rd process script.
i've fixed it, run with resume <session_id> , and although the other two processes finished successfully, Nextflow ran them instead of using the cache

any ideas?
Thanks,
Eldad

Paolo Di Tommaso

unread,

Jun 29, 2022, 3:00:04 AM6/29/22

to next...@googlegroups.com

Use the command line option -dump-hashes

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/49d72f48-ad0d-486c-8529-59e2cf217aa1n%40googlegroups.com.

Eldad Kdoshim

unread,

Jun 29, 2022, 4:00:37 AM6/29/22

to Nextflow

Thanks, Paolo,

We notice that the resume issue was because of the '.groupTuple()' operator, which already has some discussions about.

before the process that fails to resume we use a groupTuple operator on the channel:

sample_grouped_sorted_bam_files = sorted_bam_ch .groupTuple() // extract the key string from `GroupKey` .map {sample_id, read_group, output_bam -> tuple(sample_id.toString(), read_group, output_bam)}

we noticed that it causes non-determinism in the order of the keys:
in the initial run we see these logs:

root@ip-172-26-122-180:/opt/manager# cat /efs/projects/cbu-664-6/nf/.nextflow.log.1 | grep "GroupTuple dynamic size" Jun-29 07:44:56.628 [Actor Thread 10] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2157 size=1 Jun-29 07:44:56.694 [Actor Thread 10] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2389 size=1

but in the resume run we see :

[ec2-user@ip-172-26-127-225 ~]$ cat /efs/projects/cbu-664-6/nf/.nextflow.log | grep "GroupTuple dynamic size" Jun-29 07:35:10.809 [Actor Thread 5] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2389 size=1 Jun-29 07:35:10.811 [Actor Thread 5] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2157 size=1

the key in our case is a GroupKey and seems that we need to define a custom sort to make it deterministicwe have tried:

sample_grouped_sorted_bam_files = sorted_bam_ch .groupTuple( sort: {it.toString()}) // extract the key string from `GroupKey` .map {sample_id, read_group, output_bam -> tuple(sample_id.toString(), read_group, output_bam)}

but it doesn't help.
Can you point us to the right way to do this sorting? (by the key's value)

Thanks!

drhp...@gmail.com

unread,

Jun 29, 2022, 5:48:29 AM6/29/22

to Nextflow

Hi Eldad!

It might be a good idea to do the sort directly in the process itself like here. This means you won't have to always sort the channels before passing to the process if the module is to be re-used elsewhere.

Hope that helps!

Cheers,

Harshil

Eldad Kdoshim

unread,

Jun 29, 2022, 6:58:22 AM6/29/22

to next...@googlegroups.com

Harshil, Thanks for the replay

We don't need the sorting on the process, I want the resume to work - for that, I need the result of the groupTuple() to be sorted

my previous mail maybe wasn't clear (I wrote it in a marked down editor but on gmail it as a plain text):
after the first run I see in the log:

DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2157 size=1

DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2389 size=1

but on the 2nd run with the -resume flag, it's the same lines but in different order

DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2389 size=1

DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=s_arcanum_la2157 size=1

since the order is not kept, the resume mechanism not working the processes are running again instead of getting the result from the cache

So we tried to sort with

sorted_bam_ch.groupTuple( sort: {it.toString()})

but it doesn't seems to work.
Sorted_bam_ch contains a tuple with 3 values, where the first value is the key, of type GroupKey (that is why we use toString()).

Thanks!

You received this message because you are subscribed to a topic in the Google Groups "Nextflow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nextflow/zA9f4bfbwwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/df7a8f62-d1b9-4855-a5fa-1beddee60e17n%40googlegroups.com.

Reply all

Reply to author

Forward