sub-pipelines?

244 views
Skip to first unread message

Mike Smoot

unread,
Apr 20, 2016, 4:08:58 PM4/20/16
to Nextflow
Hi,

Is there any facility in Nextflow for one pipeline to use another pipeline, i.e. as a sub-pipeline?  If so, is there an example anywhere that demonstrates this?


thanks,
Mike

Paolo Di Tommaso

unread,
Apr 21, 2016, 8:18:36 AM4/21/16
to nextflow
Hi Mike, 

Sub-pipelines composition is not supported at this time. However there are other approaches that allow part of the code of your pipeline to be reused. Including Java, Groovy libraries, process scripts externalisation and process templates

Hope it helps. 

Cheers,
Paolo   

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Mike Smoot

unread,
Apr 21, 2016, 2:16:00 PM4/21/16
to Nextflow
Thanks Paolo, but do you mind elaborating on the solutions?  Here is what I understand from your response:
  • Java/Groovy library:  Write a Java or Groovy library that wraps a call to a sub-pipeline and then trigger this library with a function call somewhere in your primary pipeline script.  If my understanding is correct, can you point me in the direction of the class or classes that I should be using to trigger a pipeline?
  • process scripts externalization:  I don't understand what you mean here.
  • process templates:  Write a shell script calls the nextflow executable with a templated .nf file. 
FWIW, here's what I'd like to be able to do:

process run_sub_pipeline_whatever  {


  input:

  file file_for_pipeline from some_channel


  output:

  file file_from_pipeline into some_other_channel


  """

  some_syntax_for_executing_pipeline($file_for_pipeline)

  """


}


If adding this sort of functionality is relatively straightforward, I'd be happy to implement and contribute back.  This is a crucial feature for our group as our pipelines are already over 50 steps (and growing) in some cases. 

thanks,
Mike

Paolo Di Tommaso

unread,
Apr 21, 2016, 2:39:26 PM4/21/16
to nextflow
Hi, 

I'm happy to elaborate more if you can elaborate your needs a bit more :) 

Kidding apart, what's your use case in detail. What parts and how you would like to modularise your pipeline? 


Thank a lot. 

Cheers,
Paolo
 

Mike Smoot

unread,
Apr 21, 2016, 4:10:14 PM4/21/16
to Nextflow
Sure, so the immediate use case is that I've got one pipeline that does some taxonomic analysis on a directory full of sequence reads.  We've got this currently working with Ruffus, but Ruffus has been a struggle when pipelines get longer. This pipeline currently has 14 steps (or processes in Nextflow).  I'd like to be able to run this pipeline standalone, but I'd also need to be able to run this pipeline at the end of another pipeline we've got.  I can't simply copy/paste those 14 steps into another script.  

This other beast of a pipeline has something in the neighborhood of 50 steps.  Trying to get it into Ruffus is what's persuaded me to look elsewhere (among other reasons).  Moreover, as I've been looking at the beast pipeline, it's clear that it could benefit from some modularity.  There are sections that should clearly be separated into their own little workflows.  

Finally, I've got at least 3 other pipelines that I'll be converting over and I expect that many of these pipelines will be doing similar things to the first two.  If I can modularize things into sub-pipelines then we'll be well on our way to a composable library of pipelines, which is our ultimate goal.

I'm not sure I can share actual pipeline code as that's proprietary, but I should be able to contribute back to Nextflow if that works out.


thanks,
Mike

Paolo Di Tommaso

unread,
Apr 21, 2016, 5:52:30 PM4/21/16
to nextflow
Hi Mike, 

Thanks for your detailed feedback. As I've written in my previous email, Nextflow does not provide a language native support for sub-pipeline executions. I agree that such a feature would be useful for a use case like yours and any contribution for that is more than welcome (though I'm expecting that would not be a trivial task). 

Said that must be noted that in Nextflow the process is the unit for component reusability. It allows you to execute any script or tool accessible in the hosting environment, thus it can be used to run Nextflow scripts itself. 

For example having two (or more) pipeline scripts living in the same project repository, let's say alpha.nf and omega.nf, you can easily aggregate them in a single script, writing something similar the following: 


process A {
  input: 
  ..

  output: 
  ..

 '''
  nextflow run $baseDir/alpha.nf
 '''

}      

process B {
  input: 
  ..

  output: 
  ..

 '''
  nextflow run $baseDir/omega.nf
 '''

}      


Take also in consideration that with Nextflow an important role is played by the configuration file that is used to externalise/parametrise (and in your use case to share) many aspects of the pipeline definition.


The only problem I see in this approach is that if you are executing it with a computing cluster (which I guess), your "main" pipeline job(s) need to be submitted to a queue  which nodes are allowed to submit other jobs (the alpha and omega ones) to the computing cluster (actually a configuration that in some scenarios is considered a best practice). 
   
In m opinion this can be a valid workaround to handle sub-pipelines in an effective manner until Nextflow won't have native support for them.


Do you think it can work for you? 


Cheers,
Paolo



Mike Smoot

unread,
Apr 21, 2016, 6:56:42 PM4/21/16
to Nextflow
Thanks Paolo, I think that should work, at least for now.  Your guidance is much appreciated!


Mike

Paolo Di Tommaso

unread,
Apr 22, 2016, 7:38:42 AM4/22/16
to Nextflow
Great. Feel free to join the discussion on Gitter if you need. 

Cheers,
Paolo

Fabien Campagne

unread,
Apr 27, 2016, 8:17:32 AM4/27/16
to Nextflow
Hi Mike, Paolo,

This is a use case I am also interested in supporting in NextflowWorkbench (NW). We also need to reuse previously developed workflows as steps of larger workflows. 

I think in this case an important requirement is the ability to easily substitute input data defined in the reused workflows with data from the enclosing, reusing workflow. We should be able to do this using standard Nextflow because we can generate alternative text outputs from the same NW source (i.e., when the smaller workflow is used by itself, its inputs will be used, but when reused in another workflow, data provided by this workflow will be used). A couple of interns are joining the lab this summer, so let me know if you would be interested in discussing requirements or providing feedback if we add this to the NW roadmap. 

Fabien

Paolo Di Tommaso

unread,
Apr 27, 2016, 10:34:44 AM4/27/16
to nextflow
This is a feature that at some point we need to implement is Nextflow. 

I will try to organise some ideas and to propose a possible implementation. It would be nice if we manage to coordinate the effort in both projects.


Cheers,
Paolo


Reply all
Reply to author
Forward
0 new messages