Hi Pete,
While we are awaiting the responses from advanced users and developers of nextflow, I was wondering if you would like to provide a little more details about your workflow where you need this join operation on the channels?
Before experimenting with nextflow, I was building my software pipelines using a python based Workflow Management System (WMS) named doit. I also briefly tried with Snakemake. In both Snakemake & doit, you specify inputs and output files for a task and the commands to generate the outputs from the inputs, and the WMS will ensure efficient execution of the tasks in the pipeline. Something similar to workflows in nextflow.
However, I faced one problem in the two systems that if I needed to built the pipeline under various parameter settings, say for example different underlying algorithms, cutoff values etc., it became a tedious process to write the pipeline as it required much repetition of code. To solve this problem, I came up with an extension of doit, where each file & task has an associated parameter table (implemented using pandas DataFrame) where the columns represent the parameter names and rows represent different settings of the parameters. Once you properly associate the parameter tables in the definitions of the files and tasks, the extended WMS (which I call JUDI - Just Do It) makes sure that the pipeline is executed optimally for each setting of the parameters. I have given links to the documentation at the end of this message.
I mention this because I guess your need for join comes from the fact you need to match the common setting of some 'parameters' in the two channels. It would be great to know if it is the case.
However, I came to nextflow because the underlying WMS (doit) that I was using, does not support execution under different settings such as HPC clusters, cloud, docker, etc. The developer of doit seems to have not many resources to support these. It would also be a burden for me to implement. Moreover, why I need to reinvent the wheel when there is an excellent support for these in systems like nextflow?
In fact, I believe that the handling of parameters can be nicely implemented in nextflow if some kind of 'indexed' channels is supported. Unlike the current implementation of channels in nextflow, these indexed channels are similar to pandas series where each value of channel is associated with a set of settings of the index variables. For your example, the indexed version [[0,1], [1,2], [2,3]] of you channel with values [1,2,3], can be easily implemented using such channels. Here, there is one unnamed index (first element of the tuple). The proposed channels should have unlimited but named index variables.
If a process has more than one channel but with one or more common index variables, the nextflow executor should create a process for each setting in the join of the indexes on the common variables.
If you like this idea, could you please express your support for such an extension request?
References:
Thank you,
Soumitra