Best approach to handle two different input modes for RNA seq pipeline?

1,759 views
Skip to first unread message

rickard....@scilifelab.se

unread,
Apr 19, 2016, 9:44:24 AM4/19/16
to Nextflow

Hi! 

I'm currently developing a pipeline for doing RNA-Seq using nextflow. The pipeline is supposed to handle both single end (SE) and paired end (PE) data. I have some questions on how to implement such a switch? Using conditional script exceution seem to works great and is easy to use, but defining different input/output files for each process seem not as straightforward. I tried to run the code below but run into problems. 


process fastqc {

   
module 'bioinfo-tools'
   
module 'FastQC'

    memory
'2 GB'
    time
'1h'

    publishDir
"$results_path/fastqc"

   
if (mode == 'SE'){

    input
:

    file read1
from read1

   
}else if ( mode == 'PE'){

    input
:

    file read1
from read1

    file read2
from read2
   
}


    output
:

    file
'*_fastqc.html' into fastqc_html

    file
'*_fastqc.zip' into fastqc_zip

    script
:

   
if ( mode == 'SE')


   
"""
    fastqc -q ${read1}
    """


   
else if ( mode == 'PE')

   
"""
    fastqc -q ${read1} ${read2}
    """

}


I am assuming that the issue is with the input: block, from the docs I gathered that nextflow only allow one input block. I also tested to do something like this as well:

input:

file read1
from read1

if ( mode == 'PE'){
file read2
from read2
}
 
Which does not work either, so the Input block does not seem to tolerate there being if statements.


So the type of input conditions I am trying above seem not to be supported in nextflow. My question is then how I should device the pipeline to handle varying numbers of input output names? An easy solution would to simply have two different nextflow scripts, one for PE and Se respectively. But that's a very redundant and inelegant solution. 

Is it better to have entirely different process that are fired based on what mode is specified? That also seems to lead to some code redundancy, but might perhaps be the suggested method?


Is there a suggested way on how to approach this?


Sincerely

Rickard Hammarén

Bioinformatician at the Genomics Applications Development Facility
National Genomics Infrastructure at SciLifeLab Stockholm, Sweden



Paolo Di Tommaso

unread,
Apr 19, 2016, 10:09:06 AM4/19/16
to nextflow
Hi Rickard, 

You are right, conditional inputs are not supported by Nextflow. 

However it definitively possible to handle multiple inputs in the same process. For example we are doing something very similar what you are trying to implement in this pipeline script

You may notice that the `mapping` process declares an input of type set. In this example the first element represents name of the read(s) while the second element can hold one or more read files. 

Then if it is enough to check the `reads` input element is an instance of a `Path` (just a single file) or no (in case multiple files). 

The tricky part is the creation of the channel emitting the read (pairs) and grouping them properly which is done by the code snippet at these lines
 
  
Let me know if I can help further. 

Cheers,
Paolo





--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

rickard....@scilifelab.se

unread,
Apr 20, 2016, 2:25:33 AM4/20/16
to Nextflow
Hi Paolo! 
Thanks for the quick reply. This is very helpful and I'll probably end up using your code. 
I'll let you know if end up having more questions.
Thanks!
Rickard

rickard....@scilifelab.se

unread,
Apr 21, 2016, 3:08:11 AM4/21/16
to Nextflow
Hi again Paolo! 
I have another question. This time pertaining to how to handle the output of one channel as input to another. In this case Trim Galore will output one file for SE data and two for PE data. The following STAR aligner process will then get either one or two input files. 

output:
file
'*fq.gz' into trimmed_reads

Would probably work for catching the output files, but then I have a channel with files in it and I want STAR to use both files as input and not just run once for each file which I gathered would be the standard behaviour. So I need to do something along the lines of what you did with the input files. I.e. declare a set of files. The question is then do I declare a separate channel for that as previously or can I handle it in one of the processes? 
/Rickard
Message has been deleted

Marc Logghe

unread,
Apr 21, 2016, 3:44:24 AM4/21/16
to Nextflow
Hi Rickard,
If you want both (or rather all) produced files in the trimmed_reads channel you could do something like this:

processB
{
  input
:
    file
('*') from trimmed_reads
}

As a result of that, in the work directory of processB you will have links to all '*fq.gz' files produced in trimmed_reads.

Op donderdag 21 april 2016 09:08:11 UTC+2 schreef rickard....@scilifelab.se:

Paolo Di Tommaso

unread,
Apr 21, 2016, 7:58:54 AM4/21/16
to nextflow
Yes, this should work. The two processes should look something like this: 

process trim {
  output: 
  file '*.fq.gz' into trimmed_reads

  """
  trim command line ..
  """
}

process align {
  input: 
  file '*' from trimmed_reads

  """ 
  STAR --readFilesIn *.fq.gz 
  """ 
}


Note here that - as long as - the only `*.fq.gz` files are the expected read file(s) the BASH will automatically expand the glob pattern to the actual file names in a consistent manner i.e. in a sorted alphabetically

However if for whatever reason you need to reference explicitly that files in the script with a variable you can do in this way: 

process align {
  input: 
  file (reads:'*') from trimmed_reads

  """ 
  STAR --readFilesIn ${reads} 
  """ 
}


Hope it helps. 

Cheers,
Paolo

Rickard Hammarén

unread,
Apr 21, 2016, 8:07:41 AM4/21/16
to next...@googlegroups.com
Great! Yeah, that seems like a very simple way of handling it.
Thanks for the help guys!

Rickard

--
You received this message because you are subscribed to a topic in the Google Groups "Nextflow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nextflow/_ygESaTlCXg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nextflow+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages