How do I make NF use a group of files as input in e.g MarkDuplicates?

463 views
Skip to first unread message

Oskarv

unread,
Jul 24, 2017, 9:20:10 AM7/24/17
to Nextflow
I need to merge a bunch of bam files with MarkDuplicates, and the syntax is
java -jar MarkDuplicates --input file-1 --input file-2 --input file-n etc...

How do I make NF repeat "--input" for each file from the channel?

Paolo Di Tommaso

unread,
Jul 24, 2017, 9:25:54 AM7/24/17
to nextflow
Provided bam_files is the input list of file, something like the following:

script: 
def input_args = bam_files.collect{ "--input $it" }.join(" ")

"""
java -jar MarkDuplicates $input_args
"""






--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Oskarv

unread,
Jul 24, 2017, 9:48:54 AM7/24/17
to Nextflow
Something seems to be off,  I'm getting this error message:
Process `null` script contains error(s)

The previous tool produces the bam files with these lines:
output:
file
"bwa.fastqtosam.mergebam.bam" into MergeBamAlignment_output

Here's my code:
process MarkDup {

    input
:
    file gatk4

    output
:
    file
("mergebam.fastqtosam.bwa.bam") into MarkDup_bamoutput
    file
("mergebam.fastqtosam.bwa.bai") into MarkDup_baioutput
    file
("markduplicates.metrics")

    script
:
   
def input_args = MergeBamAlignment_output.collect{ "--input $it" }.join(" ")

   
'''
    java -Dsnappy.disable=true -Xmx16G -XX:ParallelGCThreads=16 -Djava.io.tmpdir=`pwd`/tmp -jar \
      !{gatk4} \
      MarkDuplicates \
      $input_args \
      -O mergebam.fastqtosam.bwa.bam \
      --VALIDATION_STRINGENCY LENIENT \
      --METRICS_FILE markduplicates.metrics \
      --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 200000 \
      --CREATE_INDEX true
    '''

}

What am I missing?



Den måndag 24 juli 2017 kl. 15:25:54 UTC+2 skrev Paolo Di Tommaso:
Provided bam_files is the input list of file, something like the following:

script: 
def input_args = bam_files.collect{ "--input $it" }.join(" ")

"""
java -jar MarkDuplicates $input_args
"""





On Mon, Jul 24, 2017 at 3:20 PM, Oskarv <oskarvi...@gmail.com> wrote:
I need to merge a bunch of bam files with MarkDuplicates, and the syntax is
java -jar MarkDuplicates --input file-1 --input file-2 --input file-n etc...

How do I make NF repeat "--input" for each file from the channel?

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,
Jul 24, 2017, 9:54:30 AM7/24/17
to nextflow
It's missing the input declaration of that channel, otherwise the process cannot access it. 

See some example 




p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

Oskarv

unread,
Jul 25, 2017, 2:57:31 AM7/25/17
to Nextflow
I'm sorry but this doesn't make any sense to me. It's the fourth time I'm writing code in NF, so I don't have the basics down yet. Could you elaborate on your answer?

I've tried to figure it out, and from what I can gather from your code example, what it's trying to do is to define "input_args" by collecting all files from the previous tool that produced the files to be merged. So "Previous_Tool.collect{ "--input $it" }" takes all files from the previous tool and for each file it puts it in the "$it" variable. But "$it" hasn't been declared anywhere, so I tried defining it under "input:" by using "file it from Previous_Tool", but that's not allowed, it says "Channel  "Previous_Tool" has been used as an input by more than a process or an operator", and that is of course the "def input_args = etc" process/operator. Either way, it ends with ".join(" ")" which I suppose puts each "--input $it" created by .collect in a row.

The only thing I can imagine would solve it is if I define "$it", but how do I do that if I can't use "Previous_Tool" twice? It doesn't work if I just create "Previous_Tool2" with the same input files, so that "file it from Previous_Tool2" would have defined "$it" for the "def" line.

I'm not sure how much sense this makes, but I've done some trial and error at least, what am I missing?

Paolo Di Tommaso

unread,
Jul 25, 2017, 3:35:07 AM7/25/17
to nextflow
Referring your previous example, you are trying to access to the input variable `MergeBamAlignment_output` which is declared no where in your the process MarkDup. You need to add in the input section a definition like `file MergeBamAlignment_output` otherwise it won't be accessible. 


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

Oskarv

unread,
Jul 25, 2017, 3:51:19 AM7/25/17
to Nextflow
That kind of worked, but that made it not find !gatk4 or $gatk4 anymore, it says it's an unbound variable. So I hardcoded it to the absolute path, and then NF complained that $input_args is an unbound variable. And using !input_args makes GATK complain that !input_args is an invalid argument.
1. What's the difference between ! and $?
2. How do I make the variable bound? i.e, how did !gatk4 become unbound when I added "def ...etc"?

Thanks for your swift replies btw.

Paolo Di Tommaso

unread,
Jul 25, 2017, 4:23:56 AM7/25/17
to nextflow
Check script vs shell 

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

Oskarv

unread,
Jul 25, 2017, 4:54:58 AM7/25/17
to Nextflow
I needed to change
'''
code here
'''

to
"code here"

But it turns out it still did the wrong thing. It was supposed to create
java -jar MarkDuplicates --input file1 --input file2 etc
but instead it created two processes with
java -jarMarkduplicates --input file1
for file 1 and the same for file 2.

What went wrong?

Oskarv

unread,
Jul 25, 2017, 4:56:57 AM7/25/17
to Nextflow
It would be nice to be able to edit a post, I wrote "java -jarMarkDuplicates" but there's supposed to be a space between -jar and MarkDuplicates.

Paolo Di Tommaso

unread,
Jul 25, 2017, 5:03:19 AM7/25/17
to nextflow
If you want to have the MarkDup to gather the all bam from the previous step, you will need to use a collect all the file from previous step.

For example: 

 process MarkDup {

    input:
    file gatk4
    file bam_files from file MergeBamAlignment_output.collect() 

    etc 

}


note, now the bam file variable handle is `bam_files` and do not confuse the previous collect with the above one .


p


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

Oskarv

unread,
Jul 25, 2017, 5:25:31 AM7/25/17
to Nextflow
Alright, it works now, you accidentally wrote
file bam_files from file MergeBamAlignment_output.collect()
which confused me a bit, in case anyone else reads this, it's supposed to be
file bam_files from MergeBamAlignment_output.collect()

And I also had to change
def input_args = MergeBamAlignment_output.collect{ "--input $it" }.join(" ")
to

def input_args = bam_files.collect{ "--input $it" }.join(" ")

Thanks for all the help! I've been using WDL and Cromwell but NF seems to have the same features without needing a MySQL server and cromwell server for call caching, I'm probably able to switch and possibly gain some features that are missing in WDL.

Cheers!
/Oskar

Paolo Di Tommaso

unread,
Jul 25, 2017, 5:32:40 AM7/25/17
to nextflow
Nice. That's interesting. I would like to see a more consistent comparison between WDL and Nextflow. 

Let me know if you need help. 


Best,
Paolo


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

Oskarv

unread,
Jul 25, 2017, 7:37:01 AM7/25/17
to Nextflow
If you have any questions about WDL, feel free to reply in a PM, I've used it for almost a year now so I should be able to provide some insight.

/Oskar

Paolo Di Tommaso

unread,
Jul 25, 2017, 7:49:59 AM7/25/17
to nextflow
Frankly I have no real experience with it, thus I haven't any specific question. But I'm interested in supporting such a comparison, even better with a real world use case. 

I very welcome your feedback about that. 


Cheers,
Paolo


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages