cwltool - use intermediate step's output file as next step's secondaryFile

410 views
Skip to first unread message

Phillis Tang

unread,
Apr 28, 2016, 4:19:23 PM4/28/16
to common-workflow-language
Hey guys,
we have a scenario that the a workflow's first step is taking a bam file as an input, generate a bai file, the next step expects the workflow's input bam and the first step's bai as secondary file, the problem is I can't get those two files in the same directory, as the bai file sits in the temporary directory that cwltool created. Do you know if there is a way to achieve that?

Nebojsa Tijanic

unread,
May 5, 2016, 11:11:27 AM5/5/16
to Phillis Tang, common-workflow-language
Hi Phillis,

Indexers are tricky to handle. For this use case, I think the first tool should have an output that "attaches" the index as a secondary file to the original input bam. So the workflow would be something like bam -> indexer() -> indexed_bam -> variant_caller()


Since the indexer will likely create the .bai in the same folder where the .bam file is located, you should also likely stage the input bam in the working (output) directory.

So, the indexer would look something like:

class: CommandLineTool
# can stage the input using this requirement. 
requirements:
  - class: CreateFileRequirement  
    fileDef: 
      - {filename: input.bam, fileContent: $(job.bam_file))}
inputs:
  - id: bam_file
    type: File
    inputBinding: {}
outputs:
  - id: indexed_bam
    outputBinding: {glob: input.bam, secondaryFiles: [.bai]}
baseCommand: [samtools, sort]

I'll see if I can find/write an example workflow that forwards indexed files.


On Thu, Apr 28, 2016 at 10:19 PM, Phillis Tang <phill...@gmail.com> wrote:
Hey guys,
we have a scenario that the a workflow's first step is taking a bam file as an input, generate a bai file, the next step expects the workflow's input bam and the first step's bai as secondary file, the problem is I can't get those two files in the same directory, as the bai file sits in the temporary directory that cwltool created. Do you know if there is a way to achieve that?

--
You received this message because you are subscribed to the Google Groups "common-workflow-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-workflow-la...@googlegroups.com.
To post to this group, send email to common-workf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-workflow-language/5f754f18-7d58-4fef-b6cf-9e5d79906bb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phillis Tang

unread,
May 10, 2016, 3:19:00 PM5/10/16
to Nebojsa Tijanic, Jeremiah Savage, Shenglai Li, common-workflow-language

Shenglai Li

unread,
May 10, 2016, 5:43:09 PM5/10/16
to Phillis Tang, Nebojsa Tijanic, Jeremiah Savage, common-workflow-language
Hi all,

I've tried CreateFileRequirement, but, please correct me if I got this wrong, it worked only when I hard coded the actual name of the input.bam to the cwl. Is that possible if I could use like $(job.bam_file) for filename? (It failed when I tried, but could be my mistake.) Thanks!

Shenglai
________________________________________
From: Phillis Tang [phill...@gmail.com]
Sent: Tuesday, May 10, 2016 2:18 PM
To: Nebojsa Tijanic; Jeremiah Savage; Shenglai Li
Cc: common-workflow-language
Subject: Re: cwltool - use intermediate step's output file as next step's secondaryFile

On Thu, May 5, 2016 at 10:11 AM Nebojsa Tijanic <nebojsa...@sbgenomics.com<mailto:nebojsa...@sbgenomics.com>> wrote:
Hi Phillis,

Indexers are tricky to handle. For this use case, I think the first tool should have an output that "attaches" the index as a secondary file to the original input bam. So the workflow would be something like bam -> indexer() -> indexed_bam -> variant_caller()

You can see an example of attaching secondary files here: https://github.com/common-workflow-language/workflows/blob/master/tools/samtools-faidx.cwl#L35

Since the indexer will likely create the .bai in the same folder where the .bam file is located, you should also likely stage the input bam in the working (output) directory.

So, the indexer would look something like:

class: CommandLineTool
# can stage the input using this requirement.
# see http://www.commonwl.org/draft-3/CommandLineTool.html#CreateFileRequirement
requirements:
- class: CreateFileRequirement
fileDef:
- {filename: input.bam, fileContent: $(job.bam_file))}
inputs:
- id: bam_file
type: File
inputBinding: {}
outputs:
- id: indexed_bam
outputBinding: {glob: input.bam, secondaryFiles: [.bai]}
baseCommand: [samtools, sort]

I'll see if I can find/write an example workflow that forwards indexed files.


On Thu, Apr 28, 2016 at 10:19 PM, Phillis Tang <phill...@gmail.com<mailto:phill...@gmail.com>> wrote:
Hey guys,
we have a scenario that the a workflow's first step is taking a bam file as an input, generate a bai file, the next step expects the workflow's input bam and the first step's bai as secondary file, the problem is I can't get those two files in the same directory, as the bai file sits in the temporary directory that cwltool created. Do you know if there is a way to achieve that?

--
You received this message because you are subscribed to the Google Groups "common-workflow-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-workflow-la...@googlegroups.com<mailto:common-workflow-la...@googlegroups.com>.
To post to this group, send email to common-workf...@googlegroups.com<mailto:common-workf...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-workflow-language/5f754f18-7d58-4fef-b6cf-9e5d79906bb7%40googlegroups.com<https://groups.google.com/d/msgid/common-workflow-language/5f754f18-7d58-4fef-b6cf-9e5d79906bb7%40googlegroups.com?utm_medium=email&utm_source=footer>.

Nebojsa Tijanic

unread,
May 11, 2016, 6:12:48 AM5/11/16
to Shenglai Li, Phillis Tang, Jeremiah Savage, common-workflow-language
Hi,

it should be possible to add a javascript snippet for file name. Perhaps replace "input.bam" (in CreateFileRequirement and in outputBinding glob) with $(job.bam_file.path.split('/').pop())

Samuel Lampa

unread,
May 16, 2016, 10:41:51 AM5/16/16
to common-workflow-language, sl...@uchicago.edu, phill...@gmail.com, jhsa...@uchicago.edu
On Wednesday, May 11, 2016 at 12:12:48 PM UTC+2, Nebojsa Tijanic wrote:
it should be possible to add a javascript snippet for file name. Perhaps replace "input.bam" (in CreateFileRequirement and in outputBinding glob) with $(job.bam_file.path.split('/').pop())

Hi folks, 

I had the same problem, and this last solution is the only that works. 

The problem is that this does not work for cases when the input data files are located in a sub-directory (or any other relative path) of the script folder .. then you miss that relative folder path.

I guess one would need something like runtime.indir available in javascript, so that one could do:

$(job.bam_file.path.replace(runtime.indir + "/", "") ... and any other path manipulations here ... )

... to solve this.

Comments?
Reply all
Reply to author
Forward
0 new messages