I wanted to discuss the need for building generators of CWL.
Currently, there is a GUI-based CWL generator (Rabix). Although it greatly enhances the usability of CWL for a casual user, for a developer (like myself) a scripting generator is preferable.
Of course, one can directly write CWL by hand, but this exercise quickly gets tiresome due to a high verbosity of the CWL. For the same reason, CWL is very difficult to code-review and maintain. We already had a quick discussion about this with folks from Seven Bridges.
There is scriptcwl (https://github.com/NLeSC/scriptcwl), which has been getting multiple updates recently.
I have been working lately on writing another Python generator of CWL workflows. It was inspired by scriptcwl package, but avoids creating a separate object model like scriptcwl does, and inherits instead from the native Python counterparts of YAML structures (dict and list). This way, one can always directly manipulate the data structures in Python for a final polish and immediately dump them as CWL. The objective is to generate the final CWL code from Python, and never edit CWL by hand (at least for the workflows). This way, only Python codebase can be maintained and stored in a revision control.
My generator provides various methods to perform bulk generation of CWL in order to greatly reduce the verbosity. For example, there are methods for adding all tool outputs that match a wildcard pattern, or for creating workflow inputs for all tool inputs (again, based on pattern matching if required).
Reducing verbosity and boilerplate is my major reason for using a generator.
Below I have pasted an annotated example of the generating code and its output.
I saw some snippets of discussions about defining a CWL object model and the APIs in various languages. If that is going to be done, then the generator would be better off using the object model.
I would like to get some idea of what is the current thinking here on the subject of CWL generators.
Thanks,
Andrey
#loads a directory of tools
tool_lib = tools.tool_library(pjoin(cwl_tool_dir,"*.cwl"),path_start=cwl_tool_dir)
#create a workflow; `wf` inherits from Python dict
wf = workflow(tool_lib=tool_lib)
wf.add_inputs(dict(prepareref_tgz="File",
manifest="File",
sample="sample_reads"))
#create a step in the workflow; `s` inherits from Python dict
s = wf.add_step("ariba_run.cwl")
#add all tool outputs as step outputs
s.add_outs()
s.add_in("prepareref_tgz")
#reads are extracted from a ‘record’ object
s.add_in(id="reads_1", source="sample", valueFrom="$(self.file1)")
s.add_in(id="reads_2", source="sample", valueFrom="$(self.file2)")
#add all remaining needed inputs of the tool by creating a workflow input for each and connecting it to the step input (will add step-specific prefix by default)
s.add_ins()
wf.add_output("ariba_run/assembled_genes")
#add workflow outputs: first wildcard entry selects all matching step outputs; second entry passes workflow input to the output with a prefix
wf.add_outputs(["ariba_run/*",
("out","manifest")])
# hand-copy requirements from a tool
wf["requirements"] = wf.get_tool_lib().tools["gene_extractor.cwl"]["requirements"]
wf.save("test_wf_02.cwl")
The generated CWL code:
#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.0
inputs:
- id: prepareref_tgz
type: File
- id: manifest
type: File
- id: sample
type: sample_reads
- id: ariba_run__assembled_threshold
type:
- 'null'
- float
- id: ariba_run__assembly_cov
type:
- 'null'
- int
- id: ariba_run__force
type:
- 'null'
- boolean
- id: ariba_run__gene_nt_extend
type:
- 'null'
- int
- id: ariba_run__min_scaff_depth
type:
- 'null'
- int
- id: ariba_run__noclean
type:
- 'null'
- boolean
- id: ariba_run__nucmer_breaklen
type:
- 'null'
- int
- id: ariba_run__nucmer_min_id
type:
- 'null'
- int
- id: ariba_run__nucmer_min_len
type:
- 'null'
- int
- id: ariba_run__outdir
type:
- 'null'
- string
- id: ariba_run__threads
type:
- 'null'
- int
- id: ariba_run__unique_threshold
type:
- 'null'
- float
- id: ariba_run__verbose
type:
- 'null'
- boolean
outputs:
- id: assembled_genes
type: File
outputSource:
- ariba_run/assembled_genes
- id: ariba_run__assembled_seqs
type: File
outputSource:
- ariba_run/assembled_seqs
- id: ariba_run__assemblies
type: File
outputSource:
- ariba_run/assemblies
- id: ariba_run__log_clusters
type: File
outputSource:
- ariba_run/log_clusters
- id: ariba_run__report
type: File
outputSource:
- ariba_run/report
- id: ariba_run__version_info
type: File
outputSource:
- ariba_run/version_info
- id: out__manifest
type: File
outputSource:
- manifest
requirements:
- class: ScatterFeatureRequirement
- class: InlineJavascriptRequirement
- class: StepInputExpressionRequirement
- class: SubworkflowFeatureRequirement
- class: MultipleInputFeatureRequirement
- class: SchemaDefRequirement
types:
- fields:
- name: sample_reads/file1
type: File
- name: sample_reads/file2
type: File
- name: sample_reads/SampleID
type: string
name: sample_reads
type: record
steps:
- run: ariba_run.cwl
id: ariba_run
in:
- id: prepareref_tgz
source: prepareref_tgz
- id: reads_1
source: sample
valueFrom: $(self.file1)
- id: reads_2
source: sample
valueFrom: $(self.file2)
- id: assembled_threshold
source: ariba_run__assembled_threshold
- id: assembly_cov
source: ariba_run__assembly_cov
- id: force
source: ariba_run__force
- id: gene_nt_extend
source: ariba_run__gene_nt_extend
- id: min_scaff_depth
source: ariba_run__min_scaff_depth
- id: noclean
source: ariba_run__noclean
- id: nucmer_breaklen
source: ariba_run__nucmer_breaklen
- id: nucmer_min_id
source: ariba_run__nucmer_min_id
- id: nucmer_min_len
source: ariba_run__nucmer_min_len
- id: outdir
source: ariba_run__outdir
- id: threads
source: ariba_run__threads
- id: unique_threshold
source: ariba_run__unique_threshold
- id: verbose
source: ariba_run__verbose
out:
- id: assembled_genes
- id: assembled_seqs
- id: assemblies
- id: log_clusters
- id: report
- id: version_info
To view this discussion on the web visit https://groups.google.com/d/msgid/common-workflow-language/3e553efa-697d-464e-b333-68cbff5d594e%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "common-workflow-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-workflow-language+unsub...@googlegroups.com.
To post to this group, send email to common-workflow-language@googlegroups.com.
To post to this group, send email to common-workf...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to common-workflow-language+unsubscr...@googlegroups.com.
To post to this group, send email to common-workf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-workflow-language/3e553efa-697d-464e-b333-68cbff5d594e%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "common-workflow-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-workflow-language+unsub...@googlegroups.com.
To post to this group, send email to common-workflow-language@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-workflow-language/a514b202-cffa-4370-9e9f-3f2270092059%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "common-workflow-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-workflow-language+unsubscr...@googlegroups.com.
To post to this group, send email to common-workflow-language@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-workflow-language/81b05cab-5232-4cec-b5ba-0b995ef6d3d2%40googlegroups.com.