compilation error!

674 views
Skip to first unread message

Joe

unread,
Apr 5, 2021, 7:21:39 PM4/5/21
to Nextflow
Hello nextflowers!

I have this script which works for mutect process but when I added a concatVCF process, it failed. 

kindly see the following script:

process Mutect2 {

      conda '/path/to/.conda/envs/gatk4'

      publishDir "${params.out}/VCF_files/", mode:'copy'

      input:
      each chromosome from (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,'X','Y')
      set sampleID, file(normal), file(primary), file(recurrent) from bam_trio_ch

      output:
      set sampleID , \
      file('*.vcf.gz') , \
      file('*.vcf.gz.stats') into splitted_vcf_ch

      script:
      """
      mkdir -p ${params.tmpdir}/${workflow.runName}/${sampleID}/chr${chromosome}
      gatk Mutect2  \
      -R $ref \
      -L chr${chromosome} \
      -I ${normal} \
      -I ${primary} \
      -I ${recurrent} \
      -normal ${sampleID}_normal \
      --germline-resource $hg38_GnomAD \
      -O ${sampleID}_chr${chromosome}_raw.vcf.gz \
      --tmp-dir ${params.tmpdir}
      rm -r ${params.tmpdir}/${workflow.runName}/${sampleID}/chr${chromosome}
      """
      }

process concatVCF {

      conda '/path/to/.conda/envs/gatk4'

      publishDir "${params.out}/VCF_files/", mode:'copy'

      input:
      set sampleID, file(vcf), file(stats) from splitted_vcf_ch.groupTuple()

      output:
      set sampleID, file("${sampleID}_chr*_raw.vcf"),
      file("${sampleID}_raw.vcf"),
      file("${sampleID}_raw.vcf.gz"),
      file("${sampleID}_raw.vcf.gz.tbi"),
      file("${sampleID}_raw.vcf.gz.stats")  into raw_vcf_ch

      script:
      """
      gunzip ${vcf}

      cat ${*.vcf} > ${sampleID}_raw.vcf

      bgzip ${sampleID}_raw.vcf

      gatk MergeMutectStats \
      -stats ${sampleID}_chr1.vcf.gz.stats -stats ${sampleID}_chr2.vcf.gz.stats \
      -stats ${sampleID}_chr3.vcf.gz.stats -stats ${sampleID}_chr4.vcf.gz.stats \
      -stats ${sampleID}_chr5.vcf.gz.stats -stats ${sampleID}_chr6.vcf.gz.stats \
      -stats ${sampleID}_chr7.vcf.gz.stats -stats ${sampleID}_chr8.vcf.gz.stats \
      -stats ${sampleID}_chr9.vcf.gz.stats -stats ${sampleID}_chr10.vcf.gz.stats \
      -stats ${sampleID}_chr11.vcf.gz.stats -stats ${sampleID}_chr12.vcf.gz.stats \
      -stats ${sampleID}_chr13.vcf.gz.stats -stats ${sampleID}_chr14.vcf.gz.stats \
      -stats ${sampleID}_chr15.vcf.gz.stats -stats ${sampleID}_chr16.vcf.gz.stats \
      -stats ${sampleID}_chr17.vcf.gz.stats -stats ${sampleID}_chr18.vcf.gz.stats \
      -stats ${sampleID}_chr19.vcf.gz.stats -stats ${sampleID}_chr20.vcf.gz.stats \
      -stats ${sampleID}_chr21.vcf.gz.stats -stats ${sampleID}_chr22.vcf.gz.stats \
      -stats ${sampleID}_chrX.vcf.gz.stats -stats ${sampleID}_chrY.vcf.gz.stats \
      -stats ${sampleID}_chrM.vcf.gz.stats -O ${sampleID}_raw.vcf.gz.stats

      gatk IndexFeatureFile -I ${sampleID}_raw.vcf.gz -O ${sampleID}_raw.vcf.gz.tbi
      """
      }

process filterMutectCalls {

      conda '/path/to/.conda/envs/gatk4'

      publishDir "${params.out}/VCF_files/", mode:'copy'

      input:
      set sampleID, file(rawVCF), file(rawVCFtbi), file(rawVCFstats) from raw_vcf_ch

      output:
      set sampleID, file("${sampleID}_filtered.vcf.gz"), file("${sampleID}_filtered.vcf.gz.tbi") into filtered_vcf_ch

      script:
      """
      gatk FilterMutectCalls \
      -R $ref \
      -V ${sampleID}_raw.vcf.gz \
      -O ${sampleID}_filtered.vcf.gz
      #index VCF
      gatk IndexFeatureFile -I ${sampleID}_filtered.vcf.gz -O ${sampleID}_filtered.vcf.gz.tbi
      """
      }


as I said above, this script works for mutect process but when I added a concatVCF process, I always get a compilation error :

- cause: Unexpected input: '{' @ line 93, column 19.

   process concatVCF {

                                        ^

I don't know what I'm doing wrong! so I kidna gave up...

In the concatVCF process, I want to gather all the 24 vcf files emitted by each mutect process and merge them. assuming that groupTuple will help me for that.

any help will be highly appreciated..

Thanks Joe

Paolo Di Tommaso

unread,
Apr 6, 2021, 4:37:59 AM4/6/21
to nextflow
The problem is the `${*.vcf}` which the compiler tries to resolve as a variable name.  

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/e9e08b44-7b13-4e74-8202-f48b6e7ead57n%40googlegroups.com.

Philippe La Rosa

unread,
Apr 6, 2021, 5:27:40 AM4/6/21
to 'A Viehweger' via Nextflow, Philippe La Rosa
Hi,

You can use : 
   cat *.vcf > ${sampleID}_raw.vcf

Philippe

Alan Hoyle

unread,
Apr 6, 2021, 9:56:45 AM4/6/21
to next...@googlegroups.com
A couple more hints:  

for your stats merge, do something like this, and then when you decide to get fancy and split your intervals evenly with gatk SplitIntervals, you can do it.  

script:
joined_stats = stats.join(' -stats ')
"""
# [...]
gatk MergeMutectStats \
      -stats ${joined_stats}
[...]

Also, you should be able to combine the gunzip, cat, bgzip together in a single command:

zcat *.vcf.gz | bgzip > ${sampleID}_raw.vcf

However, you might also use gatk MergeVcfs instead, with a .join() much like the other suggestion above to create the command line parameters.  

We do something like that in our somatic workflow.  



--
  -  Alan Hoyle  -  al...@alanhoyle.com  -  http://www.alanhoyle.com/  -


--

Joe

unread,
Apr 9, 2021, 3:25:34 PM4/9/21
to Nextflow
Thank you for your corrective suggestions..

it partially works,

MergeMutectStats not working, throwing an error related to gatk. however, when I run MergeMutectStats on a bash script, it works! 

******************

A USER ERROR has occurred: Encountered an IO exception while reading from ID_chr1_raw.vcf.gz.stats

******************
could the symlinks be the reason!?

10.vcf.gz -> /nextflow_work_dir/88/460102274043de292eb9f2ff2238df/ID_chr10_raw.vcf.gz

10.vcf.gz.stats -> /nextflow_work_dir/88/460102274043de292eb9f2ff2238df/ID_chr10_raw.vcf.gz.stats

11.vcf.gz -> /nextflow_work_dir/78/ed69c045bb1f31a8433f9f97858dcf/ID_chr11_raw.vcf.gz

11.vcf.gz.stats -> /nextflow_work_dir/78/ed69c045bb1f31a8433f9f97858dcf/ID_chr11_raw.vcf.gz.stats

12.vcf.gz -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz

12.vcf.gz.stats -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz.stats

13.vcf.gz -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz

13.vcf.gz.stats -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz.stats

14.vcf.gz -> /nextflow_work_dir/fc/c83ba7a31cbd7bdcb10a1544d9f105/ID_chr15_raw.vcf.gz

14.vcf.gz.stats -> /nextflow_work_dir/fc/c83ba7a31cbd7bdcb10a1544d9f105/ID_chr15_raw.vcf.gz.stats

...
...
...


@Alan,
thanks for your detailed answer.

I didn't get the idea of (joined_stats = stats.join(' -stats '))

so, shall I do like this:

gatk MergeMutectStats \
      -stats ${joined_stats}  -O merged.vcf.gz.stats, without mentioning -stats and the individual stats files anymore..

Also I get an error -stats not defined! although I did:

Alan Hoyle

unread,
Apr 9, 2021, 3:38:24 PM4/9/21
to next...@googlegroups.com
It appears there is a bug in my example where it has "-stats" instead of "--stats".  gatk MergeMutectStats can take multiple stats files as input, but each one needs to be prepended by the command-line parameter.  

I.e. if you were typing it out, it would be something like:

gatk MergeMutectStats --stats blah.1.stats --stats blah.2.stats --output blah.merged.stats

The stats.join(" --stats ") turns the stats array into a list of 

We do our stats merge in a stand-alone process, and I've pasted it below:  

process m2_parallel_stats_merge {
cpus 3
memory '4 GB'
tag "${subject}"

publishDir "${output_dir}/${subject}",
mode: publish_dir_mode,
pattern: '*.stats'

input:
tuple val(subject),
file (stats) from m2_parallel_stats.groupTuple()
output:
tuple val(subject),
"mutect2.merged.vcf.gz.stats" into m2_parallel_stats_merged

script:

input = stats.join(' --stats ')

"""
gatk MergeMutectStats \
--stats ${input} \
--output mutect2.merged.vcf.gz.stats
"""
}


--
  -  Alan Hoyle  -  al...@alanhoyle.com  -  http://www.alanhoyle.com/  -

Philippe La Rosa

unread,
Apr 9, 2021, 4:29:55 PM4/9/21
to 'A Viehweger' via Nextflow, Philippe La Rosa
HI

Or you can :

        ...
       
input:
        set sampleID, file(vcf), statsFile(stats) from splitted_vcf_ch.groupTuple()
...

script: stats = statsFile.collect{ "-stats ${it} " }.join(' ) """
gatk MergeMutectStats  ${stats} \ -O ${idSamplePair}.vcf.gz.stats """




Alan Hoyle

unread,
Apr 9, 2021, 4:39:42 PM4/9/21
to next...@googlegroups.com
Something like this would work ('--stats' instead of '-stats' notwithstanding).  However I would find it clearer to have the actual command line options that are being used in the """bash""" part of the script, rather than hiding them in the Groovy.  Makes it easier to copy/paste to run outside in any case.  

-alan


--
  -  Alan Hoyle  -  al...@alanhoyle.com  -  http://www.alanhoyle.com/  -

Joe

unread,
Apr 9, 2021, 7:24:41 PM4/9/21
to Nextflow
Thanks Alan and Philippe,

I really appreciate you support.

Working fine..

In this bit (gatk MergeMutectStats --stats blah.1.stats --stats blah.2.stats --output blah.merged.stats) there seems to be an issue while making the symlinks, which I guess are not matching the order in the in the script like chr1 chr2, see below:

1.vcf.gz -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz       

1.vcf.gz.stats -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz.stats -------------------> I'm assuming that the code looks for ID_chr1_raw.vcf.gz but find ID_chr19_raw.vcf.gz instead and then fails.

2.vcf.gz -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz

2.vcf.gz.stats -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz.stats ------------------> I'm assuming that the code looks for ID_chr2_raw.vcf.gz but find ID_chr14_raw.vcf.gz instead and then fails.

...
...
and so on.

but anyway your solutions work for me.

I will try to have the tuple sorted just for practicing because I'm new to bioinformatics and nextflow.

Have a great weekend folks..

Alan Hoyle

unread,
Apr 9, 2021, 7:52:58 PM4/9/21
to next...@googlegroups.com
Take a look in the process’s .command.sh and you should be able to see what’s going on. 

— 

From: next...@googlegroups.com <next...@googlegroups.com> on behalf of Joe <algha...@gmail.com>
Sent: Friday, April 9, 2021 7:24:41 PM
To: Nextflow <next...@googlegroups.com>
Subject: Re: [SOCIAL NETWORK ?] compilation error!
 

Alan Hoyle

unread,
Apr 10, 2021, 10:02:47 AM4/10/21
to next...@googlegroups.com
Did you really put the nextflow_work_dir in your root directory?

— 

From: Alan Hoyle <alan....@gmail.com>
Sent: Friday, April 9, 2021 7:52:54 PM
To: next...@googlegroups.com <next...@googlegroups.com>

Joe

unread,
Apr 10, 2021, 7:18:26 PM4/10/21
to Nextflow
It is the same nextflow_work_dir and no changes at all, because it worked when I applied your solution..!
Reply all
Reply to author
Forward
0 new messages