compilation error!

Joe

unread,

Apr 5, 2021, 7:21:39 PM4/5/21

to Nextflow

Hello nextflowers!

I have this script which works for mutect process but when I added a concatVCF process, it failed.

kindly see the following script:

process Mutect2 {

conda '/path/to/.conda/envs/gatk4'

publishDir "${params.out}/VCF_files/", mode:'copy'

input:

each chromosome from (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,'X','Y')

set sampleID, file(normal), file(primary), file(recurrent) from bam_trio_ch

output:

set sampleID , \

file('*.vcf.gz') , \

file('*.vcf.gz.stats') into splitted_vcf_ch

script:

"""

mkdir -p ${params.tmpdir}/${workflow.runName}/${sampleID}/chr${chromosome}

gatk Mutect2 \

-R $ref \

-L chr${chromosome} \

-I ${normal} \

-I ${primary} \

-I ${recurrent} \

-normal ${sampleID}_normal \

--germline-resource $hg38_GnomAD \

-O ${sampleID}_chr${chromosome}_raw.vcf.gz \

--tmp-dir ${params.tmpdir}

rm -r ${params.tmpdir}/${workflow.runName}/${sampleID}/chr${chromosome}

"""

}

process concatVCF {

conda '/path/to/.conda/envs/gatk4'

publishDir "${params.out}/VCF_files/", mode:'copy'

input:

set sampleID, file(vcf), file(stats) from splitted_vcf_ch.groupTuple()

output:

set sampleID, file("${sampleID}_chr*_raw.vcf"),

file("${sampleID}_raw.vcf"),

file("${sampleID}_raw.vcf.gz"),

file("${sampleID}_raw.vcf.gz.tbi"),

file("${sampleID}_raw.vcf.gz.stats") into raw_vcf_ch

script:

"""

gunzip ${vcf}

cat ${*.vcf} > ${sampleID}_raw.vcf

bgzip ${sampleID}_raw.vcf

gatk MergeMutectStats \

-stats ${sampleID}_chr1.vcf.gz.stats -stats ${sampleID}_chr2.vcf.gz.stats \

-stats ${sampleID}_chr3.vcf.gz.stats -stats ${sampleID}_chr4.vcf.gz.stats \

-stats ${sampleID}_chr5.vcf.gz.stats -stats ${sampleID}_chr6.vcf.gz.stats \

-stats ${sampleID}_chr7.vcf.gz.stats -stats ${sampleID}_chr8.vcf.gz.stats \

-stats ${sampleID}_chr9.vcf.gz.stats -stats ${sampleID}_chr10.vcf.gz.stats \

-stats ${sampleID}_chr11.vcf.gz.stats -stats ${sampleID}_chr12.vcf.gz.stats \

-stats ${sampleID}_chr13.vcf.gz.stats -stats ${sampleID}_chr14.vcf.gz.stats \

-stats ${sampleID}_chr15.vcf.gz.stats -stats ${sampleID}_chr16.vcf.gz.stats \

-stats ${sampleID}_chr17.vcf.gz.stats -stats ${sampleID}_chr18.vcf.gz.stats \

-stats ${sampleID}_chr19.vcf.gz.stats -stats ${sampleID}_chr20.vcf.gz.stats \

-stats ${sampleID}_chr21.vcf.gz.stats -stats ${sampleID}_chr22.vcf.gz.stats \

-stats ${sampleID}_chrX.vcf.gz.stats -stats ${sampleID}_chrY.vcf.gz.stats \

-stats ${sampleID}_chrM.vcf.gz.stats -O ${sampleID}_raw.vcf.gz.stats

gatk IndexFeatureFile -I ${sampleID}_raw.vcf.gz -O ${sampleID}_raw.vcf.gz.tbi

"""

}

process filterMutectCalls {

conda '/path/to/.conda/envs/gatk4'

publishDir "${params.out}/VCF_files/", mode:'copy'

input:

set sampleID, file(rawVCF), file(rawVCFtbi), file(rawVCFstats) from raw_vcf_ch

output:

set sampleID, file("${sampleID}_filtered.vcf.gz"), file("${sampleID}_filtered.vcf.gz.tbi") into filtered_vcf_ch

script:

"""

gatk FilterMutectCalls \

-R $ref \

-V ${sampleID}_raw.vcf.gz \

-O ${sampleID}_filtered.vcf.gz

#index VCF

gatk IndexFeatureFile -I ${sampleID}_filtered.vcf.gz -O ${sampleID}_filtered.vcf.gz.tbi

"""

}

as I said above, this script works for mutect process but when I added a concatVCF process, I always get a compilation error :

- cause: Unexpected input: '{' @ line 93, column 19.

process concatVCF {

^

I don't know what I'm doing wrong! so I kidna gave up...

In the concatVCF process, I want to gather all the 24 vcf files emitted by each mutect process and merge them. assuming that groupTuple will help me for that.

any help will be highly appreciated..

Thanks Joe

Paolo Di Tommaso

unread,

Apr 6, 2021, 4:37:59 AM4/6/21

to nextflow

The problem is the `${*.vcf}` which the compiler tries to resolve as a variable name.

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/e9e08b44-7b13-4e74-8202-f48b6e7ead57n%40googlegroups.com.

Philippe La Rosa

unread,

Apr 6, 2021, 5:27:40 AM4/6/21

to 'A Viehweger' via Nextflow, Philippe La Rosa

Hi,

You can use :

cat *.vcf > ${sampleID}_raw.vcf

Philippe

Alan Hoyle

unread,

Apr 6, 2021, 9:56:45 AM4/6/21

to next...@googlegroups.com

A couple more hints:

for your stats merge, do something like this, and then when you decide to get fancy and split your intervals evenly with gatk SplitIntervals, you can do it.

script:

joined_stats = stats.join(' -stats ')

"""

# [...]

gatk MergeMutectStats \

-stats ${joined_stats}

[...]

Also, you should be able to combine the gunzip, cat, bgzip together in a single command:

zcat *.vcf.gz | bgzip > ${sampleID}_raw.vcf

However, you might also use gatk MergeVcfs instead, with a .join() much like the other suggestion above to create the command line parameters.

We do something like that in our somatic workflow.

--
- Alan Hoyle - al...@alanhoyle.com - http://www.alanhoyle.com/ -

--

Joe

unread,

Apr 9, 2021, 3:25:34 PM4/9/21

to Nextflow

Thank you for your corrective suggestions..

it partially works,

MergeMutectStats not working, throwing an error related to gatk. however, when I run MergeMutectStats on a bash script, it works!

******************

A USER ERROR has occurred: Encountered an IO exception while reading from ID_chr1_raw.vcf.gz.stats

******************

could the symlinks be the reason!?

10.vcf.gz -> /nextflow_work_dir/88/460102274043de292eb9f2ff2238df/ID_chr10_raw.vcf.gz

10.vcf.gz.stats -> /nextflow_work_dir/88/460102274043de292eb9f2ff2238df/ID_chr10_raw.vcf.gz.stats

11.vcf.gz -> /nextflow_work_dir/78/ed69c045bb1f31a8433f9f97858dcf/ID_chr11_raw.vcf.gz

11.vcf.gz.stats -> /nextflow_work_dir/78/ed69c045bb1f31a8433f9f97858dcf/ID_chr11_raw.vcf.gz.stats

12.vcf.gz -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz

12.vcf.gz.stats -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz.stats

13.vcf.gz -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz

13.vcf.gz.stats -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz.stats

14.vcf.gz -> /nextflow_work_dir/fc/c83ba7a31cbd7bdcb10a1544d9f105/ID_chr15_raw.vcf.gz

14.vcf.gz.stats -> /nextflow_work_dir/fc/c83ba7a31cbd7bdcb10a1544d9f105/ID_chr15_raw.vcf.gz.stats

...

@Alan,

thanks for your detailed answer.

I didn't get the idea of (joined_stats = stats.join(' -stats '))

so, shall I do like this:

gatk MergeMutectStats \

-stats ${joined_stats} -O merged.vcf.gz.stats, without mentioning -stats and the individual stats files anymore..

Also I get an error -stats not defined! although I did:

Alan Hoyle

unread,

Apr 9, 2021, 3:38:24 PM4/9/21

to next...@googlegroups.com

It appears there is a bug in my example where it has "-stats" instead of "--stats". gatk MergeMutectStats can take multiple stats files as input, but each one needs to be prepended by the command-line parameter.

I.e. if you were typing it out, it would be something like:

gatk MergeMutectStats --stats blah.1.stats --stats blah.2.stats --output blah.merged.stats

The stats.join(" --stats ") turns the stats array into a list of

We do our stats merge in a stand-alone process, and I've pasted it below:

process m2_parallel_stats_merge {
    cpus 3
    memory '4 GB'
    tag "${subject}"

    publishDir "${output_dir}/${subject}",
        mode: publish_dir_mode,
        pattern: '*.stats'

    input:
        tuple val(subject),
            file (stats) from m2_parallel_stats.groupTuple()
    output:
        tuple val(subject),
            "mutect2.merged.vcf.gz.stats" into m2_parallel_stats_merged

    script:

    input = stats.join(' --stats ')

    """
        gatk MergeMutectStats \
            --stats ${input} \
            --output mutect2.merged.vcf.gz.stats
    """
}

--
- Alan Hoyle - al...@alanhoyle.com - http://www.alanhoyle.com/ -

To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/67b1bb50-c170-4eff-86e2-e99d7cbfe0ben%40googlegroups.com.

Philippe La Rosa

unread,

Apr 9, 2021, 4:29:55 PM4/9/21

to 'A Viehweger' via Nextflow, Philippe La Rosa

HI

Or you can :

...

input:

set sampleID, file(vcf), statsFile(stats) from splitted_vcf_ch.groupTuple()

...

script: stats = statsFile.collect{ "-stats ${it} " }.join(' ‘) """

gatk MergeMutectStats ${stats} \ -O ${idSamplePair}.vcf.gz.stats """

To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/67b1bb50-c170-4eff-86e2-e99d7cbfe0ben%40googlegroups.com.

Alan Hoyle

unread,

Apr 9, 2021, 4:39:42 PM4/9/21

to next...@googlegroups.com

Something like this would work ('--stats' instead of '-stats' notwithstanding). However I would find it clearer to have the actual command line options that are being used in the """bash""" part of the script, rather than hiding them in the Groovy. Makes it easier to copy/paste to run outside in any case.

-alan

--
- Alan Hoyle - al...@alanhoyle.com - http://www.alanhoyle.com/ -

To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/F659B039-E0FC-41CC-BC07-58C14D4EFE58%40gmail.com.

Joe

unread,

Apr 9, 2021, 7:24:41 PM4/9/21

to Nextflow

Thanks Alan and Philippe,

I really appreciate you support.

Working fine..

In this bit (gatk MergeMutectStats --stats blah.1.stats --stats blah.2.stats --output blah.merged.stats) there seems to be an issue while making the symlinks, which I guess are not matching the order in the in the script like chr1 chr2, see below:

1.vcf.gz -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz

1.vcf.gz.stats -> /nextflow_work_dir/fc/83b4cc10a2172472919f95b12e199f/ID_chr19_raw.vcf.gz.stats -------------------> I'm assuming that the code looks for ID_chr1_raw.vcf.gz but find ID_chr19_raw.vcf.gz instead and then fails.

2.vcf.gz -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz

2.vcf.gz.stats -> /nextflow_work_dir/c6/e4a327edabbe66ffde36a57e79b20c/ID_chr14_raw.vcf.gz.stats ------------------> I'm assuming that the code looks for ID_chr2_raw.vcf.gz but find ID_chr14_raw.vcf.gz instead and then fails.

...

and so on.

but anyway your solutions work for me.

I will try to have the tuple sorted just for practicing because I'm new to bioinformatics and nextflow.

Have a great weekend folks..

Alan Hoyle

unread,

Apr 9, 2021, 7:52:58 PM4/9/21

to next...@googlegroups.com

Take a look in the process’s .command.sh and you should be able to see what’s going on.

—

- Alan Hoyle - al...@alanhoyle.com - http://www.alanhoyle.com/ -

From: next...@googlegroups.com <next...@googlegroups.com> on behalf of Joe <algha...@gmail.com>
Sent: Friday, April 9, 2021 7:24:41 PM
To: Nextflow <next...@googlegroups.com>
Subject: Re: [SOCIAL NETWORK ?] compilation error!

To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/742247fd-023c-4d93-9884-277308d9acccn%40googlegroups.com.

Alan Hoyle

unread,

Apr 10, 2021, 10:02:47 AM4/10/21

to next...@googlegroups.com

Did you really put the nextflow_work_dir in your root directory?

—

- Alan Hoyle - al...@alanhoyle.com - http://www.alanhoyle.com/ -

From: Alan Hoyle <alan....@gmail.com>
Sent: Friday, April 9, 2021 7:52:54 PM
To: next...@googlegroups.com <next...@googlegroups.com>

Joe

unread,

Apr 10, 2021, 7:18:26 PM4/10/21

to Nextflow

It is the same nextflow_work_dir and no changes at all, because it worked when I applied your solution..!

Reply all

Reply to author

Forward