Puzzled about variable not making it through to a process

1,210 views
Skip to first unread message

igtr...@colorado.edu

unread,
Aug 1, 2018, 12:01:21 AM8/1/18
to Nextflow
Hi guys,

I was wondering what I'm missing here... probably something really obvious, but I'm stuck. I'm reading a directory that may contain my input files (I'm taking either SRAs from GEO or fastq files, so I'm accounting for both, hence the alternative empty channel). I pass the basename to customize my outputs, and the full path to each file. In this example the directory only contains one file:

if (params.sra_dir_pattern) {
    println("pattern for SRAs provided")
    read_files_sra = Channel
                        .fromPath(params.sra_dir_pattern)
                        .map { file -> tuple(file.baseName, file) }
}
else {
    read_files_sra = Channel.empty()
}

... then in a process supposed to kick in from this channel being populated, this is what the input looks like:

process sra_dump {
    publishDir "${params.outdir}/fastq-dump/", mode: 'copy', pattern: '*.fastq'
    tag "$fname"

    input:
    println(read_files_sra)
    set val(fname), file(reads) from read_files_sra
    println(val(fname))

    [...]
}

That first println() statement outputs the following:

DataflowQueue(queue=[DataflowVariable(value=[SRRxxxxxxx, /full/path/to/sra/SRRxxxxxxx.sra]), DataflowVariable(value=groovyx.gpars.dataflow.operator.PoisonPill@26b894bd)])

... and when I try to use "fname" anywhere in the process, I get an error from Nextflow:

ERROR ~ No such variable: fname

If the process is being called, it means it is indeed getting the tuple ("filename", "file path")... but why am I unable to use the filename? The file path variable does seem to contain what I expect. Any help would be appreciated!

Cheers,

-i



Paolo Di Tommaso

unread,
Aug 1, 2018, 2:58:26 AM8/1/18
to nextflow
Because you cannot mix a println in the input/output declaration blocks. The correct syntax is shown below: 


process sra_dump {
    publishDir "${params.outdir}/fastq-dump/", mode: 'copy', pattern: '*.fastq'
    tag "$fname"

    input:
    set val(fname), file(reads) from read_files_sra

    script:
    println fname

    """
    your_command .. etc
    """
}



Hope it helps. 

p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

igtr...@colorado.edu

unread,
Aug 1, 2018, 9:14:51 AM8/1/18
to Nextflow
Oh, sorry, I should have mentioned... I was only putting those println() statements there for debugging purposes, and to show that I was indeed getting the expected tuple. I also get that same error when I try to use the "fname" variable in the body of the script, like the way you show on your example. What is strange is that it works for some processes but not others, so there must be some subtle difference that I'm missing.

-i

Paolo Di Tommaso

unread,
Aug 1, 2018, 10:07:00 AM8/1/18
to nextflow
Please provide a repeatable test case ie. a snippet that can be executed and results in the problem you are reporting. 


p

Message has been deleted

igtr...@colorado.edu

unread,
Aug 2, 2018, 5:54:18 PM8/2/18
to Nextflow
Apologies for the delay. I'm pasting below a stripped-down, minimal *.nf example that causes that error I mentioned. The config file only contains two entries:


params {
  fastq_dir_pattern = "/path/to/fastq/SRR*.fastq"
  sra_dir_pattern = "/path/to/sra/SRR*.sra"
}


Either could be provided, so I was trying to account for either or both types of files. The only difference would be that if an SRA is provided, it would be unpacked into a FASTQ and the pipeline continues normally... akin to having a step 1, an optional pre-1 step, and steps 2-n are the same. I get an error on the line I highlighted in blue: "ERROR ~ No such variable: fname"

I guess I should rephrase my question as: is there a better design patter than this, to take alternate input file types? I had to add that conditional on the output section of a process because even if a channel is empty, the process would still be invoked, and the error would instead become: "ERROR ~ Channel `fastq_reads_for_reverse_complement` has been used twice as an output by process `sra_dump` and another operator"

The minimal nf script is:

---------------------------------------------------------------------------------------
#!/usr/bin/env nextflow

if (params.fastq_dir_pattern) {
    println("pattern for FASTQs provided")
    fastq_reads_for_qc = Channel
                        .fromPath(params.fastq_dir_pattern)
                        .map { file -> tuple(file.baseName, file) }
    fastq_reads_for_reverse_complement = Channel
                                          .fromPath(params.fastq_dir_pattern)
                                          .map { file -> tuple(file.baseName, file) }
}
else {
    Channel
        .empty()
        .into { fastq_reads_for_qc; fastq_reads_for_reverse_complement }
}

if (params.sra_dir_pattern) {
    println("pattern for SRAs provided")
    read_files_sra = Channel
                        .fromPath(params.sra_dir_pattern)
                        .map { file -> tuple(file.baseName, file) }
}
else {
    read_files_sra = Channel.empty()
}


process sra_dump {
    publishDir "${params.outdir}/fastq-dump/", mode: 'copy', pattern: '*.fastq'
    tag "$fname"

    input:
    set val(fname), file(reads) from read_files_sra

    output:
    if (read_files_sra) {
        set val(fname), file("${fname}.fastq") into fastq_reads_for_reverse_complement
        set val(fname), file("${fname}.fastq") into fastq_reads_for_qc
    }
    else {
        println("No SRA provided.")
    }

    script:
    """
    module load sra/2.8.0
    echo ${fname}

    fastq-dump ${reads}
    """
}


process reverse_complement {
    validExitStatus 0,1
    tag "$fname"
    publishDir "${params.outdir}/fastx/", mode: 'copy', pattern: '*.flip.fastq'

    input:
    set val(fname), file(reads) from fastq_reads_for_reverse_complement

    output:
    set val(fname), file("*.flip.fastq") into flipped_reads_ch

    script:
    """
    module load fastx-toolkit/0.0.13
    echo ${fname}

    /opt/fastx-toolkit/0.0.13/bin/fastx_reverse_complement \
        -Q33 \
        -i ${reads} \
        -o ${fname}.flip.fastq
    """
}


process fastqc {
    tag "$fname"
    publishDir "${params.outdir}/fastqc", mode: 'copy',
        saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"}

    input:
    set val(fname), file(reads) from fastq_reads_for_qc

    output:
    file "*_fastqc.{zip,html}" into fastqc_results

    script:
    """
    module load fastqc/0.11.5
    echo ${fname}

    fastqc $reads
    """
}


workflow.onComplete {
    log.info "[GROFlow] Pipeline Complete"
}
---------------------------------------------------------------------------------------


Thanks again!

-i
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,
Aug 2, 2018, 5:59:25 PM8/2/18
to nextflow
You cannot use `if` statement in the input/output declararion


    output:
    if (read_files_sra) {
        set val(fname), file("${fname}.fastq") into fastq_reads_for_reverse_complement
        set val(fname), file("${fname}.fastq") into fastq_reads_for_qc
    }
    else {
        println("No SRA provided.")



To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

Ignacio Tripodi

unread,
Aug 2, 2018, 6:02:23 PM8/2/18
to next...@googlegroups.com
Ahh, I suspected that was the case... is there a way that I can avoid
producing an output if the input channel I'm expecting is empty? I
wasn't expecting the process to be called at all if the input channel
existed but was empty.

-i
> You received this message because you are subscribed to a topic in the
> Google Groups "Nextflow" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/nextflow/4SMFWrIWIzk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Paolo Di Tommaso

unread,
Aug 2, 2018, 6:10:55 PM8/2/18
to nextflow
Ahh, Ignacio!  :)

Yes, have a look here 




You should be able to parametrise it doing something like 

output: 
file 'foo' optional(params.flag=='bar') into channel 



Hope it helps (otherwise we'll continue tomorrow :))


p
 


>> Visit this group at https://groups.google.com/group/nextflow.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Nextflow" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/nextflow/4SMFWrIWIzk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

> Visit this group at https://groups.google.com/group/nextflow.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

igtr...@colorado.edu

unread,
Aug 3, 2018, 1:42:03 AM8/3/18
to Nextflow
I figured it out! (thanks for that reference, BTW, those design patterns are extremely useful)

The trick was not to make it optional (that didn't work), but to create an empty channel if either input directory isn't set, and mix the two channels from the different processes in the input. Might be a useful design pattern to add to the list, if we have a situation like:

Input_type_A --> process_0 --> process_1 --> process_2 --> ...
or
Input_type_B --> process_1 --> process_2 --> ...

Then we can do:

if (params.dir_pattern_A) {
    input_A = Channel
                  .fromPath(params.dir_pattern_A)
                  .map { file -> tuple(file.baseName, file) }
}
else {
    Channel
        .empty()
        .into { input_A }
}

Same for input_B, and then mix the output of process_0 and input_B:

process process_1 {
    input:
    set val(fname), file(input_file) from input_B.mix(process_0_output)

    [...]

This way I can process fastq files, or SRRs, or both in the same workflow! :-) 

Grazie mille!

-i
> You received this message because you are subscribed to a topic in the
> Google Groups "Nextflow" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/nextflow/4SMFWrIWIzk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> nextflow+u...@googlegroups.com.
> Visit this group at https://groups.google.com/group/nextflow.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages