specifying output directory and postprocessing said directory

1,853 views
Skip to first unread message

Karin Lagesen

unread,
Feb 25, 2017, 4:23:49 AM2/25/17
to Nextflow
Hi!

I am new to nextflow so apologies if this one is a FAQ.

I am trying to figure out nextflow through setting up running the program fastqc on some files. This is how far I've gotten:

params.reads = "fastq_files/*R{1,2}_001.fastq.gz"

Channel
    .fromFilePairs( params.reads )                                             
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }  
    .set { read_pairs }

process run_fastqc {
    input:
    set pair_id, file(reads) from read_pairs
    
    output:
    file '*_fastqc.{zip,html}' into fastqc_results
    
    """
    mkdir ${pair_id}
    fastqc -q ${reads} -o ${pair_id}
    """
}

My questions are as follows:

1. I can see that the pair_id directory is being created in the work directory. However, I would like to put these resulting directories in a directory of my choosing. How do I do that?

2. I would like to run a script on the directory containing the resulting fastqc zip files once all of the fastcqs are done. How do I do that?

Thanks!

Karin 






Paolo Di Tommaso

unread,
Feb 25, 2017, 4:46:39 AM2/25/17
to Nextflow
Hi Karin, 

Regarding point 1, just define a publishDir in the fastq process. All declared output files will be copied (actually symlinked) to that folder. 

Point 2, the most trivial solution is to run that script just after the `fastqc` command in the same process. it's perfectly fine doing so. NF process are meant to exploit parallelisation, but in this case won't have any particular benefit since that script depends on the result of fastqc. 

If any reason you wont to execute as a separate process, you will need to use the fastqc output as input of this new process. For example:


process run_fastqc {
    input:
    set pair_id, file(reads) from read_pairs
    
    output:
    file "$pair_id" into fastqc_results
    
    """
    mkdir ${pair_id}
    fastqc -q ${reads} -o ${pair_id}
    """
}

process post_fastqc {
  input: 
  file dir from fastqc_results
  
  """
  your-script $dir
  """ 
}


Note that in this case the `run_fastqc` output is the directory not the single files. 


Hope it helps 


Cheers,
Paolo


Karin Lagesen

unread,
Feb 25, 2017, 10:44:50 AM2/25/17
to Nextflow
Hi!

After talking to Paolo on the chat, and reading the docs, I am now here:

params.reads = "$baseDir/fastq_files/*R{1,2}_001.fastq.gz"
params.outputdir = "$baseDir/fastqc_results"

Channel
    .fromFilePairs( params.reads )                                             
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }  
    .set { read_pairs }

process run_fastqc {
    input:
    set pair_id, file(reads) from read_pairs
    
    output:
    file "$pair_id" into fastqc_results
    
    """
    mkdir ${pair_id}
    fastqc -q ${reads} -o ${pair_id}
    """
}

process run_fastqc_qc {
    publishDir "$baseDir/outputresults"


    input:
    file "testout/*" from fastqc_results.toSortedList()
    
    output:
    file "woow"
    
    """
    python3 $HOME/PycharmProjects/Bifrost/bifrost_py/fastqc_eval.py -d testout -o woow
    
    """
    
}

Everything seems to run, but I can't get hold of the output. I am pretty certain I am making an amateur error (which jives with me being an amateur), but I can't seem to figure out what...

Karin (who is very impressed with nextflow and likes the efficiency of it!)


Paolo Di Tommaso

unread,
Feb 25, 2017, 11:13:31 AM2/25/17
to nextflow
Does it create the `outputresults` folder with the expect output ? 


p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Karin Lagesen

unread,
Feb 25, 2017, 11:33:51 AM2/25/17
to Nextflow
No. No output folder is created (and I've looked in the work directory too).

K

Paolo Di Tommaso

unread,
Feb 25, 2017, 11:44:49 AM2/25/17
to nextflow
Try to replace `publishDir "$baseDir/outputresults"` with `publishDir "outputresults"`. You should find that folder in the launching dir.

p

On 25 Feb 2017 5:33 p.m., "Karin Lagesen" <karin....@gmail.com> wrote:
No. No output folder is created (and I've looked in the work directory too).

K

--

Karin Lagesen

unread,
Feb 25, 2017, 11:51:58 AM2/25/17
to Nextflow
No dice.... 

Question: since this is turning out problematic, I am suspecting I am not doing this "the nextflow" way. If so, may I ask what it _should_ look like?

Karin
Reply all
Reply to author
Forward
0 new messages