The output files are saved only to work/ dir even though publishDir is set

guaira.mel...@gmail.com

unread,

Jun 26, 2018, 5:57:21 PM6/26/18

to Nextflow

Hello there !!

I am pretty much a novice in nextflow. I am trying to create a simple pipeline that execute fastq (quite simmilar to fastQC, widelly common in the bioinformatics community). Here is what I have done:

#!/usr/bin/env nextflow

params.fastq_files = '/home/tain/Documents/RNA_seq/Data/*.fastq'
params.outdir = '/home/tain/Documents/RNA_seq/QC/'

Channel
  .fromPath( params.fastq_files )
  .ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" }
  .into { files_QC_ch }

process fastp {
publishDir params.outdir, mode: 'copy', pattern: '*.html'

  input:
  file(fastq_file) from files_QC_ch

  output:
  set file('fastp_*'), file('*.html') into fastp_results_ch

  """
  fastp -w 17 -i ${fastq_file} -o fastp_${fastq_file} -h ${fastq_file}_fastp.html
  """
}

However, the .html and fsatp_ files (the output) are not saved to the folder QC, instead they are saved to the work/process/ folder. What am I doing wrong? I have already look at other pipelines (https://github.com/SciLifeLab/NGI-RNAseq/blob/master/main.nf) but I still cannot figure it out why is this issue?

I would deeply appreciate any help.

Tain

Paolo Di Tommaso

unread,

Jun 27, 2018, 10:08:22 AM6/27/18

to nextflow

That files are not copied in the QC folder because in the publishDir you have specified `pattern: '*.html'`. Therefore only files with that suffix are *copied* there.

p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

guaira.mel...@gmail.com

unread,

Jul 5, 2018, 12:56:08 PM7/5/18

to Nextflow

Hello.

May be I did not explain myself clear enough. The command fastp generates three output files (1 .html, 1.json and 1.fastq). What I want is to save only the .html files to a desired folder (so I specified the pattern "*.html" in the publishDir command), however the files are not copied there, instead all three files are saved in the work/process/ folder.

Tain.

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Message has been deleted

guaira.mel...@gmail.com

unread,

Jul 5, 2018, 1:10:57 PM7/5/18

to Nextflow

It is not even working with the example from the documentation

#!/usr/bin/env nextflow


process foo {


    publishDir '/home/tain/Documents/RNA_seq/QC/'


    output:
    file 'chunk_*' into letters


    '''
    printf 'Hola' | split -b 1 - chunk_
    '''
}

However there is nothing in the path:

'/home/tain/Documents/RNA_seq/QC/'

And the output is in the work dir:

work/14/8be1c980a52d25a0564de0239eda95$ ls
chunk_aa  chunk_ab  chunk_ac  chunk_ad

Why is this happening?

Paolo Di Tommaso

unread,

Jul 6, 2018, 3:20:41 AM7/6/18

to nextflow

Please include the `.nextflow.log` file produced by the execution.

p

--

You received this message because you are subscribed to the Google Groups "Nextflow" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

guaira.mel...@gmail.com

unread,

Jul 9, 2018, 11:08:21 AM7/9/18

to Nextflow

Attached you will find the -nextflow.log file resulting from running:

#!/usr/bin/env nextflow


params.fastq_files = '/home/tain/Documents/RNA_seq/Data/*.fastq'
params.outdir = '/home/tain/Documents/RNA_seq/QC/'


Channel
  .fromPath( params.fastq_files )
  .ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" }
  .into { files_QC_ch }


process fastp {


  publishDir pattern: "*.html",
                      path: { params.outdir + "fastp/" }, mode: 'copy'




  input:
  file(fastq_file) from files_QC_ch


  output:
  set file('fastp_*'), file('*.html') into fastp_results_ch

"""
  fastp -w 15 -i ${fastq_file} -o fastp_${fastq_file} -h ${fastq_file}_fastp.html
  """
}
nextflow run fastq_test.nf

nextflow.log

Paolo Di Tommaso

unread,

Jul 9, 2018, 11:14:12 AM7/9/18

to nextflow

You are using a very old NF version (0.14.3), please update to the latest one (0.30.2).

If the problem persists please send the log create running nextflow using this command line

nextflow -trace nextflow run fastq_test.nf

p

guaira.mel...@gmail.com

unread,

Jul 9, 2018, 1:01:01 PM7/9/18

to Nextflow

After updating Nextflow, everything goes smoothly. Next time I will check that firstly.

Thank you very much.

Sam Tischfield

unread,

Jul 9, 2018, 1:09:53 PM7/9/18

to Nextflow

I have a question related to this - I have set publishDir "s3://mybucket/output/", mode: 'copy', overwrite: true in the start of my process.

the process script correctly produces the output file and publishes it to the S3 bucket - however it also keeps it in the S3 work directory that was created (e.g. s3://mybucket/output/q0/) .

The output is a fairly large file and I'd like to only have it in the publishDir output. I tried setting mode to "move" but this just threw an error.

Paolo Di Tommaso

unread,

Jul 10, 2018, 3:27:21 AM7/10/18

to nextflow

Please open an issue reporting the complete error stack trace in the GitHub project.

However I think the simplest way to manage this is to keep scratch data in a separate bucket that is cleaned up by an automatic S3 lifecycle rule.