Mapping process not working

Carlos Guzman

unread,

May 19, 2016, 5:07:14 PM5/19/16

to Nextflow

Am building a simple CHIP-Seq pipeline based off of the chip-nf pipeline by the guigo lab.

Anyways, my mapping process doesn't seem to be working. I'm not getting any sort of errors, it just doesn't seem to be progressing.

My first step works fine, but the second step won't continue and I can't seem to figure out what's wrong. Could use some help, it's probably something ridiculously simple.

Here's my code:


// Default parameters
params.mismatches = 1
params.genomeSize = "hs"
params.config = "config.txt"
params.threads = 2

// Print Help

if (params.help) {
    log.info ''
    log.info 'C I P H E R - N F ~ version 1.0.0'
    log.info '================================='
    log.info 'ChIP-Seq Pipeline'
    log.info ''
    log.info 'Usage: '
    log.info '     cipher.nf --config --index [OPTIONS]...'
    log.info ''
    log.info 'Options: GENERAL'
    log.info '     --help          Show this message and exit.'
    log.info '     --config        A tab separated file containing read information. (Default: config.txt)'
    log.info '     --index         A Bowtie2 index directory.'
    log.info '     --threads       Allow N_THREADS per sample. (Default: 2)'
    log.info 'Options: BOWTIE2'
    log.info '     --mismatches    Allow max N_MISMATCHES for a read. (Default: 1)'
    log.info ''
    exit 1
}

config_file = file(params.config)

// Check required parameters

if (!params.config) {
    exit 1, "Please specify a config file"
}

if (!params.index) {
    exit 1, "Please specify a bowtie2 index directory"
}

// Create fastq channel

fastqs = Channel
.from(config_file.readLines())
.map { line ->
    list = line.split()
    id = list[0]
    path = file(list[1])
    [ id, path ]
}

/*
 * Step 1. Trimming fastq files via trim_galore
 */

process trimming {

    input:
    set id, file(fastq_file) from fastqs

    output:
    set id, file("${id}_trimmed.fq.gz") into trimmedfastqs

    script:
    """
    trim_galore --fastqc ${fastq_file}
    """
}

/*
 * Step 2. Map trimmed fastq files to bowtie2 index
 */

process mapping {

    input:
    set id, file(trimmed_fastq_file) from trimmedfastqs

    output:
    set id, file("${id}.sam") into sams

    script:
    """
    bowtie2 -q -N ${params.mismatches} --quiet -p ${params.threads} -x ${params.index} -U ${trimmed_fastq_file} -S > ${id}.sam
    """
}

The config file is just a two column tab delimited file where the first column is the id or prefix of the file, and the second column is the path. The index file is a bowtie2 index file.

I ran the script using the following:

nextflow run '/dataA/code/cipher-nf/cipher.nf' --index '/dataA/data/hg19_ref/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome'

The first step works fine. Trims the files, outputs the appropriate trimmed gzipped files and fastQC graphs. The second step seems to work fine ... submitted process > mapping .. etc. But nothing actually happens. I can tell because no CPU's are being used and I even let it go a few hours with no progress.

Any ideas?

Maria Chatzou

unread,

May 19, 2016, 5:32:06 PM5/19/16

to next...@googlegroups.com

Hi Carlos,

Just from a quick look at your code, it might be that "trimmedfastqs" channel is empty. From what I see in process 1, it's not clear how you fill the channel.

Cheers,
Maria

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Maria Chatzou

unread,

May 19, 2016, 5:46:09 PM5/19/16

to next...@googlegroups.com

If I understand correct your "trim_galone" processs produces many *_trimmed.fq.gz files which you want to put in the "trimmedfastqs" channel.

So try to replace the $id in the

"${id}_trimmed.fq.gz")

With a "*"

*_trimmed.fq.gz

indicating to it that it should get all the files.

Paolo Di Tommaso

unread,

May 19, 2016, 8:28:18 PM5/19/16

to nextflow

Could you please include the file ".nextflow.log" for the run you are experiencing this problem?

Thanks,
Paolo

--

Carlos Guzman

unread,

May 20, 2016, 9:17:08 AM5/20/16

to Nextflow

I managed to fix this. All I had to do was remove the `>` from the second process and place the resulting file names in front of the `-S` parameter. How embarrassing.

Paolo Di Tommaso

unread,

May 20, 2016, 9:33:55 AM5/20/16

to nextflow

Good. Anyway an useful debugging practice is to move into the process working directory created by nextflow to inspect the files created by your command or try to re-execute it running `bash .command.run`.