BWA Index -> BWA Mem

1,269 views
Skip to first unread message

scala.s...@gmail.com

unread,
Sep 23, 2016, 3:10:42 PM9/23/16
to Nextflow
Having trouble getting this to work properly. Figured it was pretty simple to solve, but 2 hours of scouring google, and reading through the docs hasn't helped.

I'm trying to create a BWA index file for a reference fasta file and then use that index to map my fastq files using BWA mem.

Here's what I have so far (and hasn't worked):

process create_index {

    input
:
    file
(fasta_file)

    output
:
    file
'genome.fa*' into bwa_index
   

    script
:
   
"""
    bwa index -p genome ${fasta_file}
    """

}

// STEP 3 - MAPPING
process bwa_mem
{

    input
:
   
set mergeId, id, file(fastq), controlid, mark, quality from trimmed_fastqs
    file genome_index
from bwa_index.first()
    file fasta_file

    output
:
   
set mergeId, id, file("${id}.sorted.rmdups.bam"), controlid, mark, quality into bams1, bams2
    file
("${id}_bwa_mem_summary.txt")
    file
("${id}_picard_metrics.txt")

    script
:
   
"""
    bwa mem -t -M ${params.threads} ${genome_index} ${fastq} 2> ${id}_bwa_mem_summary.txt | samtools view -b -F 1804 - | samtools sort -T ${id} -o ${id}.sorted.bam
    picard MarkDuplicates I=${id}.sorted.bam O=${id}.sorted.rmdups.bam M=${id}_picard_metrics.txt MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true
    """

}

The first process generated multiple files such as genome.ann, genome.pac, genome.amb and so on.

The error i'm getting is that BWA can't find my genome index files represented by the error [E::bwa_idx_load_from_disk] fail to locate the index files

Any idea how to solve this?

Paolo Di Tommaso

unread,
Sep 23, 2016, 4:21:25 PM9/23/16
to nextflow
Take in consideration that the `${genome_index}` variable is expanded to the list of files matched by `'genome.fa*` pattern. I'm not familiar with the bwa mem command line, but I guess is not what it's expecting for. 

Since you know the name of the index file(s) I would suggest to replace: 

   file genome_index from bwa_index.first()

with: 

   file '*' from bwa_index.first()

then reference that file name statically on the bwa command line, I mean without using the variable. Also try to list the content of the `bwa_mem` process working directory to verify that all expected files are available. 


Hope it helps. 


Cheers,
Paolo




--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages