BWA and ABySS with Illumina Paired End Data

ozborn

unread,

Aug 3, 2011, 4:16:04 PM8/3/11

to ABySS

Hello,
I'm trying to generate a high quality consensus file for a dozen or so
small bacterial genomes using Ilumina paired end technology with 100bp
reads. I have done alignments using BWA against two reference genomes,
but I have slight discrepancies (~60 SNPs, some indels too) depending
on which of the two reference sequences I align against. Sometimes one
reference is better, in other regions of the genome the other
reference is better. I would like to resolve such problems as close to
automatically as possible so I have been trying to use ABySS. I want
to ensure I don't miss any novel sequence not present in the 2
reference genomes.

I first generated my contigs.fa file for my genome:
abyss-pe k=25 n=20 in='/home/ozborn/consults/k/fastq/
142_8_s_3_1.fastq /home/ozborn/consults/k/fastq/142_8_s_3_2.fastq'
name=142dot8_abyss

This gave me ~2200 contigs which was a little high I thought since the
genome has ~ 500-fold coverage but I proceeded anyway and ran into
trouble when subsequently trying to use abyss-bwa. My two questions
are:

1) When running abyss-bwa like this:
abyss-bwa /home/ozborn/consults/k/abyss/bwa/142dot8_abyss-contigs.fa /
home/ozborn/galaxy/database/galaxy_ng_indices/bugs/mp/bwa/m_fh

I get an error:
[bwa_aln_core] calculate SA coordinate... [bns_restore_core] fail to
open file '/home/ozborn/galaxy/database/galaxy_ng_indices/bugs/mp/bwa/
m_fh.nt.ann'. Abort!

The current version of bwa (0.5.9-r16) doesn't generate this index
file, is ABySS trying to call its own bwa hidden somewhere? Is this a
mismatch between abyss-bwa in 1.2.7 and the current version of bwa? I
have m_fh.ann but not m_fh.nt.ann

2) Is it possible to have take in one or more BAM files from my BWA
alignments and use that as input to ABySS? Is this even a good idea? I
would like to be able to identify any novel sequence not present in my
reference genome. I see the aligner=bwa option for abyss-pe but it is
not clear to me how it works in conjunction with de-novo assembly.

Any advice appreciated,

-John

ozborn

unread,

Aug 3, 2011, 4:58:17 PM8/3/11

to ABySS

I just realized it is looking for colorspace index, and I can make it
happy by pre-building a colorspace index. However I don't have
colorspace reads and have no idea why it is looking for them...

Also, I'm not sure what to do with the output. It appears to output a
SAM file to STDOUT, but nothing looks like it is mapped. Ex)

0 4 * 0 0 * * 0
0 CGCACGACATCCAGCCCCCCATTGA *

-John

Shaun Jackman

unread,

Aug 3, 2011, 7:16:14 PM8/3/11

to John Ozborn, ABySS

Hi John,

At one point during the assembly, the reads are aligned back to an
intermediate assembly before proceeding with the paired-end assembly.
The default aligner is KAligner, which is included with ABySS. The
abyss-bwa script is a small wrapper script to facilitate using BWA to
align the reads to the contigs.

To align contigs to a reference you do not want to use the abyss-bwa
script. Instead use `bwa bwasw'.

k=25 is quite small. What is your read length? Have you tried other
values of k? Higher coverage data sets require larger values of k.

Cheers,
Shaun

Rod Docking

unread,

Aug 3, 2011, 7:40:09 PM8/3/11

to Shaun Jackman, John Ozborn, ABySS

Hi John:

To answer your second question, you can use BAM (or SAM) files as input to ABySS. A use-case for this might be something like "try to assemble all the reads that didn't align to my genome reference".

In general though, we usually prefer to work from the unmodified qseq or fastq files.

Cheers,
Rod

ozborn

unread,

Aug 8, 2011, 12:37:46 PM8/8/11

to ABySS

Hi Shaun and Rod, thanks for your replies, they are quite helpful.

Shaun, it sounds like abyss-bwa is just switching from KAligner to BWA
after the intermediate assembly has been generated. You're right, this
doesn't have the utility I am looking for - I already have the bwa
generated alignments to my references.

One question - I have read size of 100 and coverage of around 500,
what should I be shooting for in terms of k to maximize my contig
size?

Rod, thanks for answering my last question. It sounds like I would
only like to use the BAM input option if I suspect I have a
significant number of reads not aligning to my reference genome.

-John

Shaun Jackman

unread,

Aug 9, 2011, 4:47:21 PM8/9/11

to ozborn, ABySS

Hi John,

With 500x coverage the optimal k will be pretty close to your read
length. I'd try assembling every 8th k from 64 to 96 and compare the N50
of those assemblies. If you find a peek in the N50, you can try a few
more assemblies around that value of k.

Cheers,
Shaun

Tony Travis

unread,

Aug 11, 2011, 5:54:07 AM8/11/11

to abyss...@googlegroups.com

On 09/08/11 21:47, Shaun Jackman wrote:
> Hi John,
>

> With 500x coverage the optimal k will be pretty close to your read
> length. I'd try assembling every 8th k from 64 to 96 and compare the N50
> of those assemblies. If you find a peek in the N50, you can try a few
> more assemblies around that value of k.

Hi, Shaun.

I've been labouring under the misapprehension that we should always use
odd kmer values for ABySS to avoid palindromes - Is that wrong?

I tried a monotonically increasing sequence of kmer values with ABySS at
first, but the even ones clearly belong to a different set than the odd
ones. After some reading around, I discovered the problem of hash keys
with palindromes using even kmer values...

Bye,

Tony.

Shaun Jackman

unread,

Aug 12, 2011, 6:53:40 PM8/12/11

to Tony Travis, abyss...@googlegroups.com

Hi Tony,

Even values of k should not be a problem for ABySS. I'd be happy to see
a counterexample if you have one.

Cheers,
Shaun

Reply all

Reply to author

Forward