KAligner gives better results..

47 views
Skip to first unread message

Matthew MacManes

unread,
Jul 20, 2011, 3:00:05 PM7/20/11
to abyss...@googlegroups.com
A couple of days ago, I had asked if people have looked to see if KAligner or Bowtie worked better for assembly.. For me anyway, using ~300Gb reads and a 1Gb genome, KAligner is MUCH better:  This is using k=50.. For Bowtie, the final PE assembly (*.contigs.fa) was essentially had essentially the same N50 and mean as did the *-3.fa

Using Bowtie:
n >250bp 754458
n50 1720
n:n50 111265
median 527
mean 1018
max 72336
total assembly: 770Mb

Using KAligner:
n > 250bp 580232
n50 3394
n:n50 76403
median 867
mean 1737
max 75777
total assembly: 1Gb

It's just too bad that KAligner is such a memory (and time) hog.. For me (and I bet a lot of other people), this is a major issue, one I'd love to see improved in future releases. 

Shaun Jackman

unread,
Jul 20, 2011, 5:21:17 PM7/20/11
to abyss...@googlegroups.com
Hi Matthew,

My experience has been that using KAligner produces better assemblies
with BWA as a close second. I didn't see as drastic a difference as
you're showing here. That's interesting. Can you try using BWA to add a
third data point to this comparison? How long are the reads?

Cheers,
Shaun

Alejandro Sanchez

unread,
Jul 20, 2011, 6:30:45 PM7/20/11
to Shaun Jackman, abyss...@googlegroups.com
i would also recommend using SMALT. In our experience works better
than BWA or Bowtie. You can get it from the Sanger web page. Although,
I have to say that I always use KAligner!

Cheers.

Tony Raymond

unread,
Jul 20, 2011, 7:35:22 PM7/20/11
to abyss...@googlegroups.com
Hi Matthew,

I've actually been focusing on improving the performance of KAligner, and these improvements should be in the next release of ABySS.

First, I've made it so that KAligner creates the number of worker threads specified instead of one worker for each file. This greatly improves the performance when there is only one or two read files. Second, I've made it so that the output is deterministic. The goal here is to make it relatively trivial to split the contigs we're aligning accross many machines, and then merge the result so that we can align anything ABYSS-P is able to assemble.

Tony
________________________________________
From: abyss...@googlegroups.com [abyss...@googlegroups.com] On Behalf Of Matthew MacManes [macm...@gmail.com]
Sent: Wednesday, July 20, 2011 12:00 PM
To: abyss...@googlegroups.com
Subject: KAligner gives better results..

Steve

unread,
Jul 21, 2011, 2:54:15 AM7/21/11
to ABySS
Out of interest. Did you do any quality trimming on the reads before
the assembly with bowtie? From what I can gather KAligner is breaking
the reads into kmer before mapping and therefore will perform much
better than bowtie on reads where the quality falls off towards the
end. However, if the reads are quality filtered would there still be
the same difference? Also, would the selection of the n parameter have
to be different depending on the aligner used - when kaligner breaks
up the read into kmers, is each individual kmer counted as an
individual link when deciding to join contigs together?

On Jul 20, 9:00 pm, Matthew MacManes <macma...@gmail.com> wrote:
> A couple of days ago, I had asked if people have looked to see if KAligner
> or Bowtie worked better for assembly.. For me anyway, using ~300Gb reads and
> a 1Gb genome, KAligner is MUCH better:  This is using k=50.. For Bowtie, the
> final PE assembly (*.contigs.fa) was essentially had essentially the same
> N50 and mean as did the *-3.fa
>
> Using Bowtie:
> n >250bp 754458
> n50 1720
> n:n50 111265
> median 527
> mean 1018
> max 72336
> total assembly: 770Mb
>
> Using KAligner:
> n > 250bp 580232
> n50 *3394*

Matthew MacManes

unread,
Jul 21, 2011, 10:32:59 AM7/21/11
to ABySS
Hi Steve, 

Yes, I used the preprocess step in Jared Simpsons's SGA, using a quality threshold of 1% error. Even after trimming however, there is still this issue of decreasing quality-- its just that I've essentially narrowed the dynamic range.

I thought about modifying the abyss-bowtie script to ignore the last few bases to see if that improves things, but I doubt that this modification will improve things up to where KAligner is. I bet that a BOWTIE_OPTIONS="-m40 …-3 10 -seedlength 20 ..." option would be easy to incorporate in future releases of ABySS.

Matt 
____________________________________________
Matthew MacManes, Ph.D.
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/

Shaun Jackman

unread,
Jul 21, 2011, 3:33:25 PM7/21/11
to Steve, ABySS
Hi Steve,

> Also, would the selection of the n parameter have
> to be different depending on the aligner used - when kaligner breaks
> up the read into kmers, is each individual kmer counted as an
> individual link when deciding to join contigs together?

Each read is counted as one link, not each k-mer.

Cheers,
Shaun

Reply all
Reply to author
Forward
0 new messages