Illumina MP library for scaffolding only

82 views
Skip to first unread message

Nathaniel Street

unread,
Dec 7, 2010, 3:02:03 AM12/7/10
to ABySS
I have a set of data that includes 3 Illumina PE libraries of
different insert sizes, all 2x100 bp reads, and for these I've been
using a k mer value of 48 because I found this gave me the best stats
for longest contig, N50 (based on genome size) and number of contigs >
2 Kb.

I now also have a 3 Kb MP library but here the reads are 36 bp after
all filtering steps. To use this library in the assembly I need to
drop the k mer down to < 36, so I've been using 28, but this 'harms'
the assembly of the PE data. So, what's the best way to only use the
MP data to inform scaffolding?

I'm a little unsure how and where to stop the PE assembly and how to
add in the MP data. I guess I need the PE contigs before scaffolding,
map my reads to those using bwa/bowtie, get the bam file, use fixmate
and then restart the assembly? A brief example of the steps and abyss-
pe commands would be really useful. I'd also be interested to know if
there are any bam/sam format issues I should watch out for.

Similarly, does anyone have advise about other scaffolding tools to
try? I've looked at bambus but it doesn't seem well-suited to a large
genome assembly where you have very many contigs and a large number of
Illumina MP reads.

Thanks

Andreas Sjödin

unread,
Dec 14, 2010, 9:07:19 AM12/14/10
to ABySS
You may try SSPACE (http://www.baseclear.com/sequencing/data-analysis/
bioinformatics-tools/sspace/). A software note is published as
advanced access in Bioformatics. They use their software to
scaffolding giant panda contigs and then their compare results with
SOAP scaffolds. Scaffolding of smaller genomes are compared with
results from Abyss.

/Andreas

Shaun Jackman

unread,
Dec 14, 2010, 2:55:33 PM12/14/10
to Nathaniel Street, ABySS
Hi Nathaniel,

An iterative assembly of the sort that you're describing is not yet
supported by ABySS. It is on the wishlist. I'd be interested to hear
from anyone that is using scaffolding software with ABySS contigs.

Cheers,
Shaun

Andreas Sjödin

unread,
Dec 14, 2010, 3:54:27 PM12/14/10
to ABySS
Hi Shaun and Nathaniel,
I have evaluated different scaffolding methods for bacteria genomes
where I can compare to a very close relative (finished genome). So not
exactly large genomes but I think same approach would work for larger
genomes too. The main reason I decided to go for a iterative process
is that I want to add external information (known genome structure
including duplication etc.) to my scaffolding runs, plus abyss had
some problem with my mate-pair data. I am finally using ABYSS contigs
as starting point together with other know information and I am then
doing the scaffolding iteratively in SSPACE and contig extension to
more or less close the genome. I also get similar results using BAMBUS
with ABYSS contigs as starting point. However, the combination of
ABYSS and SSPACE gives me most control of the process plus that I
don't need spend time to convert any files (like in other option as
BAMBUS or Velvet).

I am looking forward to have some similar interactive function in
ABYSS in the future.

regards,
Andreas

Zhang Di

unread,
Jan 11, 2011, 3:17:20 AM1/11/11
to Andreas Sjödin, ABySS
Hi, Andreas
   How long will SSPACE finish a contig extension run for a typical large genome (such as 1G bp)? 1 day, 2 days, a week, or a month?
--
Zhang Di

Nathaniel Street

unread,
Jan 11, 2011, 12:41:56 PM1/11/11
to Zhang Di, Andreas Sjödin, ABySS
Hi

I have now tried SSPACE on a genome of 450 MB and it took about 2 days with contig extension and less than a day without. I had two fragment libraries (PE) and one jumping library (MP). I'm not sure how that will scale up with genome size. I was quite happy with the results from the quick look I've had so far.

Hope that helps a little

Nathaniel

On Jan 11, 2011, at 09:17 AM, Zhang Di wrote:

Hi, Andreas
How long will SSPACE finish a contig extension run for a typical large genome (such as 1G bp)? 1 day, 2 days, a week, or a month?


On Wed, Dec 15, 2010 at 4:54 AM, Andreas Sjödin <andreas...@gmail.com<mailto:andreas...@gmail.com>> wrote:
Hi Shaun and Nathaniel,
I have evaluated different scaffolding methods for bacteria genomes
where I can compare to a very close relative (finished genome). So not
exactly large genomes but I think same approach would work for larger
genomes too. The main reason I decided to go for a iterative process
is that I want to add external information (known genome structure
including duplication etc.) to my scaffolding runs, plus abyss had
some problem with my mate-pair data. I am finally using ABYSS contigs
as starting point together with other know information and I am then
doing the scaffolding iteratively in SSPACE and contig extension to
more or less close the genome. I also get similar results using BAMBUS
with ABYSS contigs as starting point. However, the combination of
ABYSS and SSPACE gives me most control of the process plus that I
don't need spend time to convert any files (like in other option as
BAMBUS or Velvet).

I am looking forward to have some similar interactive function in
ABYSS in the future.

regards,
Andreas

--
Zhang Di

--
Nathaniel Street
Umeå Plant Science Centre
Department of Plant Physiology
Umeå University
SE901-87 Umeå
SWEDEN

email: nathanie...@plantphys.umu.se<mailto:nathanie...@plantphys.umu.se>
<mailto:nathanie...@plantphys.umu.se>skype: nathaniel_street
tel: 0046907865473
fax: 0046907866676

www.popgenie.org<http://www.popgenie.org/>

Reply all
Reply to author
Forward
0 new messages